Followers

Showing posts with label Open Access Movement. Show all posts
Showing posts with label Open Access Movement. Show all posts

Tuesday, April 12, 2022

India’s open data movement has found a white knight

 Administrative datasets are generated using public funds but are typically withheld from the public. So I am glad to report that things appear to be changing. In an unprecedented step, the Union ministry of rural development has released data on key facilities (roads, bus stands, schools, hospitals, panchayat offices, agri-markets, etc) across 1 million rural habitations of the country. This dataset is a byproduct of India’s flagship rural roads scheme, the Pradhan Mantri Gram Sadak Yojana (PMGSY).

A key goal of the PMGSY is to provide all-weather roads in the hinterlands to connect rural habitations (clusters of dwellings or village sub-units) to important sites such as schools or bus stands. The ministry used a weighting formula to prioritize roads that would link a habitation to a secondary school, hospital or a mandi (agri-market). To collect data, field engineers fanned out across India over the past few years to record the geographic coordinates of these facilities on an application developed by the Pune-based Centre for Development of Advanced Computing (C-DAC). The data on these facilities have now been released as part of the rural connectivity dataset (https://geosadak-pmgsy.nic.in/OpenData). It is perhaps one of the most granular geo-tagged datasets available in the public domain today. Given the paucity of rural data, this database could help researchers and private firms understand and serve rural India better. The dataset has been released under an open data licence, which means that it can be used freely by both public and private organizations.

Like any other administrative dataset, this one too poses several statistical challenges. Coverage and definitions vary across states because state-level officials were given the discretion to tailor the scheme according to the needs of each region. There could be errors in some location coordinates as well. So the data cannot be naively merged with other databases without accounting for these definitional, coverage, and quality issues.

Yet, this data release is highly promising on three counts. First, the data release has been done in an open and accessible format, which makes it easy for developers to build other applications or conduct research. The open data licence will also enable officials in other government departments to mine the data intensively without having to go through a Kafkaesque maze of approvals. The biggest beneficiary of open government data of this kind is the government itself. Despite limitations, rural connectivity data can be of immense value in framing rural policies.

Second, the ministry’s data team is open about both the strengths and weaknesses of the dataset, and is keen to improve data quality. The data team is engaging with data users to make them aware of the potential uses of the dataset, context under which it was collected, and also to collect feedback, said Harsh Nisar, the lead data scientist at the ministry’s data insights unit. The ministry is trying to work out a governance mechanism to incorporate public responses on deficiencies in the dataset, such as missing habitations or roads, he added.

Third, the ministry has tied up with what is perhaps India’s largest open data community, DataMeet. Started by Bengaluru-based techies S. Anand and Thejesh G.N. on 26 January 2011, DataMeet has grown into a country-wide community of data nerds today, with its membership running into the thousands. Like many other journalists, I have benefited from its high-quality discussions and pool of resources. The ministry, too, is likely to gain much from its engagement with DataMeet.

DataMeet acts as a channel of communication among data users through its mailing list, which is also used to update and upgrade its repository of open data and maps. In its early years, the group would petition ministries and departments to open up their datasets. With ministry officials now reaching out to them, life seems to have come full circle for the community. Community partners such as DataMeet can help import the geo-tagged facilities into an open map framework such as OpenStreetMap (an open-source alternative to Google Maps) for wider use, said Nisar.

The rural development ministry’s example could inspire other ministries to start opening up their datasets. Involvement of the open data community in these initiatives can help improve data accessibility and quality. If all open datasets are connected via common geographic identifiers, then they could generate rich insights for both the government and private sector.

This process can become smoother over time if the government standardizes data formats and definitions across states, departments and ministries. Lack of such standardization means that a data user has to use a fair number of assumptions and adjustments to be able to use the available public datasets. This adds to the cost of doing business or research in the country, and slows down innovation. This is where an empowered data regulator such as a statutory National Statistical Commission could play a vital role by harmonizing data standards and pulling up data laggards within the government.

If only the second wish in my wish list were to come true now.

Pramit Bhattacharya is a Chennai-based journalist.

Source: Mintepaper, 12/04/22

Thursday, January 13, 2022

arXiv.org: Free online repository of 2 million research papers

 

arXiv — pronounced ‘archive’ because the ‘X’ stands for ‘chi’, the 22nd letter of the Greek alphabet — is a gigantic online repository of research that physicists, astronomers, computer scientists and mathematicians among others find indispensable.

Over the last two years, non-science specialists and other lay people have read references to “bioRxiv” and “medRxiv” in news reports on the Covid-19 pandemic, frequently described as “preprint servers”.

Both bioRxiv and medRxiv, which have played an invaluable role in quickly disseminating the conclusions of scientific research on the coronavirus to doctors, scientists, and health policymakers around the world, were inspired by arXiv.org, the original preprint server that published its two millionth paper — a numerical analysis titled ‘Affine Iterations and Wrapping Effect: Various Approaches’ — earlier this month.

arXiv — pronounced ‘archive’ because the ‘X’ stands for ‘chi’, the 22nd letter of the Greek alphabet — is a gigantic online repository of research that physicists, astronomers, computer scientists and mathematicians among others find indispensable.

For 30-plus years

arXiv “started out in 1989 as an e-mail list for a few dozen string theorists”, according to a long profile published on January 10 in Scientific American magazine. In 1991, physicist Paul Ginsparg, who was then a technical staff member at the Los Alamos National Laboratory, automated his colleague Joanne Cohn’s e-mail list, turning it into a repository which anyone could access or submit to, says the article.

Thus was born arXiv, to which as many as 500,000 papers had been submitted by 2008. It took only six years until 2014 for this number to double to a million, and seven more years to double again.

Ginsparg is now at Cornell University, where arXiv is also located legally. Cohn, whose exchange of string theory manuscripts seeded the idea of arXiv, is at UC Berkeley.

Fast and free

While the material posted on arXiv is not peer-reviewed, it allows the wider community of researchers to circulate their findings quickly and freely pending peer-review. Research could appear online within a day of submission, compared with perhaps several months at the traditional journals. This holds true for the life sciences preprint servers bioRxiv and medRxiv as well — and made an immense contribution to speeding up biomedical research in the literally life-and-death situation of the pandemic.

“It’s like the backbone for our field,” the Scientific American article quoted Alex Kohls, head of the Scientific Information Service at CERN, as saying. “It’s not only a tool for physicists and computer scientists — it has had an impact on the overall scholarly communication process.”

The Scientific American quoted the work of Lanu Kim, who led a study that found that authors of highly-cited arXiv papers were increasingly likely not to publish in a traditional journal at all. Kim’s team, the article said, found that the journals still had a significant impact on citations, but they were now more like curators than the main distributors of research.

Some concerns

But there are problems as well. arXiv acknowledges support from the Simons Foundation based in New York City and a large number of academic and research institutions around the world but is still short of resources. A small paid staff helps volunteer moderators handle up to 1,200 submissions every day, according to the Scientific American article. “We are understaffed and underfunded — and have been for years,” the article quoted Steinn Sigurdsson, the scientific director of arXiv, as saying.

The article also flagged concern over some of the moderation policies at arXiv, quoting, among others, physicist Deepak Vaid of the National Institute of Technology Karnataka, Surathkal: “They are taking actions which seem to go against what the role of a preprint server should be.” Dr Vaid, the article said, pointed to “inconsistent moderation and a lack of transparency”.

Source: Indian Express, 13/01/22

Friday, January 08, 2021

Draft Science, Technology and Innovation Policy Proposes Major Changes to India’s Open Access Culture

 On 2/01/2021, the Ministry of Science and Technology rolled out the draft version of the proposed Science, Technology and Innovation Policy. The process to have a new policy, subsequent to the 2013 one, was in the works since May 2020 (see here for our coverage of the same). And after a claimed 4 track process of consultations and “nearly 300 rounds of consultations with more than 40,000 stakeholders well distributed in terms of region, age, gender, education, economic status, etc” the present draft version of the policy is brought out for public consultation. The substantive portion of the policy is spread out in eleven chapters and for the purpose of this post we shall discuss the first chapter titled “Open Science”. It is to be noted – the Ministry of Science and Technology is concerned only with STEM type sciences, while research in social sciences seems to fall under the ambit of the Indian Council of Social Science Research, under the Ministry of Education. Therefore on the face of it, research in social sciences are not covered by this policy and it would thus be very pertinent to see whether the Ministry of Education will be joining this endeavour or not. As per the Press Release, the draft is open for comments till Jan 25, 2020 on email: india-stip[at]gov[dot]in 

Is Access now granted (read Open)? 

The draft policy places a lot of importance on Open Science and the need for publicly funded research to be inclusive and accessible. In pertinent part it states: 

“Open Science fosters more equitable participation in science through diverse steps like increasing access to research outputs, more transparency and accountability in research, inclusiveness, better resource utilisation through minimal restrictions on reuse of research outputs and infrastructure, and ensuring constant exchange of knowledge between producers and users of knowledge. It is important to make publicly-funded research output and resources available to all to foster learning and innovation. STIP aspires to build an ecosystem where research data, infrastructure, resources and knowledge are accessible to all.” (emphasis provided)

Open Access Portal: The policy proposes to establish an open access, interoperable portal called the Indian Science and Technology Archive of Research (INDSTA). The portal shall be dedicated “to provide access, specifically, to the outputs of all publicly-funded research (including manuscripts, research data, supplementary information, research protocols, review articles, conference proceedings, monographs, book chapters, etc.).” Notably, INDSTA is to also support text and data mining, querying and visualisations. 

Open Data: Importantly, the draft policy also proposes to make available all the data used in and generated from publicly funded research to the scientific community and public at large. The Policy suggests that all the data shall be available in Findable, Accessible, Interoperable and Reusable (FAIR) terms. These guiding principles provide both machines and humans better ability to engage with the vast amounts of data that is being generated in scientific eco-systems. (More on FAIR principles can be read here.) 

It also states that wherever applicable, on the basis of grounds of privacy, national security and IPRs, data will be made available to the public, subject to anonymisation or redaction. Or if the same is not possible then, it will still be made available to “bonafide and authorized researchers”. While it is understandable that not all data can (or should) be made available, this does require clarity on what qualifies a researcher as a ‘bonafide’ researcher, what type of data is eligible for being kept away from the reach of the general public, etc. 

Post-Print Repositories: The policy calls for an important Open Access mandate on manuscripts coming from public funds. It states,

“Full text of final accepted author versions of manuscripts (postprints and optionally preprints) along with supplementary materials, which are the result of public funding or performed in publicly funded institutions, or were performed using infrastructure built with the support of public funds will be deposited, immediately upon acceptance, to an institutional repository or central repository”.

The draft doesn’t elaborate on this but this type of a requirement would also have the double benefit of pushing all public funded manuscripts away from publishing in ‘closed’ journals, which traditionally don’t allow post-prints (i.e., post peer review) to be shared in accessible manners. This institutional push away from ‘closed’ journal publishing is a huge step in itself – as such mandates may be the only way of getting around the high pressure academic publishing environment that often pushes (ie., forces) academics to publish in closed journals, based on impact factor and reputation, etc. And regarding the central repository – unlike Mendeley, SSRN etc, there is no question of a giant publisher acquiring this central repository since it would be a government repository. 

One Nation, One Subscription

The most notable feature of the policy is the call for one centrally negotiated subscription which will enable access to “all individuals in India”. While this would have huge repercussions, the draft policy currently doesn’t elaborate on much. The whole provision is reproduced below: 

The Government of India will negotiate with journal publishers for a “one nation, one subscription” policy whereby, in return for one centrally negotiated payment, all individuals in India will have access to journal articles. This will replace individual institutional journal subscriptions.”

It is laudable that such a radical proposal is being considered in a way that makes clear that the research communities’ concerns regarding access and excessive subscription fees have been heard. While this, if successfully implemented, would be a game changer for researchers in the country, a lot depends on how large the theory-practice gap is when this provision is sought to be converted from paper to practice. As noted in an earlier post – there is a strong need to question why so many people need to depend on shadow libraries in the first place – and this policy proposal goes right to the heart of that question – but in its current limited form, leaves many other questions open. 

Firstly – would journal publishers be open to such a proposal? While it would certainly make their job easier to just negotiate with one bulk governmental consumer, would it make business sense (read: profit maximisation) for them to provide access to ‘all individuals in India’ at one price? On the other hand however, is the fact that access to top scientific journals is an inelastic demand – i.e., at the end of the day, institutes need access to this if they want their researchers to be internationally relevant. And at this unprecedented scale of India-wide subscription level – will this end up with the Government just paying whatever ridiculous price the journals put forth? (relevant – see here and here). Another question is who will decide which journals are worth subscribing to, now? This is especially relevant since it also says this will replace individual institutional journals. (This would be presumably be more problematic in social sciences, where various other considerations could come into the picture but perhaps a less troubled, even if still a tedious issue within STEM sciences). Given market dynamics – if an individual / private institute wants/needs to subscribe to a journal outside of the government selected ones – is there a chance that these (non-subscribed) journals will now become even higher priced, since the only ones who go after them, will presumably have a higher demand for them? 

Regardless, much of the direction of this policy marks significant progress by the Indian government towards a culture of greater / open access. It also shows an understanding that public funded research is meant for the public (see here and here), as well as a desire to reach into the vast catalyzation potential that such access would provide. It now remains to see whether the next step of converting this to the implementation stage is one which is feasible or not. 

The Consultation Process

The consultation/ public participation in the background of the policy merits appreciation independent of the policy document. The policy discloses that close to 300 rounds of negotiation has occurred for its formation, since May 2020 till date. The participative model behind the policy is based on four interdependent tracks. 

  • Track I is concerned with creating a repository of public voices to guide the drafting process.
  • Track II is consulting 21 expert-driven thematic collectives for feeding evidence based recommendations in the drafting process.
  • Track III comprises of engaging with ministries through nominated nodal officers 
  • Track IV (a bit ambiguous)  engagement of apex-level multi stakeholder at national and global levels.

The independent organization Science Policy Forum (SPF) led the Track I initiatives and devised six instruments for fulfilling the commitments therein (more about these instruments can be found here).

Additional notes

The Open Science portion of the document also touches upon other important aspects, even if only briefly. It looks at infrastructural needs of the community by calling for libraries at public funded universities to be accessible to the public without undue hassle. It further endeavors to make ‘learning spaces’ universally accessible, “especially for people with special needs” and also seeks to enable the right of attribution, preservation and translations (especially in regional languages) of the publicly funded educational resources. The policy further highlights the need to improve awareness and accessibility of the Indian journals internationally, as well as the issue of predatory journals in India. To that extent, the limited text in the draft policy does seem to reflect a well rounded understanding of the problems of access in India. However, the devil is often in the details, and only when those details are available, will we know if the solutions also reflect an understanding of these problems. As mentioned above, the draft policy is laudable for its initiative to rattle the cage, however it is yet to be seen if the proposed ‘maverick-esque’ solutions have the needed teeth to take the proposed bite. 

 

Source: SpicyIP, 4/01/21

 

Monday, August 13, 2018

Let’s share

The free software and open access movements are among the most important developments after the rise of the world wide web


Back in 2010, Aaron Swartz, a tech prodigy and political activist, sneaked into a basement closet at the Massachusetts Institute of Technology (MIT) and secretly connected his Acer laptop to the institute’s high-speed internet network. Using MIT’s credentials, he gained access to JSTOR, a digital academic database, and began to download thousands of files. Once a talented entrepreneur, he had renounced the traditional Silicon Valley career path for a non-conformist campaign of rigorous public-interest activism.
But what was his intent behind downloading these articles? He aimed to make these journals openly available to online users as he believed in the fundamental principle of freedom of information. He also believed that those digitised academic documents behind the JSTOR paywall were for public use and that they needed to be liberated by guerrilla action, as he stipulated in his “Guerrilla Open Access Manifesto” in 2008.
Swartz’s antics were detected, and he was soon apprehended. The several million documents he downloaded were never released. In January 2013, the US attorney’s office in Boston indicted him in federal court on 13 felony charges and sentenced him to a statutory maximum of 95 years in prison. Swartz hanged himself in his apartment in Brooklyn. His death drew media attention and can be considered as the lens for reconsidering the entire history of copyright for the digital age. This also marked the emergence of the Free Culture movement.
“Information wants to be free” goes the slogan of the social movement encouraging open-source software, file sharing and a permissive legal environment for modifying and distributing the creative works in the form of open content or free content by using the internet and other forms of media. The free software and open access movements are among the most important developments after the rise of the world wide web. Swartz was not the only internet activist who believed in the concept of an open and free internet. There were people like Richard Stallman, who gave birth to the term “free software”, free as in freedom, not free as in no cost.
The aura of the information age is not just about new ideas but about a shift in the paradigms of communication and control. In this age of digital feudalism, we do not actually own the products we buy, but we are merely granted limited use of them as long as we continue to pay the rent. The radical expansion of intellectual property (IP) rights threatens to reach the point where they suppress any and all other rights of the individual and society. The current copyright laws have hindered creativity and resulted in a read-only internet culture in which we only consume information/content, despite technology advances that make it easy to create and contribute to culture. Copyright law doesn’t extend neatly to the digital world and the digital rights management tools the industry is endeavouring to develop to maintain copyright control are dampening the growth of a rich read-or-write culture.
We need to bring that open-source mentality to the content layer. Two-thirds of all websites run on open-source software, but most of the premium academic resources remain closed behind digital gates. The Directory of Open Access Journals reports that nearly 4,000 publications are available to the masses via the internet, a number that grows rapidly each year. It is essential to liberate data, liberate knowledge — especially data that taxpayers have already paid for.
Thanks to the Free Culture movement, vast knowledge repositories like Wikipedia and Stack Exchange and open access efforts like the science article sharing site arXiv.org have flourished as they permit content to be re-used for free and built upon, and many major websites offer Creative Commons (CC) licensing as part of their user interfaces (UI). In 2012, Google launched a worldwide campaign named Take Action for building a free and open world wide web. Here is the kernel of Google’s argument: “A free and open world depends on a free and open internet. Governments alone, working behind closed doors, should not direct its future. The billions of people around the globe who use the internet should have a voice”.
In India, the campaign for an open and free internet was triggered in 2015, when Airtel planned to charge extra for Voice over Internet Protocol (VoIP) services such as Skype. Airtel also had a tie-up with e-commerce website Flipkart for a platform called Airtel Zero, which primarily gave users free access to Flipkart’s website, while other sites were still charged for access — a practice dubbed “zero rating”. Airtel scrapped the platform as well as the plan following a large-scale public upheaval, which eventually spawned broader discussions and deliberations in India over net neutrality.
Recently, India’s Inter-ministerial Telecom Commission took a landmark decision by agreeing to the recommendations by Telecom Regulatory Authority of India (TRAI) and finally adopting prim Net Neutrality rules (keeping the internet free from corporate and political interests) granting 1.3 billion people equal access to online content. This is consequential for a developing (largest functioning democratic) nation like India as open internet establishes a network of online democracy, innovation and creates a level playing field for everyone.
A free and open culture would enrich our world in immeasurable ways. Let’s take a step closer to a world in which free access to knowledge is a basic human right and sharing is the norm, not the exception.
Source: Indian Express, 13/08/2018