Open data and code as partial solutions to scientific research reproducibility

If the difference between science and superstition is that science is reproducible, then the methods used to conduct a study must be made available to replicate it. Across many disciplines research now relies on applying code to data sets to produce outputs that are analyzed, and these results then become the foundations of new knowledge and subjects of study. In Keeping science reproducible in a world of custom data and code, Ben Klemens outlines these scientific research fundamentals and the challenges to enabling reproducibility.

Open data and code are increasingly recognized by journal editors as critical pieces of research that authors must provide prior to acceptance for publication, but this is not true equally across fields. Klemens checked submission guidelines for 2,700 Springer journals. Across 13 fields, 37% of ecology, 23% of management, 7% of education and 6% of surgery journals required data and code submission. However, many of these journals accepted an author statement that “data is available upon request”, yet some who have tested this system have received no or unfruitful responses from the author(s). To address this problem, funders often require data management and sharing plans.

Further reproducibility problems with studies arise from poorly articulated procedures, badly formatted code, code errors, or variances in code platform, versions and add-ons. Writing clear, reproducible code takes expertise and time that may or may not be available to a research team. In an extension on the “scoop” defense – the fear that others will publish research based on data before the data collector does – some researchers are reluctant to share their data and code until they have used them in all possible ways they can imagine, so some funders do grant a period of exclusivity. In fact, open data sharing with proper metadata offers broader opportunities to receive credit for that work. Unfortunately, the metrics to count data and code references are not as widely available as those to count article citations.

Concerns are widespread about how to properly share data and the code that produces study results, and incentives still favor quantity of article citations. Still, the increasing number of journals who hire data editors, grants that fund coders, and available tools to share are signs of greater recognition of their importance. The central role of computer analysis is here to stay, and more props are coming for those who make it happen.

Policy Commons: bringing the works of research centers and think tanks to light

At last week’s NISO Open Research virtual conference, Toby Green of CoherentDigital.net described how the research of thousands of governmental agencies, non-governmental organizations, research centers and think tanks is published in works without metadata, unique identifiers or other standards that that would enable it to be found in library catalogs or major search tools that many researchers use. Consequently, a vast body of research produced by experts, supported by data and reviewed, is lost to communities that could benefit from it. It is “on the dark side of the moon.”

Policy Commons is an initiative to work with these organizations to ingest, describe and index their research in ways that the public can find and use it. The database includes entries with links the text of over 2.5 million papers from thousands of organizations. Individuals cans register to search for and access content for free up to 25 searches per month; membership gives unlimited searching. Fees are collected from research groups and institutions for higher capacity to harvest, upload and organize their works. Any registered user can upload their own content.

The vision and goals of Policy Commons are worthy and its coverage is broad. It includes 317 works published in Mali and 1,801 from the Seychelles, for example. A user has multiple options for finding content – browsing by topic, identifying organizations, viewing publications or tables – and then applying filters for language, publication type, publisher type, year published, publisher country, and more. You can also conduct a simple or advanced search. The advanced search starts with title, summary or full-text search, after which you can limit by any or a combination of facets. I found 6 reports with “voting” in the title published in Dutch between 2010 -2021. To fully integrate with other catalogs and search tools, Policy Commons could make a broader contribution to knowledge management by applying and employing standard identifiers, such as those for digital objects, organizations and authors. With plenty of room for further development, it currently serves a need for high quality information retrieval.

Reviewing open access in Canada in the wake of Covid-19

In a two part series on the Scholarly Kitchen blog, Leigh-Ann Butler, Shannon Cobb and Michael Donaldson, three authors representing Canadian national funding agencies and a non-for-profit scientific publisher, reflect on the impact of the Covid-19 pandemic on open access publishing in Canada. Their stated goal is for greater collaboration “to advance science for the public good” in a scholarly publishing ecosystem dominated by non-for-profit publishers and university presses.

In Part 1, the authors review the open access policies of three federal granting agencies – the Natural Sciences and Engineering Research Council of Canada (NSERC), the Social Sciences and Humanities Research Council (SSHRC) and Canadian Institutes of Health Research (CIHR) – the Fonds de recherche du Québec (FRQ – the province of Quebec’s research funding body) and the Université de Montréal, all of which established a requirement for funded research to be deposited in an open repository within 12 months of article publication before 2020. The Université de Montréal’s policy covers articles, book chapters and conference proceedings. FRQ has since amended its policy to eliminate the 12 month embargo in 2023. These open access policies complemented new initiatives to make Covid-19 and related research and data openly available. Many publishers eliminated their paywalls to research and data to encourage rapid and wide sharing of Covid research. However, some these policy changes were limited in scope and temporary. Questions about increasing compliance with the open access requirements and expanding participation in knowledge production continue. Canada’s First Nations communities Ownership, Control, Access, and Possession (OCAP)™ Principles is recognized as a positive example of legitimizing different perspectives and needs within research processes and outcomes. Much more must be done.

In Part II, the authors address three areas – the pre-publication stage, peer review and infrastructure – that need attention to make research more permanently and widely open. Researchers shared pre-prints at a much higher rate in the past year in an effort to quickly solve the problems associated with Covid-19. Sharing research earlier in the production lifecycle raised questions of validity and trust, and the traditional publication peer review process was challenged. Certainly the need for research review was highlighted, and openly sharing methodologies, data and analysis contributed to a faster and more widely participatory review process. Making research outputs available without fees to users has been recognized as only part of open scholarly exchange; the content must also have standard metadata, persistent identifiers (ORCiD & DOIs, for example), and indexing, and it must be delivered on open-by-default platforms. Without the infrastructure, research is not discoverable or useable. Financial stability is also needed to maintain these systems, and shrinking Covid-19 budgets threaten the organizations that have been supporting open scholarship.

The authors of this series on open access in Canada address goals, rationales and challenges to open scholarship which are familiar to readers everywhere. I hope they will follow-up with proposed solutions and actions taken to achieve the goal of advancing science for the public good.

Inclusive access to college textbooks: is it the right choice?

Inclusive access, or automated textbook billing, has become a popular sales model for textbook publishers to provide course materials to students. Pitched by publishers as a way to save students money on skyrocketing textbook prices, the inclusive access model may not be entirely positive. The new InclusiveAccess.org initiative was developed by open education resource organizations, Creative Commons, SPARC, student PIRGs and others to provide facts about automated textbook billing programs and their consequences. The site has sections for administrators, students, faculty and policymakers and addresses myths vs. facts, frequently asked questions and resources for further action.

Use Google Scholar citations to update your ORCiD profile

There are many and ever growing reasons to create, populate and maintain an ORCiD unique researcher and contributor identifier and profile, but a barrier can be spending the time to populate your profile with your complete works, especially if they were created or published before the publisher was integrated with ORCiD. ORCiD is integrated with a number of databases, such MLA International Bibliography, Scopus and The Lens, from which you can import citations or patents. Still you may not find as many of your works in these databases as are cited by Google Scholar. Though Google Scholar is not yet integrated with ORCiD, you can export citations from your Google Scholar profile to BibTeX, and then import that file to your ORCiD profile. This blog post provides step-by-step instructions with screen shots for using this method to build out your ORCiD profile.