Open data and code as partial solutions to scientific research reproducibility

If the difference between science and superstition is that science is reproducible, then the methods used to conduct a study must be made available to replicate it. Across many disciplines research now relies on applying code to data sets to produce outputs that are analyzed, and these results then become the foundations of new knowledge and subjects of study. In Keeping science reproducible in a world of custom data and code, Ben Klemens outlines these scientific research fundamentals and the challenges to enabling reproducibility.

Open data and code are increasingly recognized by journal editors as critical pieces of research that authors must provide prior to acceptance for publication, but this is not true equally across fields. Klemens checked submission guidelines for 2,700 Springer journals. Across 13 fields, 37% of ecology, 23% of management, 7% of education and 6% of surgery journals required data and code submission. However, many of these journals accepted an author statement that “data is available upon request”, yet some who have tested this system have received no or unfruitful responses from the author(s). To address this problem, funders often require data management and sharing plans.

Further reproducibility problems with studies arise from poorly articulated procedures, badly formatted code, code errors, or variances in code platform, versions and add-ons. Writing clear, reproducible code takes expertise and time that may or may not be available to a research team. In an extension on the “scoop” defense – the fear that others will publish research based on data before the data collector does – some researchers are reluctant to share their data and code until they have used them in all possible ways they can imagine, so some funders do grant a period of exclusivity. In fact, open data sharing with proper metadata offers broader opportunities to receive credit for that work. Unfortunately, the metrics to count data and code references are not as widely available as those to count article citations.

Concerns are widespread about how to properly share data and the code that produces study results, and incentives still favor quantity of article citations. Still, the increasing number of journals who hire data editors, grants that fund coders, and available tools to share are signs of greater recognition of their importance. The central role of computer analysis is here to stay, and more props are coming for those who make it happen.

Timeline: open source, open access & open scholarship

Flavors of Science has updated an interactive timeline of milestones in open source software, open access and open scholarship starting with basics, such as the creation of the printing press in 1439 and ending with the release of AmeliCA, a cooperative infrastructure project for scientific communication in Latin America and the Global South, in November 2018. It’s a fascinating history to follow through the development of Multics in 1964, CLACSO in 1967, Oxford Text Archive in 1976, GNU Project in 1984, the serials crisis in 1990, launch of arXiv and release of Linux system kernel in 1991, and so much more!

The timeline was first developed as a supporting document to the collaborative paper: Tennant, Jonathan, Ritwik Agarwal, Ksenija Baždarić, David Brassard, Tom Crick, Daniel J. Dunleavy, Thomas R. Evans, et al. 2020. “A Tale of Two ‘opens’: Intersections Between Free and Open Source Software and Open Scholarship.” SocArXiv. March 6. https://doi.org/10.31235/osf.io/2kxq8

Publishing, and receiving credit, for software code

Daniel S. Katz and Hollydawn Murray, supported by more than 15 publishers, write a guest post in the Scholarly Kitchen, “Citing Software in Scholarly Publishing to Improve Reproducibility, Reuse and Credit.” They make a case for authors and publishers to publish and cite software code itself, rather than merely an article about the code or research that was generated using code. Properly cited, open software enables others to reproduce research and modify and reuse that code for further developments. Those who write the code deserve credit for this critical work and they can’t receive it if their software isn’t properly cited. The FORCE11 Software Citation Implementation Working Group has proposed a set of customizable guidelines to clearly identify the software and credit its developers and maintainers.