If the difference between science and superstition is that science is reproducible, then the methods used to conduct a study must be made available to replicate it. Across many disciplines research now relies on applying code to data sets to produce outputs that are analyzed, and these results then become the foundations of new knowledge and subjects of study. In Keeping science reproducible in a world of custom data and code, Ben Klemens outlines these scientific research fundamentals and the challenges to enabling reproducibility.
Open data and code are increasingly recognized by journal editors as critical pieces of research that authors must provide prior to acceptance for publication, but this is not true equally across fields. Klemens checked submission guidelines for 2,700 Springer journals. Across 13 fields, 37% of ecology, 23% of management, 7% of education and 6% of surgery journals required data and code submission. However, many of these journals accepted an author statement that “data is available upon request”, yet some who have tested this system have received no or unfruitful responses from the author(s). To address this problem, funders often require data management and sharing plans.
Further reproducibility problems with studies arise from poorly articulated procedures, badly formatted code, code errors, or variances in code platform, versions and add-ons. Writing clear, reproducible code takes expertise and time that may or may not be available to a research team. In an extension on the “scoop” defense – the fear that others will publish research based on data before the data collector does – some researchers are reluctant to share their data and code until they have used them in all possible ways they can imagine, so some funders do grant a period of exclusivity. In fact, open data sharing with proper metadata offers broader opportunities to receive credit for that work. Unfortunately, the metrics to count data and code references are not as widely available as those to count article citations.
Concerns are widespread about how to properly share data and the code that produces study results, and incentives still favor quantity of article citations. Still, the increasing number of journals who hire data editors, grants that fund coders, and available tools to share are signs of greater recognition of their importance. The central role of computer analysis is here to stay, and more props are coming for those who make it happen.