Creating open scholarship by teaching and learning with open scholarship, in community

The authors of “Toward a culture of open scholarship: the role of pedagogical communities” note that as the open scholarship movement gains momentum, its goals of social justice, research quality and inclusive research culture are further advanced by training scholars in the practices of study preregistration, data sharing, replication studies and open access publishing. They argue that “open scholarship is incomplete without open educational practices.”

Integrating these practices throughout higher education curriculum is better achieved with pedagogical communities. They name several –  Open Scholarship Knowledge Base (OSKB), Principles and Practices of Open Research (PaPOR TraIL), Reproducibility for Everyone (R4E) – among others, but they elaborate on the Framework for Open and Reproducible Research Training (FORRT). FORRT includes 12 initiatives to date, including a glossary of open scholarship terms, summaries of open and reproducible science literature, and lesson plans. These pedagogical communities foster participation and collaboration, thus driving a grass roots movement for open scholarship to generate knowledge as a public good for all of humanity.

Open data and code as partial solutions to scientific research reproducibility

If the difference between science and superstition is that science is reproducible, then the methods used to conduct a study must be made available to replicate it. Across many disciplines research now relies on applying code to data sets to produce outputs that are analyzed, and these results then become the foundations of new knowledge and subjects of study. In Keeping science reproducible in a world of custom data and code, Ben Klemens outlines these scientific research fundamentals and the challenges to enabling reproducibility.

Open data and code are increasingly recognized by journal editors as critical pieces of research that authors must provide prior to acceptance for publication, but this is not true equally across fields. Klemens checked submission guidelines for 2,700 Springer journals. Across 13 fields, 37% of ecology, 23% of management, 7% of education and 6% of surgery journals required data and code submission. However, many of these journals accepted an author statement that “data is available upon request”, yet some who have tested this system have received no or unfruitful responses from the author(s). To address this problem, funders often require data management and sharing plans.

Further reproducibility problems with studies arise from poorly articulated procedures, badly formatted code, code errors, or variances in code platform, versions and add-ons. Writing clear, reproducible code takes expertise and time that may or may not be available to a research team. In an extension on the “scoop” defense – the fear that others will publish research based on data before the data collector does – some researchers are reluctant to share their data and code until they have used them in all possible ways they can imagine, so some funders do grant a period of exclusivity. In fact, open data sharing with proper metadata offers broader opportunities to receive credit for that work. Unfortunately, the metrics to count data and code references are not as widely available as those to count article citations.

Concerns are widespread about how to properly share data and the code that produces study results, and incentives still favor quantity of article citations. Still, the increasing number of journals who hire data editors, grants that fund coders, and available tools to share are signs of greater recognition of their importance. The central role of computer analysis is here to stay, and more props are coming for those who make it happen.