Purpose: Scientific evidence builds upon the publication of research findings. This process depends critically on complete and transparent reporting of methods and data integral to findings. However, multiple forces have conspired to discourage reproducible research practices. Reproducible research is work in which the original data, code, and other materials (e.g. protocol, software) is available so that findings can be reproduced and potentially replicated. This review synthesizes reproducible research methods, policies, and future directions pertinent to comparative effectiveness researchers.
Method: We reviewed the published literature describing methods and policies for reproducible research. Searches were limited to English language articles from 1/1966 to 4/2012 using key word such as “reproducible,” “data-sharing,” and “publishing standards.” Additional articles were identified from bibliographies, the related title function of PubMed, and key informants; policies of U.S. funding agencies were also examined. We were specifically interested in reproducible research methods from diverse fields, including the clinical sciences, epidemiology, biological sciences, computational mathematics, and computer science.
Result: 234 potentially relevant articles were identified; 122 described methods, policies, or future directions and were included. While federal funding agencies have data sharing policies, many of these policies do not address the sharing of code or other materials necessary for reproducible research and provide weak incentives for data sharing. Potential policies to promote reproducible research include: registration of all studies, including observational studies; assignment of digital object identifiers to give credit to sharers of materials; funding for data sharing and reproducible research practices; and the use of data repositories linked to study publications. The Creative Commons offers free use of “off the shelf” licensing options to investigators. Although substantial work has been done to develop methods to support de-identification of data, a set of standards agreed upon by funders, ethics committees, and institutions has not been established. Few studies have examined the costs of deposition and sharing of materials.
Conclusion: Adoption of reproducible research methods has been most rapid in the basic and computational sciences but could be readily applied to decision modeling, systematic reviews, observational studies, and clinical trials. Institutions, journals, and funding agencies all have a role to play in promoting reproducible research practices. Additional infrastructure and training of investigators is needed to support sharing of data, code, and materials.
See more of: The 34th Annual Meeting of the Society for Medical Decision Making