Skip to main content

Table 2 Methods and approaches that can enable the reproducibility of biomedical research findings using electronic health records

From: Methods for enhancing the reproducibility of biomedical research findings using electronic health records

Method/approach

Recommendations

Scientific software engineering principles

Create generic functions for common EHR data cleaning and preprocessing operations which can be shared with the community

 

Produce functions for defining study exposures, covariates and clinical outcomes across datasets which can be maintained across research groups and reused across many research studies

 

Create modules for logically grouping common EHR operations e.g. study population definitions or datasource manipulation to enable code maintainability

 

Create tests for individual functions and modules to ensure the robustness and correctness of results

 

Track changes in analytical code and phenotypt definitions using controlled clinical terminology terms by making use of a source code revision control system

 

Use formal software engineering best-practices to document workflows and data manipulation operations

Standardized analytical approaches

Build and distribute libraries for common EHR data manipulation or statistical analysis and include sufficient detail (e.g. command line arguments) for all tools used

 

Produce and annotate machine-readable EHR phenotyping algorithms that can be systematically curated and reused by the community

 

Use Digital Object Identifiers (DOIs) for transforming research artifacts into shareable citable resources and cross-reference from research output

 

Deposit research resources (e.g. algorithms, code) in open-access repositories or software scientific journals and cross-reference from research output

 

Virtual machines can potentially be used to encapsulate the data, operating system, analytical software and algorithms used to generate a manuscript and where applicable can be made available for others to reproduce the analytical pipeline.

Literate programming

Encapsulate both logic and programming code using literate programming approaches and tools which ensure logic and underlying processing code coexist