Scientific software engineering principles
|
Create generic functions for common EHR data cleaning and preprocessing operations which can be shared with the community
|
|
Produce functions for defining study exposures, covariates and clinical outcomes across datasets which can be maintained across research groups and reused across many research studies
|
|
Create modules for logically grouping common EHR operations e.g. study population definitions or datasource manipulation to enable code maintainability
|
|
Create tests for individual functions and modules to ensure the robustness and correctness of results
|
|
Track changes in analytical code and phenotypt definitions using controlled clinical terminology terms by making use of a source code revision control system
|
|
Use formal software engineering best-practices to document workflows and data manipulation operations
|
Standardized analytical approaches
|
Build and distribute libraries for common EHR data manipulation or statistical analysis and include sufficient detail (e.g. command line arguments) for all tools used
|
|
Produce and annotate machine-readable EHR phenotyping algorithms that can be systematically curated and reused by the community
|
|
Use Digital Object Identifiers (DOIs) for transforming research artifacts into shareable citable resources and cross-reference from research output
|
|
Deposit research resources (e.g. algorithms, code) in open-access repositories or software scientific journals and cross-reference from research output
|
|
Virtual machines can potentially be used to encapsulate the data, operating system, analytical software and algorithms used to generate a manuscript and where applicable can be made available for others to reproduce the analytical pipeline.
|
Literate programming
|
Encapsulate both logic and programming code using literate programming approaches and tools which ensure logic and underlying processing code coexist
|