Skip to main content
Fig. 1 | BioData Mining

Fig. 1

From: Predicting molecular initiating events using chemical target annotations and gene expression

Fig. 1

Diagram of data processing and classifier training procedure. From left to right, LINCS chemical identifiers are matched to DTXSIDs using ChemReg, then combined with chemical-target annotations from RefChemDB to produce the integrated data for model training and evaluation. A subset of “exemplar” chemicals that are associated with MIEs to be modeled are excluded from all training data sets for validation purposes. Training data sets for each MIE classifier are then partitioned, and classifiers are trained with 5-fold cross validation using the caret package in R. 500 “null” classifiers are generated simultaneously for the purpose of empirical significance testing. Performance for each classifier is evaluated using a MIE-specific holdout data set, internal accuracy, and empirical significance testing, identifying a set of candidate high performance classifiers. This set of candidate high performance classifiers undergoes a final phase of screening using exemplar chemical-based validation

Back to article page