Skip to main content
Fig. 1 | BioData Mining

Fig. 1

From: Detecting gene-gene interactions using a permutation-based random forest method

Fig. 1

Overview of the permuted Random Forest (pRF). Shown in panel a is the original dataset with all the SNP information (0, 1 or 2) and class (cases-control status). Each row represents a sample; different three colors in the SNP columns indicate different genotypes, and two colors in the class column indicate case-control status. b shows the first permutation framework that keeps SNPs’ main effects, in which cases and controls are separated, two selected SNP columns shuffle the information separately within each class. c shows the second permutation framework that keeps SNPs’ interaction and main effects, in which cases and controls are separated, two selected SNPs shuffle their information together by keeping their genotype combinations, separately within each class. RF is trained using original dataset and tested using the datasets from the above two permutation schemes. Error rates are calculated by averaging the classification errors across all samples. The same process is repeated 10 times and the error rates are averaged from 10 permutation results. The average classification error from the first permutation framework is named E1, while the average classification error from the second permutation framework is named E2. The whole process is repeated on all pairs of SNPs and the difference in average error rates (Δ E = E1 - E2) are calculated and ranked to identify the top candidates

Back to article page