In this work, we have proposed an Aggregated Multifactor Dimensionality Reduction (A-MDR) model to elucidate complex and non-linear genetic associations contributing to disease risk and variability in response to treatment. The proposed method is innovative in three important ways: 1) a continuous GxG enriched risk score is generated to replace the dichotomous risk factor in prediction of susceptibility to disorders; 2) new measures of gene-gene interaction using pOR, pRR, and pChi along with p-values and confidence intervals are proposed to detect and characterize multiple gene-gene interactions; and, 3) a radial network is generated to depict patterns of epistasis. This approach allows for prediction on not just a single interactive model, which is important given the growing appreciation in human genetics for the accumulative impact of a large number of variants with low effect size
. By pooling moderate and inter-related genetic contributors together, the A-MDR model becomes robust and predictive of complex traits. In addition to GxG interactions, the A-MDR can also be applied to model gene-environment interactions where environmental risk factors such as smoking, alcohol consumption, exercise, and diet can be incorporated into multi-factorial models.
The original MDR model selects an optimal multi-factorial (SNP) combination for each two-way, three-way or higher order interaction. When multiple genes function together in a pathway, the original MDR is prone to overlook genes with weaker signals and lose power for selecting one optimal GxG interaction in cross-validation. For the MTX data, the optimal two-locus interaction detected by the original MDR among 7 candidate SNPs was ATIC (rs4673990) + MTHFD2 (rs12196) with testing accuracy of 0.73 (p=0.0005). However, there exist other pairs of interactions with comparable accuracy. As a result, CVC, which measures the percentage of times that an optimal GxG interaction is selected when splitting the training and validation sets randomly, was not significant (CVC=8/10, p=0.2700). Our A-MDR analysis in Table
5 identified 15 pairs of two-locus interactions. When multiple GxG interactions with bio-equivalent effects are involved in epistasis, the original MDR will select an optimal model, by chance and lose some of the real pathway-based signals. The recent extended MDR methods, including OR-MDR
 and G-MDR
, adopt the same strategy of selecting one optimal GxG interaction as does the original MDR, which means they have the same limitations.
A continuous GxG enriched risk score is another major distinction between A-MDR and all the majority of existing MDR models, in which a binary risk factor is utilized to predict the outcome variable. For M-way interactions, the existing MDR models classify ~3
genotypic combinations as either high-risk or low-risk. A-MDR evolves from the traditional MDR outputs to the predisposing risk scores and epistasis based network as shown in Figure
Another important result of the simulation experiments is the potential of A-MDR to detect models that include genetic heterogeneity. Previous work with the original MDR has shown that heterogeneity is disastrous when using MDR to detect interactions
. Because of the use of the continuous enrichment score, A-MDR is less impacted by heterogeneity in the enclosed simulations. Further evaluation of this initial result with expanded simulations and real data applications will be an important next step.
We explore a radial network (Figure
2A) to depict patterns of epistasis. From the systems biology perspective, genetic variants might jointly impact the disease susceptibility and response to treatment. The gene-gene interaction network reveals intriguing information when interpreted in the context of what we know about the folate pathway and the effect that MTX has upon the disruption of this pathway as it relates to arthritis. ATIC and MTHFD2 were the two genes with the strongest interaction, and it is of interest to note that the genes included in the model (Figure
2A) include a transporter involved in folate uptake into mitochondria, SLC25A32, and the bifunctional methylenetetrahydrofolate dehydrogenase-cyclohydrolase MTHFD2, a key constituent of the mitochondrial folate pathway. The mitochondrial folate pathway is responsible for the generation of formate (in the form of 10-formylTHF) specifically to support purine biosynthesis, represented by ATIC, GART, and ITPA. The anti-inflammatory effect of low-dose MTX used to treat JIA and RA is thought to be due the anti-inflammatory effects of adenosine, formed as a consequence of the inhibitory effects of MTX on amino-imidazole carboxamide ribonucleotide (AICAR) transformylase (gene name, ATIC), which promotes the accumulation of AICAR ribotide, inhibiting adenosine deaminase and leading to a build up of adenosine, a potent anti-inflammatory agent
. A disruption of this process may result in a decreased anti-inflammatory effect of the drug. Therefore, the combined effect of SNPs in ATIC and MTHFD2 may indeed yield a more clinically apparent result by altering the anti-inflammatory effects of methotrexate. There is a potential to apply the proposed method to GWAS study by dissecting SNPs into pathways in order to detect GxG interactions in GWAS pathways. The major computational challenges from the proposed A-MDR and other approach in MDR framework are in the generation of p-values for MDR. MDR permutation computing time is largely dependent on the dimension of data sets. In other words, the computing time increases as the number of SNPs and/or the number of subjects increases. Several works have been devoted to improve the efficiency and shorten the computing time in MDR analysis in high-throughput data
[5, 31, 32]. We will defer interested readers to the corresponding citations for computing issues in high-throughput MDR analysis. These computational limitations make our strategy appropriate in large scale candidate gene studies, but may be limited in application to genome-wide association studies until further improvements in computing speed are realized or very large-scale computing resources are available.
In summary, bioinformatics challenges remain in detecting and modeling epistasis in complex biological traits. We have developed a new A-MDR framework to interpret complex genetic variation and have proposed predicting an outcome using a continuous risk factor. Several other extensions and modifications of the original MDR have been proposed in the literature. Incorporation of valuable features from other MDR extension models into the A-MDR framework is worth further investigation. Prospective studies and validation in independent samples are needed to assess reliability of the A-MDR model’s predictive ability. Tools for statistical inference, including asymptotic distributions of the proposed test statistics, need to be developed to save computing time and improve reliability.