Skip to main content

Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes



Multifactor Dimensionality Reduction (MDR) is a popular and successful data mining method developed to characterize and detect nonlinear complex gene-gene interactions (epistasis) that are associated with disease susceptibility. Because MDR uses a combinatorial search strategy to detect interaction, several filtration techniques have been developed to remove genes (SNPs) that have no interactive effects prior to analysis. However, the cutoff values implemented for these filtration methods are arbitrary, therefore different choices of cutoff values will lead to different selections of genes (SNPs).


We suggest incorporating a global test of p-values to filtration procedures to identify the optimal number of genes/SNPs for further MDR analysis and demonstrate this approach using a ReliefF filter technique. We compare the performance of different global testing procedures in this context, including the Kolmogorov-Smirnov test, the inverse chi-square test, the inverse normal test, the logit test, the Wilcoxon test and Tippett’s test. Additionally we demonstrate the approach on a real data application with a candidate gene study of drug response in Juvenile Idiopathic Arthritis.


Extensive simulation of correlated p-values show that the inverse chi-square test is the most appropriate approach to be incorporated with the screening approach to determine the optimal number of SNPs for the final MDR analysis. The Kolmogorov-Smirnov test has high inflation of Type I errors when p-values are highly correlated or when p-values peak near the center of histogram. Tippett’s test has very low power when the effect size of GxG interactions is small.


The proposed global tests can serve as a screening approach prior to individual tests to prevent false discovery. Strong power in small sample sizes and well controlled Type I error in absence of GxG interactions make global tests highly recommended in epistasis studies.

Peer Review reports


Recent advances in genotyping technology have allowed for the rapid and easy interrogation of large numbers of genetic variants for association with common, complex disease. While there have been a number of successes in association mapping studies, the associations found typically explain very little of the overall heritability of the traits being studied. There are several potential reasons for this “missing heritability”, and one of those potential explanations is epistatic interactions (gene-gene interactions). It is hypothesized that such interactions play an important role in the etiology of complex (non-Mendelian) traits, but detecting such interactions presents a number of statistical and computation challenges [1]. In response to these challenges, a number of new data-mining approaches have been developed [2].

Multifactor Dimensionality Reduction (MDR) is a popular and highly successful statistical method developed to detect and characterize nonlinear complex gene-gene or gene-environment interactions (epistasis) that could be associated with disease susceptibility. The method was first proposed by Ritchie et al. [3] to detect estrogen-metabolism gene interactions associated with sporadic breast cancer. MDR has several advantages over more traditional statistical approaches such as logistic regression modeling: 1) MDR is a non-parametric approach with no requirement to the distribution of data. 2) MDR can analyze non-linear associations in genotypic combinations. 3) MDR has improved power to detect gene-gene interaction in small to moderate sample sizes. Since the introduction of the original MDR implementation, many works have been published to improve modeling and prediction accuracy with the MDR method. For more information on the history and development of the method, please refer to the comprehensive review of the MDR and its extended methods by Moore [4].

While the MDR approach is widely used, to make this paper self-contained, we give a brief description of the method. MDR is often applied to genotypic data to detect gene-gene (GxG) interactions among single nucleotide polymorphism (SNP) and the original implementation of this method can be extended to detect the interactions in other types of data when the explanatory variables are categorical variables and the outcome variable is binary. As the scale of association studies has expanded (with larger numbers of SNPs), a filtration step is often implemented in the first step of MDR analysis to remove noisy SNPs. In this step, a subset of genes that are unlikely to interact with others is removed by filtration methods such as SURF [5], TuRF [6] etc. ReliefF [7], has become a commonly applied filter, and we will focus on this filter in the current study. After this step, the remaining SNPs are used for the dimensionality reduction and model selection steps of the MDR algorithm. In this step, all variable combinations are considered for k-way (k = 2, 3, 4 …) interactions. For each multi-locus combination, the ratio of cases to controls within each contingency table cell is calculated, and then each cell is assigned a status of high-risk or low-risk by comparing this ratio to the ratio of cases: controls in the overall dataset. Cells with a ratio greater than the overall ratio are assigned “high-risk” status, and those with a ratio lower than the overall ratio are assigned “low-risk” status. Subsequently, a balanced classification accuracy is calculated for each multi-locus combination, and the optimal model is selected based on the highest balanced accuracy. This model selection approach is performed in concert with a cross-validation procedure, usually 10-fold, which randomly divides the whole data set into a training set and a validation set. The testing accuracy is the balanced accuracy when the classification rule developed from the training data set is applied to the testing data set. The cross validation count (CVC) summarizes the number of times a model is the top model in each of the cross-validation splits of the data. The optimal k-way (k = 2, 3, 4 …) interaction model with the highest training accuracy and the highest CVC is then selected as the winner model. Finally, the significance of the selected optimal model is assessed by permutation testing (comparing the testing/prediction accuracy against the empirical distribution built by at least 1000 permutations). MDR can be performed by an open source software mdr2.0 and model goodness-of-fit and significance can be assessed using software mdrpt1.0 [8] or in the MDR.R R software package [9].

In this work, we seek to address two existing issues in the current MDR analysis. First, current filtration approaches do not evaluate the significance of the SNPs considered (or provide p-value for their measures) and there is no clear guideline for the cutoff point of such filtration measures. This leads inconsistency in the optimal number of SNPs remaining for the final MDR analysis.

Second, as there is a growing appreciation that the etiology of human diseases is extremely complex, many investigators are using MDR to evaluate many potential interactive effects, and not just a single final best model [10]. In this type of approach, not one but numerous tests can be performed in search of an optimal model in the k-way interaction, as the number of partitions for k-way interaction over m loci is m k = m ! k ! ( m k ) ! . For instance, if an investigator is interested to detect significant 2-way interactions among 50 SNPs, 1225 tests will be performed which will inflate the family-wise Type I error rate to 1 ( 1 α ) m k = 1 ( 1 0.05 ) 1225 1 , where α = 0.05 is the nominal error rate for an individual test without proper control.

False discoveries and losing power to detect the signal after the multiplicity adjustment are two concurrent issues in analyzing high dimensional data. Instead of replacing all the existing methods to control the false discovery, we propose to add the global tests to the current MDR framework as an ad-hoc screening process to prevent false discoveries. We will explain the rationale and utility of global tests in Section Global tests.

Incorporating global tests within the filtration procedures can reveal a trend of gene interactive patterns when noisy genes (SNPs) are removed step by step using ReliefF or other filtration techniques. In the current study, we demonstrate this approach in a candidate gene study of drug response in Juvenile Idiopathic Arthritis, using the ReleifF filter. Additionally, we perform a simulation study comparing several different global testing approaches for a range of genetic etiologies to compare the power of different approaches.


Global tests

The idea of global testing is to assess the patterns of p-values from multiple testing of k way interactions among m loci ( n = m k = m ! k ! ( m k ) ! tests). Under the null hypothesis of no GxG interactions, the p-values will follow uniform (0, 1). To see this, let T be the test statistic with the cumulative distribution function (CDF) F 0 (t) and the inverse CDF F 0 1 ( t ) for t R under H 0 . Let P be the p-value corresponding to the test statistic T. Under H 0 , we have Pr ( P p ) = Pr ( F 0 1 ( P ) F 0 1 ( p ) ) = F 0 ( F 0 1 ( p ) ) = p for p ( 0 , 1 ) (Pattern 1 in Figure 1).

Figure 1
figure 1

Four patterns of p-values.

When a proportion of m loci have k- way interactions (H α ), it is expected to observe the p-values shifting towards 0. To see this, let F(t) be the CDF under H α and F ( t ) > F 0 ( t ) for t R , then Pr ( P p ) = Pr ( F 0 1 ( P ) ) F 0 1 ( p ) ) = F ( F 0 1 ( p ) ) > p for p ( 0 , 1 ) (Pattern 2 in Figure 1). Due to correlations/linkage disequilibrium among SNPs and the redundancy of SNPs in high order models, sometimes p-values shift toward 1, i.e. Pr ( P p ) < p for p ( 0 , 1 ) (Pattern 3 in Figure 1). When p-values are correlated, they might peak near center of histogram (Pattern 4 in Figure 1). Patterns 3 and 4 are deviated from uniformity but they do not indicate potential k- way interactions among m loci.

The rationale of global testing is to ensure p-values are not randomly and uniformly distributed (Pattern 1) before we investigate each single p-value. Correlated p-values without significant effects (H 0) might even shift toward 1 or peak near the center (Patterns 3 and 4). The goals are to rule out Patterns 1, 3 and 4 and only move forward to the final MDR analysis when p-values are in Pattern 2.

If the entire set of p-value follows a uniform distribution, then it is very likely for a small p-value to be a false discovery by chance. As shown in Figure 1, the entire set of p-values might have four different Patterns: uniform, shifting to 0, shifting to 1 or peak near the center. In all four cases, we notice that the minimum p-values are less than 0.05 (0.0001111 in Pattern 1, 2.65e-6 in Pattern 2, 0.00617 in Pattern 3 and 0.003734 in Pattern 4). If we take the distribution of the entire set of p-values into account, then the minimum p-values in Patterns 1, 3 and 4 are false discoveries by chance.

Combined global testing and filtration technique

A global test will serve as an ad-hoc diagnostic tool to exam all p-values from k- way interactions among m genes in MDR-analysis. These p-values come from empirical distributions generated through permutation testing. Let P = p i ( i = 1 , 2 , , n ) be identical and independently distributed (i.i.d.) p-values from the MDR analysis of k–way interactions among m– loci ( n = m k = m ! k ! ( m k ) ! ). We will consider a one-sided test to compare

{ H 0 : P ~ U n i f o r m ( 0 , 1 ) v e r s u s H a : Pr ( P p ) > p f o r p ( 0 , 1 )

Rejecting H 0 indicates significant GxG interactions in some target genes.

We propose incorporating global testing of p-values with ReliefF [7] gene filtration technique to detect the patterns of k-way GxG interaction among m genes (SNPs) and remove noisy genes (SNPs) with little interactive effects to determine the optimal number of SNPs for the final MDR analysis. The ReliefF algorithm estimates weights to measure the potential accuracy of attributes in prediction of phenotype. The redundant attribute will be assigned a lower score. When applied in gene-gene interactions, a higher ReliefF score indicates a stronger interactive effect for the corresponding gene (SNP). ReliefF algorithm first uses x- nearest neighborhood approach ( x = 1 , 2 , , m ) to match a selected subject with x subjects in neighborhood (with shortest distances across all SNPs) from the control group and from test group respectively. An attribute (SNP) will be assigned score 1 (−1) if the attribute from the selected subject matches (mismatches) one of x nearest subjects from the same phenotype group. Similarly, an attribute will be assigned score −1 (1) if the attribute from the selected subject matches (mismatches) one of the nearest subjects from the different phenotype group. The score will be aggregated for all subjects and normalized (divided) by the total number of subjects and neighbors. Detailed description of ReliefF algorithm for filtering genotyping data can be found in Section 3 of [4].

The flow chart of the testing procedure is presented in Figure 2. Starting with a set of m candidate SNPs, perform the ReliefF algorithm on m SNPs and sort SNPs by ReliefF scores in an ascending order. Generate p-values for the exhaustive search of k- way interactions among a total of m SNPs using the original or extended MDR methods. For each k- way interaction, one p-value of MDR analysis is generated by permutation test. Let m 1 be the number of remaining SNPs. Perform the global testing on m 1 k p-values. Remove one SNP that has the lowest ReliefF score and all interactions corresponding to this SNP. Perform the global testing about hypothesis testing (1) on the p-values of the k-way interactions of the remaining SNPs. Continue to remove SNPs with the lowest ReliefF score one by one and perform global testing after each removal of a SNP. One can stop the process when the remaining SNPs reach a predetermined minimum number. Choose the optimal number of SNPs for the final MDR analysis as the largest number of SNPs with global testing p-value < α. Often we set α = 0.05 . To be more rigorous of controlling family-wise Type I error, one can apply FDR algorithm on the global testing p-values and the first p-value with FDR < 0.05 will determine the optimal number of SNPs.

Figure 2
figure 2

Flow chart of global testing of p-values in conjunction with filtration process.

Global tests of P-values

Here we introduce 6 global tests that can be applied to test hypothesis (1). These six tests are based on different approaches to detect deviation from uniformity. We will survey these methods and compare their power using a case study and Monte-Carlo simulations.

Test 1 one sided Kolmogorov-Smirnov test [KS]

KS test is a non-parametric test that can be applied to compare the distance between an empirical distribution of i.i.d. p-values and Uniform(0, 1). For hypothesis test (1), define the one-sided KS statistic as D n + = sup p 1 n i = 1 n I { p i p } p , where I { p i p } is an indicator function which equals 1 if p i p and 0 if p i > p . According to [11], the p-value of one-sided KS test follows Pr D n + > t = t i = 0 [ n ( 1 t ) ] n i 1 t i / n n i t + i / n i 1 where [ n ( 1 t ) ] is the largest integer not greater than n ( 1 t ) and t ( 0 , 1 ) .

Test 2 one-sided inverse chi-square test [inverse chi]

Fisher [12] shows that if p i ~ i . i . d . U n i f o r m ( 0 , 1 ) for i = 1 , 2 , , n , then 2 i = 1 n ln ( p i ) ~ χ 2 n 2 where χ 2 n 2 is chi-square distribution with 2n degrees of freedom. For a one sided test (1), reject H 0 if 2 i = 1 n ln ( p i ) > χ 2 n , 1 α 2 where χ 2 n , 1 α 2 is ( 1 α ) * 100 % percentile of χ 2 n 2 .

Test 3 one sided inverse normal test [inverse norm]

Transform p-value to normal z score by letting z i = Φ 1 ( p i ) where Φ 1 is inverse cumulative normal distribution. Under H 0 , Z = i = 1 n z i / n ~ N ( 0 , 1 ) . For one sided test (1), reject H 0 if Z < Z α where Z α is α * 100 % percentile of the standard normal distribution.

Test 4 one sided logit test [logit]

Logit transform p-value by letting L = i = 1 n ln ( p i / ( 1 p i ) ) . [13] shows that under H 0, the distribution of L can be closely approximated by Student’s t-distribution with 5 n + 4 degrees of freedom, namely L * = L 3 ( 5 n + 4 ) π 2 n ( 5 n + 2 ) t 5 n + 4 . Therefore, for one-sided test (1), we can reject H 0 if L * < t 5 n + 4 , α where t 5 n + 4 , α is α * 100 % percentile of the t-distribution.

Test 5 one sided Wilcoxon test [Wilcoxon]

Order n p-values from MDR testing along with n 2 observations randomly drawn from Uniform(0,1) from least to greatest and denote them by S 1 , S 2 , , S N with N = n + n 2 . Let W be the sum of the ranks corresponding to n p-values from MDR testing. For one-sided test (1), we can reject H 0 if W n ( N + 1 ) ω α where the constant ω α is chosen to make the Type I error probability equal α. Values of ω α are given in Table A6 by [14]. For large sample sizes, i.e. min(n,n 2) going to infinity, one can apply normal approximation on the standardized W.

Test 6 Tippett and Wilkinson’s test [Tippett]

Tippett’s Test [15] is based on the property of the minimal p-value in multiple testing. Let p ( 1 ) , p ( 2 ) , , p ( n ) be the ordered p-values in an ascending order. When p-values identically and independently follow Uniform(0,1) distribution, Tippett’s test will reject H 0 if p ( 1 ) < 1 ( 1 α ) 1 / n . The p-value of Tippett’s test equals 1 ( 1 p ( 1 ) ) n . Tippett’s test is very easy to perform but it only takes the smallest p-value into account.

Wilkinson [16] extended Tippett’s procedure to the r th smallest p-values where r = 1 , 2 , , n . By expanding ( α + ( 1 α ) ) n , Wilkinson tabulated the probability, denoted by C γ,α of obtaining r significant statistics by chance in a group of n tests. Suppose there are r tests with p-values less than α, Wilkinson’s test rejects H 0 if c r , α < α [17]. Because P (r) has been distributed with parameters r and n-r + 1, tables of the incomplete beta function can be used to obtain critical values of P (r) directly. In our work, we will not include Wilkinson’s test in case study and power simulation because this method does not provide p-value for the testing results.

Case study

We used a real dataset to illustrate how to apply our proposed global testing to prevent false discovery and to determine the optimal number of SNPs for the final MDR analysis. Juvenile Idiopathic Arthritis (JIA) is one of the most common chronic diseases of childhood, affecting an estimated 300,000 children in the U.S. alone, and is an important cause of morbidity and disability in children [18]. Although methotrexate (MTX) is the most commonly used second-line agent used to treat JIA worldwide, this antifolate drug has shown considerable inter-individual variability in clinical response and adverse reactions [19]. The polyglutamation of methotrexate (MTXglu) is an intracellular mechanism that retains the drug and enhances target enzyme inhibition within the folate pathway [20], and high concentrations of “long chain” methotrexate polyglutamates (MTXglu3-5) have been associated with improved response to the drug in adults with rheumatoid arthritis [21]. Studies have reported the extensive variability in intracellular MTXglu concentrations in JIA, and an association of long chain MTXglu with toxicity (but not efficacy) in children [22]. Due to the complexity of the folate cycle as well as the extensive variability in response to the drug in clinical practice, it is hypothesized that genetic factors may contribute to differences seen in distinct patterns of MTXglu concentrations intracellularly, which might further impact patients’ responses to MTX.

In this case study, we analyzed 25 SNPs from 17 candidate genes in the folate pathway (Table 1). MTXglu was measured in all patients after at least 3 months on stable MTX therapy and a range of 1 to 5 glutamate moieties were reported as a percentage of the total polyglutamate concentration (MTXglun%). Hierarchical clustering was performed to identify patterns of MTXglun% , and two clusters were determined based on the hierarchical clustering of normalized MTXglu1-5%. Subjects in cluster 1 had lower concentration of short chain polyglutamates (MTXglu1-2%) and higher concentration of long chain polyglutamates (MTXglu3-5%) as compared to subjects in cluster 2 (p < 0.05). These clusters reflected distinct patterns in the proportion of MTXglu concentrations.

Table 1 List of 25 SNPs from 17 candidate genes in the folate pathway

There were 30 subjects in Cluster 1 and 74 subjects in Cluster 2. The MTXglu clustering phenotype was coded 1 and 0 for MDR analysis. Genotypes, coded 0 for common homozygote, 1 for heterozygote and 2 for rare homozygote for 25 SNPs, were measured. The overall goal of the analysis was to assess whether interactions among SNPs are associated with MTXglu clustering. While the scale of this study is not so large that an exhaustive search of all SNPs is computationally limited, this data is used to demonstrate the proposed approach.

For illustrative purposes, we will focus on 2-way interactions among 25 SNPs and will determine the optimal number of targeted SNPs for testing 2-way interactions. We first applied the ReliefF algorithm to 25 SNPs. As shown in Table 2, ReliefF scores ranged from −0.0308 to 0.1163. Although a higher score indicates stronger interaction with other SNPs, there is no clear cutoff point for ReliefF scores.

Table 2 FDR adjusted P-value in global testing

To circumvent this limitation, we incorporated global testing of p-values and ReliefF algorithm using the method proposed in Section Combined global testing and filtration technique. We first generated p-values for all 2-way interactions among 25 SNPs through permutation testing. Then we applied global testing, including KS test, Inverse chi test, Inverse norm test, Logit test, Wilcoxon test and Tippett’s tests on 25 2 = 300 p-values of 2-way interactions among 25 SNPs. The global tests were performed to evaluate whether the distribution of p-values deviated from uniformity (null hypothesis) and shifted towards 0 (alternative hypothesis - Pattern 2 in Figure 1). Then we removed one SNP with the lowest ReliefF score step by step and repeated the global testing process until only 5 SNPs were remained. We stopped the global testing procedure at 5 SNPs because it is not meaningful or necessary to perform global testing when the number of SNPs is less than 5 in any case study.

The entire procedure of filtration and global testing of p-values are summarized in Table 2. The optimal number of SNPs is the largest number of SNPs with global testing p-value <0.05. In this case study, KS test and Inverse chi test were more sensitive to deviation from uniformity as the tests became significant after 5 and 4 SNPs removed respectively (Table 2). Inverse norm test and Logit test were more conservative, suggesting removal of 11 SNPs. Wilcoxon test, removing 8 SNPs, was moderate as compared to the other tests. Tippett’s test failed to detect significant GxG interactions with all FDR corrected p-values > 0.05. Our further simulation studies (discussed in Section Power simulation) indicate that Tippett’s test, which only takes the smallest p-value into account, might not be appropriate for global testing of p-values.

Figure 3 and 4 both revealed strong patterns of transition when noisy SNPs were removed. In Figure 3, as SNPs with low ReliefF scores were removed sequentially, the histogram of p-values started to shift toward 0. Also in Figure 4, the global test p-values were in decreasing trends when noisy SNPs were removed one by one. Once the p-value of global testing was under 0.05, it continued to stay under 0.05. There were only a few exceptions at the end of the filtration procedure, probably due to smaller sample sizes and reduction of power.

Figure 3
figure 3

Histograms of p-values from multiple tests GxG interactions (Genes with weak interactive effects/low ReliefF scores are removed step by step. The red dash curves are fitted beta density functions).

Figure 4
figure 4

Global Testing of p-values combined with filtration technique (The red line is at nominal rate 0.05. The optimal number of genes is determined when the global test first has p-value < 0.05).

Figure 5
figure 5

Gene to gene interaction detected by MDR after filtering out 6 SNPs according to the one sided inverse chi-square test. (The distribution of MTXglu clustering among genotypic combinations between DHFR-rs7387 and ITPA-rs2295553 is listed. The genotype for DHFR-rs7387 and ITPA-rs2295553 is coded as 0-homozygote, 1-heterozygote and 2-rare homozygote. In each cell, the first column stands for the number of subjects in cluster 1 (low concentration of MTXglu1-2% and high concentration of MTXglu3-4%) and the second column stands for the number of subjects in cluster 2(high concentration of MTXglu1-2% and low concentration of MTXglu3-5%. Genotypic combinations in relatively high (low) likelihood of cluster 1 are displayed in darkly (lightly) shaded cells).

After the filtration and global testing, we removed 4 SNPs as suggested by Inverse chi test with FDR correction (Table 2) and performed MDR analysis on the remaining 21 SNPs. The results of MDR analysis indicated that there was significant two-way interaction between DHFR (rs7387) and ITPA (rs2295553) with testing balance accuracy = 0.7374 (p = 0.0045). The MDR analysis was performed by an open source software mdr2.0 and model goodness-of-fit and significance was assessed by permutation using software mdrpt1.0 [8].

The dihydrofolate reductase (DHFR) enzyme is a well known important target of MTX action. When DHFR is inhibited by MTX the subsequent production of reduced folates such as tetrahydrofolate (THF) 5,10 methenyl-THF and 5-methyl-THF are altered, affecting not only total cellular folate concentrations but also the downstream effects from one carbon donation including homocysteine remethylation and pyrimidine and purine synthesis thought to result in an anti-proliferative effect [23]. The inhibition of DHFR also results in a buildup of the precursor dihydrofolate (DHF) which in its polyglutamated state has inhibitory effects upon enzymes within the pathway as well [24]. Inosine triphosphate pyrophosphatase (ITPA) plays a role in de novo purine synthesis, and is closely related to adenosine metabolism, which is thought to contribute to MTX response via its anti-inflammatory effect [25]. Variations in ITPA interestingly have been shown by other authors to contribute to MTX response as part of a candidate gene study in JIA [26], as well as “predisposing genetic attribute” in studies utilizing MDR in adults with rheumatoid arthritis [27, 28]. How these 2 genes directly affect MTXglu patterns remains difficult to determine, as the direct understanding of how MTXglu patterns are associated with response is yet to be elucidated. However, both genes encode enzymes closely linked to or directly affected by MTX, thus as we gain a more detailed knowledge of cellular folate metabolism and its disruption by anti-folate agents such as MTX, we will then develop a better understanding of this complex system, and how alterations in the folate pathway affect response to the drug.

Power simulation

In the two empirical studies described below, we investigate the performance of 6 global testing when p-values exhibit different patterns (Figure 1) of variation. Data were generated from the Uniform distribution or varying mixtures of Beta distributions based on the inference regarding the patterns of p-values as described in Section Global tests and Figure 1. Here we give the rationale of using uniform or beta mixture to simulate p-values under null and alternative hypotheses. Under the null hypothesis of no GxG interactions, we have proved that p-values follow Uniform(0,1) distribution (Pattern 1). When this null hypothesis is violated, we introduce a latent variable to indicate the status of underlying hypothesis for each test. For p i , i = 1 , 2 , , n , introduce a latent variable Z i where for hypothesis testing (1), we have

{ Z i = 0 if H 0 : no GxG for the i th test ( Pattern 1 , Pr ( P p ) = p ) Z i = 1 i f H a : GxG for the i th test ( Pattern 2 , Pr ( P p ) > p ) ,

for p ( 0 , 1 ) . The proportion of tests where H α holds is denoted by the mixing weight Pr ( Z i = 1 ) = π where π ( 0 , 1 ) .

Conditioning on Z i , we have

{ p i | Z i = 0 ~ Uniform ( 0 , 1 ) p i | Z i = 1 ~ Beta ( a , b ) where a > 0 and b > 0 , ( a , b ) ( 1 , 1 ) .

The marginal distribution of combined p-values becomes P ~ ( 1 π ) Uniform ( 0 , 1 ) + π Beta ( a , b ) , which indicates that with ( 1 π ) × 100 % of chance, a p-value is drawn from Uniform(0,1) and with π × 100 % of chance, a p-value is drawn from Beta(α,b). Beta distribution is very flexible to characterize the patterns of p-values (Figure 1) where Uniform (0,1) is a special case of Beta (1,1). One can also adjust the shape and scale parameters a and b to model the deviation from uniformity.

The p-values from MDR analysis are correlated due to linkage disequilibrium among SNPs and sharing the SNPs among GxG interactions. The dependence among p-values might cause inflation of Type I errors or lead to bias in global tests. As a result, it is critical to extensively simulate p-values with varying correlation structures and assess the robustness of global tests for correlated p-values. In this work, we simulated correlated Uniform variables with random correlation matrix Σ and Beta random variables with correlation coefficient ρ = 0.2 , 0.8 , Beta ( 2 , 5 ) , Uniform ( 0.1 , 0.9 ) respectively. The details of generating correlated uniform [29] and beta distributions [30] are summarized in Appendix 1.

The first simulation study concerns the Type I error of global testing when there does not exist any GxG interactions among genes (SNPs). We generated p-values from

  • Independent Uniform(0,1),

  • Correlated Uniform ( 0 , 1 ) ,

  • Correlated 0.9 Uniform ( 0 , 1 ) + 0.1 Beta ( 5 , 1 ) , (4.1)

  • Correlated 0.5 Uniform ( 0 , 1 ) + 0.5 Beta ( 5 , 1 ) , (4.2)

  • Correlated 0.9 Uniform ( 0 , 1 ) + 0.1 Beta ( 6 , 3 ) , (4.3)

  • Correlated 0.5 Uniform ( 0 , 1 ) + 0.5 Beta ( 6 , 3 ) . (4.4)

These six scenarios cover Patterns 1, 3 and 4 with no signs of GxG interactions in Figure 1. For each simulation, the sample size of p-values varies from 20 to 500 and we performed global tests on each sample of p-values. We repeated the process 1000 times, and calculate the percentage of rejection of null hypothesis for each test. Under the hypothesis of no GxG interaction, this rejection rate is considered as Type I error. As shown in Table 3, the Type I error rates are well controlled to be near or under the nominal rate 0.05 when p-values are i.i.d Uniform(0,1). When p-values are correlated Uniform with random correlation matrices, there was slight inflation in five global tests except Tippett’s test. It is good to notice that the inflation is not severe as most tests have Type I error rates under 0.07. Such mild inflation is acceptable in screening testing and we will discuss how to further address this issue in Discussion Section.

Table 3 Type I error of six global tests of p-values when p-values are independent or strongly correlated (The nominal Type I error rate is 0.05 and the severe inflation of Type I error with simulated error rate > 0.1 is written in bold italic)

In addition to Uniform distributions, we also simulated correlated Beta mixtures in formula (4.1)-(4.4) regarding p-values shifting to 1 or peaking near the center (Patterns 3 and 4). We conservatively set correlation coefficient ρ = 0.8 (Table 3) to simulate very strong correlation among Beta variates, which is most likely to inflate Type I errors in global tests. The simulation results for mild correlation including ρ = 0.2 , Beta ( 2 , 5 ) , Uniform ( 0.1 , 0.9 ) are summarized in Appendix 1 (Table 4). The results from Table 3 and Appendix 1 (Table 4) show that Inverse chi test and Tippett’s test are very robust to dependency in p-values with well controlled Type I error rates. The KS test has the highest inflation in several scenarios we simulated, especially when correlated p-values had peaks near center (Pattern 4). For strongly correlated p-values ( ρ = 0.8 Table 3), the inverse norm, the Wilcoxon and Logit tests also had modest inflations when sample sizes get larger (n > 200). When p-values were moderately correlated ( ρ = 0.2 , Beta ( 2 , 5 ) , Uniform ( 0.1 , 0.9 ) , (Table 4)), the inverse norm, the Wilcoxon and the Logit tests had well controlled Type I errors for all tested sample sizes.

Table 4 Type I error of six global tests of p-values when p-values are moderately correlated (The nominal Type I error rate is 0.05 and the severe inflation of Type I error with simulated error rate > 0.1 is written in bold italic)

In the second simulation study, we are interested in the power of each of the approaches to detect the GxG interactions by performing the hypothesis testing (1) to detect Pr ( P p ) > p (Pattern 2 with GxG interactions). We simulated p-values from a wide range of beta mixture distribution where the mixing π was set to be 0.1 and 0.4, indicating different proportions of tests with significant GxG interactions. In most cases, parameters a < b will have Pr ( P p ) > p for p ( 0 , 1 ) which coincides with Pattern 2. Under alternative hypothesis of a proportion of tests having GxG interaction, we simulated p-values from 6 Beta mixtures:

  • Correlated 0.9 Uniform ( 0 , 1 ) + 0.1 Beta ( 0.4 , 6 ) ,

  • Correlated 0.6 Uniform ( 0 , 1 ) + 0.4 Beta ( 0.4 , 6 ) ,

  • Correlated 0.9 Uniform ( 0 , 1 ) + 0.1 Beta ( 0.5 , 4.5 ) ,

  • Correlated 0.6 Uniform ( 0 , 1 ) + 0.4 Beta ( 0.5 , 4.5 ) ,

  • Correlated 0.9 Uniform ( 0 , 1 ) + 0.1 Beta ( 1 , 5 ) , and

  • Correlated 0.6 Uniform ( 0 , 1 ) + 0.4 Beta ( 1 , 5 ) .

As in the Type I error simulation, in the power simulation, we used sample sizes ranging from 20 to 500 and perform simulation 1000 times. The power is the percentage of results across the 1000 replicates where the null hypothesis was rejected. We present the power comparison results under week correlation ρ = 0.2 in Table 5 because all six global tests are free of Type I errors in this scenario. Power comparisons for ρ = 0.8 , Beta ( 2 , 5 ) , Uniform ( 0.1 , 0.9 ) are summarized in Appendix 1 (Table 6). These results indicate that Tippett’s test, which only takes the smallest p-value into account, might not be appropriate for detecting the patterns of alternation. Among the other five global tests, Inverse chi test has the strongest power in most simulated cases. These five global tests have strong power to detect Pr ( P p ) > p for small to moderate sample sizes.

Table 5 Power of six global tests of correlated P-values (The correlation coefficient for Beta random variables is ρ . Uniform distributions have random correlation matrices)
Table 6 Power of six global tests of correlated P-values (The correlation coefficient for Beta random variables is ρ Uniform distributions have random correlation matrices)

The global tests have been implemented in R. The R code is available at

Discussion and conclusions

Multifactor Dimensionality Reduction (MDR) is a novel statistical method developed to characterize and detect nonlinear complex gene-gene interactions (epistasis) that could be associated with disease susceptibility. We suggest incorporating global test to filtration procedures to reveal a trend of gene interactive patterns when noisy genes are removed step by step using ReliefF or other filtration techniques. The optimal number of genes for further MDR analysis can be identified by p-values of global testing. A real data applications and empirical assessment of our proposed methods reveal strong trends in global testing of p-values and clear patterns of distribution of p-values in three scenarios: 1) presence of GxG interactions, 2) absence of GxG interactions, 3) weak GxG interactions that needs filtration to remove noisy genes. The proposed global tests can serve as a screening approach before individual tests to prevent false discovery. Strong power in small sample sizes and well controlled Type I error in absence of GxG interactions makes these tests highly recommended in epistasis studies.

Global testing has not been implemented in MDR analyses in the literature we have reviewed. Currently, researchers rely on adjustment of individual p-values such as false discovery rate (FDR) as suggested by [31]. Due to high dimensionality in genetic interactions, the FDR and other multiple testing adjustments often lose power in MDR analyses. Some MDR studies [27] have utilized the false positive report probability proposed by [32] but this method has been pointed out by [33] to be heuristic and wrong in formulation. In contrast, the global tests proposed by this paper are based on rigorous statistical theories and inferences.

Through extensive simulation on correlated p-values, our study shows that the Inverse chi test is the most powerful approach to be incorporated with the filtration techniques to determine the optimal number of SNPs for the final MDR analysis. The KS test might have high inflation of Type I errors when p-values are highly correlated or when p-values peak near the center of histogram (Pattern 4). The Tippett’s test has very low power when the effect size of Pattern 2 is small.

We observe mild inflation of Type I error (<0.07) when p-values are Uniform with a random correlation matrix. Our global tests are implemented for screening SNPs and investigators can continue to use multiplicity adjustment algorithms such as FDR to adjust individual p-values in the final MDR analysis to prevent false discoveries. As a result, slight inflation in Type I error (<0.07) is acceptable in practice. Moreover, in our case study, we show that one can utilize the decreasing trend of global test results (Figure 4) to facilitate decision making. If global tests provided false discoveries, then the trend of global tests results would randomly fluctuate up and down. Figure 4 with a decreasing trend for global testing results as well Figure 3 with histograms systematically switching to Pattern 2 can also serve as diagnostic tools to prevent false discoveries or selection bias in global tests.

It is worthwhile to point out the proposed global tests can effectively prevent false discovery without losing the power to detect significant GxG interactions. To prevent the false discovery, current MDR applications typically choose one optimal model for each k-way interaction. This method has two major drawbacks: firstly, the false positive discovery is not reduced by choosing one optimal model; secondly, choosing one optimal models may overlook other potential GxG interactions that also contributes to the disease susceptibility.

The major contribution of our manuscript is to incorporate global testing procedures to MDR framework. Our proposed global tests will provide p-values to help practitioners determine the appropriate number of SNPs to be remained in the final analysis. The current filtration process does not provide p-values. Therefore, using arbitrary cutoff value in the current process might lead to over-filtering or under-filtering of SNPs.

All 6 global tests are based on statistical inference instead of permutation. These 6 tests run very fast in a single computer. The major computational challenges are in the generation of p-values for MDR through permutation tests but this is not the major focus of our work. Several works have been devoted to improve the efficiency and shorten the computing time in MDR analysis in high-throughput data. We will defer interested readers to the corresponding citations for computing issues in high-throughput MDR analysis. These computational limitations make our strategy appropriate in large scale candidate gene studies, but may be limited in application to genome-wide association studies until further improvements in computing speed are realized or very large-scale computing resources are available.

MDR permutation computing time is largely dependent on the dimension of data sets. In other words, the computing time increases as the number of SNPs and/or the number of subjects increases. Interestingly, the dimension of data does little impact on the computing time of global tests. The computing time for global tests of 1000 p-values is very close to tests of 10 p-values. Several filtration approaches have been proposed and some (ReliefF, SURF and TuRF etc.) have been implemented in the MDR software ( In this work, we utilize ReliefF for filtration. There have been other filtration techniques proposed in literature. For instance, [34] introduced entropy-based information gain to search and evaluate interactions among risk factors. The current MDR software [8] provides ReliefF, entropy, chi-square test, etc. about 10 filtration methods. The global tests could be integrated in the workflow with other filtration techniques, although the comparison and evaluation of all filtration technique requires more research attention.

Appendix 1

Simulation of correlated p-values

We generate correlated Beta variables using the method proposed by [30]. According to Bayesian theory, random variables from a Beta prior and a Beta-Binomial conjugate function will yield correlated random deviates whose marginal distribution is also Beta. Firstly, randomly generate a variable K from K ~ Beta Binomial ( v , α , β ) where α and β are the shape parameters. Conditioning on K = k , generate P deviates from Beta ( α + k , v + β 1 ) . By integrating on K, the P deviates have unconditional marginal distribution as Beta ( α , β ) and the correlation coefficient among P deviates is ρ = v / ( v + α + β ) . In this paper, we simulated different correlation coefficient with constant ρ = 0.2 , 08 or ρas a random variable from ρ ~ Beta ( 2 , 5 ) and ρ ~ Uniform ( 0.1 , 0.9 ) respectively.

In the above method, ρ = v / ( v + α + β ) can be written as v = ( α + β ) ρ / ( 1 ρ ) but algorithm to generate Beta-Binomial with non-integer v is not widely available. As a result, we use an alternative method to generate correlated uniform distributions . In essence, correlated uniform variables, U, with a random correlation matrix Σ can be generated by transforming multivariate normal variables X using formula U = F ( X ) where F is the CDF of the standard normal distribution. First, we generated a positive definite covariance matrix, Σ0 with randomly selected eigenvalues and randomly generated orthogonal matrix as eigenvectors(R clusterGeneration package). Let σij be the component in Σ0. We can convert the covariance matrix, Σ0 to correlation matrix Σ with components r ij = σ ij σ ii σ jj . To ensure the correlation is invariant to transformation, we need to adjust correlation matrix Σ into adj = 2 sin ( π / 6 ) . Perform Choleski factorization to generate C = adj 1 2 . Generate a vector of i.i.d. standard normal variables, X 0. Let X = X 0 C and U = F ( X ) . As a result, the variables U are correlated uniform variables with correlation matrix Σ.


  1. Moore JH, Williams SM: Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays. 2005, 27 (6): 637-646. 10.1002/bies.20236.

    Article  CAS  PubMed  Google Scholar 

  2. Motsinger AA, Ritchie MD, Reif DM: Novel methods for detecting epistasis in pharmacogenomics studies. Pharmacogenomics. 2007, 8 (9): 1229-1241. 10.2217/14622416.8.9.1229.

    Article  CAS  PubMed  Google Scholar 

  3. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001, 69 (1): 138-147. 10.1086/321276.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Moore JH: Detecting, characterizing, and interpreting nonlinear gene-gene interactions using multifactor dimensionality reduction. Adv Genet. 2010, 72: 101-116.

    Article  PubMed  Google Scholar 

  5. Greene CS, Penrod NM, Kiralis J, Moore JH: Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions. BioData Min. 2009, 2 (1): 5-10.1186/1756-0381-2-5.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Moore JH, White BC: Tuning relieff for genome-wide genetic analysis. Lecture Notes in Computer Science. 2007, 4447: 166-175. 10.1007/978-3-540-71783-6_16.

    Article  Google Scholar 

  7. Robnik-Sikonja, Igor K: Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning Journal. 2003, 53: 23-69. 10.1023/A:1025667309714.

    Article  Google Scholar 

  8. Hahn LW, Ritchie MD, Moore JH: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics. 2003, 19 (3): 376-382. 10.1093/bioinformatics/btf869.

    Article  CAS  PubMed  Google Scholar 

  9. Winham SJ, Motsinger-Reif AA: An R package implementation of multifactor dimensionality reduction. BioData Min. 2011, 4 (1): 24-10.1186/1756-0381-4-24.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Oki N, Motsinger-Reif A: Multifactor dimensionality reduction as a filter based approach for genome wide association studies. Frontiers in Genetics. 2011, 2: 80-

    Article  PubMed  PubMed Central  Google Scholar 

  11. Birnbaum ZW, Tingey FH: One-sided confidence contours for probability distribution functions. The Annals of Mathematical Statistics. 1951, 22 (4): 592-596. 10.1214/aoms/1177729550.

    Article  Google Scholar 

  12. Fisher RA: Statistical methods for research workers. 1932, Oliver & Boyd, London

    Google Scholar 

  13. Mudholkar GS, George EO: The logit statistic for combining probabilities - an overview. Optimizing Methods in Statistics. Edited by: Rustagi JS. 1979, 345-365.

    Google Scholar 

  14. Myles H, Wolfe DA: Nonparametric statistical methods. 1999, Wiley, New York, 2

    Google Scholar 

  15. Tippett L: The methods of statistics. 1931, Williams & Norgate, London

    Google Scholar 

  16. Wilkinson B: A statistical consideration in psychological research. Psychological Bulletin. 1951, 48: 156-158.

    Article  CAS  PubMed  Google Scholar 

  17. Sakoda JM, Cohen BH, Beall G: Test of significance for a series of statistical tests. Psychological Bulletin. 1954, 51 (2): 172-175.

    Article  CAS  PubMed  Google Scholar 

  18. Helmick CG, Felson DT, Lawrence RC, Gabriel S, Hirsch R, Kwoh CK, Liang MH, Kremers HM, Mayes MD, Merkel PA: Estimates of the prevalence of arthritis and other rheumatic conditions in the United States. Part I. Arthritis Rheum. 2008, 58 (1): 15-25. 10.1002/art.23177.

    Article  PubMed  Google Scholar 

  19. Becker ML, Rose CD, Cron RQ, Sherry DD, Bilker WB, Lautenbach E: Effectiveness and toxicity of methotrexate in juvenile idiopathic arthritis: comparison of 2 initial dosing regimens. J Rheumatol. 2010, 37 (4): 870-875. 10.3899/jrheum.090826.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Chabner BA, Allegra CJ, Curt GA, Clendeninn NJ, Baram J, Koizumi S, Drake JC, Jolivet J: Polyglutamation of methotrexate. Is methotrexate a prodrug?. J Clin Invest. 1985, 76 (3): 907-912. 10.1172/JCI112088.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Stamp LK, Barclay ML, O'Donnell JL, Zhang M, Drake J, Frampton C, Chapman PT: Effects of changing from oral to subcutaneous methotrexate on red blood cell methotrexate polyglutamate concentrations and disease activity in patients with rheumatoid arthritis. J Rheumatol. 2011, 38 (12): 2540-2547. 10.3899/jrheum.110481.

    Article  CAS  PubMed  Google Scholar 

  22. Dolezalova P, Krijt J, Chladek J, Nemcova D, Hoza J: Adenosine and methotrexate polyglutamate concentrations in patients with juvenile arthritis. Rheumatology (Oxford). 2005, 44 (1): 74-79. 10.1093/rheumatology/keh401.

    Article  CAS  Google Scholar 

  23. Allegra CJ, Chabner BA, Drake JC, Lutz R, Rodbard D, Jolivet J: Enhanced inhibition of thymidylate synthase by methotrexate polyglutamates. J Biol Chem. 1985, 260 (17): 9720-9726.

    CAS  PubMed  Google Scholar 

  24. Baggott JE, Vaughn WH, Hudson BB: Inhibition of 5-aminoimidazole-4-carboxamide ribotide transformylase, adenosine deaminase and 5'-adenylate deaminase by polyglutamates of methotrexate and oxidized folates and by 5-aminoimidazole-4-carboxamide riboside and ribotide. Biochem J. 1986, 236 (1): 193-200.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Cronstein BN, Naime D, Ostad E: The antiinflammatory mechanism of methotrexate. Increased adenosine release at inflamed sites diminishes leukocyte accumulation in an in vivo model of inflammation. J Clin Invest. 1993, 92 (6): 2675-2682. 10.1172/JCI116884.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Hinks A, Moncrieffe H, Martin P, Ursu S, Lal S, Kassoumeri L, Weiler T, Glass DN, Thompson SD, Wedderburn LR: Association of the 5-aminoimidazole-4-carboxamide ribonucleotide transformylase gene with response to methotrexate in juvenile idiopathic arthritis. Ann Rheum Dis. 2011, 70 (8): 1395-1400. 10.1136/ard.2010.146191.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Dervieux T, Wessels JA, van der Straaten T, Penrod N, Moore JH, Guchelaar HJ, Kremer JM: Gene-gene interactions in folate and adenosine biosynthesis pathways affect methotrexate efficacy and tolerability in rheumatoid arthritis. Pharmacogenet Genomics. 2009, 19 (12): 935-944. 10.1097/FPC.0b013e32833315d1.

    Article  CAS  PubMed  Google Scholar 

  28. Dervieux T, Wessels JA, Kremer JM, Padyukov L, Seddighzadeh M, Saevarsdottir S, van Vollenhoven RF, Klareskog L, Huizinga TW, Guchelaar HJ: Patterns of interaction between genetic and nongenetic attributes and methotrexate efficacy in rheumatoid arthritis. Pharmacogenet Genomics. 2012, 22 (1): 1-9. 10.1097/FPC.0b013e32834d3e0b.

    Article  CAS  PubMed  Google Scholar 

  29. Hotelling H, Pabst MR: Rank correlation and tests of significance involving no assumption of normality. Annals of Mathematical Statistics. 1936, 7: 29-43. 10.1214/aoms/1177732543.

    Article  Google Scholar 

  30. Minhajuddin ATM, Harris IR, Schucany WR: Simulating multivariate distributions with specific correlations. Journal of Statistical Computation and Simulation. 2004, 74 (8): 599-607. 10.1080/00949650310001626161.

    Article  Google Scholar 

  31. Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B. 1995, 57: 289-833.

    Google Scholar 

  32. Wacholder S, Chanock S, Garcia-Closas M, El Ghormli L, Rothman N: Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst. 2004, 96 (6): 434-442. 10.1093/jnci/djh075.

    Article  PubMed  Google Scholar 

  33. Lucke JF: A critique of the false-positive report probability. Genetic epidemiology. 2009, 33 (2): 145-150. 10.1002/gepi.20363.

    Article  PubMed  Google Scholar 

  34. Moore JH, Gilbert JC, Tsai CT, Chiang FT, Holden T, Barney N, White BC: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol. 2006, 241 (2): 252-261. 10.1016/j.jtbi.2005.11.036.

    Article  PubMed  Google Scholar 

  35. Poonkuzhali B, Lamba J, Storm S, Sparreboom A, Thummel K, Watkins P, Schuetz E: Association of breast cancer resistance protein/ABCG2 phenotypes and novel promoter and intron 1 single nucleotide polymorphisms. Drug metab Dispos. 2008, 36 (4): 780-795. 10.1124/dmd.107.018366.

    Article  CAS  PubMed  Google Scholar 

Download references


This work is supported for collaboration between HD and AMR by Bursary Award of the 1st Short Course on Statistical Genetics and Genomics from University of Alabama at Birmingham from the National Institute of Health R25GM093044 (PI: Tiwari). We are grateful to two reviewers whose comments have helped us improve the manuscript.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Hongying Dai.

Additional information

Competing interests

There are no competing interests to this work.

Authors’ contributions

HD and MB conceived of the study. AMR aided in study design and MDR method. HD and AMR performed the simulations and data analysis. MB, RG, and SL performed the clinical data collection, genotyping and interpretation of case study findings. All authors contributed to the manuscript writing. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Dai, H., Bhandary, M., Becker, M. et al. Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes. BioData Mining 5, 3 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: