Global tests of P-values for multifactor dimensionality reduction models in selection of optimal number of target genes

Background Multifactor Dimensionality Reduction (MDR) is a popular and successful data mining method developed to characterize and detect nonlinear complex gene-gene interactions (epistasis) that are associated with disease susceptibility. Because MDR uses a combinatorial search strategy to detect interaction, several filtration techniques have been developed to remove genes (SNPs) that have no interactive effects prior to analysis. However, the cutoff values implemented for these filtration methods are arbitrary, therefore different choices of cutoff values will lead to different selections of genes (SNPs). Methods We suggest incorporating a global test of p-values to filtration procedures to identify the optimal number of genes/SNPs for further MDR analysis and demonstrate this approach using a ReliefF filter technique. We compare the performance of different global testing procedures in this context, including the Kolmogorov-Smirnov test, the inverse chi-square test, the inverse normal test, the logit test, the Wilcoxon test and Tippett’s test. Additionally we demonstrate the approach on a real data application with a candidate gene study of drug response in Juvenile Idiopathic Arthritis. Results Extensive simulation of correlated p-values show that the inverse chi-square test is the most appropriate approach to be incorporated with the screening approach to determine the optimal number of SNPs for the final MDR analysis. The Kolmogorov-Smirnov test has high inflation of Type I errors when p-values are highly correlated or when p-values peak near the center of histogram. Tippett’s test has very low power when the effect size of GxG interactions is small. Conclusions The proposed global tests can serve as a screening approach prior to individual tests to prevent false discovery. Strong power in small sample sizes and well controlled Type I error in absence of GxG interactions make global tests highly recommended in epistasis studies.


Background
Recent advances in genotyping technology have allowed for the rapid and easy interrogation of large numbers of genetic variants for association with common, complex disease. While there have been a number of successes in association mapping studies, the associations found typically explain very little of the overall heritability of the traits being studied. There are several potential reasons for this "missing heritability", and one of those potential explanations is epistatic interactions (gene-gene interactions). It is hypothesized that such interactions play an important role in the etiology of complex (non-Mendelian) traits, but detecting such interactions presents a number of statistical and computation challenges [1]. In response to these challenges, a number of new data-mining approaches have been developed [2].
Multifactor Dimensionality Reduction (MDR) is a popular and highly successful statistical method developed to detect and characterize nonlinear complex gene-gene or gene-environment interactions (epistasis) that could be associated with disease susceptibility. The method was first proposed by Ritchie et al. [3] to detect estrogen-metabolism gene interactions associated with sporadic breast cancer. MDR has several advantages over more traditional statistical approaches such as logistic regression modeling: 1) MDR is a non-parametric approach with no requirement to the distribution of data. 2) MDR can analyze non-linear associations in genotypic combinations. 3) MDR has improved power to detect gene-gene interaction in small to moderate sample sizes. Since the introduction of the original MDR implementation, many works have been published to improve modeling and prediction accuracy with the MDR method. For more information on the history and development of the method, please refer to the comprehensive review of the MDR and its extended methods by Moore [4].
While the MDR approach is widely used, to make this paper self-contained, we give a brief description of the method. MDR is often applied to genotypic data to detect gene-gene (GxG) interactions among single nucleotide polymorphism (SNP) and the original implementation of this method can be extended to detect the interactions in other types of data when the explanatory variables are categorical variables and the outcome variable is binary. As the scale of association studies has expanded (with larger numbers of SNPs), a filtration step is often implemented in the first step of MDR analysis to remove noisy SNPs. In this step, a subset of genes that are unlikely to interact with others is removed by filtration methods such as SURF [5], TuRF [6] etc. ReliefF [7], has become a commonly applied filter, and we will focus on this filter in the current study. After this step, the remaining SNPs are used for the dimensionality reduction and model selection steps of the MDR algorithm. In this step, all variable combinations are considered for kway (k = 2, 3, 4 . . .) interactions. For each multi-locus combination, the ratio of cases to controls within each contingency table cell is calculated, and then each cell is assigned a status of high-risk or low-risk by comparing this ratio to the ratio of cases: controls in the overall dataset. Cells with a ratio greater than the overall ratio are assigned "high-risk" status, and those with a ratio lower than the overall ratio are assigned "low-risk" status. Subsequently, a balanced classification accuracy is calculated for each multi-locus combination, and the optimal model is selected based on the highest balanced accuracy. This model selection approach is performed in concert with a cross-validation procedure, usually 10-fold, which randomly divides the whole data set into a training set and a validation set. The testing accuracy is the balanced accuracy when the classification rule developed from the training data set is applied to the testing data set. The cross validation count (CVC) summarizes the number of times a model is the top model in each of the cross-validation splits of the data. The optimal k-way (k = 2, 3, 4 . . .) interaction model with the highest training accuracy and the highest CVC is then selected as the winner model. Finally, the significance of the selected optimal model is assessed by permutation testing (comparing the testing/prediction accuracy against the empirical distribution built by at least 1000 permutations). MDR can be performed by an open source software mdr2.0 and model goodness-of-fit and significance can be assessed using software mdrpt1.0 [8] or in the MDR.R R software package [9].
In this work, we seek to address two existing issues in the current MDR analysis. First, current filtration approaches do not evaluate the significance of the SNPs considered (or provide p-value for their measures) and there is no clear guideline for the cutoff point of such filtration measures. This leads inconsistency in the optimal number of SNPs remaining for the final MDR analysis.
Second, as there is a growing appreciation that the etiology of human diseases is extremely complex, many investigators are using MDR to evaluate many potential interactive effects, and not just a single final best model [10]. In this type of approach, not one but numerous tests can be performed in search of an optimal model in the kway interaction, as the number of partitions for k-way inter- For instance, if an investigator is interested to detect significant 2-way interactions among 50 SNPs, 1225 tests will be performed which will inflate the family-wise Type I error rate where α ¼ 0:05 is the nominal error rate for an individual test without proper control.
False discoveries and losing power to detect the signal after the multiplicity adjustment are two concurrent issues in analyzing high dimensional data. Instead of replacing all the existing methods to control the false discovery, we propose to add the global tests to the current MDR framework as an ad-hoc screening process to prevent false discoveries. We will explain the rationale and utility of global tests in Section Global tests.
Incorporating global tests within the filtration procedures can reveal a trend of gene interactive patterns when noisy genes (SNPs) are removed step by step using ReliefF or other filtration techniques. In the current study, we demonstrate this approach in a candidate gene study of drug response in Juvenile Idiopathic Arthritis, using the ReleifF filter. Additionally, we perform a simulation study comparing several different global testing approaches for a range of genetic etiologies to compare the power of different approaches.

Global tests
The idea of global testing is to assess the patterns of p-values from multiple testing of k way interactions among Under the null hypothesis of no GxG interactions, the p-values will follow uniform (0, 1). To see this, let T be the test statistic with the cumulative distribution function (CDF) F 0 (t) and the inverse CDF F À1 0 t ð Þ for t 2 R under H 0 . Let P be the p-value corresponding to the test statistic T. Under H 0 , we have Figure 1). When a proportion of m loci have k-way interactions (H α ), it is expected to observe the p-values shifting towards 0. To see this, let F(t) be the CDF under H α and F t ð Þ > Figure 1). Due to correlations/linkage disequilibrium among SNPs and the redundancy of SNPs in high order models, sometimes pvalues shift toward 1, i.e. Pr P≤p ð Þ < p for p 2 0; 1 ð Þ (Pattern 3 in Figure 1). When p-values are correlated, they might peak near center of histogram (Pattern 4 in Figure 1). Patterns 3 and 4 are deviated from uniformity but they do not indicate potential k-way interactions among m loci.
The rationale of global testing is to ensure p-values are not randomly and uniformly distributed (Pattern 1) before we investigate each single p-value. Correlated p-values without significant effects (H 0 ) might even shift toward 1 or peak near the center (Patterns 3 and 4). The goals are to rule out Patterns 1, 3 and 4 and only move forward to the final MDR analysis when p-values are in Pattern 2.
If the entire set of p-value follows a uniform distribution, then it is very likely for a small p-value to be a false discovery by chance. As shown in Figure 1, the entire set of p-values might have four different Patterns: uniform, shifting to 0, shifting to 1 or peak near the center. In all four cases, we notice that the minimum p-values are less than 0.05 (0.0001111 in Pattern 1, 2.65e-6 in Pattern 2, 0.00617 in Pattern 3 and 0.003734 in Pattern 4). If we take the distribution of the entire set of p-values into account, then the minimum p-values in Patterns 1, 3 and 4 are false discoveries by chance.

Combined global testing and filtration technique
A global test will serve as an ad-hoc diagnostic tool to exam all p-values from k-way interactions among m genes in MDR-analysis. These p-values come from empirical distributions generated through permutation testing. Let P ¼ p i i ¼ 1; 2;⋯; n ð Þ be identical and independently Rejecting H 0 indicates significant GxG interactions in some target genes.
We propose incorporating global testing of p-values with ReliefF [7] gene filtration technique to detect the patterns of k-way GxG interaction among m genes (SNPs) and remove noisy genes (SNPs) with little interactive effects to determine the optimal number of SNPs for the final MDR analysis. The ReliefF algorithm estimates weights to measure the potential accuracy of attributes in prediction of phenotype. The redundant attribute will be assigned a lower score. When applied in gene-gene interactions, a higher ReliefF score indicates a stronger interactive effect for the corresponding gene (SNP). ReliefF algorithm first uses x-nearest neighborhood approach x ¼ 1; 2;⋯; m ð Þto match a selected subject with x subjects in neighborhood (with shortest distances across all SNPs) from the control group and from test group respectively. An attribute (SNP) will be assigned score 1 (−1) if the attribute from the selected subject matches (mismatches) one of x nearest subjects from the same phenotype group. Similarly, an attribute will be assigned score −1 (1) if the attribute from the selected subject matches (mismatches) one of the nearest subjects from the different phenotype group. The score will be aggregated for all subjects and normalized (divided) by the total number of subjects and neighbors. Detailed description of ReliefF algorithm for filtering genotyping data can be found in Section 3 of [4].
The flow chart of the testing procedure is presented in Figure 2. Starting with a set of m candidate SNPs, perform the ReliefF algorithm on m SNPs and sort SNPs by ReliefF scores in an ascending order. Generate p-values for the exhaustive search of k-way interactions among a total of m SNPs using the original or extended MDR methods. For each k-way interaction, one p-value of MDR analysis is generated by permutation test. Let m 1 be the number of remaining SNPs. Perform the global testing on m 1 k p-values. Remove one SNP that has the lowest ReliefF score and all interactions corresponding to this SNP. Perform the global testing about hypothesis testing (1) on the p-values of the k-way interactions of the remaining SNPs. Continue to remove SNPs with the lowest ReliefF score one by one and perform global testing after each removal of a SNP. One can stop the process when the remaining SNPs reach a predetermined minimum number.
Choose the optimal number of SNPs for the final MDR analysis as the largest number of SNPs with global testing p-value < α. Often we set α ¼ 0:05 . To be more rigorous of controlling family-wise Type I error, one can apply FDR algorithm on the global testing p-values and the first pvalue with FDR < 0.05 will determine the optimal number of SNPs.

Global tests of P-values
Here we introduce 6 global tests that can be applied to test hypothesis (1). These six tests are based on different approaches to detect deviation from uniformity. We will survey these methods and compare their power using a case study and Monte-Carlo simulations.

Test 1 one sided Kolmogorov-Smirnov test [KS]
KS test is a non-parametric test that can be applied to compare the distance between an empirical distribution of i.i.d. p-values and Uniform(0, 1). For hypothesis test (1), define the one-sided KS statistic as According to [11], the p-value of one-sided KS test follows is the largest integer not greater than n 1 À t ð Þ and t 2 0; 1 ð Þ.

Test 3 one sided inverse normal test [inverse norm]
Transform p-value to normal z score by letting

Test 4 one sided logit test [logit]
Logit transform p-value by letting L ¼ P Þ . [13] shows that under H 0 , the distribution of L can be Fisher [12] shows that if p i where closely approximated by Student's t-distribution with 5n þ 4 degrees of freedom, namely L Ã ¼ L Therefore, for one-sided test (1), we can reject H 0 if L Ã < t 5nþ4;α where t 5nþ4;α is α Ã 100% percentile of the t-distribution.

Test 5 one sided Wilcoxon test [Wilcoxon]
Order n p-values from MDR testing along with n 2 observations randomly drawn from Uniform(0,1) from least to greatest and denote them by S 1 ; S 2 ;⋯; S N with N ¼ n þ n 2 . Let W be the sum of the ranks corresponding to n p-values from MDR testing. For one-sided test (1), we can reject H 0 if W ≤n N þ 1 ð ÞÀω α where the constant ω α is chosen to make the Type I error probability equal α. Values of ω α are given in Table A6 by [14]. For large sample sizes, i.e. min(n,n 2 ) going to infinity, one can apply normal approximation on the standardized W.

Test 6 Tippett and Wilkinson's test [Tippett]
Tippett's Test [15] is based on the property of the minimal p-value in multiple testing. Let p 1 ð Þ ; p 2 ð Þ ;⋯; p n ð Þ be the ordered p-values in an ascending order. When p-values identically and independently follow Uniform (0,1) distribution, Tippett's test will reject H 0 if p 1 ð Þ < 1 À 1 À α ð Þ 1=n . The p-value of Tippett's test equals Tippett's test is very easy to perform but it only takes the smallest p-value into account. Wilkinson [16] extended Tippett's procedure to the r th smallest p-values where r ¼ 1; 2;⋯; n. By expanding α þ 1 À α ð Þ ð Þ n , Wilkinson tabulated the probability, Start with a total of SNPs.
Perform the ReliefF algorithm on SNPs. Sort SNPs by ReliefF scores in an ascending order.
Let be the number of remaining SNPs. At the beginning, = .
Perform the global test on p-values regarding all -way interactions among remaining SNPs.
If the global test fails to reject , then remove one SNP with the lowest ReliefF score; let = 1; let = .
If the global test rejects , then p-values shift towards 0 (Pattern 2 in Figure 1). the optimal number of SNPs for the final MDR analysis is .
Perform the final MDR analysis on the remaining SNPs.
Generate p-values for the exhaustive search of -way interactions among a total of SNPs using the original or extended MDR methods. denoted by C γ,α of obtaining r significant statistics by chance in a group of n tests. Suppose there are r tests with p-values less than α, Wilkinson's test rejects H 0 if c r;α < α [17]. Because P (r) has been distributed with parameters r and n-r + 1, tables of the incomplete beta function can be used to obtain critical values of P (r) directly. In our work, we will not include Wilkinson's test in case study and power simulation because this method does not provide p-value for the testing results.

Case study
We used a real dataset to illustrate how to apply our proposed global testing to prevent false discovery and to determine the optimal number of SNPs for the final MDR analysis. Juvenile Idiopathic Arthritis (JIA) is one of the most common chronic diseases of childhood, affecting an estimated 300,000 children in the U.S. alone, and is an important cause of morbidity and disability in children [18]. Although methotrexate (MTX) is the most commonly used second-line agent used to treat JIA worldwide, this antifolate drug has shown considerable inter-individual variability in clinical response and adverse reactions [19]. The polyglutamation of methotrexate (MTXglu) is an intracellular mechanism that retains the drug and enhances target enzyme inhibition within the folate pathway [20], and high concentrations of "long chain" methotrexate polyglutamates (MTXglu 3-5 ) have been associated with improved response to the drug in adults with rheumatoid arthritis [21]. Studies have reported the extensive variability in intracellular MTXglu concentrations in JIA, and an association of long chain MTXglu with toxicity (but not efficacy) in children [22]. Due to the complexity of the folate cycle as well as the extensive variability in response to the drug in clinical practice, it is hypothesized that genetic factors may contribute to differences seen in distinct patterns of MTXglu concentrations intracellularly, which might further impact patients' responses to MTX.
In this case study, we analyzed 25 SNPs from 17 candidate genes in the folate pathway (Table 1). MTXglu was measured in all patients after at least 3 months on stable MTX therapy and a range of 1 to 5 glutamate moieties were reported as a percentage of the total polyglutamate concentration (MTXglu n% ). Hierarchical clustering was performed to identify patterns of MTXglu n% , and two clusters were determined based on the hierarchical clustering of normalized MTXglu 1-5% . Subjects in cluster 1 had lower concentration of short chain polyglutamates (MTXglu 1-2% ) and higher concentration of long chain polyglutamates (MTXglu 3-5% ) as compared to subjects in cluster 2 (p < 0.05). These clusters reflected distinct patterns in the proportion of MTXglu concentrations.
There were 30 subjects in Cluster 1 and 74 subjects in Cluster 2. The MTXglu clustering phenotype was coded 1 and 0 for MDR analysis. Genotypes, coded 0 for common homozygote, 1 for heterozygote and 2 for rare homozygote for 25 SNPs, were measured. The overall goal of the analysis was to assess whether interactions among SNPs are associated with MTXglu clustering. While the scale of this study is not so large that an exhaustive search of all SNPs is computationally limited, this data is used to demonstrate the proposed approach.
For illustrative purposes, we will focus on 2-way interactions among 25 SNPs and will determine the optimal number of targeted SNPs for testing 2-way interactions. We first applied the ReliefF algorithm to 25 SNPs. As shown in Table 2, ReliefF scores ranged from −0.0308 to 0.1163. Although a higher score indicates stronger interaction with other SNPs, there is no clear cutoff point for ReliefF scores.
To circumvent this limitation, we incorporated global testing of p-values and ReliefF algorithm using the method proposed in Section Combined global testing and filtration technique. We first generated p-values for  SNPs. The global tests were performed to evaluate whether the distribution of p-values deviated from uniformity (null hypothesis) and shifted towards 0 (alternative hypothesis -Pattern 2 in Figure 1). Then we removed one SNP with the lowest ReliefF score step by step and repeated the global testing process until only 5 SNPs were remained. We stopped the global testing procedure at 5 SNPs because it is not meaningful or necessary to perform global testing when the number of SNPs is less than 5 in any case study. The entire procedure of filtration and global testing of p-values are summarized in Table 2. The optimal number of SNPs is the largest number of SNPs with global testing p-value <0.05. In this case study, KS test and Inverse chi test were more sensitive to deviation from uniformity as the tests became significant after 5 and 4 SNPs removed respectively (  Figure 3 and 4 both revealed strong patterns of transition when noisy SNPs were removed. In Figure 3, as SNPs with low ReliefF scores were removed sequentially, the histogram of p-values started to shift toward 0. Also in Figure 4, the global test p-values were in decreasing trends when noisy SNPs were removed one by one. Once the p-value of global testing was under 0.05, it continued to stay under 0.05. There were only a few exceptions at the end of the filtration procedure, probably due to smaller sample sizes and reduction of power. After the filtration and global testing, we removed 4 SNPs as suggested by Inverse chi test with FDR correction ( Table 2) and performed MDR analysis on the remaining 21 SNPs. The results of MDR analysis indicated that there was significant two-way interaction between DHFR (rs7387) and ITPA (rs2295553) with testing balance accuracy = 0.7374 (p = 0.0045). The MDR analysis was performed by an open source software mdr2.0 and model goodness-of-fit and significance was assessed by permutation using software mdrpt1.0 [8].
The dihydrofolate reductase (DHFR) enzyme is a well known important target of MTX action. When DHFR is inhibited by MTX the subsequent production of reduced folates such as tetrahydrofolate (THF) 5,10 methenyl-THF and 5-methyl-THF are altered, affecting not only total cellular folate concentrations but also the downstream effects from one carbon donation including homocysteine remethylation and pyrimidine and purine synthesis thought to result in an anti-proliferative effect [23]. The inhibition of DHFR also results in a buildup of the precursor dihydrofolate (DHF) which in its polyglutamated state has inhibitory effects upon enzymes within the pathway as well [24]. Inosine triphosphate pyrophosphatase (ITPA) plays a role in de novo purine synthesis, and is closely related to adenosine metabolism, which is thought to contribute to MTX response via its anti-inflammatory effect [25]. Variations in ITPA interestingly have been shown by other authors to contribute to MTX response as part of a candidate gene study in JIA [26], as well as "predisposing genetic attribute" in studies utilizing MDR in adults with rheumatoid arthritis [27,28]. How these 2 genes directly affect MTXglu patterns remains difficult to determine, as the direct understanding of how MTXglu patterns are associated with response is yet to be elucidated. However, both genes encode enzymes closely linked to or directly affected by MTX, thus as we gain a more detailed knowledge of cellular folate metabolism and its disruption by anti-folate agents such as MTX, we will then develop a better understanding of this complex system, and how alterations in the folate pathway affect response to the drug.

Power simulation
In the two empirical studies described below, we investigate the performance of 6 global testing when p-values exhibit different patterns (Figure 1 The marginal distribution of combined p-values becomes Pe 1 À π ð ÞUniform 0; 1 ð ÞþπBeta a; b ð Þ , which indicates that with 1 À π ð ÞÂ100% of chance, a p-value is drawn from Uniform(0,1) and with π Â 100% of chance, a p-value is drawn from Beta(α,b). Beta distribution is very flexible to characterize the patterns of p-values ( Figure 1) where Uniform (0,1) is a special case of Beta (1,1). One can also adjust the shape and scale parameters a and b to model the deviation from uniformity.
The p-values from MDR analysis are correlated due to linkage disequilibrium among SNPs and sharing the SNPs among GxG interactions. The dependence among p-values might cause inflation of Type I errors or lead to bias in global tests. As a result, it is critical to extensively simulate p-values with varying correlation structures and assess the robustness of global tests for correlated p-values. In this work, we simulated correlated Uniform variables with random correlation matrix Σ and Beta random variables with correlation coefficient ρ ¼ 0:2; 0:8; Beta 2; 5 ð Þ; Uniform 0:1 ; 0:9 ð Þ respectively. The details of generating correlated uniform [29] and beta distributions [30] are summarized in Appendix 1.
The first simulation study concerns the Type I error of global testing when there does not exist any GxG interactions among genes (SNPs). We generated p-values from These six scenarios cover Patterns 1, 3 and 4 with no signs of GxG interactions in Figure 1. For each simulation, the sample size of p-values varies from 20 to 500 and we performed global tests on each sample of p-values. We repeated the process 1000 times, and calculate the percentage of rejection of null hypothesis for each test. Under the hypothesis of no GxG interaction, this rejection rate is considered as Type I error. As shown in Table 3, the Type I error rates are well controlled to be near or under the nominal rate 0.05 when p-values are i.i.d Uniform(0,1). When p-values are correlated Uniform with random correlation matrices, there was slight inflation in five global tests except Tippett's test. It is good to notice that the inflation is not severe as most tests have Type I error rates under 0.07. Such mild inflation is acceptable in screening testing and we will discuss how to further address this issue in Discussion Section.
In addition to Uniform distributions, we also simulated correlated Beta mixtures in formula (4.1)-(4.4) regarding p-values shifting to 1 or peaking near the center (Patterns 3 and 4). We conservatively set correlation coefficient ρ ¼ 0:8 (Table 3) to simulate very strong correlation among Beta variates, which is most likely to inflate Type I errors in global tests. The simulation results for mild correlation including ρ ¼ 0:2; Beta 2; 5 ð Þ; Uniform 0:1; 0:9 ð Þare summarized in Appendix 1 ( Table 4). The results from Table 3 and Appendix 1 (Table 4) show that Inverse chi test and Tippett's test are very robust to dependency in p-values with well controlled Type I error rates. The KS test has the highest inflation in several scenarios we simulated, especially when correlated p-values had peaks near center (Pattern 4). For strongly correlated p-values (ρ ¼ 0:8 Table 3), the inverse norm, the Wilcoxon and Logit tests also had modest inflations when sample sizes get larger (n > 200). When p-values were moderately correlated ( ρ ¼ 0:2; Beta 2; 5 ð Þ; Uniform 0:1; 0:9 ð Þ, (Table 4)), the inverse norm, the Wilcoxon and the Logit tests had well controlled Type I errors for all tested sample sizes.
In the second simulation study, we are interested in the power of each of the approaches to detect the GxG interactions by performing the hypothesis testing (1) to detect Pr P≤p ð Þ > p(Pattern 2 with GxG interactions). We simulated p-values from a wide range of beta mixture distribution where the mixing π was set to be 0.1 and 0.4, indicating different proportions of tests with significant GxG interactions. In most cases, parameters a < b will have Pr P≤p ð Þ > p for p 2 0; 1 ð Þ which coincides with Pattern 2. Under alternative hypothesis of a proportion of tests having GxG interaction, we simulated p-values from 6 Beta mixtures: As in the Type I error simulation, in the power simulation, we used sample sizes ranging from 20 to 500 and perform simulation 1000 times. The power is the percentage of results across the 1000 replicates where the null hypothesis was rejected. We present the power comparison results under week correlation ρ ¼ 0:2 in Table 5

Discussion and conclusions
Multifactor Dimensionality Reduction (MDR) is a novel statistical method developed to characterize and detect nonlinear complex gene-gene interactions (epistasis) that could be associated with disease susceptibility. We suggest incorporating global test to filtration procedures to reveal a trend of gene interactive patterns when noisy genes are removed step by step using ReliefF or other filtration techniques. The optimal number of genes for further MDR analysis can be identified by p-values of global testing. A real data applications and empirical assessment of our proposed methods reveal strong trends in global testing of p-values and clear patterns of distribution of p-values in three scenarios: 1) presence of GxG interactions, 2) absence of GxG interactions, 3) weak GxG interactions that needs filtration to remove noisy genes. The proposed global tests can serve as a screening approach before individual tests to prevent false discovery. Strong power in      small sample sizes and well controlled Type I error in absence of GxG interactions makes these tests highly recommended in epistasis studies. Global testing has not been implemented in MDR analyses in the literature we have reviewed. Currently, researchers rely on adjustment of individual p-values such as false discovery rate (FDR) as suggested by [31]. Due to high dimensionality in genetic interactions, the FDR and other multiple testing adjustments often lose power in MDR analyses. Some MDR studies [27] have utilized the false positive report probability proposed by [32] but this method has been pointed out by [33] to be heuristic and wrong in formulation. In contrast, the global tests proposed by this paper are based on rigorous statistical theories and inferences.
Through extensive simulation on correlated p-values, our study shows that the Inverse chi test is the most powerful approach to be incorporated with the filtration techniques to determine the optimal number of SNPs for the final MDR analysis. The KS test might have high inflation of Type I errors when p-values are highly correlated or when p-values peak near the center of histogram (Pattern 4). The Tippett's test has very low power when the effect size of Pattern 2 is small.
We observe mild inflation of Type I error (<0.07) when p-values are Uniform with a random correlation matrix.   Figure 4 with a decreasing trend for global testing results as well Figure 3 with histograms systematically switching to Pattern 2 can also serve as diagnostic tools to prevent false discoveries or selection bias in global tests. It is worthwhile to point out the proposed global tests can effectively prevent false discovery without losing the power to detect significant GxG interactions. To prevent the false discovery, current MDR applications typically choose one optimal model for each k-way interaction. This method has two major drawbacks: firstly, the false positive discovery is not reduced by choosing one optimal model; secondly, choosing one optimal models may overlook other potential GxG interactions that also contributes to the disease susceptibility.
The major contribution of our manuscript is to incorporate global testing procedures to MDR framework. Our proposed global tests will provide p-values to help practitioners determine the appropriate number of SNPs to be remained in the final analysis. The current filtration process does not provide p-values. Therefore, using arbitrary cutoff value in the current process might lead to over-filtering or under-filtering of SNPs.
All 6 global tests are based on statistical inference instead of permutation. These 6 tests run very fast in a single computer. The major computational challenges are in the generation of p-values for MDR through permutation tests but this is not the major focus of our work. Several works have been devoted to improve the efficiency and shorten the computing time in MDR analysis in high-throughput data.    We will defer interested readers to the corresponding citations for computing issues in high-throughput MDR analysis. These computational limitations make our strategy appropriate in large scale candidate gene studies, but may be limited in application to genome-wide association studies until further improvements in computing speed are realized or very large-scale computing resources are available. MDR permutation computing time is largely dependent on the dimension of data sets. In other words, the computing time increases as the number of SNPs and/or the number of subjects increases. Interestingly, the dimension of data does little impact on the computing time of global tests. The computing time for global tests of 1000 p-values is very close to tests of 10 p-values. Several filtration approaches have been proposed and some (ReliefF, SURF and TuRF etc.) have been implemented in the MDR software (http://www.epistasis.org). In this work, we utilize ReliefF for filtration. There have been other filtration techniques proposed in literature. For instance, [34] introduced entropy-based information gain to search and evaluate interactions among risk factors. The current MDR software [8] provides ReliefF, entropy, chi-square test, etc. about 10 filtration methods. The global tests could be integrated in the workflow with other filtration techniques, although the comparison and evaluation of all filtration technique requires more research attention. Uniform distributions have random correlation matrices).