 Methodology
 Open Access
 Published:
Global tests of Pvalues for multifactor dimensionality reduction models in selection of optimal number of target genes
BioData Mining volume 5, Article number: 3 (2012)
Abstract
Background
Multifactor Dimensionality Reduction (MDR) is a popular and successful data mining method developed to characterize and detect nonlinear complex genegene interactions (epistasis) that are associated with disease susceptibility. Because MDR uses a combinatorial search strategy to detect interaction, several filtration techniques have been developed to remove genes (SNPs) that have no interactive effects prior to analysis. However, the cutoff values implemented for these filtration methods are arbitrary, therefore different choices of cutoff values will lead to different selections of genes (SNPs).
Methods
We suggest incorporating a global test of pvalues to filtration procedures to identify the optimal number of genes/SNPs for further MDR analysis and demonstrate this approach using a ReliefF filter technique. We compare the performance of different global testing procedures in this context, including the KolmogorovSmirnov test, the inverse chisquare test, the inverse normal test, the logit test, the Wilcoxon test and Tippett’s test. Additionally we demonstrate the approach on a real data application with a candidate gene study of drug response in Juvenile Idiopathic Arthritis.
Results
Extensive simulation of correlated pvalues show that the inverse chisquare test is the most appropriate approach to be incorporated with the screening approach to determine the optimal number of SNPs for the final MDR analysis. The KolmogorovSmirnov test has high inflation of Type I errors when pvalues are highly correlated or when pvalues peak near the center of histogram. Tippett’s test has very low power when the effect size of GxG interactions is small.
Conclusions
The proposed global tests can serve as a screening approach prior to individual tests to prevent false discovery. Strong power in small sample sizes and well controlled Type I error in absence of GxG interactions make global tests highly recommended in epistasis studies.
Background
Recent advances in genotyping technology have allowed for the rapid and easy interrogation of large numbers of genetic variants for association with common, complex disease. While there have been a number of successes in association mapping studies, the associations found typically explain very little of the overall heritability of the traits being studied. There are several potential reasons for this “missing heritability”, and one of those potential explanations is epistatic interactions (genegene interactions). It is hypothesized that such interactions play an important role in the etiology of complex (nonMendelian) traits, but detecting such interactions presents a number of statistical and computation challenges [1]. In response to these challenges, a number of new datamining approaches have been developed [2].
Multifactor Dimensionality Reduction (MDR) is a popular and highly successful statistical method developed to detect and characterize nonlinear complex genegene or geneenvironment interactions (epistasis) that could be associated with disease susceptibility. The method was first proposed by Ritchie et al. [3] to detect estrogenmetabolism gene interactions associated with sporadic breast cancer. MDR has several advantages over more traditional statistical approaches such as logistic regression modeling: 1) MDR is a nonparametric approach with no requirement to the distribution of data. 2) MDR can analyze nonlinear associations in genotypic combinations. 3) MDR has improved power to detect genegene interaction in small to moderate sample sizes. Since the introduction of the original MDR implementation, many works have been published to improve modeling and prediction accuracy with the MDR method. For more information on the history and development of the method, please refer to the comprehensive review of the MDR and its extended methods by Moore [4].
While the MDR approach is widely used, to make this paper selfcontained, we give a brief description of the method. MDR is often applied to genotypic data to detect genegene (GxG) interactions among single nucleotide polymorphism (SNP) and the original implementation of this method can be extended to detect the interactions in other types of data when the explanatory variables are categorical variables and the outcome variable is binary. As the scale of association studies has expanded (with larger numbers of SNPs), a filtration step is often implemented in the first step of MDR analysis to remove noisy SNPs. In this step, a subset of genes that are unlikely to interact with others is removed by filtration methods such as SURF [5], TuRF [6] etc. ReliefF [7], has become a commonly applied filter, and we will focus on this filter in the current study. After this step, the remaining SNPs are used for the dimensionality reduction and model selection steps of the MDR algorithm. In this step, all variable combinations are considered for kway (k = 2, 3, 4 …) interactions. For each multilocus combination, the ratio of cases to controls within each contingency table cell is calculated, and then each cell is assigned a status of highrisk or lowrisk by comparing this ratio to the ratio of cases: controls in the overall dataset. Cells with a ratio greater than the overall ratio are assigned “highrisk” status, and those with a ratio lower than the overall ratio are assigned “lowrisk” status. Subsequently, a balanced classification accuracy is calculated for each multilocus combination, and the optimal model is selected based on the highest balanced accuracy. This model selection approach is performed in concert with a crossvalidation procedure, usually 10fold, which randomly divides the whole data set into a training set and a validation set. The testing accuracy is the balanced accuracy when the classification rule developed from the training data set is applied to the testing data set. The cross validation count (CVC) summarizes the number of times a model is the top model in each of the crossvalidation splits of the data. The optimal kway (k = 2, 3, 4 …) interaction model with the highest training accuracy and the highest CVC is then selected as the winner model. Finally, the significance of the selected optimal model is assessed by permutation testing (comparing the testing/prediction accuracy against the empirical distribution built by at least 1000 permutations). MDR can be performed by an open source software mdr2.0 and model goodnessoffit and significance can be assessed using software mdrpt1.0 [8] or in the MDR.R R software package [9].
In this work, we seek to address two existing issues in the current MDR analysis. First, current filtration approaches do not evaluate the significance of the SNPs considered (or provide pvalue for their measures) and there is no clear guideline for the cutoff point of such filtration measures. This leads inconsistency in the optimal number of SNPs remaining for the final MDR analysis.
Second, as there is a growing appreciation that the etiology of human diseases is extremely complex, many investigators are using MDR to evaluate many potential interactive effects, and not just a single final best model [10]. In this type of approach, not one but numerous tests can be performed in search of an optimal model in the kway interaction, as the number of partitions for kway interaction over m loci is $\left(\begin{array}{l}m\\ k\end{array}\right)=\frac{m!}{k!(mk)!}$. For instance, if an investigator is interested to detect significant 2way interactions among 50 SNPs, 1225 tests will be performed which will inflate the familywise Type I error rate to $1{(1\alpha )}^{\left(\begin{array}{l}m\\ k\end{array}\right)}=1{(10.05)}^{1225}\approx 1$, where $\alpha =0.05$ is the nominal error rate for an individual test without proper control.
False discoveries and losing power to detect the signal after the multiplicity adjustment are two concurrent issues in analyzing high dimensional data. Instead of replacing all the existing methods to control the false discovery, we propose to add the global tests to the current MDR framework as an adhoc screening process to prevent false discoveries. We will explain the rationale and utility of global tests in Section Global tests.
Incorporating global tests within the filtration procedures can reveal a trend of gene interactive patterns when noisy genes (SNPs) are removed step by step using ReliefF or other filtration techniques. In the current study, we demonstrate this approach in a candidate gene study of drug response in Juvenile Idiopathic Arthritis, using the ReleifF filter. Additionally, we perform a simulation study comparing several different global testing approaches for a range of genetic etiologies to compare the power of different approaches.
Methods
Global tests
The idea of global testing is to assess the patterns of pvalues from multiple testing of k way interactions among m loci ($n=\left(\begin{array}{l}m\\ k\end{array}\right)=\frac{m!}{k!(mk)!}$tests). Under the null hypothesis of no GxG interactions, the pvalues will follow uniform (0, 1). To see this, let T be the test statistic with the cumulative distribution function (CDF) F _{0} (t) and the inverse CDF ${F}_{0}^{1}\left(t\right)$ for $t\in \mathfrak{R}$under H _{ 0 }. Let P be the pvalue corresponding to the test statistic T. Under H _{ 0 }, we have $\mathrm{Pr}(P\le p)=\mathrm{Pr}\left({F}_{0}^{1}\right(P)\le {F}_{0}^{1}(p\left)\right)={F}_{0}\left({F}_{0}^{1}\right(p\left)\right)=p$ for $p\in \left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)$ (Pattern 1 in Figure 1).
When a proportion of m loci have k way interactions (H _{ α }), it is expected to observe the pvalues shifting towards 0. To see this, let F(t) be the CDF under H _{ α } and $F\left(t\right)>{F}_{0}\left(t\right)$ for $t\in \mathfrak{R}$, then $\mathrm{Pr}(P\le p)=\mathrm{Pr}\left({F}_{0}^{1}\right(P\left)\right)\le {F}_{0}^{1}\left(p\right))=F({F}_{0}^{1}\left(p\right))>p$ for $p\in \left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)$ (Pattern 2 in Figure 1). Due to correlations/linkage disequilibrium among SNPs and the redundancy of SNPs in high order models, sometimes pvalues shift toward 1, i.e. $\mathrm{Pr}(P\le p)<p$ for $p\in \left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)$(Pattern 3 in Figure 1). When pvalues are correlated, they might peak near center of histogram (Pattern 4 in Figure 1). Patterns 3 and 4 are deviated from uniformity but they do not indicate potential k way interactions among m loci.
The rationale of global testing is to ensure pvalues are not randomly and uniformly distributed (Pattern 1) before we investigate each single pvalue. Correlated pvalues without significant effects (H _{0}) might even shift toward 1 or peak near the center (Patterns 3 and 4). The goals are to rule out Patterns 1, 3 and 4 and only move forward to the final MDR analysis when pvalues are in Pattern 2.
If the entire set of pvalue follows a uniform distribution, then it is very likely for a small pvalue to be a false discovery by chance. As shown in Figure 1, the entire set of pvalues might have four different Patterns: uniform, shifting to 0, shifting to 1 or peak near the center. In all four cases, we notice that the minimum pvalues are less than 0.05 (0.0001111 in Pattern 1, 2.65e6 in Pattern 2, 0.00617 in Pattern 3 and 0.003734 in Pattern 4). If we take the distribution of the entire set of pvalues into account, then the minimum pvalues in Patterns 1, 3 and 4 are false discoveries by chance.
Combined global testing and filtration technique
A global test will serve as an adhoc diagnostic tool to exam all pvalues from k way interactions among m genes in MDRanalysis. These pvalues come from empirical distributions generated through permutation testing. Let ${P}_{}={p}_{i}(i=1\text{,}\phantom{\rule{0.12em}{0ex}}2\text{,}\cdots \text{,}\phantom{\rule{0.12em}{0ex}}n)$be identical and independently distributed (i.i.d.) pvalues from the MDR analysis of k–way interactions among m– loci ($n=\left(\begin{array}{l}m\\ k\end{array}\right)=\frac{m!}{k!(mk)!}$). We will consider a onesided test to compare
Rejecting H _{0} indicates significant GxG interactions in some target genes.
We propose incorporating global testing of pvalues with ReliefF [7] gene filtration technique to detect the patterns of kway GxG interaction among m genes (SNPs) and remove noisy genes (SNPs) with little interactive effects to determine the optimal number of SNPs for the final MDR analysis. The ReliefF algorithm estimates weights to measure the potential accuracy of attributes in prediction of phenotype. The redundant attribute will be assigned a lower score. When applied in genegene interactions, a higher ReliefF score indicates a stronger interactive effect for the corresponding gene (SNP). ReliefF algorithm first uses x nearest neighborhood approach $(x=1\text{,}\phantom{\rule{0.12em}{0ex}}2\text{,}\cdots \text{,}\phantom{\rule{0.12em}{0ex}}m)$ to match a selected subject with x subjects in neighborhood (with shortest distances across all SNPs) from the control group and from test group respectively. An attribute (SNP) will be assigned score 1 (−1) if the attribute from the selected subject matches (mismatches) one of x nearest subjects from the same phenotype group. Similarly, an attribute will be assigned score −1 (1) if the attribute from the selected subject matches (mismatches) one of the nearest subjects from the different phenotype group. The score will be aggregated for all subjects and normalized (divided) by the total number of subjects and neighbors. Detailed description of ReliefF algorithm for filtering genotyping data can be found in Section 3 of [4].
The flow chart of the testing procedure is presented in Figure 2. Starting with a set of m candidate SNPs, perform the ReliefF algorithm on m SNPs and sort SNPs by ReliefF scores in an ascending order. Generate pvalues for the exhaustive search of k way interactions among a total of m SNPs using the original or extended MDR methods. For each k way interaction, one pvalue of MDR analysis is generated by permutation test. Let m _{ 1 } be the number of remaining SNPs. Perform the global testing on $\left(\begin{array}{l}{m}_{1}\\ k\end{array}\right)$ pvalues. Remove one SNP that has the lowest ReliefF score and all interactions corresponding to this SNP. Perform the global testing about hypothesis testing (1) on the pvalues of the kway interactions of the remaining SNPs. Continue to remove SNPs with the lowest ReliefF score one by one and perform global testing after each removal of a SNP. One can stop the process when the remaining SNPs reach a predetermined minimum number. Choose the optimal number of SNPs for the final MDR analysis as the largest number of SNPs with global testing pvalue < α. Often we set $\alpha =0.05$. To be more rigorous of controlling familywise Type I error, one can apply FDR algorithm on the global testing pvalues and the first pvalue with FDR < 0.05 will determine the optimal number of SNPs.
Global tests of Pvalues
Here we introduce 6 global tests that can be applied to test hypothesis (1). These six tests are based on different approaches to detect deviation from uniformity. We will survey these methods and compare their power using a case study and MonteCarlo simulations.
Test 1 one sided KolmogorovSmirnov test [KS]
KS test is a nonparametric test that can be applied to compare the distance between an empirical distribution of i.i.d. pvalues and Uniform(0, 1). For hypothesis test (1), define the onesided KS statistic as ${D}_{n}^{+}=\underset{p}{sup}\left(\frac{1}{n}\sum _{i=1}^{n}{I}_{\{{p}_{i}\le p\}}p\right)$, where ${I}_{\{{p}_{i}\le p\}}$is an indicator function which equals 1 if ${p}_{i}\le p$and 0 if ${p}_{i}>p$. According to [11], the pvalue of onesided KS test follows $\mathrm{Pr}\left({D}_{n}^{+}>t\right)=t\sum _{i=0}^{\left[n\right(1t\left)\right]}\left(\begin{array}{l}n\\ i\end{array}\right){\left(1ti/n\right)}^{ni}{\left(t+i/n\right)}^{i1}$ where $\left[n\right(1t\left)\right]$ is the largest integer not greater than $n(1t)$ and $t\in \left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)$.
Test 2 onesided inverse chisquare test [inverse chi]
Fisher [12] shows that if ${p}_{i}\stackrel{i.i.d.}{~}Uniform(0,1)$ for $i=1\text{,}\phantom{\rule{0.12em}{0ex}}2\text{,}\cdots \text{,}\phantom{\rule{0.12em}{0ex}}n$, then $2\sum _{i=1}^{n}ln\left({p}_{i}\right)~{\chi}_{2n}^{2}$where ${\chi}_{2n}^{2}$is chisquare distribution with 2n degrees of freedom. For a one sided test (1), reject H _{0} if $2\sum _{i=1}^{n}ln\left({p}_{i}\right)>{\chi}_{2n,1\alpha}^{2}$where ${\chi}_{2n,1\alpha}^{2}$ is $(1\alpha )*100\%$percentile of ${\chi}_{2n}^{2}$.
Test 3 one sided inverse normal test [inverse norm]
Transform pvalue to normal z score by letting ${z}_{i}={\Phi}^{1}\left({p}_{i}\right)$where ${\Phi}^{1}$ is inverse cumulative normal distribution. Under H _{ 0 }, $Z=\left(\sum _{i=1}^{n}{z}_{i}\right)/\sqrt{n}~N\left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)$. For one sided test (1), reject H _{0} if Z < Z _{ α } where Z _{ α } is $\alpha *100\%$ percentile of the standard normal distribution.
Test 4 one sided logit test [logit]
Logit transform pvalue by letting $L=\sum _{i=1}^{n}ln({p}_{i}/(1{p}_{i}\left)\right)$. [13] shows that under H _{0}, the distribution of L can be closely approximated by Student’s tdistribution with $5n+4$degrees of freedom, namely ${L}^{*}=L\sqrt{\frac{3(5n+4)}{{\pi}^{2}n(5n+2)}}\approx {t}_{5n+4}$. Therefore, for onesided test (1), we can reject H _{0} if ${L}^{*}<{t}_{5n+4,\alpha}$where ${t}_{5n+4,\alpha}$ is $\alpha *100\%$ percentile of the tdistribution.
Test 5 one sided Wilcoxon test [Wilcoxon]
Order n pvalues from MDR testing along with n _{ 2 } observations randomly drawn from Uniform(0,1) from least to greatest and denote them by ${S}_{1}\text{,}\phantom{\rule{0.12em}{0ex}}{S}_{2}\text{,}\cdots \text{,}\phantom{\rule{0.12em}{0ex}}{S}_{N}$ with $N=n+{n}_{2}$. Let W be the sum of the ranks corresponding to n pvalues from MDR testing. For onesided test (1), we can reject H _{0} if $W\le n(N+1){\omega}_{\alpha}$where the constant ω _{ α } is chosen to make the Type I error probability equal α. Values of ω _{ α } are given in Table A6 by [14]. For large sample sizes, i.e. min(n,n _{2}) going to infinity, one can apply normal approximation on the standardized W.
Test 6 Tippett and Wilkinson’s test [Tippett]
Tippett’s Test [15] is based on the property of the minimal pvalue in multiple testing. Let ${p}_{\left(1\right)}\text{,}\phantom{\rule{0.12em}{0ex}}{p}_{\left(2\right)}\text{,}\cdots \text{,}\phantom{\rule{0.12em}{0ex}}{p}_{\left(n\right)}$be the ordered pvalues in an ascending order. When pvalues identically and independently follow Uniform(0,1) distribution, Tippett’s test will reject H _{0} if ${p}_{\left(1\right)}<1{(1\alpha )}^{1/n}$. The pvalue of Tippett’s test equals $1{(1{p}_{\left(1\right)})}^{n}$. Tippett’s test is very easy to perform but it only takes the smallest pvalue into account.
Wilkinson [16] extended Tippett’s procedure to the r ^{th} smallest pvalues where $r=1\text{,}\phantom{\rule{0.12em}{0ex}}2\text{,}\cdots \text{,}\phantom{\rule{0.12em}{0ex}}n$. By expanding ${(\alpha +(1\alpha \left)\right)}^{n}$, Wilkinson tabulated the probability, denoted by C _{ γ,α } of obtaining r significant statistics by chance in a group of n tests. Suppose there are r tests with pvalues less than α, Wilkinson’s test rejects H _{0} if ${c}_{r,\alpha}<\alpha $[17]. Because P _{ (r) } has been distributed with parameters r and nr + 1, tables of the incomplete beta function can be used to obtain critical values of P _{ (r) } directly. In our work, we will not include Wilkinson’s test in case study and power simulation because this method does not provide pvalue for the testing results.
Case study
We used a real dataset to illustrate how to apply our proposed global testing to prevent false discovery and to determine the optimal number of SNPs for the final MDR analysis. Juvenile Idiopathic Arthritis (JIA) is one of the most common chronic diseases of childhood, affecting an estimated 300,000 children in the U.S. alone, and is an important cause of morbidity and disability in children [18]. Although methotrexate (MTX) is the most commonly used secondline agent used to treat JIA worldwide, this antifolate drug has shown considerable interindividual variability in clinical response and adverse reactions [19]. The polyglutamation of methotrexate (MTXglu) is an intracellular mechanism that retains the drug and enhances target enzyme inhibition within the folate pathway [20], and high concentrations of “long chain” methotrexate polyglutamates (MTXglu_{35}) have been associated with improved response to the drug in adults with rheumatoid arthritis [21]. Studies have reported the extensive variability in intracellular MTXglu concentrations in JIA, and an association of long chain MTXglu with toxicity (but not efficacy) in children [22]. Due to the complexity of the folate cycle as well as the extensive variability in response to the drug in clinical practice, it is hypothesized that genetic factors may contribute to differences seen in distinct patterns of MTXglu concentrations intracellularly, which might further impact patients’ responses to MTX.
In this case study, we analyzed 25 SNPs from 17 candidate genes in the folate pathway (Table 1). MTXglu was measured in all patients after at least 3 months on stable MTX therapy and a range of 1 to 5 glutamate moieties were reported as a percentage of the total polyglutamate concentration (MTXglu_{n%}). Hierarchical clustering was performed to identify patterns of MTXglu_{n%} ^{,} and two clusters were determined based on the hierarchical clustering of normalized MTXglu_{15%}. Subjects in cluster 1 had lower concentration of short chain polyglutamates (MTXglu_{12%}) and higher concentration of long chain polyglutamates (MTXglu_{35%}) as compared to subjects in cluster 2 (p < 0.05). These clusters reflected distinct patterns in the proportion of MTXglu concentrations.
There were 30 subjects in Cluster 1 and 74 subjects in Cluster 2. The MTXglu clustering phenotype was coded 1 and 0 for MDR analysis. Genotypes, coded 0 for common homozygote, 1 for heterozygote and 2 for rare homozygote for 25 SNPs, were measured. The overall goal of the analysis was to assess whether interactions among SNPs are associated with MTXglu clustering. While the scale of this study is not so large that an exhaustive search of all SNPs is computationally limited, this data is used to demonstrate the proposed approach.
For illustrative purposes, we will focus on 2way interactions among 25 SNPs and will determine the optimal number of targeted SNPs for testing 2way interactions. We first applied the ReliefF algorithm to 25 SNPs. As shown in Table 2, ReliefF scores ranged from −0.0308 to 0.1163. Although a higher score indicates stronger interaction with other SNPs, there is no clear cutoff point for ReliefF scores.
To circumvent this limitation, we incorporated global testing of pvalues and ReliefF algorithm using the method proposed in Section Combined global testing and filtration technique. We first generated pvalues for all 2way interactions among 25 SNPs through permutation testing. Then we applied global testing, including KS test, Inverse chi test, Inverse norm test, Logit test, Wilcoxon test and Tippett’s tests on $\left(\begin{array}{l}25\\ 2\end{array}\right)=300$ pvalues of 2way interactions among 25 SNPs. The global tests were performed to evaluate whether the distribution of pvalues deviated from uniformity (null hypothesis) and shifted towards 0 (alternative hypothesis  Pattern 2 in Figure 1). Then we removed one SNP with the lowest ReliefF score step by step and repeated the global testing process until only 5 SNPs were remained. We stopped the global testing procedure at 5 SNPs because it is not meaningful or necessary to perform global testing when the number of SNPs is less than 5 in any case study.
The entire procedure of filtration and global testing of pvalues are summarized in Table 2. The optimal number of SNPs is the largest number of SNPs with global testing pvalue <0.05. In this case study, KS test and Inverse chi test were more sensitive to deviation from uniformity as the tests became significant after 5 and 4 SNPs removed respectively (Table 2). Inverse norm test and Logit test were more conservative, suggesting removal of 11 SNPs. Wilcoxon test, removing 8 SNPs, was moderate as compared to the other tests. Tippett’s test failed to detect significant GxG interactions with all FDR corrected pvalues > 0.05. Our further simulation studies (discussed in Section Power simulation) indicate that Tippett’s test, which only takes the smallest pvalue into account, might not be appropriate for global testing of pvalues.
Figure 3 and 4 both revealed strong patterns of transition when noisy SNPs were removed. In Figure 3, as SNPs with low ReliefF scores were removed sequentially, the histogram of pvalues started to shift toward 0. Also in Figure 4, the global test pvalues were in decreasing trends when noisy SNPs were removed one by one. Once the pvalue of global testing was under 0.05, it continued to stay under 0.05. There were only a few exceptions at the end of the filtration procedure, probably due to smaller sample sizes and reduction of power.
After the filtration and global testing, we removed 4 SNPs as suggested by Inverse chi test with FDR correction (Table 2) and performed MDR analysis on the remaining 21 SNPs. The results of MDR analysis indicated that there was significant twoway interaction between DHFR (rs7387) and ITPA (rs2295553) with testing balance accuracy = 0.7374 (p = 0.0045). The MDR analysis was performed by an open source software mdr2.0 and model goodnessoffit and significance was assessed by permutation using software mdrpt1.0 [8].
The dihydrofolate reductase (DHFR) enzyme is a well known important target of MTX action. When DHFR is inhibited by MTX the subsequent production of reduced folates such as tetrahydrofolate (THF) 5,10 methenylTHF and 5methylTHF are altered, affecting not only total cellular folate concentrations but also the downstream effects from one carbon donation including homocysteine remethylation and pyrimidine and purine synthesis thought to result in an antiproliferative effect [23]. The inhibition of DHFR also results in a buildup of the precursor dihydrofolate (DHF) which in its polyglutamated state has inhibitory effects upon enzymes within the pathway as well [24]. Inosine triphosphate pyrophosphatase (ITPA) plays a role in de novo purine synthesis, and is closely related to adenosine metabolism, which is thought to contribute to MTX response via its antiinflammatory effect [25]. Variations in ITPA interestingly have been shown by other authors to contribute to MTX response as part of a candidate gene study in JIA [26], as well as “predisposing genetic attribute” in studies utilizing MDR in adults with rheumatoid arthritis [27, 28]. How these 2 genes directly affect MTXglu patterns remains difficult to determine, as the direct understanding of how MTXglu patterns are associated with response is yet to be elucidated. However, both genes encode enzymes closely linked to or directly affected by MTX, thus as we gain a more detailed knowledge of cellular folate metabolism and its disruption by antifolate agents such as MTX, we will then develop a better understanding of this complex system, and how alterations in the folate pathway affect response to the drug.
Power simulation
In the two empirical studies described below, we investigate the performance of 6 global testing when pvalues exhibit different patterns (Figure 1) of variation. Data were generated from the Uniform distribution or varying mixtures of Beta distributions based on the inference regarding the patterns of pvalues as described in Section Global tests and Figure 1. Here we give the rationale of using uniform or beta mixture to simulate pvalues under null and alternative hypotheses. Under the null hypothesis of no GxG interactions, we have proved that pvalues follow Uniform(0,1) distribution (Pattern 1). When this null hypothesis is violated, we introduce a latent variable to indicate the status of underlying hypothesis for each test. For ${p}_{i}\text{,}\phantom{\rule{0.12em}{0ex}}i=1\text{,}\phantom{\rule{0.12em}{0ex}}2\text{,}\cdots \text{,}\phantom{\rule{0.12em}{0ex}}n$, introduce a latent variable ${Z}_{i}$ where for hypothesis testing (1), we have
for $p\in \left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)$. The proportion of tests where H _{ α } holds is denoted by the mixing weight $\mathrm{\text{Pr}}({Z}_{i}=1)=\pi $ where $\pi \in \left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)$.
Conditioning on Z _{ i }, we have
The marginal distribution of combined pvalues becomes $P~(1\pi )\text{Uniform}(0,1)+\pi \text{Beta}\left(a\text{,}\phantom{\rule{0.12em}{0ex}}b\right)$, which indicates that with $(1\pi )\times 100\%$ of chance, a pvalue is drawn from Uniform(0,1) and with $\pi \times 100\%$ of chance, a pvalue is drawn from Beta(α,b). Beta distribution is very flexible to characterize the patterns of pvalues (Figure 1) where Uniform (0,1) is a special case of Beta (1,1). One can also adjust the shape and scale parameters a and b to model the deviation from uniformity.
The pvalues from MDR analysis are correlated due to linkage disequilibrium among SNPs and sharing the SNPs among GxG interactions. The dependence among pvalues might cause inflation of Type I errors or lead to bias in global tests. As a result, it is critical to extensively simulate pvalues with varying correlation structures and assess the robustness of global tests for correlated pvalues. In this work, we simulated correlated Uniform variables with random correlation matrix Σ and Beta random variables with correlation coefficient $\rho =0.2\text{,}\phantom{\rule{0.12em}{0ex}}0.8\text{,}\phantom{\rule{0.12em}{0ex}}\text{Beta}\left(2\text{,}\phantom{\rule{0.12em}{0ex}}5\right)\text{,}\phantom{\rule{0.12em}{0ex}}\text{Uniform}\left(0.1\phantom{\rule{0.12em}{0ex}}\text{,}\phantom{\rule{0.12em}{0ex}}0.9\right)$ respectively. The details of generating correlated uniform [29] and beta distributions [30] are summarized in Appendix 1.
The first simulation study concerns the Type I error of global testing when there does not exist any GxG interactions among genes (SNPs). We generated pvalues from

Independent Uniform(0,1),

Correlated $\text{Uniform}\left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)$,

Correlated $0.9\text{Uniform}\left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)+0.1\text{Beta}\left(5\text{,}\phantom{\rule{0.12em}{0ex}}1\right)$, (4.1)

Correlated $0.5\text{Uniform}\left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)+0.5\text{Beta}\left(5\text{,}\phantom{\rule{0.12em}{0ex}}1\right)$, (4.2)

Correlated $0.9\text{Uniform}\left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)+0.1\text{Beta}\left(6\text{,}\phantom{\rule{0.12em}{0ex}}3\right)$, (4.3)

Correlated $0.5\text{Uniform}\left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)+0.5\text{Beta}\left(6\text{,}\phantom{\rule{0.12em}{0ex}}3\right)$. (4.4)
These six scenarios cover Patterns 1, 3 and 4 with no signs of GxG interactions in Figure 1. For each simulation, the sample size of pvalues varies from 20 to 500 and we performed global tests on each sample of pvalues. We repeated the process 1000 times, and calculate the percentage of rejection of null hypothesis for each test. Under the hypothesis of no GxG interaction, this rejection rate is considered as Type I error. As shown in Table 3, the Type I error rates are well controlled to be near or under the nominal rate 0.05 when pvalues are i.i.d Uniform(0,1). When pvalues are correlated Uniform with random correlation matrices, there was slight inflation in five global tests except Tippett’s test. It is good to notice that the inflation is not severe as most tests have Type I error rates under 0.07. Such mild inflation is acceptable in screening testing and we will discuss how to further address this issue in Discussion Section.
In addition to Uniform distributions, we also simulated correlated Beta mixtures in formula (4.1)(4.4) regarding pvalues shifting to 1 or peaking near the center (Patterns 3 and 4). We conservatively set correlation coefficient $\rho =0.8$ (Table 3) to simulate very strong correlation among Beta variates, which is most likely to inflate Type I errors in global tests. The simulation results for mild correlation including $\rho =0.2\text{,}\phantom{\rule{0.12em}{0ex}}\text{Beta}\left(2\text{,}\phantom{\rule{0.12em}{0ex}}5\right)\text{,}\phantom{\rule{0.12em}{0ex}}\text{Uniform}(0.1,0.9)$are summarized in Appendix 1 (Table 4). The results from Table 3 and Appendix 1 (Table 4) show that Inverse chi test and Tippett’s test are very robust to dependency in pvalues with well controlled Type I error rates. The KS test has the highest inflation in several scenarios we simulated, especially when correlated pvalues had peaks near center (Pattern 4). For strongly correlated pvalues ($\rho =0.8$ Table 3), the inverse norm, the Wilcoxon and Logit tests also had modest inflations when sample sizes get larger (n > 200). When pvalues were moderately correlated ($\rho =0.2\text{,}\phantom{\rule{0.12em}{0ex}}\text{Beta}\left(2\text{,}\phantom{\rule{0.12em}{0ex}}5\right)\text{,}\phantom{\rule{0.12em}{0ex}}\text{Uniform}(0.1,0.9)$, (Table 4)), the inverse norm, the Wilcoxon and the Logit tests had well controlled Type I errors for all tested sample sizes.
In the second simulation study, we are interested in the power of each of the approaches to detect the GxG interactions by performing the hypothesis testing (1) to detect $\mathrm{Pr}(P\le p)>p$(Pattern 2 with GxG interactions). We simulated pvalues from a wide range of beta mixture distribution where the mixing π was set to be 0.1 and 0.4, indicating different proportions of tests with significant GxG interactions. In most cases, parameters $a<b$will have $\mathrm{Pr}(P\le p)>p$for $p\in \left(0\text{,}\phantom{\rule{0.12em}{0ex}}1\right)$ which coincides with Pattern 2. Under alternative hypothesis of a proportion of tests having GxG interaction, we simulated pvalues from 6 Beta mixtures:

Correlated $0.9\text{Uniform}(0,1)+0.1\text{Beta}(0.4,6)$,

Correlated $0.6\text{Uniform}(0,1)+0.4\text{Beta}\left(0.4\text{,}\phantom{\rule{0.12em}{0ex}}6\right)$,

Correlated $0.9\text{Uniform}(0,1)+0.1\text{Beta}\left(0.5\text{,}\phantom{\rule{0.12em}{0ex}}4.5\right)$,

Correlated $0.6\text{Uniform}(0,1)+0.4\text{Beta}\left(0.5\text{,}\phantom{\rule{0.12em}{0ex}}4.5\right)$,

Correlated $0.9\text{Uniform}(0,1)+0.1\text{Beta}(1,5)$, and

Correlated $0.6\text{Uniform}(0,1)+0.4\text{Beta}(1,5)$.
As in the Type I error simulation, in the power simulation, we used sample sizes ranging from 20 to 500 and perform simulation 1000 times. The power is the percentage of results across the 1000 replicates where the null hypothesis was rejected. We present the power comparison results under week correlation $\rho =0.2$ in Table 5 because all six global tests are free of Type I errors in this scenario. Power comparisons for $\rho =0.8\text{,}\phantom{\rule{0.12em}{0ex}}\text{Beta}\left(2\text{,}\phantom{\rule{0.12em}{0ex}}5\right)\text{,}\phantom{\rule{0.12em}{0ex}}\text{Uniform}\left(0.1\phantom{\rule{0.12em}{0ex}}\text{,}\phantom{\rule{0.12em}{0ex}}0.9\right)$ are summarized in Appendix 1 (Table 6). These results indicate that Tippett’s test, which only takes the smallest pvalue into account, might not be appropriate for detecting the patterns of alternation. Among the other five global tests, Inverse chi test has the strongest power in most simulated cases. These five global tests have strong power to detect $\mathrm{Pr}(P\le p)>p$for small to moderate sample sizes.
The global tests have been implemented in R. The R code is available at http://www.childrensmercy.org/Content/view.aspx?id=22812.
Discussion and conclusions
Multifactor Dimensionality Reduction (MDR) is a novel statistical method developed to characterize and detect nonlinear complex genegene interactions (epistasis) that could be associated with disease susceptibility. We suggest incorporating global test to filtration procedures to reveal a trend of gene interactive patterns when noisy genes are removed step by step using ReliefF or other filtration techniques. The optimal number of genes for further MDR analysis can be identified by pvalues of global testing. A real data applications and empirical assessment of our proposed methods reveal strong trends in global testing of pvalues and clear patterns of distribution of pvalues in three scenarios: 1) presence of GxG interactions, 2) absence of GxG interactions, 3) weak GxG interactions that needs filtration to remove noisy genes. The proposed global tests can serve as a screening approach before individual tests to prevent false discovery. Strong power in small sample sizes and well controlled Type I error in absence of GxG interactions makes these tests highly recommended in epistasis studies.
Global testing has not been implemented in MDR analyses in the literature we have reviewed. Currently, researchers rely on adjustment of individual pvalues such as false discovery rate (FDR) as suggested by [31]. Due to high dimensionality in genetic interactions, the FDR and other multiple testing adjustments often lose power in MDR analyses. Some MDR studies [27] have utilized the false positive report probability proposed by [32] but this method has been pointed out by [33] to be heuristic and wrong in formulation. In contrast, the global tests proposed by this paper are based on rigorous statistical theories and inferences.
Through extensive simulation on correlated pvalues, our study shows that the Inverse chi test is the most powerful approach to be incorporated with the filtration techniques to determine the optimal number of SNPs for the final MDR analysis. The KS test might have high inflation of Type I errors when pvalues are highly correlated or when pvalues peak near the center of histogram (Pattern 4). The Tippett’s test has very low power when the effect size of Pattern 2 is small.
We observe mild inflation of Type I error (<0.07) when pvalues are Uniform with a random correlation matrix. Our global tests are implemented for screening SNPs and investigators can continue to use multiplicity adjustment algorithms such as FDR to adjust individual pvalues in the final MDR analysis to prevent false discoveries. As a result, slight inflation in Type I error (<0.07) is acceptable in practice. Moreover, in our case study, we show that one can utilize the decreasing trend of global test results (Figure 4) to facilitate decision making. If global tests provided false discoveries, then the trend of global tests results would randomly fluctuate up and down. Figure 4 with a decreasing trend for global testing results as well Figure 3 with histograms systematically switching to Pattern 2 can also serve as diagnostic tools to prevent false discoveries or selection bias in global tests.
It is worthwhile to point out the proposed global tests can effectively prevent false discovery without losing the power to detect significant GxG interactions. To prevent the false discovery, current MDR applications typically choose one optimal model for each kway interaction. This method has two major drawbacks: firstly, the false positive discovery is not reduced by choosing one optimal model; secondly, choosing one optimal models may overlook other potential GxG interactions that also contributes to the disease susceptibility.
The major contribution of our manuscript is to incorporate global testing procedures to MDR framework. Our proposed global tests will provide pvalues to help practitioners determine the appropriate number of SNPs to be remained in the final analysis. The current filtration process does not provide pvalues. Therefore, using arbitrary cutoff value in the current process might lead to overfiltering or underfiltering of SNPs.
All 6 global tests are based on statistical inference instead of permutation. These 6 tests run very fast in a single computer. The major computational challenges are in the generation of pvalues for MDR through permutation tests but this is not the major focus of our work. Several works have been devoted to improve the efficiency and shorten the computing time in MDR analysis in highthroughput data. We will defer interested readers to the corresponding citations for computing issues in highthroughput MDR analysis. These computational limitations make our strategy appropriate in large scale candidate gene studies, but may be limited in application to genomewide association studies until further improvements in computing speed are realized or very largescale computing resources are available.
MDR permutation computing time is largely dependent on the dimension of data sets. In other words, the computing time increases as the number of SNPs and/or the number of subjects increases. Interestingly, the dimension of data does little impact on the computing time of global tests. The computing time for global tests of 1000 pvalues is very close to tests of 10 pvalues. Several filtration approaches have been proposed and some (ReliefF, SURF and TuRF etc.) have been implemented in the MDR software (http://www.epistasis.org). In this work, we utilize ReliefF for filtration. There have been other filtration techniques proposed in literature. For instance, [34] introduced entropybased information gain to search and evaluate interactions among risk factors. The current MDR software [8] provides ReliefF, entropy, chisquare test, etc. about 10 filtration methods. The global tests could be integrated in the workflow with other filtration techniques, although the comparison and evaluation of all filtration technique requires more research attention.
Appendix 1
Simulation of correlated pvalues
We generate correlated Beta variables using the method proposed by [30]. According to Bayesian theory, random variables from a Beta prior and a BetaBinomial conjugate function will yield correlated random deviates whose marginal distribution is also Beta. Firstly, randomly generate a variable K from $K~\text{Beta}\text{Binomial}\left(v\text{,}\phantom{\rule{0.12em}{0ex}}\alpha \text{,}\phantom{\rule{0.12em}{0ex}}\beta \right)$ where α and β are the shape parameters. Conditioning on $K=k$, generate P deviates from $\text{Beta}(\alpha +k\text{,}\phantom{\rule{0.12em}{0ex}}v+\beta 1)$. By integrating on K, the P deviates have unconditional marginal distribution as $\text{Beta}\left(\alpha \text{,}\phantom{\rule{0.12em}{0ex}}\beta \right)$ and the correlation coefficient among P deviates is $\rho =v/(v+\alpha +\beta )$. In this paper, we simulated different correlation coefficient with constant $\rho =0.2\text{,}\phantom{\rule{0.12em}{0ex}}08$ or $\rho $as a random variable from $\rho ~\text{Beta}\left(2\text{,}\phantom{\rule{0.12em}{0ex}}5\right)$ and $\rho ~\text{Uniform}\left(0.1\text{,}\phantom{\rule{0.12em}{0ex}}0.9\right)$ respectively.
In the above method, $\rho =v/(v+\alpha +\beta )$ can be written as $v=(\alpha +\beta )\rho /(1\rho )$ but algorithm to generate BetaBinomial with noninteger v is not widely available. As a result, we use an alternative method to generate correlated uniform distributions . In essence, correlated uniform variables, U, with a random correlation matrix Σ can be generated by transforming multivariate normal variables X using formula $U=F\left(X\right)$ where F is the CDF of the standard normal distribution. First, we generated a positive definite covariance matrix, Σ_{0} with randomly selected eigenvalues and randomly generated orthogonal matrix as eigenvectors(R clusterGeneration package). Let σ_{ij} be the component in Σ_{0}. We can convert the covariance matrix, Σ_{0} to correlation matrix Σ with components ${r}_{\mathit{ij}}=\frac{{\sigma}_{\mathit{ij}}}{\sqrt{{\sigma}_{\mathit{ii}}{\sigma}_{\mathit{jj}}}}$. To ensure the correlation is invariant to transformation, we need to adjust correlation matrix Σ into ${\sum}_{}^{\mathit{adj}}=2sin(\pi \sum /6)$. Perform Choleski factorization to generate $C={\left({\sum}^{\mathit{adj}}\right)}^{\frac{1}{2}}$. Generate a vector of i.i.d. standard normal variables, X _{0}. Let $X={X}_{0}\ast C$ and $U=F\left(X\right)$. As a result, the variables U are correlated uniform variables with correlation matrix Σ.
References
 1.
Moore JH, Williams SM: Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays. 2005, 27 (6): 637646. 10.1002/bies.20236.
 2.
Motsinger AA, Ritchie MD, Reif DM: Novel methods for detecting epistasis in pharmacogenomics studies. Pharmacogenomics. 2007, 8 (9): 12291241. 10.2217/14622416.8.9.1229.
 3.
Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH: Multifactordimensionality reduction reveals highorder interactions among estrogenmetabolism genes in sporadic breast cancer. Am J Hum Genet. 2001, 69 (1): 138147. 10.1086/321276.
 4.
Moore JH: Detecting, characterizing, and interpreting nonlinear genegene interactions using multifactor dimensionality reduction. Adv Genet. 2010, 72: 101116.
 5.
Greene CS, Penrod NM, Kiralis J, Moore JH: Spatially uniform relieff (SURF) for computationallyefficient filtering of genegene interactions. BioData Min. 2009, 2 (1): 510.1186/1756038125.
 6.
Moore JH, White BC: Tuning relieff for genomewide genetic analysis. Lecture Notes in Computer Science. 2007, 4447: 166175. 10.1007/9783540717836_16.
 7.
RobnikSikonja, Igor K: Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning Journal. 2003, 53: 2369. 10.1023/A:1025667309714.
 8.
Hahn LW, Ritchie MD, Moore JH: Multifactor dimensionality reduction software for detecting genegene and geneenvironment interactions. Bioinformatics. 2003, 19 (3): 376382. 10.1093/bioinformatics/btf869.
 9.
Winham SJ, MotsingerReif AA: An R package implementation of multifactor dimensionality reduction. BioData Min. 2011, 4 (1): 2410.1186/17560381424.
 10.
Oki N, MotsingerReif A: Multifactor dimensionality reduction as a filter based approach for genome wide association studies. Frontiers in Genetics. 2011, 2: 80
 11.
Birnbaum ZW, Tingey FH: Onesided confidence contours for probability distribution functions. The Annals of Mathematical Statistics. 1951, 22 (4): 592596. 10.1214/aoms/1177729550.
 12.
Fisher RA: Statistical methods for research workers. 1932, Oliver & Boyd, London
 13.
Mudholkar GS, George EO: The logit statistic for combining probabilities  an overview. Optimizing Methods in Statistics. Edited by: Rustagi JS. 1979, 345365.
 14.
Myles H, Wolfe DA: Nonparametric statistical methods. 1999, Wiley, New York, 2
 15.
Tippett L: The methods of statistics. 1931, Williams & Norgate, London
 16.
Wilkinson B: A statistical consideration in psychological research. Psychological Bulletin. 1951, 48: 156158.
 17.
Sakoda JM, Cohen BH, Beall G: Test of significance for a series of statistical tests. Psychological Bulletin. 1954, 51 (2): 172175.
 18.
Helmick CG, Felson DT, Lawrence RC, Gabriel S, Hirsch R, Kwoh CK, Liang MH, Kremers HM, Mayes MD, Merkel PA: Estimates of the prevalence of arthritis and other rheumatic conditions in the United States. Part I. Arthritis Rheum. 2008, 58 (1): 1525. 10.1002/art.23177.
 19.
Becker ML, Rose CD, Cron RQ, Sherry DD, Bilker WB, Lautenbach E: Effectiveness and toxicity of methotrexate in juvenile idiopathic arthritis: comparison of 2 initial dosing regimens. J Rheumatol. 2010, 37 (4): 870875. 10.3899/jrheum.090826.
 20.
Chabner BA, Allegra CJ, Curt GA, Clendeninn NJ, Baram J, Koizumi S, Drake JC, Jolivet J: Polyglutamation of methotrexate. Is methotrexate a prodrug?. J Clin Invest. 1985, 76 (3): 907912. 10.1172/JCI112088.
 21.
Stamp LK, Barclay ML, O'Donnell JL, Zhang M, Drake J, Frampton C, Chapman PT: Effects of changing from oral to subcutaneous methotrexate on red blood cell methotrexate polyglutamate concentrations and disease activity in patients with rheumatoid arthritis. J Rheumatol. 2011, 38 (12): 25402547. 10.3899/jrheum.110481.
 22.
Dolezalova P, Krijt J, Chladek J, Nemcova D, Hoza J: Adenosine and methotrexate polyglutamate concentrations in patients with juvenile arthritis. Rheumatology (Oxford). 2005, 44 (1): 7479. 10.1093/rheumatology/keh401.
 23.
Allegra CJ, Chabner BA, Drake JC, Lutz R, Rodbard D, Jolivet J: Enhanced inhibition of thymidylate synthase by methotrexate polyglutamates. J Biol Chem. 1985, 260 (17): 97209726.
 24.
Baggott JE, Vaughn WH, Hudson BB: Inhibition of 5aminoimidazole4carboxamide ribotide transformylase, adenosine deaminase and 5'adenylate deaminase by polyglutamates of methotrexate and oxidized folates and by 5aminoimidazole4carboxamide riboside and ribotide. Biochem J. 1986, 236 (1): 193200.
 25.
Cronstein BN, Naime D, Ostad E: The antiinflammatory mechanism of methotrexate. Increased adenosine release at inflamed sites diminishes leukocyte accumulation in an in vivo model of inflammation. J Clin Invest. 1993, 92 (6): 26752682. 10.1172/JCI116884.
 26.
Hinks A, Moncrieffe H, Martin P, Ursu S, Lal S, Kassoumeri L, Weiler T, Glass DN, Thompson SD, Wedderburn LR: Association of the 5aminoimidazole4carboxamide ribonucleotide transformylase gene with response to methotrexate in juvenile idiopathic arthritis. Ann Rheum Dis. 2011, 70 (8): 13951400. 10.1136/ard.2010.146191.
 27.
Dervieux T, Wessels JA, van der Straaten T, Penrod N, Moore JH, Guchelaar HJ, Kremer JM: Genegene interactions in folate and adenosine biosynthesis pathways affect methotrexate efficacy and tolerability in rheumatoid arthritis. Pharmacogenet Genomics. 2009, 19 (12): 935944. 10.1097/FPC.0b013e32833315d1.
 28.
Dervieux T, Wessels JA, Kremer JM, Padyukov L, Seddighzadeh M, Saevarsdottir S, van Vollenhoven RF, Klareskog L, Huizinga TW, Guchelaar HJ: Patterns of interaction between genetic and nongenetic attributes and methotrexate efficacy in rheumatoid arthritis. Pharmacogenet Genomics. 2012, 22 (1): 19. 10.1097/FPC.0b013e32834d3e0b.
 29.
Hotelling H, Pabst MR: Rank correlation and tests of significance involving no assumption of normality. Annals of Mathematical Statistics. 1936, 7: 2943. 10.1214/aoms/1177732543.
 30.
Minhajuddin ATM, Harris IR, Schucany WR: Simulating multivariate distributions with specific correlations. Journal of Statistical Computation and Simulation. 2004, 74 (8): 599607. 10.1080/00949650310001626161.
 31.
Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B. 1995, 57: 289833.
 32.
Wacholder S, Chanock S, GarciaClosas M, El Ghormli L, Rothman N: Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst. 2004, 96 (6): 434442. 10.1093/jnci/djh075.
 33.
Lucke JF: A critique of the falsepositive report probability. Genetic epidemiology. 2009, 33 (2): 145150. 10.1002/gepi.20363.
 34.
Moore JH, Gilbert JC, Tsai CT, Chiang FT, Holden T, Barney N, White BC: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol. 2006, 241 (2): 252261. 10.1016/j.jtbi.2005.11.036.
 35.
Poonkuzhali B, Lamba J, Storm S, Sparreboom A, Thummel K, Watkins P, Schuetz E: Association of breast cancer resistance protein/ABCG2 phenotypes and novel promoter and intron 1 single nucleotide polymorphisms. Drug metab Dispos. 2008, 36 (4): 780795. 10.1124/dmd.107.018366.
Acknowledgements
This work is supported for collaboration between HD and AMR by Bursary Award of the 1^{st} Short Course on Statistical Genetics and Genomics from University of Alabama at Birmingham from the National Institute of Health R25GM093044 (PI: Tiwari). We are grateful to two reviewers whose comments have helped us improve the manuscript.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
There are no competing interests to this work.
Authors’ contributions
HD and MB conceived of the study. AMR aided in study design and MDR method. HD and AMR performed the simulations and data analysis. MB, RG, and SL performed the clinical data collection, genotyping and interpretation of case study findings. All authors contributed to the manuscript writing. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Dai, H., Bhandary, M., Becker, M. et al. Global tests of Pvalues for multifactor dimensionality reduction models in selection of optimal number of target genes. BioData Mining 5, 3 (2012). https://doi.org/10.1186/1756038153
Received:
Accepted:
Published:
Keywords
 Pvalue
 Global tests
 ReliefF
 Multifactor dimensionality reduction