An adaptive permutation approach for genome-wide association study: evaluation and recommendations for use
© Che et al.; licensee BioMed Central Ltd. 2014
Received: 24 September 2013
Accepted: 2 June 2014
Published: 14 June 2014
Permutation testing is a robust and popular approach for significance testing in genomic research, which has the broad advantage of estimating significance non-parametrically, thereby safe guarding against inflated type I error rates. However, the computational efficiency remains a challenging issue that limits its wide application, particularly in genome-wide association studies (GWAS). Because of this, adaptive permutation strategies can be employed to make permutation approaches feasible. While these approaches have been used in practice, there is little research into the statistical properties of these approaches, and little guidance into the proper application of such a strategy for accurate p-value estimation at the GWAS level.
In this work, we advocate an adaptive permutation procedure that is statistically valid as well as computationally feasible in GWAS. We perform extensive simulation experiments to evaluate the robustness of the approach to violations of modeling assumptions and compare the power of the adaptive approach versus standard approaches. We also evaluate the parameter choices in implementing the adaptive permutation approach to provide guidance on proper implementation in real studies. Additionally, we provide an example of the application of adaptive permutation testing on real data.
The results provide sufficient evidence that the adaptive test is robust to violations of modeling assumptions. In addition, even when modeling assumptions are correct, the power achieved by adaptive permutation is identical to the parametric approach over a range of significance thresholds and effect sizes under the alternative. A framework for proper implementation of the adaptive procedure is also generated.
While the adaptive permutation approach presented here is not novel, the current study provides evidence of the validity of the approach, and importantly provides guidance on the proper implementation of such a strategy. Additionally, tools are made available to aid investigators in implementing these approaches.
Permutation testing is a popular nonparametric (distribution-free) randomization procedure, which provides a robust and powerful method of testing statistical hypothesis. In the classical form of the permutation test, the response is shuffled b times, and the test statistics are recorded for each permuted data set. These test statistics are then used to generate the distribution under the null hypothesis of no true association. If the observed test statistic is the R th largest among all the test statistics, then a p-value is declared as R/b, and the null hypothesis is rejected at this significance level [1–3].
A permutation test is commonly used when the standard distributional assumptions are violated . For instance, quantitative trait loci (QTL) analysis typically assumes that quantitative traits are normally distributed within each genotype . If the model is correctly specified, a simple one-way ANOVA test is often powerful in this situation. In more complicated models, large sample asymptotic approximations to test statistics can often be derived. Nevertheless, in practice, it is often difficult to determine the appropriate asymptotic distribution or the sample size may not be large enough for asymptotic assumptions to hold, limiting the generality of this approach. In these situations, the permutation test has become an emerging attractive choice, in particular when the distribution is doubtful. Such issues are becoming increasingly important as the field considers uncommon and even rare variants, with the introduction of the new exome chips with fixed sample sizes, as rare/less common variant frequencies can impact distributional assumptions like small sample sizes. As imputation tools have improved (from a methods development point of view, and with an increased number of reference genomes available), it is likely that many genome-wide association studies performed on early generation technologies will be reevaluated with imputed genotypes that cover more common variants, and less common or even rare variants.
The rapid explosion of computing capabilities has greatly facilitated the use of permutation tests , particularly in genomic studies . The recent explosion in genome-wide association studies (GWAS) has presented several statistical challenges, as well as unprecedented computational burdens . Most of the current GWAS methods were developed for analyzing each genetic marker, often a single nucleotide polymorphism (SNP), individually. However, in most GWAS applications, analyzing a large number of markers is required. In these situations, it is important to control the experiment-wise error rate (EWER) . Using Bonferroni adjustment with the estimated number of informative markers in the genome, a common threshold for establishing significance in GWAS is generally considered to be p < 5e - 8 or p < 1e - 8, for α e = 0.05 and 0.01, respectively . Due to the discreet nature of permutation p-values, at least 2e 7 and 1e 8 permutation samples would be required for every SNP just to achieve the correct type I error rate. Even more samples would be needed to estimate p-values accurately at this level. This computational burden limits the application of standard permutation testing in GWAS.
Much success has been achieved with efficient computational algorithms to make each individual permutation test faster [8–10]. Although all of these studies yield tremendous improvements in computing times, they still have difficulty estimating p-values less than 5e - 8 accurately in reasonable time, and are limited to case–control data. These studies all rely on improvements or approximations to standard permutation testing, where a predetermined number of test statistics are computed for each locus, regardless of locus significance. In order to accurately estimate genome-wide significance, this number is necessarily large. Many resources wasted on markers having low associations. In a large-scale GWAS, it is assumed that the vast majority of SNPs are non-causal/non-associated. Identifying these SNPs early in permutation testing could reduce the total number of permutation samples. Computational burden could then be reserved for a small subset of SNPs with high associations to phenotype. These ideas were captured in a sampling scheme originally described by Besag and Clifford .
In this study, we propose an adaptive permutation strategy based on this existing schema , and generalize it to genomic studies. Furthermore, we establish a concordance between the standard and adaptive permutation approaches. We also make recommendations for the appropriate number of permutation samples to use, in order to achieve a desired level of accuracy for both permutation procedures using distributional theory. We design a wide range of simulation scenarios and demonstrate the validity of adaptive permutation. It achieves good power, and controls the type I error as well as experiment-wise error rate very well. In addition, the adaptive permutation is much faster than the standard permutation and provides good estimates of p-values, and thus it is computationally feasible for GWAS. Finally, we briefly demonstrate this approach in a real data analysis in a GWAS in pharmacogenomics.
Relationship between standard and adaptive permutation
Standard permutation: binomial distribution
and the estimate of is
Adaptive permutation: censored negative binomial distribution
These two permutation tests are similar, except they differ in their sampling distributions. The standard permutation fixes the total number of permutations b, and calculates the number of successes, R, estimating p as R/b, where the numerator R is a random variable. The adaptive permutation fixes the total number of successes, r, estimating p using Eq. 1, where the denominator B is a random variable if R = r, or the numerator R is a random variable otherwise. To ensure the permutation stops in a finite number, the adaptive permutation stops if b permutations have been conducted, but the number of successes is still less than r.
Choice of parameters for both permutation approaches
For example, if c =0.1, and α =0.05, then we would choose permutation sampling parameters that guarantee the standard error in estimating p-values at 0.05 be less than cα = 0.005.
Establishing the precision defined in Eq. 2 for statistically significant associations, and solving affords b = (1 - α)/c 2 α. Since b is quite large for small α, the central limit theorem provides that is very close to being normally distributed. This implies that when b is chosen as (1 - α)/c 2 α, we have that is approximately a 68% confidence interval (CI). This CI implies .
Similarly, choice of the censoring point, b, and the total number of successes, r, determines the precision in estimating the p-value for adaptive permutation testing. One way to ensure the same precision for the adaptive approach as the traditional approach is choosing r such that , which is guaranteed when . The smallest value of r that affords both of these probabilities can be calculated exactly using the qnbinom() function in R. These values for r (which are a function of α and c) were used in all simulations. The censoring point, b, was chosen to equal the total number of tests (also b) in the traditional approach. In this way, the adaptive permutation approach uses a censored negative binomial distribution. However, since the truncation occurs at b which is extremely large for small α, it approximates a (non-censored) negative binomial distribution quite well.
Point-wise error rate (PWER) is type I error rate for an individual test or the probability of incorrectly rejecting null hypothesis. Experiment-wise error rate (EWER), also called family-wise error rate (FWER), is the probability of making at least one type I error when performing a large number of related tests. Keeping EWER (α e ) at a nominal significance level, the adjusted PWER was denoted as α p . Since we assume all the tests are independent, it is appropriate to apply the Bonferroni correction approach, leading to α p = α e /m, where m is the number of tests.
Adaptive permutation recommendations
The adaptive permutation algorithm
The adaptive permutation algorithm proceeds as following:
Step 1: Determine the EWER (α e ) and the number of independent tests (m). Apply the Bonferroni correction method to calculate the adjusted PWER as α p = α e /m.
Step 2: Decide the precision level c. Choose the maximum number of permutations as b and the cut-off value, r, in order to achieve c (see Table 1).
If , reject the null hypothesis at the nominal significance level of α e .
Step 4: Repeat step 3 until to complete all SNPs, that is, i = m.
where β reflected the effect size of SNP and ∈ was the random error. Also, m is the total number of SNPs in the data set, and n is the sample size. No linkage disequilibrium (LD) was considered in the current study.
Simulation design 1: comparison of ANOVA, standard and adaptive permutation type I error rates
Simulation was used to compare ANOVA to the standard and adaptive permutation procedures. Each SNP was generated assuming HWE with a MAF of 0.1. The quantitative phenotype Y was calculated using Eq. 3, where the error term ∈ was generated using the normal distribution with mean 0 and standard deviation 1, or the Student's t-distribution with 5 degrees of freedom. For the null distribution, the effect size β is 0. The primary objective of this simulation was to test the type I error rates of ANOVA and both permutation testing methods under correct and incorrect modeling assumptions. 10,000 datasets were simulated and for each replicate, 200 samples were generated. Using a precision (c) of 0.1, b was set to 9,900, and r was set to 120 (Table 1).
Simulation design 2: comparison of ANOVA and adaptive permutation under GWAS settings
Type I error rates
One million replicates were simulated to resemble one million SNPs. Each replicate was generated independently. To obtain the Bonferroni adjustment p-value (α p = 5e ‒ 8) and a c precision level of 0.1, the number of permutations is 1,999,999,900 and the cut-off value for adaptive permutation is 121. Data sets were generated under the null model, with a sample size of 200, SNP MAF = 0.1, and with error terms distributed as Student's t with 5 degrees of freedom. The sheer computational burden did not allow for comparison with standard permutation testing in this simulation.
To demonstrate the performance of ANOVA and adaptive permutation in terms of type I error rate and experiment-wise error rate (EWER), data sets of either 1,000 or 10,000 SNPs were generated. Each scenario was replicated 100 times, under the null with error terms following a Student’s t-distribution with 5 degrees of freedom. The means and standard deviations of the type I error rate across 100 replicates were calculated. Using a Bonferroni correction, with a desired EWER of 0.05, significance thresholds were set to 5e – 5 and 5e – 6 for 1,000 and 10,000 SNPs, respectively. EWER was calculated as the proportion of tests (across the 100 replicates) that have at least one false positive for any SNP. A sample size of 200 and MAF of 0.1 were used in this simulation.
β represents the effect sizes for disease associated SNP(s). Varying effect sizes were used to ensure that the expected power, using ANOVA, was constant near 0.3, for various α levels. Based on the recommendations from Table 1, and a 0.1 precision level, 900, 1,900, 9,900 and 99,900 permutations were chosen for the α levels of 0.1, 0.05, 0.01 and 0.001, respectively. A single SNP with MAF of 0.1 was considered, and 5,000 data sets were replicated with a sample size of 500 for each. The power was defined as the proportion of times the null hypothesis was (correctly) rejected. Because ANOVA did not have the correct type I error rate when error terms were distributed using the Student's t-distribution, only the normal distribution was used for power comparisons.
Simulation design 3: comparison of standard and adaptive permutation computation times
Permutation time under the null model
giving a model r-squared of 0.968. In this way, computation times for 1,000, 10,000, 100,000 and 1,000,000 SNPs could be estimated using extrapolation.
Relationship between time and effect size
Because of the nature of the adaptive permutation design, estimation of p-values for markers having a strong association with response is expected to take longer. To illustrate how the strength of association influences computation times, data were simulated using Eq. 3 with various effect sizes β ranging from 0 (null) to 1.8. For each effect size, 1,000 replicates were simulated using a MAF of 0.1 and a sample size of 200. The number of permutations, b, was set to 9,900. For standard permutation, the computation time was expected to be only dependent on the number of SNPs (m) and the number of permutations (b). For adaptive permutation, it was expected that the time would increase as the effect size of SNP increases.
The computation time required to evaluate 10,000 SNPs using b = 99,900 permutations was also estimated, by varying effect sizes of SNPs. For the adaptive approach, this was found by averaging the actual computation times across 100 replicates, while computation time for the standard permutation approach was predicted using Eq. 4.
All the simulation experiments were performed in R (http://www.r-project.org) using the High Performance Computing (HPC) cluster resource (hpc.ncsu.edu) at North Carolina State University. The HPC is a IBM Blade Center Linux Cluster, 1053 dual Xeon compute nodes with Intel Xeon Processors (mix of single-, dual-, quad-, and six- core), with 2-4GB per core distributed memory. The data analyses were performed using SAS 9.3 (http://www.sas.com). Plots were generated using R and JMP 10 (http://www.jmp.com).
Real data analysis: application of adaptive permutation testing to published GWAS
In , 29 chemotherapeutic pharmaceutical agents were studied in concentration response cytotoxic assays across 520 cell lines with ~2 million genetic markers. The original analysis was performed with a MANCOVA approach using the Multivariate Ancova Genome-Wide Association Software (MAGWAS) package. We have extended the MAGWAS package using our adaptive permutation procedure. The following parameter choices were used: precision c = .2 and EWER (α e ) = 5e-6. The entire set of SNP data was split by chromosome and separate processes were carried across all 22 chromosomes in parallel on a computing cluster with Xeon processors.
Simulation result 1: comparison of ANOVA, standard and adaptive permutation type I error rates
Additional file 1: Figure S1 demonstrates that three approaches are similar when the model is correctly specified, while ANOVA has much smaller -values than permutation tests when the model is misspecified. Furthermore, it shows that the standard and adaptive permutation approaches provide similar p-values regardless if the model is specified correctly.
Simulation result 2: comparison of ANOVA and adaptive permutation under GWAS settings
Type I error rates
Means of type I error rate (SD)
EWERs of ANOVA are 0.71 for 1,000 SNPs and 1.0 for 10,000 SNPs, while that of adaptive permutation are 0.05 and 0.06 respectively. It is clear that under the null model using the Student's t-distribution, ANOVA tests are too liberal and give very high EWERs. Adaptive permutation performs very well in controlling the EWER.
Simulation result 3: comparison of standard and adaptive permutation computation times
Permutation time under the null model
Adaptive empirical (SD)
0.042 (0.026) hours
0.55 (0.29) hours
6.0 (2.0) hours
1.9 (0.079) days*
Causal/associated SNPs add significantly to the computational burden in the adaptive approach. However, for a more realistic scenario with 1 million non-associated SNPs and 5 extremely strongly associated SNPs, if all the five causal SNPs require maximal permutation times, the worst computation time would be around 17 days (on a single processor).
Relationship between time and effect size
An additional study was conducted in a scenario of 10,000 SNPs and 99,900 permutations by varying effect sizes from weak to strong, with 100 replications for each. Additional file 1: Table S1 shows a similar pattern. For adaptive permutation, the computation time increases when the effect sizes of SNPs increase.
Real data analysis
The adaptive permutation procedure on the drug, carboplatin, was compared with the previously published MANCOVA results. A pairwise comparison of negative log-transformed p-values across all SNPs (Additional file 1: Figure S3) shows the differences between the uncorrected and corrected association results. Since association analyses rely on large sample asymptotic theory, the previously published results reported associations solely on SNPs with at least 20 individuals for each genotype (black points), any SNPs in violation of this assumption (green or blue points) were previously filtered out. This filtering step was necessary to remove potentially spurious associations in the original analysis, and resulted in the removal of 38.4% of the SNPs from the original analysis. However, the adaptive permutation procedure was able to provide meaningful statistical tests for these SNPs, given the capacity to test for potential associations on rare or uncommon variants.
Overall, the top association (rs1982901, p < 10-6) from the original analysis remains significant after permutation testing. In the adaptive permutation analysis, an additional SNP crossed the significance threshold (rs6594545, p < 10-6). While not necessarily expected, this result does not contradict the conclusions of the original study, since the additional SNP is located physically close to the original top association. Thus, both associations are likely pointing to the same downstream biological mechanism.
We presented an adaptive permutation procedure for testing associations in high-dimensional studies having a large number of tests. This adaptive permutation advocates an intuitive idea that we may terminate the permutation at an early stage if there is little or even no evidence towards alternative based on our stopping rule, while perform exhaustive permutations for highly associated variables. This is a similar approach to that implemented in the popular software package PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/), but other approaches do not provide guidance on the parameter choices to use when implementing such an adaptive approach.
By applying distributional theory, we provided some guidance for researchers about how to choose the number of permutations (b) and cut-off value (r), based on a series of factors, including the number of independent tests (m), the experiment-wise error rate (α e ) and a desirable precision level (c). This guidance will be important for the proper application of our adaptive implementation, or others. The results tables provide parameter guidance for standard needs, and the available R code aids investigators in choosing parameters based on specifics of their own study.
Additionally, we show that the adaptive procedure maintains the correct type I and experiment-wise error rates, even when modeling assumptions are incorrect. However, there is essentially no loss in power when compared to the parametric approach. In addition, this adaptive procedure is computationally efficient enough to be applied to a large-scale GWAS. Indeed, this strategy has been successfully employed in a previous GWAS that involved mapping 520 individuals jointly across six responses, with several covariates, across over two million genetic markers. Albeit the study in question was not explicitly designed for locating rare variants, it is an important distinction that the adaptive permutation procedure gave meaningful results across all SNPs regardless of genotyping frequencies and without the need to filter SNPs based on any violation of asymptotic assumptions. All 22 chromosomes were run in parallel with the longest run time at 23.33 hrs. This is a remarkable example of how efficient this algorithm is, considering little effort was made to improve the computational efficiency of each individual permutation test. Also, parallel computations vastly improve the computational feasibility of the adaptive permutation procedure. Since each association test can theoretically be computed independently, the problem is embarrassingly parallel.
In the follow-up studies, it would be promising and attractive to incorporate some challenging issues, such as LD and multiple testing, in a more realistic simulation study. We suggest a very simple way to combine two component algorithms: the simple M method developed by Gao et.al  and our adaptive permutation algorithm. The simple M method was based on the Bonferroni correction, but it could account for composite LD correlation and derive an effective number of SNPs. By implementing this M eff _ G , it would be easy to replace the actual number of SNPs m to by the effective number of SNPs M eff _ G in our algorithm step 1. Correspondingly, the adjusted PWER is α p = α e /M eff _ G , leading to a less conservative threshold of point-wise significance and in turn a relatively small number of permutation times. It is believed that it may be highly computational efficient, while effectively accounting for the complex LD.
In summary, the adaptive permutation procedure provides an easy, simple, fast and accurate test, and it is comparable to the existing permutation methods. While in conjunction with other multiple correction method, it is completely feasible to apply this adaptive method in a high-dimensional data, which we have shown through application to a published GWAS.
This work was funded by NCI 1R01CA161608.
- Besag J, Clifford P: Sequential Monte-Carlo p-values. Biometrika. 1991, 78 (2): 301-304. 10.1093/biomet/78.2.301.View ArticleGoogle Scholar
- Boos DD, Zhang J: Monte Carlo evaluation of resampling-based hypothesis tests. J Am Stat Assoc. 2000, 95 (450): 486-492. 10.1080/01621459.2000.10474226.View ArticleGoogle Scholar
- Phipson B, Smyth GK: Permutation p-values should never be zero: calculating exact p-values when permutations are randomly drawn. Stat Appl Genet Mol Biol. 2010, 9: Article39-PubMedGoogle Scholar
- Churchill GA, Doerge RW: Empirical threshold values for quantitative trait mapping. Genetics. 1994, 138 (3): 963-971.PubMedPubMed CentralGoogle Scholar
- Cantor RM, Lange K, Sinsheimer JS: Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am J Hum Genet. 2010, 86 (1): 6-22. 10.1016/j.ajhg.2009.11.017.View ArticlePubMedPubMed CentralGoogle Scholar
- Gao X, Starmer J, Martin ER: A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol. 2008, 32 (4): 361-369. 10.1002/gepi.20310.View ArticlePubMedGoogle Scholar
- Johnson RC, Nelson GW, Troyer JL, Lautenberger JA, Kessing BD, Winkler CA, O'Brien SJ: Accounting for multiple comparisons in a genome-wide association study (GWAS). BMC Genomics. 2010, 11: 724-10.1186/1471-2164-11-724.View ArticlePubMedPubMed CentralGoogle Scholar
- Browning BL: PRESTO: rapid calculation of order statistic distributions and multiple-testing adjusted P-values via permutation for one and two-stage genetic association studies. BMC Bioinformatics. 2008, 9: 309-10.1186/1471-2105-9-309.View ArticlePubMedPubMed CentralGoogle Scholar
- Kimmel G, Shamir R: A fast method for computing high-significance disease association in large population-based studies. Am J Hum Genet. 2006, 79 (3): 481-492. 10.1086/507317.View ArticlePubMedPubMed CentralGoogle Scholar
- Pahl R, Schafer H: PERMORY: an LD-exploiting permutation test algorithm for powerful genome-wide association testing. Bioinformatics. 2010, 26 (17): 2093-2100. 10.1093/bioinformatics/btq399.View ArticlePubMedGoogle Scholar
- Lunetta KL: Genetic association studies. Circulation. 2008, 118 (1): 96-101. 10.1161/CIRCULATIONAHA.107.700401.View ArticlePubMedGoogle Scholar
- Brown CC, Havener TM, Medina MW, Jack JR, Krauss RM, McLeod HL, Motsinger-Reif AA: Genome-wide association and pharmacological profiling of 29 anticancer agents using lymphoblastoid cell lines. Pharmacogenomics. 2014, 15 (2): 137-146. 10.2217/pgs.13.213.View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.