An adaptive permutation approach for genome-wide association study: evaluation and recommendations for use

Che, Ronglin; Jack, John R; Motsinger-Reif, Alison A; Brown, Chad C

doi:10.1186/1756-0381-7-9

Research
Open access
Published: 14 June 2014

An adaptive permutation approach for genome-wide association study: evaluation and recommendations for use

Ronglin Che¹,
John R Jack¹,
Alison A Motsinger-Reif¹ &
…
Chad C Brown¹

BioData Mining volume 7, Article number: 9 (2014) Cite this article

7322 Accesses
61 Citations
6 Altmetric
Metrics details

Abstract

Background

Permutation testing is a robust and popular approach for significance testing in genomic research, which has the broad advantage of estimating significance non-parametrically, thereby safe guarding against inflated type I error rates. However, the computational efficiency remains a challenging issue that limits its wide application, particularly in genome-wide association studies (GWAS). Because of this, adaptive permutation strategies can be employed to make permutation approaches feasible. While these approaches have been used in practice, there is little research into the statistical properties of these approaches, and little guidance into the proper application of such a strategy for accurate p-value estimation at the GWAS level.

Methods

In this work, we advocate an adaptive permutation procedure that is statistically valid as well as computationally feasible in GWAS. We perform extensive simulation experiments to evaluate the robustness of the approach to violations of modeling assumptions and compare the power of the adaptive approach versus standard approaches. We also evaluate the parameter choices in implementing the adaptive permutation approach to provide guidance on proper implementation in real studies. Additionally, we provide an example of the application of adaptive permutation testing on real data.

Results

The results provide sufficient evidence that the adaptive test is robust to violations of modeling assumptions. In addition, even when modeling assumptions are correct, the power achieved by adaptive permutation is identical to the parametric approach over a range of significance thresholds and effect sizes under the alternative. A framework for proper implementation of the adaptive procedure is also generated.

Conclusions

While the adaptive permutation approach presented here is not novel, the current study provides evidence of the validity of the approach, and importantly provides guidance on the proper implementation of such a strategy. Additionally, tools are made available to aid investigators in implementing these approaches.

Peer Review reports

Background

Permutation testing is a popular nonparametric (distribution-free) randomization procedure, which provides a robust and powerful method of testing statistical hypothesis. In the classical form of the permutation test, the response is shuffled b times, and the test statistics are recorded for each permuted data set. These test statistics are then used to generate the distribution under the null hypothesis of no true association. If the observed test statistic is the R th largest among all the test statistics, then a p-value is declared as R/b, and the null hypothesis is rejected at this significance level [1–3].

A permutation test is commonly used when the standard distributional assumptions are violated [3]. For instance, quantitative trait loci (QTL) analysis typically assumes that quantitative traits are normally distributed within each genotype [4]. If the model is correctly specified, a simple one-way ANOVA test is often powerful in this situation. In more complicated models, large sample asymptotic approximations to test statistics can often be derived. Nevertheless, in practice, it is often difficult to determine the appropriate asymptotic distribution or the sample size may not be large enough for asymptotic assumptions to hold, limiting the generality of this approach. In these situations, the permutation test has become an emerging attractive choice, in particular when the distribution is doubtful. Such issues are becoming increasingly important as the field considers uncommon and even rare variants, with the introduction of the new exome chips with fixed sample sizes, as rare/less common variant frequencies can impact distributional assumptions like small sample sizes. As imputation tools have improved (from a methods development point of view, and with an increased number of reference genomes available), it is likely that many genome-wide association studies performed on early generation technologies will be reevaluated with imputed genotypes that cover more common variants, and less common or even rare variants.

The rapid explosion of computing capabilities has greatly facilitated the use of permutation tests [2], particularly in genomic studies [4]. The recent explosion in genome-wide association studies (GWAS) has presented several statistical challenges, as well as unprecedented computational burdens [5]. Most of the current GWAS methods were developed for analyzing each genetic marker, often a single nucleotide polymorphism (SNP), individually. However, in most GWAS applications, analyzing a large number of markers is required. In these situations, it is important to control the experiment-wise error rate (EWER) [6]. Using Bonferroni adjustment with the estimated number of informative markers in the genome, a common threshold for establishing significance in GWAS is generally considered to be p < 5e - 8 or p < 1e - 8, for α _e = 0.05 and 0.01, respectively [7]. Due to the discreet nature of permutation p-values, at least 2e 7 and 1e 8 permutation samples would be required for every SNP just to achieve the correct type I error rate. Even more samples would be needed to estimate p-values accurately at this level. This computational burden limits the application of standard permutation testing in GWAS.

Much success has been achieved with efficient computational algorithms to make each individual permutation test faster [8–10]. Although all of these studies yield tremendous improvements in computing times, they still have difficulty estimating p-values less than 5e - 8 accurately in reasonable time, and are limited to case–control data. These studies all rely on improvements or approximations to standard permutation testing, where a predetermined number of test statistics are computed for each locus, regardless of locus significance. In order to accurately estimate genome-wide significance, this number is necessarily large. Many resources wasted on markers having low associations. In a large-scale GWAS, it is assumed that the vast majority of SNPs are non-causal/non-associated. Identifying these SNPs early in permutation testing could reduce the total number of permutation samples. Computational burden could then be reserved for a small subset of SNPs with high associations to phenotype. These ideas were captured in a sampling scheme originally described by Besag and Clifford [1].

In this study, we propose an adaptive permutation strategy based on this existing schema [1], and generalize it to genomic studies. Furthermore, we establish a concordance between the standard and adaptive permutation approaches. We also make recommendations for the appropriate number of permutation samples to use, in order to achieve a desired level of accuracy for both permutation procedures using distributional theory. We design a wide range of simulation scenarios and demonstrate the validity of adaptive permutation. It achieves good power, and controls the type I error as well as experiment-wise error rate very well. In addition, the adaptive permutation is much faster than the standard permutation and provides good estimates of p-values, and thus it is computationally feasible for GWAS. Finally, we briefly demonstrate this approach in a real data analysis in a GWAS in pharmacogenomics.

Methods

Relationship between standard and adaptive permutation

Standard permutation: binomial distribution

Here, b permutation test statistics are calculated, with each statistic having probability p of being larger than the observed test statistic. In this case, the number of successes is distributed as binomial:

R \sim Bin (b, p),

and the estimate of is $\hat{p} = R / b .$

Adaptive permutation: censored negative binomial distribution

In the adaptive approach, permutation test statistics are sampled until either r of these statistics are larger than the observed statistic, or b total permutations are calculated with R total successes, where R < r. In this case, the total number of permutations is distributed according to the censored negative binomial distribution:

B \sim truncNB (r, p, b) .

The estimate of p is given by:

\hat{p} = \{\begin{array}{c} \frac{r}{B}, |R = r, \frac{R + 1}{b + 1}, R < r . \end{array}

(1)

These two permutation tests are similar, except they differ in their sampling distributions. The standard permutation fixes the total number of permutations b, and calculates the number of successes, R, estimating p as R/b, where the numerator R is a random variable. The adaptive permutation fixes the total number of successes, r, estimating p using Eq. 1, where the denominator B is a random variable if R = r, or the numerator R is a random variable otherwise. To ensure the permutation stops in a finite number, the adaptive permutation stops if b permutations have been conducted, but the number of successes is still less than r.

Choice of parameters for both permutation approaches

It is useful to define the desired precision of p-value estimation as c, or the fraction of the significance threshold that equals the standard error in estimation, when p = α:

c = SE (\hat{p}) / α .

(2)

For example, if c =0.1, and α =0.05, then we would choose permutation sampling parameters that guarantee the standard error in estimating p-values at 0.05 be less than cα = 0.005.

Choice for the total number of permutations determines the precision in estimating the p-value for traditional permutation testing. Here we have:

Var (\hat{p}) = α * (1 - α) / b .

Establishing the precision defined in Eq. 2 for statistically significant associations, and solving affords b = (1 - α)/c ² α. Since b is quite large for small α, the central limit theorem provides that $\hat{p}$ is very close to being normally distributed. This implies that when b is chosen as (1 - α)/c ² α, we have that $(\hat{p} - SE (\hat{p}), \hat{p} + SE (\hat{p}))$ is approximately a 68% confidence interval (CI). This CI implies $P ((1 - c) α < \hat{p} < (1 + c) α) \approx 0.68$ .

Similarly, choice of the censoring point, b, and the total number of successes, r, determines the precision in estimating the p-value for adaptive permutation testing. One way to ensure the same precision for the adaptive approach as the traditional approach is choosing r such that $P ((1 - c) α < \hat{p} < (1 + c) α) \approx 0.68$ , which is guaranteed when $P (\hat{p} < (1 - c) α) = P (\hat{p} > (1 + c) α) \leq 0.16$ . The smallest value of r that affords both of these probabilities can be calculated exactly using the qnbinom() function in R. These values for r (which are a function of α and c) were used in all simulations. The censoring point, b, was chosen to equal the total number of tests (also b) in the traditional approach. In this way, the adaptive permutation approach uses a censored negative binomial distribution. However, since the truncation occurs at b which is extremely large for small α, it approximates a (non-censored) negative binomial distribution quite well.

Point-wise error rate (PWER) is type I error rate for an individual test or the probability of incorrectly rejecting null hypothesis. Experiment-wise error rate (EWER), also called family-wise error rate (FWER), is the probability of making at least one type I error when performing a large number of related tests. Keeping EWER (α _e) at a nominal significance level, the adjusted PWER was denoted as α _p[6]. Since we assume all the tests are independent, it is appropriate to apply the Bonferroni correction approach, leading to α _p = α _e/m, where m is the number of tests.

Choice for threshold values b and r is therefore a function of the number of tests m, PWER α _p and precision level c. The corresponding R code is available from http://www4.stat.ncsu.edu/~motsinger/Lab_Website/Software.html. Table 1 shows the recommendation of threshold values b and r in a series of scenarios.

Table 1 Adaptive permutation recommendations

Full size table

The adaptive permutation algorithm

The adaptive permutation algorithm proceeds as following:

Step 1: Determine the EWER (α _e) and the number of independent tests (m). Apply the Bonferroni correction method to calculate the adjusted PWER as α _p = α _e/m.

Step 2: Decide the precision level c. Choose the maximum number of permutations as b and the cut-off value, r, in order to achieve c (see Table 1).

Step 3: Start with SNP 1 as i = 1. Calculate the test statistics for i th SNP as $u_{i}^{*}$ . For each j th permutation, calculate the test statistics u _ij and compare with $u_{i}^{*}$ . Let R _i denote the number of values u _ij exceeding $u_{i}^{*}$ (success). Stop as soon as R _i equals r, and estimate the p-value as r/B _i, where B _i is the number of permutations to achieve r number of successes. However, if this has not occurred after b permutations, we stop in any case and declare a p-value of (R _i + 1)/(b + 1). In sum, the estimate of p-value for i th SNP is:

{\hat{p}}_{l} = \{\begin{array}{c} \frac{r}{B_{i}}, |R_{i} = r, \frac{R_{i} + 1}{b + 1}, R_{i} < r . \end{array}

If ${\hat{p}}_{l} < α_{p}$ , reject the null hypothesis at the nominal significance level of α _e.

Step 4: Repeat step 3 until to complete all SNPs, that is, i = m.

Simulation design

Each single nucleotide polymorphism (SNP) was assumed to have two alleles A and a, corresponding to the three genotypes AA, Aa and aa. An additive genetic model was applied, and thus the genotype value S was coded as 0, 1 and 2, corresponding to the number of minor alleles. Let p _a denote the minor allele frequency (MAF). Each SNP was assumed to be under Hardy-Weinberg Equilibrium (HWE) [11]. The quantitative phenotype Y was generated according to

Y = β S + \in,

(3)

where β reflected the effect size of SNP and ∈ was the random error. Also, m is the total number of SNPs in the data set, and n is the sample size. No linkage disequilibrium (LD) was considered in the current study.

Simulation design 1: comparison of ANOVA, standard and adaptive permutation type I error rates

Simulation was used to compare ANOVA to the standard and adaptive permutation procedures. Each SNP was generated assuming HWE with a MAF of 0.1. The quantitative phenotype Y was calculated using Eq. 3, where the error term ∈ was generated using the normal distribution with mean 0 and standard deviation 1, or the Student's t-distribution with 5 degrees of freedom. For the null distribution, the effect size β is 0. The primary objective of this simulation was to test the type I error rates of ANOVA and both permutation testing methods under correct and incorrect modeling assumptions. 10,000 datasets were simulated and for each replicate, 200 samples were generated. Using a precision (c) of 0.1, b was set to 9,900, and r was set to 120 (Table 1).

Simulation design 2: comparison of ANOVA and adaptive permutation under GWAS settings

Type I error rates

One million replicates were simulated to resemble one million SNPs. Each replicate was generated independently. To obtain the Bonferroni adjustment p-value (α _p = 5e ‒ 8) and a c precision level of 0.1, the number of permutations is 1,999,999,900 and the cut-off value for adaptive permutation is 121. Data sets were generated under the null model, with a sample size of 200, SNP MAF = 0.1, and with error terms distributed as Student's t with 5 degrees of freedom. The sheer computational burden did not allow for comparison with standard permutation testing in this simulation.

EWER

To demonstrate the performance of ANOVA and adaptive permutation in terms of type I error rate and experiment-wise error rate (EWER), data sets of either 1,000 or 10,000 SNPs were generated. Each scenario was replicated 100 times, under the null with error terms following a Student’s t-distribution with 5 degrees of freedom. The means and standard deviations of the type I error rate across 100 replicates were calculated. Using a Bonferroni correction, with a desired EWER of 0.05, significance thresholds were set to 5e – 5 and 5e – 6 for 1,000 and 10,000 SNPs, respectively. EWER was calculated as the proportion of tests (across the 100 replicates) that have at least one false positive for any SNP. A sample size of 200 and MAF of 0.1 were used in this simulation.

Power

β represents the effect sizes for disease associated SNP(s). Varying effect sizes were used to ensure that the expected power, using ANOVA, was constant near 0.3, for various α levels. Based on the recommendations from Table 1, and a 0.1 precision level, 900, 1,900, 9,900 and 99,900 permutations were chosen for the α levels of 0.1, 0.05, 0.01 and 0.001, respectively. A single SNP with MAF of 0.1 was considered, and 5,000 data sets were replicated with a sample size of 500 for each. The power was defined as the proportion of times the null hypothesis was (correctly) rejected. Because ANOVA did not have the correct type I error rate when error terms were distributed using the Student's t-distribution, only the normal distribution was used for power comparisons.

Simulation design 3: comparison of standard and adaptive permutation computation times

Permutation time under the null model

In this simulation, the computation times for the two permutation methods were compared. For the adaptive permutation, 1,000, 10,000 and 100,000 SNPs were simulated with 100 replicates for each scenario, while 1,000,000 SNPs were simulated with 3 replicates because of the burden of computation time. All α values were set according to Bonferroni correction level. The elapsed time was recorded using the proc.time() function in R, and the means and standard deviations of empirical elapsed time were calculated among all replicates. For the standard permutation, it was computationally infeasible to obtain the empirical time for even 1000 SNPs. However, using linear regression with a smaller number of SNPs, computation time was found to be almost exactly proportional to the number of calculated test statistics, which was the number of SNPs (m) multiplied by the number of permutations (b),

time = γmb,

(4)

giving a model r-squared of 0.968. In this way, computation times for 1,000, 10,000, 100,000 and 1,000,000 SNPs could be estimated using extrapolation.

Relationship between time and effect size

Because of the nature of the adaptive permutation design, estimation of p-values for markers having a strong association with response is expected to take longer. To illustrate how the strength of association influences computation times, data were simulated using Eq. 3 with various effect sizes β ranging from 0 (null) to 1.8. For each effect size, 1,000 replicates were simulated using a MAF of 0.1 and a sample size of 200. The number of permutations, b, was set to 9,900. For standard permutation, the computation time was expected to be only dependent on the number of SNPs (m) and the number of permutations (b). For adaptive permutation, it was expected that the time would increase as the effect size of SNP increases.

The computation time required to evaluate 10,000 SNPs using b = 99,900 permutations was also estimated, by varying effect sizes of SNPs. For the adaptive approach, this was found by averaging the actual computation times across 100 replicates, while computation time for the standard permutation approach was predicted using Eq. 4.

All the simulation experiments were performed in R (http://www.r-project.org) using the High Performance Computing (HPC) cluster resource (hpc.ncsu.edu) at North Carolina State University. The HPC is a IBM Blade Center Linux Cluster, 1053 dual Xeon compute nodes with Intel Xeon Processors (mix of single-, dual-, quad-, and six- core), with 2-4GB per core distributed memory. The data analyses were performed using SAS 9.3 (http://www.sas.com). Plots were generated using R and JMP 10 (http://www.jmp.com).

Real data analysis: application of adaptive permutation testing to published GWAS

In [12], 29 chemotherapeutic pharmaceutical agents were studied in concentration response cytotoxic assays across 520 cell lines with ~2 million genetic markers. The original analysis was performed with a MANCOVA approach using the Multivariate Ancova Genome-Wide Association Software (MAGWAS) package. We have extended the MAGWAS package using our adaptive permutation procedure. The following parameter choices were used: precision c = .2 and EWER (α _e) = 5e-6. The entire set of SNP data was split by chromosome and separate processes were carried across all 22 chromosomes in parallel on a computing cluster with Xeon processors.

Results

Simulation result 1: comparison of ANOVA, standard and adaptive permutation type I error rates

Figure 1 shows the quartile-quartile (QQ) plots of the p-values from the ANOVA, standard and adaptive permutation tests under the normal and Student's t-distribution. It is expected all the tests should lie in a diagonal line under the null distribution. The upper three panels demonstrate all the three approaches have the correct type I error rate under correct modeling assumptions (normality). The bottom three panels show that ANOVA has an inflated type I error under the Student's t-distribution, especially for p-values smaller than 0.01, while both permutation tests are still valid.

Additional file 1: Figure S1 demonstrates that three approaches are similar when the model is correctly specified, while ANOVA has much smaller -values than permutation tests when the model is misspecified. Furthermore, it shows that the standard and adaptive permutation approaches provide similar p-values regardless if the model is specified correctly.