- Short report
- Open Open Peer Review
Network-based analysis of genetic variants associated with hippocampal volume in Alzheimer’s disease: a study of ADNI cohorts
BioData Miningvolume 9, Article number: 3 (2016)
Alzheimer’s disease (AD) is a neurodegenerative disease that causes dementia. While molecular basis of AD is not fully understood, genetic factors are expected to participate in the development and progression of the disease. Our goal was to uncover novel genetic underpinnings of Alzheimer’s disease with a bioinformatics approach that accounts for tissue specificity.
We performed genome-wide association studies (GWAS) for hippocampal volume in two Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohorts. We used these GWAS in a subsequent tissue-specific network-wide association study (NetWAS), which applied nominally significant associations in the initial GWAS to identify disease relevant patterns in a functional network for the hippocampus. We compared prioritized gene lists from NetWAS and GWAS with literature curated AD-associated genes from the Online Mendelian Inheritance in Man (OMIM) database. In the ADNI-1 GWAS, where we also observed an enrichment of low p-values, NetWAS prioritized disease-gene associations in accordance with OMIM annotations. This was not observed in the ADNI-2 dataset. We provide source code to replicate these analyses as well as complete results under permissive licenses.
We performed the first analysis of hippocampal volume using NetWAS, which uses machine learning algorithms applied to tissue-specific functional interaction network to prioritize GWAS results. Our findings support the idea that tissue-specific networks may provide helpful context for understanding the etiology of common human diseases and reveal challenges that network-based approaches encounter in some datasets. Our source code and intermediate results files can facilitate the development of methods to address these challenges.
Alzheimer’s disease (AD) is a neurodegenerative disease that affects memory and behavior. Around 70 % of the risk is expected to be attributable to genetics . Including the well documented association with the gene APOE, previous studies have identified 22 susceptibility loci for late-onset Alzheimer’s disease (LOAD), of which 11 were recently reported by a large meta-analysis . Most existing genetic analyses lack the affected tissue context, which is crucial to the precise actions of genes, though some eQTL analyses have begun to address this problem. For example, analysis of eQTLs in brain samples from control individuals and those with LOAD identified variants associated with changes in gene expression in the brain .
To identify genes that play a tissue-specific role in disease, we used the recently developed network-wide association study (NetWAS) method, which takes the tissue-specific functional roles of genes into account, to identify genes associated with differences in hippocampal volume . We applied the NetWAS approach to genome-wide association study (GWAS) results from ADNI using hippocampal volume as an endophenotype. NetWAS identifies features in the hippocampus tissue-specific network that are associated with nominally significant GWAS p-values, and it uses these features to identify potentially novel AD gene associations. We evaluated the extent to which the NetWAS approach using a hippocampus network could identify AD-associated genes and compared results to the GWAS alone.
Data used in this study were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations, as a $60 million, 5-year public-private partnership. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early AD. Determination of sensitive and specific markers of very early AD progression is critical to aid researchers and clinicians to develop new treatments and monitor their effectiveness, and to reduce the time and cost of clinical trials.
The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University of California - San Francisco. ADNI is the result of efforts of many co-investigators from a broad range of academic institutions and private corporations, and subjects have been recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI was to recruit 800 subjects but ADNI has been followed by ADNI-GO and ADNI-2. To date these three protocols have recruited over 1,500 adults, ages 55 to 90, to participate in the research, consisting of cognitively normal older individuals, people with early or late MCI (EMCI or LMCI), and people with early AD. The follow up duration of each group is specified in the protocols for ADNI-1, ADNI-2 and ADNI-GO. Subjects originally recruited for ADNI-1 and ADNI-GO had the option to be followed in ADNI-2. Thousands of longitudinal imaging scans [5, 6], performance on neuropsychological and clinical assessments  and biological samples  were collected at baseline and at follow-up visits for all or a subset of participants. Genome-wide genotyping data  are available on the full ADNI sample. For up-to-date information, see www.adni-info.org.
Genotype data of all participants from both ADNI-1 and ADNI-2 were downloaded, quality controlled, imputed to the Illumina 610Quad platform and combined, following the procedure described in . Freesurfer version 5.1 was used to extract hippocampal volume and intracranial volume (ICV) measures from the 1.5 T baseline MRI scans of the ADNI-1 participants, and from the 3 T baseline MRI scans of the ADNI-2 participants, as previously described in . Siblings and participants without hippocampal volume were excluded. As a result, we included 748 subjects and 817 subjects in the ADNI-1 and ADNI-2 analyses, respectively. GWAS analyses were performed by PLINK v1.07 using a linear regression model covaried for age, gender, education and ICV.
The analytic workflow
Our analytical workflow was designed to identify genes associated with differences in hippocampal volume. Figure 1 provides an overview of our analytical strategy. To ensure reproducibility, we have used methods that are available from webservers and open source software packages to ensure accessibility of this analysis for other researchers. Because we performed a large number of NetWAS to evaluate the robustness of results, the actual analyses were performed on a computing cluster. We provide our source code and intermediate results under permissive licenses. The goal of phase I was to calculate gene-based association p-values based on SNP association p-values from the hippocampal volume GWAS. The goals of phase II were to reprioritize gene-based associations using hippocampus functional network and to assess the concordance of identified disease-gene associations with documented associations.
Phase I: gene-level analysis
We applied the versatile gene-based association study (VEGAS) webserver to calculate gene-based p-values from individual SNP p-values, described in the data section above, for associations with hippocampal volume endpoint . VEGAS assigns SNPs to individual genes using a window of 50 kb upstream and downstream from the gene sequence and generates a combined test-statistic that accounts for linkage disequilibrium (LD). The ADNI data we used were restricted to non-Hispanic Caucasian participants , so we used the HapMap CEU population for LD. This stage of analysis provided us with gene candidates associated with the hippocampal volume endpoint from the GWAS.
Phase II: tissue-specific functional network analysis
The gene-based p-values calculated in phase I were used as input for a NetWAS analysis. NetWAS reprioritizes disease-gene associations using a functional network by constructing a support vector machine classifier using nominally significant genes (p < 0.01) as positive examples and non-associated genes as negative examples. The resulting classifier, which learned patterns in the biological network associated with nominal significance, was then applied back to the network to re-rank genes .
We performed NetWAS on the DISCOVERY supercomputer at Dartmouth. To evaluate the stability of predictions and estimate the variance of multiple NetWAS runs, we performed 1000 iterations of each analysis with randomly selected cross-validation intervals. To evaluate potential underlying bias in the method towards OMIM annotations, we also performed NetWAS on 1000 different GWAS for each cohort in which the associated genes had been permuted. Source code and intermediate and final results for these analyses are provided under permissive open source licenses  (code: BSD 3-clause; data: CC0).
The ultimate goal of this analysis was to use information about the tissue-specific functions and relationships of genes to identify disease-associated genes more effectively than GWAS. We mapped OMIM  annotations onto the Disease Ontology (DO) . To assess the performance of this approach for this phenotype, we used the area under the receiver-operator characteristic curve (AUC) to compare NetWAS, permuted NetWAS and GWAS gene rankings. AUCs were calculated using annotations that were mapped to the DO term for Alzheimer’s disease (DOID:10652) as gold standard positives.
Results and discussion
Using VEGAS, in phase I of the analysis we converted SNP-level association statistics from two different ADNI studies into two sets of gene-level association statistics for the hippocampal volume phenotype. These statistical associations were reprioritized in phase II using NetWAS analyses that leveraged a functional network for the hippocampus performed across 1000 independent sets of five-fold cross validation. To generate an aggregate list, we summed the ranks for each NetWAS. Tabular complete results for each of these analyses are available in the source code repository associated with this paper (ADNI-1 VEGAS: data/vegas_adni1.xls; ADNI-2 VEGAS: data/vegas_adni2.xls; NetWAS ADNI-1 and ADNI-2: combined-results.csv).
The underlying idea of NetWAS is that multiple truly associated genes may exhibit nominal, though not genome-wide, significance, which a machine learning method could use to extract phenotype-associated network features. Because NetWAS was developed to identify network commonalities among nominally associated genes, we generated quantile-quantile plots of observed VEGAS gene-wise p-values against a uniform distribution. ADNI-1 showed slight inflation, suggesting that this GWAS might represent a good NetWAS candidate (Additional file 1: Figure S1). ADNI-2 did not show evidence of inflation (Additional file 2: Figure S2).
We compared the GWAS ranked and NetWAS reprioritized genes to genes with a documented association with Alzheimer’s disease in the OMIM database using the AUC (Fig. 2). Comparing the GWAS against the 95 % confidence intervals, we found that the NetWAS ranking more closely matched genes with a documented relationship to AD than genes identified by the initial VEGAS analysis in the ADNI-1 cohort. In the ADNI-2 cohort, this relationship was reversed but less pronounced. In both cases, the scores for the NetWAS performed using the GWAS associations were outside of and above the 95 % bootstrap confidence intervals for the mean of NetWAS with permuted gene-significance.
We noted that in the NetWAS results generated from ADNI-1 GWAS, the top 14 genes all belonged to the protocadherin alpha gene cluster (PCDHA). This unusual result had not previously been observed in a NetWAS analysis, so we investigated both network-based and association-based explanations. Examination of p-values from VEGAS revealed multiple significant PCDHA genes (PCDHA1-13, PCDHAC1, PCDHAC2). These associations could be driven by the same variants within the overall 50 kb window, as the PCDHA gene cluster includes multiple overlapping genes (Fig. 3). In the case that these genes are from the same family, as with PCDHA genes, they may also have similar network connectivity patterns, making it difficult for NetWAS to distinguish whether one or all members of the gene family should be most closely associated with the disease. The multiple associated genes by VEGAS also make it difficult for NetWAS to successfully distinguish involved members. These properties can, in certain circumstances, make it difficult to pinpoint the principally associated gene. This multiple-mapping problem has previously been reported to have impact on gene-set based analysis . Our analysis demonstrates that users should also be cognizant of this for network-based approaches. Gene-based association tests that attempt to control for SNPs with potential associations with multiple genes may be required to address this challenge.
While a 50 kb window can lead to multi-gene mapping, it can also fail to capture long-range interactions between SNPs and genes. For example, SNPs 500 kb from the gene c-MYC were found to directly interact with the promoter of c-MYC, upregulating the expression of this gene . Such relationships cannot be captured by existing gene-based association tests without creating additional multiple-mapping challenges. Gene-based association tests that account for potential long-range interactions through both haplotype structures and 3D chromosomal organization information would help to ameliorate this problem.
In summary, we have performed the first NetWAS for hippocampal volume, or any AD-related phenotype, in AD patients and controls using the ADNI-1 and ADNI-2 cohorts. In the course of this analysis, we identified the need for new gene-based association tests that aim to solve multiple-mapping challenges for SNP-gene relationships.
Ballard C, Gauthier S, Corbett A, Brayne C, Aarsland D, Jones E. Alzheimer’s disease. Lancet. 2011;377(9770):1019–31.
Lambert JC, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, Bellenguez C, et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat Genet. 2013;45(12):1452–8.
Webster JA, Gibbs JR, Clarke J, Ray M, Zhang W, Holmans P, et al. Genetic control of human brain transcript expression in Alzheimer disease. Am J Hum Genet. 2009;84(4):445–58.
Greene CS, Krishnan A, Wong AK, Ricciotti E, Zelaya RA, Himmelstein DS, et al. Understanding multicellular function and disease with human tissue-specific networks. Nat Genet. 2015 Jun;47(6):569-76. doi:10.1038/ng.3259.
Jack Jr CR, Bernstein MA, Fox NC, Thompson P, Alexander G, Harvey D, et al. The Alzheimer’s Disease Neuroimaging Initiative (ADNI): MRI methods. J Magn Reson Imaging. 2008;27(4):685–91.
Jagust WJ, Bandy D, Chen K, Foster NL, Landau SM, Mathis CA, et al. The Alzheimer’s disease neuroimaging initiative positron emission tomography core. Alzheimers Dement. 2010;6(3):221–9.
Petersen RC, Aisen PS, Beckett LA, Donohue MC, Gamst AC, Harvey DJ, et al. Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology. 2010;74(3):201–9.
Shaw LM, Vanderstichele H, Knapik-Czajka M, Clark CM, Aisen PS, Petersen RC, et al. Cerebrospinal fluid biomarker signature in Alzheimer’s disease neuroimaging initiative subjects. Ann Neurol. 2009;65(4):403–13.
Saykin AJ, Shen L, Foroud TM, Potkin SG, Swaminathan S, Kim S, et al. Alzheimer’s disease neuroimaging initiative biomarkers as quantitative phenotypes: Genetics core aims, progress, and plans. Alzheimers Dement. 2010;6(3):265–73.
Kim S, Swaminathan S, Inlow M, Risacher SL, Nho K, Shen L, et al. Influence of genetic variation on plasma protein levels in older adults using a multi-analyte panel. PLoS One. 2013;8(7):e70269.
Risacher SL, Kim S, Shen L, Nho K, Foroud T, Green RC, et al. The role of apolipoprotein E (APOE) genotype in early mild cognitive impairment (E-MCI). Front Aging Neurosci. 2013;5:11.
Liu JZ, McRae AF, Nyholt DR, Medland SE, Wray NR, Brown KM, et al. A versatile gene-based test for genome-wide association studies. Am J Hum Genet. 2010;87(1):139–45.
Greene CS, Song A, Wong AK. adni-netwas: v1.0. Zenodo. 2016. doi:10.5281/zenodo.44401.
Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33(Database issue):D514–7.
Schriml LM, Arze C, Nadendla S, Chang YW, Mazaitis M, Felix V et al. Disease Ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2012;40(Database issue):D940–6.
Sedeno-Cortes AE, Pavlidis P. Pitfalls in the application of gene-set analysis to genetics studies. Trends Genet. 2014;30(12):513–4.
Wright JB, Brown SJ, Cole MD. Upregulation of c-MYC in cis through a large chromatin loop linked to a cancer risk-associated single-nucleotide polymorphism in colorectal cancer cells. Mol Cell Biol. 2010;30(6):1411–20.
We sincerely thank Dr. Jason Moore for his insights during the conception of this work. We also gratefully acknowledge Peter Andrews, Jonathan Fisher, and Dr. Christian Darabos for their technical support.
AS was funded by the Sophomore Science Scholarship Program of Dartmouth College. CSG is supported by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grant GBMF4552. Additionally, the work with ADNI was supported by NIH R01 LM010098 and P20 GM103534 at Dartmouth and by NIH R01 LM011360, U01 AG024904, RC2 AG036535, R01 AG19771, P30 AG10133, UL1TR001108, DOD W81XWH-14-2-0151, NCAA 14132004, and NSF IIS-1117335 at IU.
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
The authors declare they have no competing interests.
AS, LS, and CSG designed the study. JY, SK, SLR, AJS, and LS analyzed ADNI data and provided association statistics. AS, AKW, and CSG performed VEGAS, NetWAS, and analyzed the results. AS, JY, LS, and CSG created figures and wrote the manuscript. SK, SLR, AKW, and AJS provided critical review of the manuscript.
Quantile-quantile plot showing the deviation of the distribution of p-values in the ADNI-1 dataset from the uniform expected under the null. Some inflation was observed, consistent with gene-phenotype associations captured by the GWAS. (PDF 73 kb)
Quantile-quantile plot showing the deviation of the distribution of p-values in the ADNI-2 dataset from the uniform expected under the null. We did not observe inflation of p-values for the ADNI-2 dataset. (PDF 77 kb)