Visually integrating and exploring high throughput Phenome-Wide Association Study (PheWAS) results using PheWAS-View
© Pendergrass et al.; licensee BioMed Central Ltd. 2012
Received: 5 March 2012
Accepted: 8 June 2012
Published: 8 June 2012
Phenome-Wide Association Studies (PheWAS) can be used to investigate the association between single nucleotide polymorphisms (SNPs) and a wide spectrum of phenotypes. This is a complementary approach to Genome Wide Association studies (GWAS) that calculate the association between hundreds of thousands of SNPs and one or a limited range of phenotypes. The extensive exploration of the association between phenotypic structure and genotypic variation through PheWAS produces a set of complex and comprehensive results. Integral to fully inspecting, analysing, and interpreting PheWAS results is visualization of the data.
We have developed the software PheWAS-View for visually integrating PheWAS results, including information about the SNPs, relevant genes, phenotypes, and the interrelationships between phenotypes, that exist in PheWAS. As a result both the fine grain detail as well as the larger trends that exist within PheWAS results can be elucidated.
PheWAS can be used to discover novel relationships between SNPs, phenotypes, and networks of interrelated phenotypes; identify pleiotropy; provide novel mechanistic insights; and foster hypothesis generation – and these results can be both explored and presented with PheWAS-View. PheWAS-View is freely available for non-commercial research institutions, for full details see http://ritchielab.psu.edu/ritchielab/software.
KeywordsPheWAS Phenome-Wide Association Study Visualization
In Phenome-Wide Association Studies (PheWAS), the association between single nucleotide polymorphisms (SNPs) and an extensive range of phenotypic measurements are calculated in a high throughput, unbiased manner. The phenotypic data used in PheWAS can come from a variety of sources. One possible source is epidemiologic health surveys linked to genotypic data that include measurements of intermediate traits or biomarkers such as blood cell counts and blood pressure measurements, as well as information on case/control status for multiple clinical conditions and risk factors such as presence/absence of diabetes or hypertension. One such example is the Population Architecture Using Genomics (PAGE) network, which is a National Human Genome Research Institute (NHGRI)-supported network of four study sites and a coordinating center accessing eight extensively characterized studies for PheWAS studies in diverse populations [1, 2]. These survey-based PheWAS efforts are complimentary to on-going PheWAS efforts using electronic medical records linked to biorepositories such as those in the electronic Medical Records & Genomics (eMERGE) network [3, 4].
The exploration of data in a PheWAS effort presents several challenges, including the need for data visualization to assist with interpretation of the data. GWA studies of a single or limited number of traits lend themselves to Manhattan plots where p-values for every test of association are plotted by chromosomal location (x-axis) and the level of significance is visualized easily (y-axis). Such a plot does not present the complex relationships that exist between both genotypes and phenotypes in PheWAS. Therefore, to visualize the complex results of PheWAS, we have developed PheWAS-View, software that can be used to create visual summaries of the SNP, gene, phenotype, and association information resulting from these studies. Using specialized tools such PheWAS-View to investigate results on a larger summary level as well as the individual result level is key for interpretation, analysis, and sharing of PheWAS results. While this tool was developed specifically for PheWAS, it could be used in other high throughput bioinformatics data where thousands of association results are being explored.
Listed are all the PheWAS-View plotting options, parameters, and flags (format: -flag name ) for creating PheWAS-View plots using phewas_view.rb
Standard PheWAS Plot
Show PheWAS-View version
-e phewas file
PheWAS-View formatted file for input
-o output name
Optional output name for the resultant plot
Main title for the plot (enclose in quotes)
-f image type
Image format for output (png default). Other options depend on ImageMagick installation.
Low resolution image (72 dpi)
Rotate final image 90 degrees
-p p-value threshold
p-value threshold, values less significant will be plotted in grey
-m, --maxp maximum p-value
Maximum p-value to plot. Values less significant than the specified cut off are not plotted
-R, --redline p-value
Draw a red line at the designated p-value
Include direction of effect on plot
Include sample size plot
-l ancestry map file
Optional ancestry map file
-c phenotype class names
Only results matching this phenotype class name are plotted
Display detailed information for best score at each phenotype
-x phenotype/SNP file
PheWAS expected SNP/phenotype file
List of race/ethnicities to include (AA, EA, MA)
-s SNP ID
SNP ID to display from input file
-L phenotype list file
Optional phenotype list for inclusion
No background lines drawn on plot
-C phenotype correlation file
Optional file with phenotype correlations
Sun Plot Settings
Produce sun plot
-s SNP ID
SNP ID to display in center of sun plot
-P phenotype name
Phenotype to display in center of sun plot
-g gene name
Gene to display in center of sun plot
Include gene name along with SNP, when SNP is selected for sun plot
Include ancestry as description of result for sun plot
-m, --maxp p-value
Choose a p-value threshold, p-values less significant will not be plotted
For plotted results, any results more significant will be plotted in red
To apply direction of effect for phenotypes in sun plot, - is negative direction, + is positive direction
A single input file is required to produce a standard PheWAS-View plot (example Additional file 1). Required columns for the standard input file include a column of unique SNP identifiers such as an rs number, a column of a unique phenotype description/identification for the tests of association that were calculated for each SNP, and a column of p-values for each test of association. By adding additional columns, additional features are possible with PheWAS-View (example Additional file 2). Table 1 lists the various parameters/flag settings available for modifying PheWAS-View plots (format: -flag name) at the command line.
Results and discussion
For plots in vertical format, using the parameter –B will plot the SNP identifier, gene symbol, as well as direction of the genetic effect (positive (+) or negative (−)) for the most significant p-value for each phenotype (Figure 4). To plot effect size, these data must be provided in a column “ES”, and gene symbol must be provided in a column “Gene” (using example file Additional file 2).
PheWAS-View recognizes the population or genetic ancestry abbreviations listed below that can be used as “group” identifiers in the input file
Asian Pacific Islander
The PheWAS approach provides a way to explore pleiotropy and the interrelationships between phenotypes, and as well as generate new hypotheses about the genetic architecture of complex traits. Visualizing complex PheWAS results with the various possible plots available within PheWAS-View provides a way to explore the data in a visual way, facilitating data analysis and interpretation. This software could be also be used for other phenotypically rich association studies such as expression quantitative trait loci (eQTL) studies, studies that have high numbers of phenotypes due to the expression of multiple genes coupled with genotypic data.
Availability and requirements
Project name: PheWAS-View
Project home page: http://ritchielab.psu.edu/ritchielab/software
Operating systems(s): Linux, Mac OS X, Windows
Programming language: Ruby
Other requirements: RMagick
License: GNU General Public License
Any restrictions to use by non-academics: The use of PheWAS-View is restricted to academic and non-profit users
- Pendergrass SA, Brown-Gentry K, Dudek SM, Torstenson ES, Ambite JL, Avery CL, Buyske S, Cai C, Fesinmeyer MD, Haiman C: The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genet Epidemiol. 2011, 35: 410-422. 10.1002/gepi.20589.PubMed CentralView ArticlePubMed
- Matise TC, Ambite JL, Buyske S, Carlson CS, Cole SA, Crawford DC, Haiman CA, Heiss G, Kooperberg C, Marchand LL: The Next PAGE in understanding complex traits: design for the analysis of Population Architecture Using Genetics and Epidemiology (PAGE) Study. Am J Epidemiol. 2011, 174: 849-859. 10.1093/aje/kwr160.PubMed CentralView ArticlePubMed
- Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, Wang D, Masys DR, Roden DM, Crawford DC: PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010, 26: 1205-1210. 10.1093/bioinformatics/btq126.PubMed CentralView ArticlePubMed
- Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA, Bradford Y, Chai HS, Bastarache L, Zuvich R, Peissig P: Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am J Hum Genet. 2011, 89: 529-542. 10.1016/j.ajhg.2011.09.008.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.