Skip to main content

The goldmine of GWAS summary statistics: a systematic review of methods and tools

Abstract

Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.

Peer Review reports

Background

Genome-wide association studies (GWAS) enable the simultaneous testing of thousands of genetic variants, usually SNPs, across the genome in order to find variants associated with a trait or a disease [1]. The GWAS methodology, so far, has generated many robust associations for various traits and diseases and has revolutionized our understanding of the genetic architecture of complex traits. With increasing sample sizes, new sequencing technologies and the accumulation of large biobanks it is expected that our ability to investigate the effects of human genetic variation in complex traits will increase in the near future [2]. In the first years of the development of the field, efforts were oriented towards the statistical aspects of the analysis [3], which involved thousands of SNPs simultaneously, including the methodology for multiple testing and quality control. This task was successful and enabled the discovery of associations replicated in subsequent studies, and in several cases, validated experimentally and functionally using a wide variety of methods [4]. However, it was soon clear that most variants discovered via GWAS have small overall effects on disease susceptibility [5]. Thus, it became evident that integrating data from multiple sources and developing reliable bioinformatics tools was a necessary step in order to address the complexity of the underlying genetic basis of common human diseases [5].

Soon after the publication of the first GWAS it also became evident that, at least theoretically, individuals could be identified in such cohorts even if only the summary statistics are available [6]. This led to imposing strict control access for sharing individual patients’ data (IPD) from GWAS. Subsequent works found that privacy attacks are possible in theory but unsuccessful and unconvincing in real practice. For instance, even sharing 1,000 SNPs for datasets with more than 500 individuals generally leads to a low power of the “attack” [7]. A more thorough investigation is given in [8]. In practice, however, not all studies share their data, at least when it comes to the studies published in the first decade of GWAS. It has been estimated that the proportion is only 13%, which increased from 3% in 2010 to 23% in 2017 [9]. On the contrary, researchers sharing their summary data has been shown to receive on average 81.8% more citations, an effect that probably is related, at least partially, to the usability of the data in downstream analyses [10]. Summary statistics do not only offer the additional protection of privacy, but also offer significant advantages in computational cost when using the data in downstream analyses, which does not scale with the number of participants in the study [11]. Thus, it is of no surprise that during the last years a large variety of methods have been developed to perform a so-called post-GWAS analysis using the summary results of a single study, or of several studies, and in most cases integrating data from other sources [11]. The majority of these methods use the summary data in the form of per-allele SNP effect sizes (log odds ratios or betas) along with their standard errors, or equivalently the z-scores (per-allele effect sizes divided by their standard errors). These methods seek to go a step further from the simple analysis, or re-analysis of a study, and aim to improve our understanding about the functional role of the identified variants [12]. The most important factors that played significant role in the development of such methods, in this so-called post-GWAS era, is the linkage disequilibrium (LD) information from a population reference panel such as HapMap or 1000 Genomes Project, the gene expression variation in the form of eQTL, and the integration of functional information on biological pathways [13,14,15].

The methods developed so far cover a broad range of different types of analysis, either in the study of a single trait or in the combined analysis of multiple traits. For a single trait, we may have methods for meta-analysis [16, 17], methods for inferring heritability [18, 19], gene-based tests [20], methods for Gene Set (or Pathway) Analysis [21], or methods for fine-mapping causal variants [22]. Regarding the analysis of multiple traits there is also a variety of methods [23], ranging from those that estimate the genetic correlation between traits [24], the joint analysis of multiple traits [25], or the methods that try to estimate causality between traits such as Mendelian Randomization [26], transcriptome-wide association studies [27], or colocalization [28]. Of course, the data standards [29] used to facilitate these analyses and the databases that the results are stored in, are also of great importance for the community.

In order to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics we performed a systematic review following the PRISMA guidelines [30]. We conducted a comprehensive search of the literature to identify relevant software tools and databases. We categorized the tools and databases by their functionality, in categories related to data, single-trait analysis, and multiple-trait analysis, along with their sub-categories mentioned in the previous paragraph. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a wide range of software tools and databases for GWAS summary statistics analysis, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of using GWAS summary statistics.

The systematic review

In order to collect all the available published papers, we performed a systematic review of the literature following the PRISMA guidelines [30]. The search was performed in PubMed (https://pubmed.ncbi.nlm.nih.gov) with the following query: ("Summary Statistics" OR "Summary Data" OR "Summary Association Statistics" OR "Summary Association Data") AND (GWAS OR genomewide OR genome-wide). The abstracts initially, and then the full articles were scrutinized in order to collect the necessary information. The inclusion criteria state that methods, software tools and databases, suitable for the analysis of GWAS summary data are suitable for inclusion. Methods papers that do not report software, or software pages not currently available are excluded. Additional searches were performed in the reference lists of the identified articles in order to identify additional studies that were missing. In many cases multiple articles regarding a single tool were found, so we kept only one. We decided to include reports deposited in preprint servers like medRxiv/bioRxiv, but some of these papers were eventually published in peer-review journals, so in such cases we retained only the latter reference. Tools regarding Polygenic Risk Scores (PRSs) and visualization were excluded. For all included tools we recorded the URL, the PMID, and the main functionality/es along with comments regarding its main methodological features. The initial search identified 2942 articles (22/12/2023).

In total we identified 305 tools and databases (Fig. 1). We classified them in three broad categories: data, tools for single traits and tools for multiple traits, along with the various sub-categories. The total breakdown is given in Table 1. Several tools may perform different tasks and thus they can be considered for more than one category; so, we classified them to the one most closely related to the primary goal of the analysis they claim to perform. Other tools do not fit exactly to the general description of the category, but we nevertheless classified them to the most similar one. The largest sub-category consists of the tools for pleiotropy analysis, whereas the smallest one is related to reconstruction of genotypes and effect sizes. Most tools are written in R (56.4%) with the largest proportion being in the multiple traits category, followed by Python (12.5%) and C/C +  + (8.2%) (Fig. 2). Apart from the publicly available databases only a handful of tools are offered as webservers (6.95%). Most of the tools were published after 2015 (Fig. 3). Nearly 60% of the tools and databases were published in: Bioinformatics, American Journal of Human Genetics, Nature Genetics, Nature Communications, Nucleic Acids Research and PloS Genetics (Fig. 4). In the following sections we proceed with the detailed description of the various tools identified, classified in the different categories and sub-categories. The complete list of identified tools along with the relevant information is given in Supplementary Table 1.

Fig. 1
figure 1

PRISMA Flow diagram for systematic review

Table 1 The broad categories and the sub-categories of tools and databases
Fig. 2
figure 2

Number of Tools and Databases included in the review Published Per Year

Fig. 3
figure 3

The programming languages used in the various categories of identified tools

Fig. 4
figure 4

The journals in which the studies including in the review were published

The data

Firstly, we are going to present the tools dedicated to the data themselves. We include here tools for quality control of GWA summary statistics, tools for imputation and genotype reconstruction as well as the publicly available databases of summary results.

Standards and quality control

The need for sharing and re-using GWAS summary statistics has been an issue for the community during the last years. Generally, it is acceptable that the minimum information (“mandatory”) contained in GWAS summary statistics should include: the chromosome and the base-pair location, the p-value of the association, the risk allele and the other allele, the risk allele frequency, and an estimate of the effect size (odds ratio or beta) along with its standard error [29]. Other important summary statistics that nevertheless termed as “encouraged” ones include the sample size, the variant ID, the rsID, the confidence interval of the effect size and so on. Such specifications were considered for the GWAS-SSF format [31], which was developed to meet the requirements settled by the community. GWAS-SSF consists of a tab-separated data file with well-defined fields and an accompanying metadata file. Most repositories and programs use some variant of the GWAS-SSF. However, such tabular formats in several cases lead to ambiguity or incomplete storage of information, or other times lack essential metadata. This leads to poor performance and increased risk of possible errors in downstream analyses. To address these issues, an adaptation of the well-known variant call format [32] was developed, capable of storing GWAS summary statistics which was called GWAS-VCF along with software tools to apply it in downstream analyses [33]. The VCF contains a file header with metadata and a main file containing variant-level (one locus per row with one or more alternative alleles/variants) and sample-level (one sample per column) information. This way, the VCF was adapted to include GWAS-specific metadata utilizing the sample column to store variant-trait association data. The GWAS-VCF is the standard used by the MRC-IEU OpenGWAS database [34] and it comes with appropriate tools to map GWAS summary statistics to VCF with on-the-fly harmonization (https://github.com/mrcieu/gwas2vcf).

Despite these efforts, not all available data are in line with the standards, especially when dealing with data from older studies. Thus, there is a need for additional tools to harmonize the data, as well as to identify and correct errors. Tools belonging to the former class were developed early and were focused mainly on harmonizing data in preparation of a meta-analysis. These include QCGWAS [35], GWAtoolbox [36] and EasyQC [37]. GEAR [38] is very interesting in that it incorporates ideas from population genetics which allow verification of the genetic origin and geographic location of each cohort and identifying significant sample overlap. More recent tools like MungeSumstats [39] and GWASlab [40] perform standardization and quality control handling the most common formats, SumStatsRehab [41] can be used for data validation, restoration of missing data, correction of errors or formatting, and GWASinspector [42] provides extensive QC reports and perform harmonization being compatible with recent reference panels and by handling insertion/deletion and multi-allelic variants. The latter class of methods, additionally, leverages information from the LD among SNPs. One such tool is GQS [43] which identifies suspicious regions and prevents erroneous interpretations by comparing the significance of the association for each SNP to its LD value for the reported index SNP. Similar functionalities are offered by DENTIST [44] which uses LD to detect and eliminate errors and disagreements between GWAS data and the LD reference panel. EXTminus23andMe [45] evaluates the quality of summary statistics after data removal and the suitability of the down sampled summary statistics for typical follow-up genetic analyses.

Databases

The publicly available biological databases played and continue to play a central role in bioinformatics and in biological research in general [46,47,48]. The same is the case for databases related to human research [49] and in particular those involved in GWAS [50]. The databases we identified can be roughly divided in two categories: databases that contain summary statistics from GWAS and databases that contain important secondary analyses on those data with some of the methods that we will describe in later sections.

Regarding the databases of the first category, NCBI’s dbGAP [51] was developed to contain the results of studies investigating the interaction of genotype and phenotype, which include GWAS. One of the dbGAP’s primary objectives was to house individual level GWAS data, but the database also contains summary data as well. Summary statistics are generally available to the public, whereas access to IPD requires varying levels of authorization. The NHGRI-EBI GWAS Catalog [52], which was established in 2008 is considered for years the central repository of GWAS summary statistics. It is a high-quality curated collection of all published GWAS and as of 2023–12-20, contains 6,680 publications, 566,798 top associations and 66,825 full summary statistics (Fig. 5). The database played an important role in the community efforts leading to the development of GWAS-SSF format. GWAScentral [53] previously known as the Human Genome Variation (HGV) database of Genotype-to-Phenotype information is a database that contains over 72.5 million P-values for over 5,000 studies, with over 7.4 million unique genetic markers involved in more than 1,700 unique phenotypes. The database contains data from several sources (including NHGRI-EBI GWAS Catalog, OpenGWAS, Japanese GWASdb, dbGaP, WTCCC and so on). The IEU MRC OpenGWAS [34] is a new addition and contains 346 million genetic associations from 50,037 GWAS summary datasets. It contains complete data from various consortia and the UK Biobank and comes with a lot of tools for harmonizing the data and storing them in the GWAS-VCF format. At the time of writing there are 4,126 binary traits, 725 metabolites, 3,371 proteins, 3,143 brain imaging phenotypes, and 3,217 other continuous phenotypes. In addition to the complete GWAS summary data, it also contains independent top hits for every dataset, totaling 116,918 independent signals in which 7,109 datasets have at least one hit. GeneATLAS [54] and GBE [55] contain associations from the UK Biobank cohort. GeneATLAS currently contains data for 452,264 individuals, 778 traits and 30 million variants, whereas GBE contains summary statistics from over 750,000 individuals combining data from the UK Biobank, the Million Veterans Program and the Biobank Japan. GTEx [56] and QTLbase [57] are the primary resources for xQTL data. The GTEx project has been expanded over time, and currently contains data of genetic associations for gene expression and splicing in 838 individuals in 49 tissues. QTLbase, similarly, contains genome-wide QTL summary statistics for many molecular traits across 95 tissue/cell types and multiple conditions. Contains tens of millions of significant genotype-molecular trait associations under different conditions. Other resources of this category, related to various large consortia (GIANT, WTCC, PGC etc.) as well as other biobanks (FinnGen etc.) can be found in Supplementary Table 2.

Fig. 5
figure 5

A snapshot of the data. A A view of the Type 2 Diabetes Mellitus studies deposited in NHGRI-EBI GWAS Catalog. B Type 2 Diabetes Mellitus studies contained in GWAS Central, depicting the significant hits in the chromosomes. C The SFF format

The second category contains databases of important secondary analyses performed on GWAS summary statistics with some of the methods that we describe in detail in later sections, such as gene-based tests, heritability analysis, TWAS, colocalization and so on. TSEA-DB [58] and PCGA [59] use information from gene-expression in various tissues to perform tissue or cell-type enrichment analysis of the GWAS association statistics. webTWAS [60] and COLOCdb [61] also use information on eQTL but in different fashion. webTWAS currently contains data for over 1,389 full GWAS for which it calculates the causal genes using single tissue expression imputation (using MetaXcan and FUSION), or cross-tissue expression imputation (using UTMOST). COLOCdb on the other hand is the most comprehensive colocalization analysis by integrating publicly available GWASs with different types of xQTL and different algorithms (COLOC, SMR). GWAS ATLAS [62] contains results of 4,756 GWAS from 473 unique studies across 3,302 unique traits accompanied by useful information obtained from downstream analysis. Each study is accompanied by MAGMA results (see also “gene-based tests”), SNP heritability estimation and genetic correlations with other traits in the database. GWASROCS [63], on the other hand, contains a large and comprehensive set of SNP-derived AUROCs and heritabilities. Currently includes 579 simulated populations (corresponding to 219 traits) and SNP data (odds ratio, risk allele frequency, and p-values) for 2,886 unique SNPs. Phenome-wide association studies (PheWAS) invert the idea of a GWAS by searching for phenotypes associated with specific variants across the range of thousands of human phenotypes, or the “phenome [64,65,66]. Thus, it is expected that a PheWAS will need large databases of GWAS results. PhenoScanner [67] is the most complete such database with publicly available results from over 65 billion associations and more than 150 million unique genetic variants. Similar functionalities are offered also by OpenGWAS, GWAS ATLAS and PheWAS Catalog [68]. Lastly, we need to mention LD Hub [69], a centralized database of publicly available GWAS results for 173 diseases/traits which offers a web interface that automates the LD score regression (LDSC) analysis pipeline (see also “Genetic correlation”).

Imputation and genotype reconstruction

Although some of the methods for quality control mentioned previously can correct errors and alter the data, the methods used for imputation go one step further. As expected, imputation methods were developed initially for individual data for handling studies genotyped with different platforms [70,71,72]. Such methods can infer missing genotypes using LD information from reference samples genotyped using denser arrays or sequencing. Genotype imputation increases the coverage of SNPs and thus can be used to increase statistical power, increase the accuracy of fine-mapping and harmonize the data in order to facilitate meta-analysis [70]. Several factors can influence the imputation accuracy: the sample size, the suitability of the reference panel for the particular sample, the genotyping chip and the allele frequency [71]. In general, however, these methods are time-consuming since they process individuals one at a time, and thus methods that impute directly the summary statistics were developed. These methods utilize only the information provided in the sample regarding the studied population (p-value, z-score or odds-ratio/beta) and require additional information regarding the LD structure. Nearly all methods perform a kind of multiple regression assuming the multivariate normal distribution for the test statistics and utilizing the theoretical result pointing that the correlation of such test statistics equals the correlation of the corresponding variables [73], that is the genotype correlation, available through the reference panel. Such methods include FAPI [74], ImpG [75], RAISS [76], DIST [77] and SSimp [78] with most of the differences lying in the choice of the reference panel and the exact details of the mathematical methods used to handle matrix inversions in the multivariate normal. DISSCO [79] uses a similar framework but allows for covariates. Such methods may perform poorly in cases where the sample has a different LD structure compared to the reference panel. Thus, extensions such as DISTMIX [80] and ARDISS [81] were developed to handle mixed ethnicity cohorts, improving the imputation performance. Adapt-Mix [82] estimates the correlation structure in both admixed and non-admixed individuals using simulated and real data and allows the use of this matrix with other imputation methods. Other methods such LS-meta [83] and LSimputing [84] offer additional advantages; LS-meta imputes both genetic and environmental components using information from additional omics-trait association summary data, whereas LSimputing implements a non-parametric method that allows for nonlinear SNP-trait associations and predictions in case a sample of IPD is available. Using the same principles, simGWAS [84] allows simulation of whole GWAS summary data, without generating individual data as an intermediate step.

Genotype reconstruction methods take a different approach. Given the summary statistics for a SNP (either directly measured or imputed), one can reconstruct the genotype counts that produced it. This will offer many advantages, since with the reconstructed genotypes the researchers could perform additional analyses using other statistical methods suitable for grouped data and test different hypotheses [85]. For instance, one can calculate grouped Polygenic Risk Scores (PRS) [85], perform logistic regression for grouped data [85, 86], perform multivariate meta-analysis [87], or implement robust tests for association that is expected to work better when the underlying model of inheritance deviates from the additive which is usually assumed [88, 89]. The details and the success of the reconstruction depend heavily on available summary statistics. As one can easily understand, p-values and z-scores cannot be used, and one must rely on available effect sizes such as the odds ratio (OR). When the OR, the standard error and the sample size is given, methods are available in epidemiology that allow the reconstruction of the allelic 2X2 table [90]. If z-scores, confidence intervals or p-values are available one can use them to obtain the standard error. React [85] uses an equivalent method relying on solving a system of nonlinear equations. If the allele frequency in one group (usually the controls) is also known, the allelic counts may easily be obtained with a simple calculation. In all cases the accuracy of the reconstruction may depend on the precision of the available summary statistics. After the allelic 2X2 table is reconstructed, it is straightforward to obtain the genotype counts, assuming HWE (which as one might expect adds another source of potential bias). MetaSustract [91] is a tool that recreates analytically the results of the validation cohort from meta-analysis summary statistics, allowing the researchers to compute meta-analysis summary statistics that are independent of the validation cohort, without requiring access to the IPD. Spkmt [92] works in similar fashion but in families; it can be used to derive the summary statistics of one parent from the data of the offspring and the other parent. Finally, we need to mention two tools that work in somewhat different modes. OATH [93] is used to reproduce reported results from a GWAS and recover underreported results from other alternative models with a different combination of nuisance parameters, whereas LMOR [94] performs transformations from the genetic effects estimated under the Linear Mixed Model to the Odds Ratio that only rely on summary statistics.

Analysis of a single trait

In this section we are going to present the various types of methods and tools dedicated to the analysis of a single trait. These include tools for meta-analysis, tools for the estimation of heritability, tools for implementing gene-based tests, gene set methods and fine mapping methods.

Meta-analysis

One of the most obvious uses of GWAS summary data is to combine them and perform a meta-analysis. Meta-analysis is the statistical procedure used to combine evidence from multiple studies in order to increase statistical power and it is a methodology widely used in medical research for decades [95]. A meta-analysis can be performed with various methods [16] using IPD or summary data; the former offers many advantages, but the latter is far more easy to be performed taking into account the various restrictions imposed on sharing GWAS IPD and the difficulties in the logistics of such a project [17]. Moreover, given the large samples usually encountered in GWAS it has been shown, both theoretically and empirically, that meta-analysis using summary statistics has the same efficiency as the joint analysis of IPD [96]. A compromise between these two extremes arises when a research group has access to individual-level genotype data of a limited sample size and wants to integrate these with existing summary data available in the databases. Such methods are in use in epidemiology for years [97] and several tools have been developed especially for handling GWAS data, for instance IGESS [98], metaGIM [99] and LEP [100]. PolyGIM [101] can be applied with or without IPD and uses polytomous logistic regression to investigate disease subtype heterogeneity in situations when only summary data is available.

Regarding summary-data meta-analysis of GWAS, the most commonly used methods includes standard methods, such as combining p-values, z-statistics or effects sizes like Odds Ratio (for binary traits) or mean differences (for continuous traits) using fixed or random effects models [16, 102]. These statistical methods are straightforward to implement, and are available in general purpose statistical packages such as STATA and R. However, there are several specialized tools that facilitate the process and provide integration with useful bioinformatics or visualization functions. Such widely used tools include METAL [103], GWAMA [104] and PLINK [105]. Other tools are oriented to more specialized cases offering advanced options. For instance, YAMAS performs meta-analysis including missing SNPs identified with LD without performing imputation [106] and rareMETALS [107] uses a partial correlation based score to perform meta-analysis in the presence of large amounts of missing values. There is also a class of tools which focus on the replication of GWAS and the combined analysis of data from primary and replication studies. Such tools include rfdr [108] and Jlfdr [109] which control for False Discovery Rate (FDR), Rrate [110], which determines the sample size of the replication study and checks the consistency between the primary and the replication study, and MAJAR [111] which jointly test prognostic and predictive effects in meta-analysis without the need of using an independent cohort. metaGAP [112] is an online tool for calculating the statistical power of a meta-analysis of GWAS (Fig. 6). METACARPA works with overlapping or related samples, even when details of the overlap or relatedness are unknown [113], MAGENTA [114] performs meta-analysis with gene set enrichment analysis (GSEA), whereas GWASmeta [115] and MetABF [116] work in a bayesian framework calculating the Approximate Bayes Factor (ABF). Other tools offer more advanced options such as meta-analysis with multiple traits (see also “multiple traits”), like nGWAMA [117], metaCCA [118], CPASSOC [119], metaUSAT [120] and CPBayes [114] (and its extension GCPBayes [121]), and others are designed for meta-analysis under different genetic models, like GWAR [89] which uses robust methods (like MIN2 or MAX) in order to handle the uncertainty in the underlying genetic model, or like the simulation tool [122] which implements an alternate strategy for the additive genetic model simulating data for the individual studies. Finally, we need to mention sPLINK [123] which performs privacy-aware GWAS on distributed datasets, and XPEB [124] which is an empirical Bayes approach designed to improve the power GWAS in minority populations by exploiting information from GWASs performed in populations of different origin.

Fig. 6
figure 6

Tools for meta-analysis. A GWASmeta (SMetABF) for performing Bayesian meta-analysis. B The MetaGAP power calculator. C GWAR for robust analysis and meta-analysis of GWAS

Inferring heritability

Heritability is generally defined as the fraction of phenotypic variation explained by genetic variation. Heritability is a dimensionless parameter of the population, and it was introduced by Sewall Wright and Ronald Fisher in the previous century. Traditionally, heritability is estimated using family-based designs such as twin studies. However, there are controversies regarding the various methodologies for estimation and interpretation of the results [125]. Despite all these, heritability is an important aspect of research in modern genetics, and regarding the prediction of disease risk from genomic data [126]. The technological advancements have facilitated the development of methods that use large samples of unrelated, or related, individuals. Thus, family-based designs using genomic data (trio-genome-wide complex trait analysis, and so on) have emerged. Such methods are discussed and compared in [127]. Of course, heritability can also be estimated via the results obtained in a traditional GWAS using unrelated individuals. The gap between these estimates and those obtained from classical heritability estimation methods has been termed the "missing heritability problem" and it is an important open question in current research [128]. Recent reviews of the methods that use GWAS data, are given in [18, 19] focusing on their modeling assumptions, their similarities, and their applicability.

One of the first and simplest methods to calculate heritability from allele frequency, odds ratio and prevalence of the disease was implemented in the SumVg package [129]. This method, however, utilizes only the significant SNPs. The same authors extended the method later in order to allow calculation using the z-statistics from the whole GWAS sample [130]. A disadvantage of this method is that LD is not taken care of, and highly correlated SNPs need to be filtered manually. AVENGEME [131] is a tool that treats causal effect sizes as fixed effects and models the genotypes as random correlated variables. HESS [132] which was presented later built upon the same ideas and can be viewed as a weighted sum of the squares of the projection of effect sizes onto the eigenvectors of the LD matrix at the particular locus, with weights inversely proportional to the corresponding eigenvalues. LD Score Regression (LDSC) has been frequently applied to summary statistics from GWAS and one of its functionalities is to estimate the SNP heritability of a trait [133]. LDER [134] extends LDSC making full use of the information from the LD matrix providing more accurate estimates, whereas s-LDSC [135] is an extension suitable for partitioning heritability. SumHer [136] presented later and offers the same functionalities, with the main difference being that it allows for different so called “heritability models”. According to these, a SNP with high MAF is expected to contribute more to the total heritability compared to one with low MAF, whereas on the other hand, a SNP in a region of low LD is expected to contribute more compared to one in a region of high LD. On the contrary, LDSC estimates are obtained by assuming that all SNPs contribute equally. HEELS [137] is a new tool using REML to produce accurate and precise local heritability estimates and RSS, is a multiple regression-based fine-mapping tool (see “Fine-mapping”), can also calculate SNP heritability from the regression model. VarExp [138] and GxESum [139] are methods for estimating the phenotypic variance explained by genome-wide gene-environment (GxE) interactions. There are also tools like GWIZ [63] and SummaryAUC [140] that calculate the Receivers Operator’s Characteristic (ROC) curve and the associated Area Under the Curve (AUC). GWIZ generates ROC curves and the AUC using simulations and then estimates heritability using the square of the Somers’ rank correlation D. SummaryAUC on the other hand approximates the AUC of a PRS and its variance. HAMSTA [141] is a tool that, among others, estimates heritability explained by local ancestry using data from admixture mapping studies. Estimating the Effect size distribution is also a related important concept. GENESIS [142] uses LD and a Likelihood-based approach to estimate effect-size distributions. It also allows predictions regarding yield of future GWAS with larger sample sizes. GWEHS [143] calculates the distribution of effect sizes of SNPs, as well as their contribution to trait heritability. Furthermore, it performs predictions for the change in the effect size as well as in the heritability when new variants are identified. FMR [144] is a method-of-moments for calculating the effect-size distribution and GWAS-Causal-Effects-Model [145] is a random effects model for estimating the causal variants and their effect size distribution. Finally, there are tools to implicate gene-expression in heritability analysis: MESC [146] which estimates the proportion of heritability mediated by gene expression levels using linkage disequilibrium (LD) scores and eQTL, and GCSC [147] which uses results from a TWAS (see “TWAS and Colocalization”) in the so-called gene co-regulation score regression, to identify gene sets enriched for disease heritability.

Gene-based tests

Historically, association tests are oriented towards single variants, and this was the case for both traditional association studies as well as for GWAS. However this approach has some limitations that were noted earlier and a call for a shift towards gene-based tests was made [148]. Gene-based tests aggregate individual variant associations within a gene, providing a more comprehensive assessment of the gene's overall contribution to a trait or disease. This approach helps prioritize genes with multiple associated variants, enhancing the biological relevance of findings, and it has proven to be useful particularly in case of low frequency variants [148]. There are plenty of different methods for combining the association statistics or p-values within a gene, ranging from simple Fisher’s method or the minimum p-value approach, to more advanced methods like the Burden Test (BT) [149] or quadratic tests like SKAT [150] with variations in power [151]. Nevertheless, there is a consensus regarding the importance of incorporating LD information of the nearby variants into the methods for controlling the type I error rate at the desired level [20].

VEGAS, GATES, fastBAT and GCTA are among the oldest tools available for summary data, which remain efficient and widely used. SKAT (Sequence Kernel Association Test) is a well-known regression method for testing association between variants and traits adjusting for covariates. As a score-based variance-component test, it calculates p-values analytically by fitting the null model containing only the covariates [150]. The original SKAT method uses only IPD, but later implementations like metaSKAT or SKAT-O have been extended to handle summary data. GCTA and VEGAS also use the multivariate normal framework adjusting the estimates for LD using a reference panel [152, 153]. Of note, GCTA also offers methods for conditional analysis (see “Fine mapping”), and same also holds for KGG [154], whereas VEGAS’s new version allows for mixed ethnicity populations. GATES [155], on the other hand, uses an extended Simes procedure that integrates functional information and association evidence to combine p-values, whereas fastBAT [156] offers fast analytical p-value computations. The gene analysis in MAGMA (Multi-marker Analysis of GenoMic Annotation) is based on a multiple linear principal components’ regression model to account for LD and uses an F-test to compute the overall gene p-value [157]. Its extension, nMAGMA, extends the lists of genes that can be annotated by integrating local signals, long-range regulation signals, and tissue-specific gene networks. It also provides tissue-specific risk signals, which are useful for understanding disorders with multi-tissue origins [158]. H-MAGMA [159] and eMAGMA [160] are two other extensions. The former integrates 3D chromatin configuration, whereas the latter leverages significant tissue-specific cis-eQTL information to assign SNPs to putative genes. EPIC [161] and GAMBIT [162] also utilize functional data for gene-based analysis; the former using cell-type-specific gene expression data obtained from single-cell RNA sequencing and the latter using coding and tissue-specific regulatory annotations. Such methods share several features in common with TWAS methods (see respective section). AgglomerativLD [163] also captures LD between SNPs of nearby genes, which induces correlation of the gene-based test statistics. DOT [164] is one of the few methods that applies a decorrelation-based approach before combining SNP-level statistics or p-values. Tools like GPA [165], oTFisher [166], TS [167] and aSPU [168] implement some type of so-called adaptive tests (AT), that is, they account for possibly varying association patterns across SNPs, whereas some modern tools like MKATR [169], COMBAT [170], MCA [171], OWC [172], FST [173], ACAT [174], HYST [175], GBJ [176] and sumFREGAT [177] perform analysis with multiple statistical methods and test and combine the results. Notably, tools like aSPU [168], snpGeneSets [178], Pascal/PascalX [179, 180], MAGMA, chromMAGMA [181] and FUMA [182], also offer the option of performing gene-set analysis after performing the gene-based analysis (see next section), whereas HSVS-M [168, 183] tests the association of a gene with multiple correlated traits.

Gene Set analysis

Gene set analysis (GSA), or Pathway Analysis, extends the concept of gene-based methods by jointly analyzing groups of functionally related genes and identifying biological pathways enriched with trait-associated genes. By considering the collective impact of multiple genes within a pathway, researchers can obtain a clearer picture of the underlying biological mechanisms influencing the phenotype under investigation. The first applications of such methods borrowed ideas from the microarray data analysis literature, and since then they became widespread in analysis of GWAS [184]. Any GSA method needs to address some issues. Firstly, how to handle SNPs of the same gene; secondly, how to define the appropriate gene-set or pathway, and finally how to combine the effects from multiple SNPs/genes within the same set/pathway [185]. Thus, the choices made by different methods can be very diverse leading to a wide variety of different approaches. For instance, some methods operate with SNP-level statistics (effect sizes, z, or p-values) assigning the SNP to the closest gene (usually within a range of ± 20 K bases), whereas others take as input a gene-level statistic or simply a gene list obtained by a gene-based method (of course, several tools allow for both a gene-based and a GSA approach). Regarding the choice of set there is a plethora of databases containing biological pathways (KEGG, PANTHER etc.), or other types of gene-set representation like PPI interactions, ontologies and so on [186]. Finally, regarding the statistical method used to aggregate evidence there is also a wide range of different methods that handle with different approaches the gene set size and gene length, the LD patterns and the presence of overlapping genes within pathways, or apply different statistical approaches such as those using the so-called competitive null hypothesis, or those using the self-containing one [14, 187]. A tutorial regarding the use of such methods is given in [21].

Among the most easily used and frequently cited are the tools that utilize a webserver. FUMA [182] and iGSE4GWAS [188] are tools specialized in GWAS and use SNP-level statistics as inputs, differing in the subsequent analyses: FUMA uses MAGMA for gene-based testing and allows for ORA and Kologorov-Smirnov test (GSEA), whereas iGSE4GWAS maps the most significant SNP to a gene and then performs an improved GSEA with label permutation to obtain accurate p-values. Tools like Enrichr [189], g:Profiler [190], DAVID [191], WebGestalt [192] and PANTHER [193] are general purpose enrichment tools that provide functionalities for different types of omics data (Fig. 7). They accept gene or SNP-list as input and provide Application Programming Interface (API) ensuring interoperability, whereas for the statistical analysis they all use some version of ORA and/or GSEA (WebGestalt also uses Network Topology-based Analysis). A major feature of these tools is that they incorporate a large number of biological and pathway databases, with g:Profiler and Enrichr offering the most complete collection. GSA-SNP2 is one of the first methods to be developed for GWAS and has seen several improvements regarding the calculation of the combined gene score and the execution time, being among the fastest methods [194]. aSPUpath2 [195] and GIGSEA [196] are two methods that integrate expression data (eQTL) in the pathway analysis. The former uses an adaptive test that extends the aSPU methodology based on chi-square, whereas the latter uses a regression-based approach coupled with permutations to calculate accurate p-values. In a similar fashion, deTS [197] and PGCA perform tissue-specific enrichment analysis (TSEA) for detecting tissue-specific genes and for enrichment test of different forms of query data. Other methods use different definitions of the gene-sets, in some cases utilizing additional information. For instance, dmGWAS [198] integrates PPI networks and uses a search method to identify subnetworks. Compared with standard pathway methods it offers to the users the flexibility in the definition of a gene set and can utilize local PPI information. GEMB [199] defines the gene-sets using gene weights from model predictions and gene ranks from GWAS, and GENOMICper [200] uses permutations of the identified SNPs by rotation with respect to the genomic locations. GWAB [201] uses network connections to reprioritize candidate genes by integrating the GWAS and network data, whereas GenToS [202] searches for trait-associated variants in existing human GWAS. We also need to mention PAPA [203] which is a flexible tool for pleiotropic pathway analysis. As we already mentioned, aSPU, snpGeneSets, PascalX/PASCAL and MAGMA/chromMAGMA are gene-based methods that also perform GSA, whereas MAGENTA is a tool that performs meta-analysis and subsequently GSA (see “meta-analysis”). Lastly, we need to mention Inferno [204] and Mergeomics [205] which are webservers offering a variety of options, extending typical GSA applications. Inferno integrates a variety of functional genomics sources to identify causal noncoding variants using COLOC, WebGestalt, LDSC and MetaXcan. Mergeomics uses summary statistics of multi-omics association studies (GWAS, EWAS, TWAS, PWAS, etc.) and performs correction for LD, GSEA, meta-analysis and identification of regulators of disease-associated pathways and networks.

Fig. 7
figure 7

Enrichment. A Summary view in g:Profiler of the significant SNPs for Type 2 Diabetes Mellitus. B Enrichr results for the same set. C Output of GWAB for Type 2 Diabetes Mellitus SNPs. D Detailed results from g:Profiler

Fine-mapping

While GWAS can identify broad genomic regions associated with the trait, it doesn't pinpoint the exact causal variant within those regions. Fine mapping, working in the opposite direction of that of the gene-based approaches, is a process aimed at narrowing down and identifying causal variants, that is the specific genetic variants responsible for the observed associations between genomic regions and traits of interest. The plethora of statistical methods and study designs makes it difficult to choose an optimal approach. The different approaches that have been proposed to perform fine-mapping can be divided in three broad categories: heuristic methods that select SNPs based on LD patterns, conditional or penalized regression models that perform variable selection, and Bayesian methods that calculate posterior probabilities or Bayes Factors. Based on theoretical and empirical evidence it seems that Bayesian methods have superior performance [22]. Several factors may influence the performance of fine-mapping approaches, including the true number of causal SNPs in a region and their effect sizes, the local LD structure, the sample size, and the SNP density [22, 206]. Functional annotations are also of great importance leading to the so-called functionally informed fine-mapping (FIFM) methods [206]. The hypothesis of a single causal variant is also very restrictive, and several methods have been developed to allow multiple causal variants in a region as well as to incorporate additional layers of functional annotations, like eQTL [207]. Moreover, methods for fine-mapping of multiple datasets have been proposed, either exploiting different LD patterns across ethnic groups or borrowing information between different traits [207].

As we already noted Bayesian methods seem to have superior performance [22] and thus it is of no surprise that most of the currently available methods operate in a Bayesian framework calculating Posterior Inclusion Probabilities (PIP) and/or Bayes Factors (BFs) in various settings: PAINTOR [208], DAP [209], fgwas [210], FINEMAP [211], flashfm [212], FINMOM [213], CARMA [214] and CAVIAR/CAVIARBF [215]. MsCAVIAR [216] is an extension of the latter method leveraging information from multiple studies, useful in trans-ethnic fine mapping. Similarly, XMAP [217] performs cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias. BEATRICE [218] is a unique method that combines a hierarchical Bayesian model with a deep learning-based inference procedure, whereas RIVIERA-beta [219] performs Bayesian fine-mapping using Epigenomic Reference Annotation. On a different level, PolyFun/PolyLoc [220] do not perform fine-mapping per se but are used for estimating the prior causal probabilities of SNPs, which can then be used by other Bayesian fine-mapping methods. SusieR [221], BVS-PICA [222] and JAM [223], operate also in a Bayesian regression framework performing variable selection and penalized regression. Other regression-based methods, like SOJO [224] and ANNORE [225] work in a frequentist framework and perform lasso-type and differential shrinkage via random effects, respectively, whereas GSR utilizes a gene score regression approach [226] and RSS performs multiple regression utilizing the so-called summary statistics likelihood [227]. AHIUT [228] performs an intersection–union test based on a joint/conditional regression model with all the SNPs in a region. Lastly, we need to mention PICS2 [229], which performs probabilistic identification of causal SNPs and is the only of the methods that is available as a web-server, and echocolatoR [230] which requires minimal input from users and integrates a suite of fine-mapping tools to identify consensus variants, test enrichment and visualize the results.

Analysis of multiple traits

In this section we analyze methods developed for handling multiple traits. Depending on the type of data and the purpose of the analysis the methods can be divided into pleiotropy methods, methods that calculate the genetic correlation, methods for mendelian randomization, transcriptome-wide association and colocalization methods.

Pleiotropy

Pleiotropy is the phenomenon in which a single variant influences several traits [231]. Such methods are of great importance in genetic research and several methods have been developed during the last years. A major goal of such methods is to increase the statistical power over single trait methods. Imagine for instance a variant that produces a near-significant effect when analyzed separately for two or three traits. A method that can combine these estimates may produce significant results. Another application of a joint analysis would be to identify variants that influence both traits, or variants that influence only one of them. When all the relevant variants are considered, one can also estimate the kind of relationship between the traits (see “genetic correlation”). A review of the statistical methods to detect pleiotropy in complex traits can be found in [25]. Usually, the methods that allow for multiple trait analysis are oriented toward quantitative traits like BMI, SBP, DBP and so on, that traditionally are measured on a single cohort, resulting in the existence of cross-trait correlation that needs to be taken into account in the analysis. However, there are also methods for performing the same analysis with summary estimates derived from different cohorts, as well as methods that allow for binary traits with the case–control design, using overlapped or non-overlapped controls.

All methods base their inference on the assumption that the z-statistics follow a multivariate normal distribution (MVN) and perform different types of tests and/or different procedures to estimate or approximate the correlation structure. ACA [232] one of the first methods, estimates the traits covariance from a subset of the phenotypic data or from published studies, p_ACT [233] integrates the MVN using the trait correlation, PAT [234] uses a likelihood-ratio test, and PLEI [235] uses the union-intersection testing method, but in addition to the likelihood ratio test, it also applies generalized estimating equations under the working independence model; it can be applied for both marginal analysis and conditional analysis. USAT [236] uses a score-based test, JaSPU [237] uses an adaptive test which is robust to violations of the MVN assumptions and MTAR [238] uses a Principal Components (PC)-based test. BMASS [239] on the other hand is a Bayesian multivariate method, whereas TWT [240], MTAFS [241] and EBMMT [242], which are among the newer tools, perform a Cauchy Combined Test (CCT) to handle the correlation structure and obtain accurate p-values. SHAHER [243] uses a linear combination of traits by maximizing the proportion of its genetic variance explained by the shared variants and allows both shared and unshared variants to be effectively analyzed and HIPO [244] performs heritability-informed power optimization for conducting multi-trait association analysis. HOPS [245] computes a horizontal pleiotropy score by removing correlations between traits caused by vertical pleiotropy and normalizing effect sizes across all traits and PDR [246] performs a pleiotropic decomposition regression to identify shared components and their underlying genetic variants. We also need to mention methods like MTAG [247] and PLEIO [248] which use LDSC and apart from sample overlap also allow data from multiple studies, something that can be considered meta-analysis and methods like MSKAT [249], multiSKAT [250], MGAS [251], MAIUP [252] and MTAR (multi-trait analysis of rare variants) [253] which are gene-based methods specialized for multiple traits. Finally, methods like iMAP [254] and graphGPA2 [255] use graphical models and are capable of performing analysis of large number of traits.

On the other hand, there are several methods that assume independence of the studied samples. Most of them are designed for larger analyses of many traits from multiple studies, for instance PolarMorphism [256], JASS [257], gwas-pw [258] and FactorGo [259], sumDAG [260], combGWAS [261] and GCPBayes pipeline [262]. GCPBayes_pipeline uses the functionality of GCPBayes to perform cross-phenotype gene-set analysis between two traits. gwas-pw is used for the joint analysis of two GWAS in order to identify variants influencing both traits. PolarMorphism is based on a transform from Cartesian to polar coordinates and reports a per variant degree of 'sharedness' across traits, whereas FactorGo provides scalable variational factor analysis model that is computationally efficient for large number of traits. JASS provides interactive exploration and visualization of the results of comparison of many traits through a web interface (Fig. 8 A-C), sumDAG goes one step further and constructs phenotype networks by using a Gaussian linear model and a directed acyclic graph, and combGWAS identifies susceptibility variants for comorbid disorders and calculate genetic correlations. EPS [263] and GPA [264] differ in integrating Pleiotropy and functional annotation from eQTL.

Fig. 8
figure 8

Analysis of multiple traits. A JASS analysis for Type 2 Diabetes Mellitus (T2DM), Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (SBP), indicating the pairwise genetic correlations between the traits. B Manhattan Plot from JASS for the combined analysis of the three traits. C Pairwise analysis of the SNPs identified as significant in the univariate analysis and in the combined analysis. D Two-sample Mendelian Randomization analysis for the association of SBP and T2DM obtained by MR-BASE

Genetic correlation

Genetic correlation is related to pleiotropy and describes the relationship between two traits, that is, the extent to which the genetic variants influencing one trait overlap with the genetic variants associated with the other. It thus can quantify the overall genetic similarity and provide insights into the polygenic genetic architecture of complex traits [23]. As we already saw, analyzing simultaneously multiple traits may increase power in case of horizontal pleiotropy; an additional potential application is to use the estimated correlation in order to establish causality between traits in case of vertical pleiotropy (see also next sections). Since heritability is the proportion of the phenotypic variance explained by genotypic variation it is of no surprise that genetic correlation (or, the genetic covariance) is related to the traits’ heritabilities. Thus, several of the methods for estimating heritability discussed earlier, like HESS and SumHer can also calculate the correlation between traits. The most commonly used method, however, for calculating genetic correlation is LDSC (LD Score Regression). The method originally developed for distinguishing polygenicity from bias by examining the relationship between test statistics and LD score, but it is also used for estimating heritability and genetic correlation [133]. LDSC is also available through the LD Hub server. PCGC-s [265] is an adaptation of stratified LDSC for case–control studies and can also estimate genetic heritability, genetic correlation, and functional enrichment. Another popular tool is GNOVA [266] which calculates annotation-stratified covariance using the method of moments and allows for sample overlap. Its extension, SUPERGNOVA [267] identifies global and local genetic correlations that could provide new insights into the shared genetic basis of many phenotypes. Local correlations, among others, can be also computed using LAVA [268]. HDL [269] is a likelihood-based method which produces more precise estimates. A recent comparison found that LDSC and GNOVA are more similar and robust to LD and sample overlap compared to HDL. HDL provides biased estimates of the genetic covariance in most cases and could not distinguish genetic from non-genetic correlation. Moreover, HDL restricts the users to using the built-in reference panel, and its performs poorly when the number of shared SNPs between reference panel and GWAS is small [24]. Other tools provide somewhat different types of analyses. For instance Popcorn [270] estimates transethnic genetic correlation, GECKO [271] estimates both genetic and environmental covariances, PhenoSD [272] uses LDSC for estimating phenotypic correlations and then performs correction for multiple testing using the spectral decomposition of matrices, whereas LPM [273] is a latent probit model scalable to hundreds of annotations and phenotypes that integrates functional annotations. ccGWAS [274] is a tool for comparing two different disorders with small genetic correlation providing a case-case association test, and RHOGE [275] estimates the genetic correlation between two traits as a function of predicted gene expression effect. LOGOdetect [276] uses scan statistics with an LD score-weighted inner product of local z-scores to identify small segments that harbor local genetic correlation between two traits. DONUTS [277] is a unique method since it operates on summary statistics from families.

Mendelian randomization

Mendelian Randomization (MR) is a method suggested in the pre-GWAS era to investigate causal relationships between two traits, usually a phenotype and a disease [278] using genotype–trait associations to make inferences about environmentally modifiable causes of the traits. In technical terms, MR uses genetic variants as instrumental variables [279] to mimic the random assignment of exposures in a randomized controlled trial, similar to the way Mendel's laws of inheritance dictate the random assortment of alleles during gamete formation. By utilizing the natural randomization of genetic inheritance, MR aims to minimize biases introduced by confounding factors that usually affect observational studies when investigating the association of two traits. Usually, we are interested in a disease and some other intermediate phenotype, or another disease. For instance, the MR approach may involve the relationship between hypertension and BMI, or between hypertension and diabetes. Traditionally MR was performed with one sample (1SMR) using a single variant (usually referred to IPD methods), and subsequently multivariate methods for MR meta-analysis were developed [280]. With the emergence of GWAS these methods evolved to the most commonly used two-sample MR (2SMR) methods that utilize summary data estimates from several variants regarding the genotype–phenotype and genotype-disease association from different samples [26, 281]. To establish connection with the previous sections, MR seeks to analyze correlated traits [282] and to provide evidence for causation, in other words to distinguish vertical from horizontal pleiotropy.

Several standard methods for MR in GWAS with summary data have been made available during the last years: the inverse-variance weighted method (IVW), the various types of median estimators (simple or weighted) and the MR-Egger regression approach. IVW gives consistent estimates only if all the genetic variants in the analysis are valid instruments. The median estimator is consistent even when up to 50% of the information comes from invalid instrumental variables, whereas MR-Egger performs equally well but provides somewhat less precise estimates [283]. These methods are readily available in standard packages like TwoSampleMR [284] and MR [285]. The functionalities of TwoSampleMR are also offered, at least partially, through the webserver of MRBASE [284], which is the only method available as such (see Fig. 8, D). BWMR [286] is a tool that performs MR in a Bayesian framework. Besides the issue of weak instruments which is of importance, most modern methods also aim to perform the MR analysis accounting or correcting for horizontal pleiotropy. For instance, pIVW [287] is an extension of the IVW that accounts simultaneously for weak instruments and balanced horizontal pleiotropy and MRmix [288] uses a mixture approach allowing a fraction of the instruments to have pleiotropic effect on the outcome. Similarly, MRcML [289], MR-LDP [290], MR-Corr2 [291] and MR-PRESSO [292] provide functionalities to account for horizontal pleiotropy, whereas IMRP [293] takes a different approach and searches iteratively for horizontally pleiotropic variants and causal effects. MR-APSS [294] differs in that it performs MR accounting for both pleiotropy and sample structure which seems to be another important confounder (and includes population stratification, cryptic relatedness, and sample overlap); MRlap [295] considers both weak instrument bias and winner's curse, accounting for sample overlap. MR.CUE [296] and TS_LMM [297] offer additional functionality for handling variability of the estimates. LCV [298] is a method that estimates causal associations between traits avoiding confounding by genetic correlation, whereas OMR [299] uses information from all GWAS SNPs for causal inference and JAM-MR [300] performs variable selection and causal effect estimation in MR. CS [301], BiDirectCausal [302], MRCI [303] and LHC-MR [304] constitute another important class of methods since they can identify bidirectional causal effects. Another important extension is offered by methods like MR2 [305], MV-MR [306], MRBEE [307], MVMR-cML [308] and adOMICs [309] which extend the MR framework in the multivariate setting allowing more than one exposures or outcomes, as well as MR-BMA [310] which go one step further performing multivariate MR in a Bayesian framework. Finally, other methods like hJAM [311], MR.RAPS [312] and MRPEA [313] offer more advanced options. hJAM unifies the framework of MR and TWAS and can be applied to correlated instruments and multiple intermediates, MR.RAPS uses a three-sample genome-wide design with many independent genetic instruments across the genome to handle many weak genetic instruments and pleiotropy, whereas MRPEA uses pathway association MR analysis approach using data of environmental exposures.

Colocalization and TWAS

As we already described, the MR approach involves the combination of two types of data, a genotype-disease association, and a genotype–phenotype association. If the phenotype involves gene-expression, that is the result of an eQTL study, then we have two distinct but fundamentally related methods, the Transcriptome-wide association study (TWAS) and the colocalization approach (Fig. 9). TWAS is based on the idea that genetic variants can influence gene expression, which subsequently can affect complex traits or diseases [27]. Thus, the approach uses information from eQTL to identify associations between predicted gene expression levels and complex traits/diseases [314]. Even though there are several different methods, the resemblance to MR is obvious; in fact several methods like SMR that uses a single variant [315], GSMR that uses multiple variants [310], and PMR [316] which can account for correlated instruments, horizontal pleiotropy, and can accommodate both single traits and multiple correlated outcomes, all use the term MR, whereas the authors of TScML [317], which uses two-stage constrained maximum likelihood, which is an extension of 2SLS, explicitly state that can be used for both MR and TWAS analyses. FUSION and S-PrediXcan are the oldest and most widely known methods. FUSION is the current implementation of the first TWAS method [318], whereas S-PrediXcan [319] is the summary-data version of PrediXcan. Xu et al. [320] noted that PrediXcan and TWAS can be viewed as a special case of general association testing with multiple SNPs in a GLM and proposed the so-called sum of powered score (SPU) test implemented in aSPU-TWAS [320]. A subsequent evaluation has shown that the original TWAS statistic is equivalent to an LD-aware version of standard MR [321]. iFunMed [322] and sMIST [323] formulate the problem within the framework of mediator analysis, and similarly PTWAS [324] applies principles from instrumental variables analysis. Comm-S* [325] uses a variational Bayesian EM algorithm and a likelihood ratio test to assess expression-trait association. Its extension Tiss-Comm [326] leverages the co-regulation of genetic variations across different tissues explicitly via a unified probabilistic model and also detects the tissue-specific role of candidate target genes in complex traits. Similar multi-tissue approaches are followed by fQTL [327], sCCA [328] and UTMOST [329]. Primo [330], and OPERA [331] extend further the integration by allowing different types of xQTL data (eQTL, pQTL, mQTL etc.) to allow estimation under different conditions, whereas SUMMIT [332] uses a large eQTL summary-level dataset, penalized regression and Cauchy Combination Test and HMAT [333] aggregates TWAS association tests obtained across multiple gene expression prediction models using the harmonic mean P-value combination (HMP). BGW [334] and ARCHIE [335] are two methods that utilize trans-regulated eQTLs. Other tools use combination of methods, like TIGAR [336] which combines DPR and PrediXcan, whereas others, like JEPEGMIX2‐P [337] or FOCUS [338], perform TWAS using pathway information, or use LD to perform fine-mapping over the gene–trait association signals obtained from TWAS, respectively. Even though the various methods discussed here have different modeling assumptions and many were initially developed to answer different biological questions, a recent technical review of the TWAS methods showed that all can be viewed as versions of the two-sample MR analysis [339]. Indeed, several recent tools like MRLocus [340], TWMR [341], and Mr.MtRobin [342] make explicit use of the MR methodology and jargon in order to perform a sophisticated TWAS. MRLocus performs first a colocalization step to each nearly-LD-independent eQTL, and then performs an MR analysis step across eQTLs. TWMR performs a multi-gene multi-instrument MR approach to identify genes whose expression influence the phenotype. Finally, Mr.MtRobin uses multi-tissue eQTL and a reverse regression random slope mixed model to infer whether a gene is associated with a complex trait. As we have already noticed, webTWAS, apart from the database, also offers a webserver for accessing S-PrediXcan, SMR and UTMOST with user supplied datasets.

Fig. 9
figure 9

Incorporation of eQTL data. A Overview of the gene-expression patterns in T2DM obtained by PCGA. B Top associated tissues and cells for T2DM (PCGA). C An example of colocalization output perform by LocusFocus. D TSEA-DB view of the analysis of significant SNPs involved in T2DM. E Heat-map for the tissues involved in T2DM significant hits obtained by COLOC. F Plots of the genome-wide significant hits obtained from GWAS and eQTL (COLOC). G Heat-map for the tissues involved in T2DM (TSEA-DB). H Example of fine-mapping regarding a SNP indicated in T2DM obtained by PICS2

Another method that also uses GWAS results along with eQTL data is colocalization. Colocalization approaches are used to assess whether two different traits or diseases share a common causal genetic variant or set of variants at a specific genomic locus [13]. Colocalization analysis identifies genetic variants that show significant association in both GWAS and eQTL studies. However, unlike TWAS, it does not perform gene expression prediction and gene-trait association tests, but it focuses on the colocalized SNPs [28]. TWAS and colocalization are related approaches but not identical, since it has been shown that may give different results under different conditions (for instance in case of horizontal pleiotropy) and thus it has been suggested that they should be used complementary [28, 343]. COLOC was one of the first methods for colocalization and has seen several improvements [344, 345] (see also Fig. 9). The latest version uses SuSiE and allows evidence for association at multiple causal variants to be evaluated simultaneously, while at the same time separating the statistical support for each variant conditional on the causal signal being considered. MOLOC [346] is multiple-trait version of COLOC, operating in a Bayesian framework that integrates GWAS summary data with multiple xQTL data to identify regulatory effects, HyPrColoc [347] is a deterministic Bayesian method that detects colocalization across large numbers of traits, and SS2 [348] operates across any number of gene-tissue pairs allowing for sample overlap. LLR [349] works for colocalizing genetic risk variants in multiple GWAS and phenotypes, whereas POEMcoloc [350] is an approximation to the COLOC method that can be applied when limited data are available. SparkINFERNO [351], PwCoCo [352] and ColocQuiaL [353] are pipelines offering additional functionalities, all using COLOC. eCAVIAR is another popular method [354] that uses a probabilistic model that accounts for more than one causal variant at a given locus. MSG [355] increases the power using a spliced gene approach and SharePro [356] integrates LD modeling and colocalization assessment to account for multiple causal variants in colocalization analysis. PESCA [357] uses estimates of LD that are ancestry-matched, in order to infer proportions of population-specific and shared causal variants in two populations. These estimates are then used as priors in an empirical Bayes framework for colocalization and test for enrichment of these causal variants in loci of interest. Lastly, we have to mention the methods that operate as webservers offering ease of use. Sherlock [358] which is also one of the oldest methods, uses a database of eQTL associations from different tissues to identify genetic signatures that match those for specific genes. Unlike other methods it incorporates information from both cis- and trans- eQTL SNPs. LocusFocus [359] is a web-based colocalization tool that tests colocalization using the Simple Sum method to identify relevant genes and tissues for a particular GWAS locus in the presence of high linkage disequilibrium and/or allelic heterogeneity. Regarding the analysis of eQTL data, ezQTL [360] is a webserver performing various tasks like data quality control for variants matched between different datasets, LD visualization, and colocalization analysis using eCAVIAR and HyPrColoc, whereas BAGEA [361] uses a variational Bayes framework to model cis-eQTLs using directed and undirected genomic annotations.

Conclusions

Summary statistics offer protection of privacy over IPD, as well as significant advantages in computational cost, which does not scale with the number of individuals in the study [11]. Naturally, in the post-GWAS era it is expected that a large number of methods would be developed to perform analysis using the summary results of GWAS [11]. The particular methods, integrating data from multiple sources such as LD, gene expression and biological pathways, aim to provide biological insight and improve our understanding about the functional role of identified variants [12,13,14,15]. One thing which we should emphasize is the fact that GWAS summary statistics are not mere replacements for IPD. Of course, some types of analysis can be applied using both summary data or IPD, like meta-analysis, heritability analysis, fine-mapping and so on. In such cases the summary data methods greatly enhance the applicability and the ease of use overcoming the limitations of IPD mentioned earlier. However, methods for other types of analysis, and particularly those that use multiple datasets, like TWAS, colocalization or Mendelian Randomization were designed having in mind the summary data and the integration of data from multiple sources. This is exactly the spirit of the so-called post-GWAS analysis that brought bioinformatics into a central role in genetics research [11]. Most of the “success stories” in GWAS during the last years can be attributed to the development and the application of such methods in identifying new variants, in functional annotation, causal discovery or even in medical applications [2, 12, 362].

In this work we conducted, for the first time in the literature, a systematic review in order to identify software tools and databases dedicated to GWAS summary data analysis. We categorized the tools and databases by their functionality, in categories related to data, single-trait analysis, and multiple-trait analysis, along with their sub-categories which we analyzed and reviewed. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a wide range of tools, each with unique strengths and limitations. We provided descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discussed the overall usability and applicability of each tool for different research scenarios. We identified families of related tools for performing different or complementary tasks, for instance the CAVIAR tools (CAVIAR, CAVIARBF, msCAVIAR, eCAVIAR), the EpiXcan tools (S-MultiXcan, S-PrediXcan), the LDAK programs (SumHer, GBAT), the MAGMA tools (nMAGMA, H-MAGMA, eMAGMA) and so on. We need to emphasize that in many cases a tool, originally developed for IPD, is later adapted to handle summary data, whereas in other cases a tool is succeeded by a newer version with added capabilities. For instance, the original PrediXcan method uses only IPD, but it is now considered deprecated. S-PrediXcan and S-MultiXcan are later versions that are designed to be used with summary data. The same is the case regarding SKAT. The original method uses only IPD, but later implementations like metaSKAT or SKAT-O allow for summary data as well. At the same time, it is of importance that there are several tools that combine different functionalities. For instance there are tools that can perform meta-analysis and GSA (MAGENTA), gene-based methods that also offer functionalities for conditional analysis (GCTA), methods for analysis of multiple traits with gene-based tests (multiSKAT, MSKAT), methods that can be seen both as methods for multiple-traits or as meta-analysis (PLEIO, PASCAL), methods that perform both GSA and gene-based tests (aSPU, snpGeneSets, PascalX, PASCAL,MAGMA, FUMA). Of course, there are several single-purpose methods that use and combine different statistical tests or different methods (OWC, MCA, TWT, EBMMT, COMBAT, sumFREGAT, MKATR), and we may not forget methods like LDSC, with its variants, which was originally developed for distinguishing polygenicity from bias, but it is also used for estimating heritability and genetic correlation being integrated in many other tools and pipelines.

As we already mentioned, the tools and databases included in the study were those with a functioning URL. In many publications identified through the literature search the URL was not working. In some situations, we recovered a valid link by performing google searches, or by identifying the authors’ websites, but in many cases, this was not enough. Similarly, several tools deposited in CRAN had been removed or archived. This kind of problem is something already known in the scientific community for years [363,364,365]. However, there is more to it. Even for the tools included in the review we could not verify without proper testing that they all work seamlessly, especially for the older ones [366]. Operating systems evolve, programming languages change, and with these the dependencies of each software also change. Even though there are available best practices [367], it is not always realistic to expect complex software to work forever without maintenance. Even for some of the tools having valid URLs, for instance deposited on GitHub, or on personal web pages, we found statements by the authors indicating that the software is no longer maintained and that it is not easy to provide technical support. It is clear that more advanced solutions should be pursued. For instance, among the tools we identified the majority are written in R and Python, but only a handful is available as a webserver: ten of the tools for GSA, three tools for colocalization, two tools for meta-analysis, and one for pleiotropy analysis, MR and fine-mapping. Of course, several of the secondary databases we identified also provide the functionality of performing the analysis using data provided by the user (webTWAS, TSEA-DB, PCGA), but even counting these the proportion of web-tools is rather low (< 10%). Web servers and web services have become of high relevance to the field of bioinformatics during the last 20 years [368], so it is expected to have an increasing number of relevant webservers in the near future as relevant tools are available to facilitate the incorporation of existing applications [369,370,371,372]. On the other hand, some tools may be too computationally demanding, so other solutions must be found. Container-based applications [373, 374] such as Docker can simplify maintenance procedures and add to the reproducibility of research [375]. Community efforts such as udocker [376] may promote usability of complex software tools by non-experts in multi-user environments.

As data accumulates it is unavoidable to head to analyses on an even larger scale. Traditionally the large-scale analysis of many gene-disease associations is modeled by the so-called diseasome [377, 378] using graph theoretic methods [379, 380]. The gene-disease network is composed of pairwise associations obtained from public databases and is a bipartite network [379] consisting of two separate sets of nodes and the interactions between nodes belonging to the different sets. The projection to the one or the other of the sets may lead to the gene–gene or the disease-disease projected networks that inform us about the associations between members of the same set (for instance, two diseases are connected if they share common genes, and so on). Such methods are available for years, but they treat the associations as fixed inputs to the graph. As data accumulate and even more complex statistical methods are developed that allow cross-trait comparisons and combined analyses of multiple traits, along with the integration of different types of data such as xQTL, it is tempting to speculate that a fusion of these two traditions may come, in which the statistical formalism of the tools presented in this review will merge with the graph theoretic approaches developed in the systems biology literature. For instance, we may see network approaches leading to causal analyses (similar to MR) that consider simultaneously all the diseases and traits for which we have GWAS summary data, or similar approaches that integrate xQTL data of various types, different tissues and so on.

We hope that this comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases, as well as to methodologists that develop and test relevant methods. We provided a detailed overview of the available tools and databases, and we hope that this work will facilitate informed tool selection and will maximize the effectiveness of using GWAS summary statistics.

Availability of data and materials

The data collected in this study are available in Supplementary Material. Supplementary Table 1 contains the list with the identified tools along with the URLs, the references and the descriptions. Supplementary Table 2 contains the list with the additional datasets identified in various consortia.

References

  1. Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y, Martin AR, et al. Genome-wide association studies. Nature Reviews Methods Primers. 2021;1(1):59.

    Article  CAS  Google Scholar 

  2. Abdellaoui A, Yengo L, Verweij KJH, Visscher PM. 15 years of GWAS discovery: Realizing the promise. Am J Hum Genet. 2023;110(2):179–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Ziegler A, Konig IR, Thompson JR. Biostatistical aspects of genome-wide association studies. Biom J. 2008;50(1):8–28.

    Article  PubMed  Google Scholar 

  4. Alsheikh AJ, Wollenhaupt S, King EA, Reeb J, Ghosh S, Stolzenburg LR, et al. The landscape of GWAS validation; systematic review identifying 309 validated non-coding variants across 130 human diseases. BMC Med Genomics. 2022;15(1):74.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Moore JH, Asselbergs FW, Williams SM. Bioinformatics challenges for genome-wide association studies. Bioinformatics. 2010;26(4):445–55.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Muehling J, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 2008;4(8): e1000167.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Craig DW, Goor RM, Wang Z, Paschall J, Ostell J, Feolo M, et al. Assessing and managing risk when sharing aggregate genetic variant data. Nat Rev Genet. 2011;12(10):730–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Cai R, Hao Z, Winslett M, Xiao X, Yang Y, Zhang Z, et al. Deterministic identification of specific individuals from GWAS results. Bioinformatics. 2015;31(11):1701–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Thelwall M, Munafo M, Mas-Bleda A, Stuart E, Makita M, Weigert V, et al. Is useful research data usually shared? An investigation of genome-wide association study summary statistics. PLoS ONE. 2020;15(2): e0229578.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Reales G, Wallace C. Sharing GWAS summary statistics results in more citations. Commun Biol. 2023;6(1):116.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet. 2017;18(2):117–27.

    Article  CAS  PubMed  Google Scholar 

  12. Gallagher MD, Chen-Plotkin AS. The Post-GWAS Era: From Association to Function. Am J Hum Genet. 2018;102(5):717–30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Cano-Gamez E, Trynka G. From GWAS to Function: Using Functional Genomics to Identify the Mechanisms Underlying Complex Diseases. Front Genet. 2020;11:424.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Chimusa ER, Dalvie S, Dandara C, Wonkam A, Mazandu GK. Post genome-wide association analysis: dissecting computational pathway/network-based approaches. Brief Bioinform. 2019;20(2):690–700.

    Article  CAS  PubMed  Google Scholar 

  15. Ishigaki K. Beyond GWAS: from simple associations to functional insights. Semin Immunopathol. 2022;44(1):3–14.

    Article  CAS  PubMed  Google Scholar 

  16. Begum F, Ghosh D, Tseng GC, Feingold E. Comprehensive literature review and statistical considerations for GWAS meta-analysis. Nucleic Acids Res. 2012;40(9):3777–84.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Ioannidis JP, Rosenberg PS, Goedert JJ, O'Brien TR, International Meta-analysis of HIVHG. Commentary: meta-analysis of individual participants' data in genetic epidemiology. Am J Epidemiol. 2002;156(3):204–10.

  18. Tang M, Wang T, Zhang X. A review of SNP heritability estimation methods. Brief Bioinform. 2022;23(3).

  19. Zhu H, Zhou X. Statistical methods for SNP heritability estimation and partition: A review. Comput Struct Biotechnol J. 2020;18:1557–68.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Cinar O, Viechtbauer W. A Comparison of Methods for Gene-Based Testing That Account for Linkage Disequilibrium. Front Genet. 2022;13: 867724.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Mooney MA, Wilmot B. Gene set analysis: A step-by-step guide. Am J Med Genet B Neuropsychiatr Genet. 2015;168(7):517–27.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Schaid DJ, Chen W, Larson NB. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet. 2018;19(8):491–504.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. van Rheenen W, Peyrot WJ, Schork AJ, Lee SH, Wray NR. Genetic correlations of polygenic disease traits: from theory to practice. Nat Rev Genet. 2019;20(10):567–81.

    Article  PubMed  Google Scholar 

  24. Zhang Y, Cheng Y, Jiang W, Ye Y, Lu Q, Zhao H. Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics. Brief Bioinform. 2021;22(5).

  25. Hackinger S, Zeggini E. Statistical methods to detect pleiotropy in human complex traits. Open Biol. 2017;7(11).

  26. Boehm FJ, Zhou X. Statistical methods for Mendelian randomization in genome-wide association studies: A review. Comput Struct Biotechnol J. 2022;20:2338–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles DA, Golan D, et al. Opportunities and challenges for transcriptome-wide association studies. Nat Genet. 2019;51(4):592–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Hukku A, Sampson MG, Luca F, Pique-Regi R, Wen X. Analyzing and reconciling colocalization and transcriptome-wide association studies from the perspective of inferential reproducibility. Am J Hum Genet. 2022;109(5):825–37.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. MacArthur JAL, Buniello A, Harris LW, Hayhurst J, McMahon A, Sollis E, et al. Workshop proceedings: GWAS summary statistics standards and sharing. Cell Genom. 2021;1(1).

  30. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372: n71.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Hayhurst J, Buniello A, Harris L, Mosaku A, Chang C, Gignoux CR, et al. A community driven GWAS summary statistics standard. bioRxiv. 2023:2022.07.15.500230.

  32. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Lyon MS, Andrews SJ, Elsworth B, Gaunt TR, Hemani G, Marcora E. The variant call format provides efficient and robust storage of GWAS summary statistics. Genome Biol. 2021;22(1):32.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Elsworth B, Lyon M, Alexander T, Liu Y, Matthews P, Hallett J, et al. The MRC IEU OpenGWAS data infrastructure. bioRxiv. 2020:2020.08.10.244293.

  35. van der Most PJ, Vaez A, Prins BP, Munoz ML, Snieder H, Alizadeh BZ, et al. QCGWAS: A flexible R package for automated quality control of genome-wide association results. Bioinformatics. 2014;30(8):1185–6.

    Article  PubMed  Google Scholar 

  36. Fuchsberger C, Taliun D, Pramstaller PP, Pattaro C. GWAtoolbox: an R package for fast quality control and handling of genome-wide association studies meta-analysis data. Bioinformatics. 2012;28(3):444–5.

    Article  CAS  PubMed  Google Scholar 

  37. Winkler TW, Day FR, Croteau-Chonka DC, Wood AR, Locke AE, Mägi R, et al. Quality control and conduct of genome-wide association meta-analyses. Nat Protoc. 2014;9(5):1192–212.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Chen GB, Lee SH, Robinson MR, Trzaskowski M, Zhu ZX, Winkler TW, et al. Across-cohort QC analyses of GWAS summary statistics from complex traits. Eur J Hum Genet. 2016;25(1):137–46.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Murphy AE, Schilder BM, Skene NG. MungeSumstats: a Bioconductor package for the standardization and quality control of many GWAS summary statistics. Bioinformatics. 2021;37(23):4593–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. He Y, Koido M, Shimmori Y, Kamatani Y. GWASLab: a Python package for processing and visualizing GWAS summary statistics. 2023.

  41. Matushyn M, Bose M, Mahmoud AA, Cuthbertson L, Tello C, Bircan KO, et al. SumStatsRehab: an efficient algorithm for GWAS summary statistics assessment and restoration. BMC Bioinformatics. 2022;23(1):443.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Ani A, van der Most PJ, Snieder H, Vaez A, Nolte IM. GWASinspector: comprehensive quality control of genome-wide association study results. Bioinformatics. 2021;37(1):129–30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Awasthi S, Chen CY, Lam M, Huang H, Ripke S, Altar CA. GWAS quality score for evaluating associated regions in GWAS analyses. Bioinformatics. 2023;39(1).

  44. Chen W, Wu Y, Zheng Z, Qi T, Visscher PM, Zhu Z, et al. Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors. Nat Commun. 2021;12(1):7117.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Williams CM, Poore H, Tanksley PT, Kweon H, Courchesne-Krak NS, Londono-Correa D, et al. Guidelines for Evaluating the Comparability of Down-Sampled GWAS Summary Statistics. Behav Genet. 2023;53(5–6):404–15.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Baxevanis AD, Bateman A. The Importance of Biological Databases in Biological Discovery. Curr Protoc Bioinformatics. 2015;50:1–8.

    Article  Google Scholar 

  47. Ison J, Rapacki K, Menager H, Kalas M, Rydza E, Chmura P, et al. Tools and data services registry: a community effort to document bioinformatics resources. Nucleic Acids Res. 2016;44(D1):D38-47.

    Article  CAS  PubMed  Google Scholar 

  48. Rigden DJ, Fernandez XM. The 27th annual Nucleic Acids Research database issue and molecular biology database collection. Nucleic Acids Res. 2020;48(D1):D1–8.

    Article  CAS  PubMed  Google Scholar 

  49. Zou D, Ma L, Yu J, Zhang Z. Biological databases for human research. Genomics Proteomics Bioinformatics. 2015;13(1):55–63.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Hassani-Pak K, Rawlings C. Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes. J Integr Bioinform. 2017;14(1).

  51. Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007;39(10):1181–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47(D1):D1005–12.

    Article  CAS  PubMed  Google Scholar 

  53. Beck T, Rowlands T, Shorter T, Brookes AJ. GWAS Central: an expanding resource for finding and visualising genotype and phenotype data from genome-wide association studies. Nucleic Acids Res. 2023;51(D1):D986–93.

    Article  CAS  PubMed  Google Scholar 

  54. Canela-Xandri O, Rawlik K, Tenesa A. An atlas of genetic associations in UK Biobank. Nat Genet. 2018;50(11):1593–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. McInnes G, Tanigawa Y, DeBoever C, Lavertu A, Olivieri JE, Aguirre M, et al. Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics. Bioinformatics. 2019;35(14):2495–7.

    Article  CAS  PubMed  Google Scholar 

  56. Consortium GT. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369(6509):1318–30.

    Article  Google Scholar 

  57. Huang D, Feng X, Yang H, Wang J, Zhang W, Fan X, et al. QTLbase2: an enhanced catalog of human quantitative trait loci on extensive molecular phenotypes. Nucleic Acids Res. 2023;51(D1):D1122–8.

    Article  CAS  PubMed  Google Scholar 

  58. Dai Y, Hu R, Manuel AM, Liu A, Jia P, Zhao Z. CSEA-DB: an omnibus for human complex trait and cell type associations. Nucleic Acids Res. 2021;49(D1):D862–70.

    Article  CAS  PubMed  Google Scholar 

  59. Xue C, Jiang L, Zhou M, Long Q, Chen Y, Li X, et al. PCGA: a comprehensive web server for phenotype-cell-gene association analysis. Nucleic Acids Res. 2022;50(W1):W568–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Cao C, Wang J, Kwok D, Cui F, Zhang Z, Zhao D, et al. webTWAS: a resource for disease candidate susceptibility genes identified by transcriptome-wide association study. Nucleic Acids Res. 2022;50(D1):D1123–30.

    Article  CAS  PubMed  Google Scholar 

  61. Pan S, Kang H, Liu X, Li S, Yang P, Wu M, et al. COLOCdb: a comprehensive resource for multi-model colocalization of complex traits. Nucleic Acids Res. 2024;52(D1):D871–81.

    Article  PubMed  Google Scholar 

  62. Watanabe K, Stringer S, Frei O, Umicevic Mirkov M, de Leeuw C, Polderman TJC, et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet. 2019;51(9):1339–48.

    Article  CAS  PubMed  Google Scholar 

  63. Patron J, Serra-Cayuela A, Han B, Li C, Wishart DS. Assessing the performance of genome-wide association studies for predicting disease risk. PLoS ONE. 2019;14(12): e0220215.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Bastarache L, Denny JC, Roden DM. Phenome-Wide Association Studies. JAMA. 2022;327(1):75–6.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Verma A, Ritchie MD. Current Scope and Challenges in Phenome-Wide Association Studies. Curr Epidemiol Rep. 2017;4(4):321–9.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Wang L, Zhang X, Meng X, Koskeridis F, Georgiou A, Yu L, et al. Methodology in phenome-wide association studies: a systematic review. J Med Genet. 2021;58(11):720–8.

    Article  PubMed  Google Scholar 

  67. Kamat MA, Blackshaw JA, Young R, Surendran P, Burgess S, Danesh J, et al. PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations. Bioinformatics. 2019;35(22):4851–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013;31(12):1102–10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Zheng J, Erzurumluoglu AM, Elsworth BL, Kemp JP, Howe L, Haycock PC, et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics. 2017;33(2):272–9.

    Article  CAS  PubMed  Google Scholar 

  70. Li Y, Willer C, Sanna S, Abecasis G. Genotype imputation. Annu Rev Genomics Hum Genet. 2009;10:387–406.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010;11(7):499–511.

    Article  CAS  PubMed  Google Scholar 

  72. Naj AC. Genotype Imputation in Genome-Wide Association Studies. Curr Protoc Hum Genet. 2019;102(1): e84.

    Article  PubMed  Google Scholar 

  73. Dickhaus T, Stange J, Demirhan H. On an extended interpretation of linkage disequilibrium in genetic case-control association studies. Stat Appl Genet Mol Biol. 2015;14(5):497–505.

    Article  CAS  PubMed  Google Scholar 

  74. Kwan JS, Li MX, Deng JE, Sham PC. FAPI: Fast and accurate P-value Imputation for genome-wide association study. Eur J Hum Genet. 2016;24(5):761–6.

    Article  CAS  PubMed  Google Scholar 

  75. Pasaniuc B, Zaitlen N, Shi H, Bhatia G, Gusev A, Pickrell J, et al. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics. 2014;30(20):2906–14.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Julienne H, Shi H, Pasaniuc B, Aschard H. RAISS: robust and accurate imputation from summary statistics. Bioinformatics. 2019;35(22):4837–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Lee D, Bigdeli TB, Williamson VS, Vladimirov VI, Riley BP, Fanous AH, et al. DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts. Bioinformatics. 2015;31(19):3099–104.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Rueger S, McDaid A, Kutalik Z. Evaluation and application of summary statistic imputation to discover new height-associated loci. PLoS Genet. 2018;14(5): e1007371.

    Article  PubMed  PubMed Central  Google Scholar 

  79. Xu Z, Duan Q, Yan S, Chen W, Li M, Lange E, et al. DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics. 2015;31(15):2434–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Lee D, Bigdeli TB, Riley BP, Fanous AH, Bacanu SA. DIST: direct imputation of summary statistics for unmeasured SNPs. Bioinformatics. 2013;29(22):2925–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Togninalli M, Roqueiro D, Investigators CO, Borgwardt KM. Accurate and adaptive imputation of summary statistics in mixed-ethnicity cohorts. Bioinformatics. 2018;34(17):i687–96.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Park DS, Brown B, Eng C, Huntsman S, Hu D, Torgerson DG, et al. Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses. Bioinformatics. 2015;31(12):i181–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Ren J, Lin Z, Pan W. Integrating GWAS summary statistics, individual-level genotypic and omic data to enhance the performance for large-scale trait imputation. Hum Mol Genet. 2023;32(17):2693–703.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Ren J, Lin Z, He R, Shen X, Pan W. Using GWAS summary data to impute traits for genotyped individuals. HGG Adv. 2023;4(3): 100197.

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Yang Z, Paschou P, Drineas P. Reconstructing SNP allele and genotype frequencies from GWAS summary statistics. Sci Rep. 2022;12(1):8242.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Bagos PG, Nikolopoulos GK. A method for meta-analysis of case-control genetic association studies using logistic regression. Stat Appl Genet Mol Biol. 2007;6:Article17.

  87. Bagos PG. A unification of multivariate methods for meta-analysis of genetic association studies. Stat Appl Genet Mol Biol. 2008;7(1):Article31.

  88. Bagos PG. Genetic model selection in genome-wide association studies: robust methods and the use of meta-analysis. Stat Appl Genet Mol Biol. 2013;12(3):285–308.

    Article  PubMed  Google Scholar 

  89. Dimou NL, Tsirigos KD, Elofsson A, Bagos PG. GWAR: robust analysis and meta-analysis of genome-wide association studies. Bioinformatics. 2017;33(10):1521–7.

    Article  CAS  PubMed  Google Scholar 

  90. Di Pietrantonj C. Four-fold table cell frequencies imputation in meta analysis. Stat Med. 2006;25(13):2299–322.

    Article  PubMed  Google Scholar 

  91. Nolte IM. Metasubtract: an R-package to analytically produce leave-one-out meta-analysis GWAS summary statistics. Bioinformatics. 2020;36(16):4521–2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Woolf B, Sallis HM, Munafò MR, Gill D. Deriving GWAS summary estimates for paternal smoking in UK biobank: a GWAS by subtraction. BMC Res Notes. 2023;16(1):159.

    Article  PubMed  PubMed Central  Google Scholar 

  93. Niu YF, Ye C, He J, Han F, Guo LB, Zheng HF, et al. Reproduction and In-Depth Evaluation of Genome-Wide Association Studies and Genome-Wide Meta-analyses Using Summary Statistics. G3 (Bethesda). 2017;7(3):943–52.

  94. Lloyd-Jones LR, Robinson MR, Yang J, Visscher PM. Transformation of Summary Statistics from Linear Mixed Model Association on All-or-None Traits to Odds Ratio. Genetics. 2018;208(4):1397–408.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Forero DA, Lopez-Leon S, González-Giraldo Y, Bagos PG. Ten simple rules for carrying out and writing meta-analyses. PLoS Comput Biol. 2019;15(5): e1006922.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Lin DY, Zeng D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet Epidemiol. 2010;34(1):60–6.

    Article  CAS  PubMed  Google Scholar 

  97. Riley RD, Lambert PC, Staessen JA, Wang J, Gueyffier F, Thijs L, et al. Meta-analysis of continuous outcomes combining individual patient data and aggregate data. Stat Med. 2008;27(11):1870–93.

    Article  PubMed  Google Scholar 

  98. Dai M, Ming J, Cai M, Liu J, Yang C, Wan X, et al. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies. Bioinformatics. 2017;33(18):2882–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Fu S, Deng L, Zhang H, Qin J, Yu K. Integrative analysis of individual-level data and high-dimensional summary statistics. Bioinformatics. 2023;39(4).

  100. Dai M, Wan X, Peng H, Wang Y, Liu Y, Liu J, et al. Joint analysis of individual-level and summary-level GWAS data by leveraging pleiotropy. Bioinformatics. 2019;35(10):1729–36.

    Article  CAS  PubMed  Google Scholar 

  101. Fu S, Purdue MP, Zhang H, Qin J, Song L, Berndt SI, et al. Improve the model of disease subtype heterogeneity by leveraging external summary data. PLoS Comput Biol. 2023;19(7): e1011236.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Evangelou E, Ioannidis JP. Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet. 2013;14(6):379–89.

    Article  CAS  PubMed  Google Scholar 

  103. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–1.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Mägi R, Morris AP. GWAMA: software for genome-wide association meta-analysis. BMC Bioinformatics. 2010;11:288.

    Article  PubMed  PubMed Central  Google Scholar 

  105. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Meesters C, Leber M, Herold C, Angisch M, Mattheisen M, Drichel D, et al. Quick, “imputation-free” meta-analysis with proxy-SNPs. BMC Bioinformatics. 2012;13:231.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Jiang Y, Chen S, McGuire D, Chen F, Liu M, Iacono WG, et al. Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes. PLoS Genet. 2018;14(7): e1007452.

    Article  PubMed  PubMed Central  Google Scholar 

  108. Jiang W, Yu W. Jointly determining significance levels of primary and replication studies by controlling the false discovery rate in two-stage genome-wide association studies. Stat Methods Med Res. 2018;27(9):2795–808.

    Article  PubMed  Google Scholar 

  109. Jiang W, Yu W. Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies. Bioinformatics. 2017;33(4):500–7.

    Article  CAS  PubMed  Google Scholar 

  110. Jiang W, Xue JH, Yu W. What is the probability of replicating a statistically significant association in genome-wide association studies? Brief Bioinform. 2017;18(6):928–39.

    PubMed  Google Scholar 

  111. Xie Y, Zhai S, Jiang W, Zhao H, Mehrotra DV, Shen J. Statistical assessment of biomarker replicability using MAJAR method. Stat Methods Med Res. 2023;32(10):1961–72.

    Article  PubMed  Google Scholar 

  112. de Vlaming R, Okbay A, Rietveld CA, Johannesson M, Magnusson PK, Uitterlinden AG, et al. Meta-GWAS Accuracy and Power (MetaGAP) Calculator Shows that Hiding Heritability Is Partially Due to Imperfect Genetic Correlations across Studies. PLoS Genet. 2017;13(1): e1006495.

    Article  PubMed  PubMed Central  Google Scholar 

  113. Province MA, Borecki IB. A correlated meta-analysis strategy for data mining "OMIC" scans. Pac Symp Biocomput. 2013:236–46.

  114. Segrè AV, Groop L, Mootha VK, Daly MJ, Altshuler D. Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet. 2010;6(8).

  115. Sun J, Lyu R, Deng L, Li Q, Zhao Y, Zhang Y. SMetABF: A rapid algorithm for Bayesian GWAS meta-analysis with a large number of studies included. PLoS Comput Biol. 2022;18(3): e1009948.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Trochet H, Pirinen M, Band G, Jostins L, McVean G, Spencer CCA. Bayesian meta-analysis across genome-wide association studies of diverse phenotypes. Genet Epidemiol. 2019;43(5):532–47.

    Article  PubMed  Google Scholar 

  117. Baselmans BML, Jansen R, Ip HF, van Dongen J, Abdellaoui A, van de Weijer MP, et al. Multivariate genome-wide analyses of the well-being spectrum. Nat Genet. 2019;51(3):445–51.

    Article  CAS  PubMed  Google Scholar 

  118. Cichonska A, Rousu J, Marttinen P, Kangas AJ, Soininen P, Lehtimäki T, et al. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis. Bioinformatics. 2016;32(13):1981–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Zhu X, Feng T, Tayo BO, Liang J, Young JH, Franceschini N, et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am J Hum Genet. 2015;96(1):21–36.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Ray D, Boehnke M. Methods for meta-analysis of multiple traits using GWAS summary statistics. Genet Epidemiol. 2018;42(2):134–45.

    Article  PubMed  Google Scholar 

  121. Baghfalaki T, Sugier PE, Truong T, Pettitt AN, Mengersen K, Liquet B. Bayesian meta-analysis models for cross cancer genomic investigation of pleiotropic effects using group structure. Stat Med. 2021;40(6):1498–518.

    Article  PubMed  Google Scholar 

  122. John M, Lencz T, Malhotra AK, Correll CU, Zhang JP. A simulations approach for meta-analysis of genetic association studies based on additive genetic model. Meta Gene. 2018;16:143–64.

    Article  PubMed  PubMed Central  Google Scholar 

  123. Nasirigerdeh R, Torkzadehmahani R, Matschinske J, Frisch T, List M, Späth J, et al. sPLINK: a hybrid federated tool as a robust alternative to meta-analysis in genome-wide association studies. Genome Biol. 2022;23(1):32.

    Article  PubMed  PubMed Central  Google Scholar 

  124. Coram MA, Candille SI, Duan Q, Chan KH, Li Y, Kooperberg C, et al. Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach. Am J Hum Genet. 2015;96(5):740–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  125. Tenesa A, Haley CS. The heritability of human disease: estimation, uses and abuses. Nat Rev Genet. 2013;14(2):139–49.

    Article  CAS  PubMed  Google Scholar 

  126. Visscher PM, Hill WG, Wray NR. Heritability in the genomics era–concepts and misconceptions. Nat Rev Genet. 2008;9(4):255–66.

    Article  CAS  PubMed  Google Scholar 

  127. Barry CS, Walker VM, Cheesman R, Davey Smith G, Morris TT, Davies NM. How to estimate heritability: a guide for genetic epidemiologists. Int J Epidemiol. 2023;52(2):624–32.

    Article  PubMed  Google Scholar 

  128. Zaitlen N, Kraft P. Heritability in the genome-wide association era. Hum Genet. 2012;131(10):1655–64.

    Article  PubMed  PubMed Central  Google Scholar 

  129. So HC, Gui AH, Cherny SS, Sham PC. Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases. Genet Epidemiol. 2011;35(5):310–7.

    Article  PubMed  Google Scholar 

  130. So HC, Li M, Sham PC. Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study. Genet Epidemiol. 2011;35(6):447–56.

    PubMed  Google Scholar 

  131. Palla L, Dudbridge F. A Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait. Am J Hum Genet. 2015;97(2):250–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Shi H, Kichaev G, Pasaniuc B. Contrasting the Genetic Architecture of 30 Complex Traits from Summary Association Data. Am J Hum Genet. 2016;99(1):139–53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Patterson N, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47(3):291–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Song S, Jiang W, Zhang Y, Hou L, Zhao H. Leveraging LD eigenvalue regression to improve the estimation of SNP heritability and confounding inflation. Am J Hum Genet. 2022;109(5):802–11.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  135. Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh PR, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet. 2015;47(11):1228–35.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Speed D, Balding DJ. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat Genet. 2019;51(2):277–84.

    Article  CAS  PubMed  Google Scholar 

  137. Li H, Mazumder R, Lin X. Accurate and efficient estimation of local heritability using summary statistics and the linkage disequilibrium matrix. Nat Commun. 2023;14(1):7954.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Laville V, Bentley AR, Privé F, Zhu X, Gauderman J, Winkler TW, et al. VarExp: estimating variance explained by genome-wide GxE summary statistics. Bioinformatics. 2018;34(19):3412–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Shin J, Lee SH. GxEsum: a novel approach to estimate the phenotypic variance explained by genome-wide GxE interaction based on GWAS summary statistics for biobank-scale data. Genome Biol. 2021;22(1):183.

    Article  PubMed  PubMed Central  Google Scholar 

  140. Song L, Liu A, Shi J. SummaryAUC: a tool for evaluating the performance of polygenic risk prediction models in validation datasets with only summary level statistics. Bioinformatics. 2019;35(20):4038–44.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. Chan TF, Rui X, Conti DV, Fornage M, Graff M, Haessler J, et al. Estimating heritability explained by local ancestry and evaluating stratification bias in admixture mapping from summary statistics. Am J Hum Genet. 2023;110(11):1853–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Zhang Y, Qi G, Park JH, Chatterjee N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat Genet. 2018;50(9):1318–26.

    Article  CAS  PubMed  Google Scholar 

  143. López-Cortegano E, Caballero A. GWEHS: A Genome-Wide Effect Sizes and Heritability Screener. Genes (Basel). 2019;10(8).

  144. O’Connor LJ. The distribution of common-variant effect sizes. Nat Genet. 2021;53(8):1243–9.

    Article  CAS  PubMed  Google Scholar 

  145. Holland D, Frei O, Desikan R, Fan CC, Shadrin AA, Smeland OB, et al. Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model. PLoS Genet. 2020;16(5): e1008612.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  146. Yao DW, O’Connor LJ, Price AL, Gusev A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat Genet. 2020;52(6):626–33.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Siewert-Rocks KM, Kim SS, Yao DW, Shi H, Price AL. Leveraging gene co-regulation to identify gene sets enriched for disease heritability. Am J Hum Genet. 2022;109(3):393–404.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Neale BM, Sham PC. The future of association studies: gene-based analysis and replication. Am J Hum Genet. 2004;75(3):353–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83(3):311–21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89(1):82–93.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  151. Chapman J, Whittaker J. Analysis of multiple SNPs in a candidate gene or region. Genet Epidemiol. 2008;32(6):560–6.

    Article  PubMed  PubMed Central  Google Scholar 

  152. Lee D, Williamson VS, Bigdeli TB, Riley BP, Fanous AH, Vladimirov VI, et al. JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. Bioinformatics. 2015;31(8):1176–82.

    Article  PubMed  Google Scholar 

  153. Yang J, Ferreira T, Morris AP, Medland SE, Madden PA, Heath AC, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet. 2012;44(4):369–75, s1–3.

  154. Li M, Jiang L, Mak TSH, Kwan JSH, Xue C, Chen P, et al. A powerful conditional gene-based association approach implicated functionally important genes for schizophrenia. Bioinformatics. 2019;35(4):628–35.

    Article  CAS  PubMed  Google Scholar 

  155. Li MX, Gui HS, Kwan JS, Sham PC. GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet. 2011;88(3):283–93.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  156. Bakshi A, Zhu Z, Vinkhuyzen AA, Hill WD, McRae AF, Visscher PM, et al. Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits. Sci Rep. 2016;6:32894.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. de Leeuw CA, Mooij JM, Heskes T, Posthuma D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol. 2015;11(4): e1004219.

    Article  PubMed  PubMed Central  Google Scholar 

  158. Yang A, Chen J, Zhao XM. nMAGMA: a network-enhanced method for inferring risk genes from GWAS summary statistics and its application to schizophrenia. Brief Bioinform. 2021;22(4).

  159. Sey NYA, Pratt BM, Won H. Annotating genetic variants to target genes using H-MAGMA. Nat Protoc. 2023;18(1):22–35.

    Article  CAS  PubMed  Google Scholar 

  160. Gerring ZF, Mina-Vargas A, Gamazon ER, Derks EM. E-MAGMA: an eQTL-informed method to identify risk genes using genome-wide association study summary statistics. Bioinformatics. 2021;37(16):2245–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  161. Wang R, Lin DY, Jiang Y. EPIC: Inferring relevant cell types for complex traits by integrating genome-wide association studies and single-cell RNA sequencing. PLoS Genet. 2022;18(6): e1010251.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  162. Quick C, Wen X, Abecasis G, Boehnke M, Kang HM. Integrating comprehensive functional annotations to boost power and accuracy in gene-based association analysis. PLoS Genet. 2020;16(12): e1009060.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. Yurko R, Roeder K, Devlin B, G'Sell M. An approach to gene-based testing accounting for dependence of tests among nearby genes. Brief Bioinform. 2021;22(6).

  164. Vsevolozhskaya OA, Shi M, Hu F, Zaykin DV. DOT: Gene-set analysis by combining decorrelated association statistics. PLoS Comput Biol. 2020;16(4): e1007819.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  165. Zhang J, Zhao Z, Guo X, Guo B, Wu B. Powerful statistical method to detect disease-associated genes using publicly available genome-wide association studies summary data. Genet Epidemiol. 2019;43(8):941–51.

    Article  PubMed  Google Scholar 

  166. Chen X, Zhang H, Liu M, Deng HW, Wu Z. Simultaneous detection of novel genes and SNPs by adaptive p-value combination. Front Genet. 2022;13:1009428.

    Article  PubMed  PubMed Central  Google Scholar 

  167. Zhang J, Guo X, Gonzales S, Yang J, Wang X. TS: a powerful truncated test to detect novel disease associated genes using publicly available gWAS summary data. BMC Bioinformatics. 2020;21(1):172.

    Article  PubMed  PubMed Central  Google Scholar 

  168. Kwak IY, Pan W. Gene- and pathway-based association tests for multiple traits with GWAS summary statistics. Bioinformatics. 2017;33(1):64–71.

    Article  CAS  PubMed  Google Scholar 

  169. Guo B, Wu B. Statistical methods to detect novel genetic variants using publicly available GWAS summary data. Comput Biol Chem. 2018;74:76–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  170. Wang M, Huang J, Liu Y, Ma L, Potash JB, Han S. COMBAT: A Combined Association Test for Genes Using Summary Statistics. Genetics. 2017;207(3):883–91.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  171. Shao Z, Wang T, Qiao J, Zhang Y, Huang S, Zeng P. A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies. BMC Bioinformatics. 2022;23(1):359.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  172. Zhang J, Liang X, Gonzales S, Liu J, Gao XR, Wang X. A gene based combination test using GWAS summary data. BMC Bioinformatics. 2023;24(1):2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  173. He Z, Xu B, Lee S, Ionita-Laza I. Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data. Am J Hum Genet. 2017;101(3):340–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  174. Liu Y, Chen S, Li Z, Morrison AC, Boerwinkle E, Lin X. ACAT: A Fast and Powerful p Value Combination Method for Rare-Variant Analysis in Sequencing Studies. Am J Hum Genet. 2019;104(3):410–21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  175. Li MX, Kwan JS, Sham PC. HYST: a hybrid set-based test for genome-wide association studies, with application to protein-protein interaction-based association analysis. Am J Hum Genet. 2012;91(3):478–88.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  176. Sun R, Lin X. Genetic Variant Set-Based Tests Using the Generalized Berk-Jones Statistic with Application to a Genome-Wide Association Study of Breast Cancer. J Am Stat Assoc. 2020;115(531):1079–91.

    Article  CAS  PubMed  Google Scholar 

  177. Berrandou TE, Balding D, Speed D. LDAK-GBAT: Fast and powerful gene-based association testing using summary statistics. Am J Hum Genet. 2023;110(1):23–9.

    Article  CAS  PubMed  Google Scholar 

  178. Mei H, Li L, Jiang F, Simino J, Griswold M, Mosley T, et al. snpGeneSets: An R Package for Genome-Wide Study Annotation. G3 (Bethesda). 2016;6(12):4087–95.

  179. Krefl D, Brandulas Cammarata A, Bergmann S. PascalX: a Python library for GWAS gene and pathway enrichment tests. Bioinformatics. 2023;39(5).

  180. Lamparter D, Marbach D, Rueedi R, Kutalik Z, Bergmann S. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics. PLoS Comput Biol. 2016;12(1): e1004714.

    Article  PubMed  PubMed Central  Google Scholar 

  181. Nameki R, Shetty A, Dareng E, Tyrer J, Lin X, Pharoah P, et al. chromMAGMA: regulatory element-centric interrogation of risk variants. Life Sci Alliance. 2022;5(10).

  182. Watanabe K, Taskesen E, van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun. 2017;8(1):1826.

    Article  PubMed  PubMed Central  Google Scholar 

  183. Yang Y, Basu S, Zhang L. A Bayesian hierarchically structured prior for gene-based association testing with multiple traits in genome-wide association studies. Genet Epidemiol. 2022;46(1):63–72.

    Article  PubMed  Google Scholar 

  184. Wang K, Li M, Bucan M. Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet. 2007;81(6):1278–83.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  185. Mooney MA, Nigg JT, McWeeney SK, Wilmot B. Functional and genomic context in pathway analysis of GWAS data. Trends Genet. 2014;30(9):390–400.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  186. Pers TH. Gene set analysis for interpreting genetic studies. Hum Mol Genet. 2016;25(R2):R133–40.

    Article  CAS  PubMed  Google Scholar 

  187. Wang L, Jia P, Wolfinger RD, Chen X, Zhao Z. Gene set analysis of genome-wide association studies: methodological issues and perspectives. Genomics. 2011;98(1):1–8.

    Article  CAS  PubMed  Google Scholar 

  188. Zhang K, Cui S, Chang S, Zhang L, Wang J. i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study. Nucleic Acids Res. 2010;38(Web Server issue):W90–5.

  189. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44(W1):W90–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  190. Kolberg L, Raudvere U, Kuzmin I, Adler P, Vilo J, Peterson H. g:Profiler-interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res. 2023;51(W1):W207–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  191. Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022;50(W1):W216–21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  192. Liao Y, Wang J, Jaehnig EJ, Shi Z, Zhang B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Res. 2019;47(W1):W199-w205.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  193. Mi H, Ebert D, Muruganujan A, Mills C, Albou LP, Mushayamaha T, et al. PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API. Nucleic Acids Res. 2021;49(D1):D394-d403.

    Article  CAS  PubMed  Google Scholar 

  194. Yoon S, Nguyen HCT, Yoo YJ, Kim J, Baik B, Kim S, et al. Efficient pathway enrichment and network analysis of GWAS summary data using GSA-SNP2. Nucleic Acids Res. 2018;46(10): e60.

    Article  PubMed  PubMed Central  Google Scholar 

  195. Wu C, Pan W. Integrating eQTL data with GWAS summary statistics in pathway-based analysis with application to schizophrenia. Genet Epidemiol. 2018;42(3):303–16.

    Article  PubMed  PubMed Central  Google Scholar 

  196. Zhu S, Qian T, Hoshida Y, Shen Y, Yu J, Hao K. GIGSEA: genotype imputed gene set enrichment analysis using GWAS summary level data. Bioinformatics. 2019;35(1):160–3.

    Article  CAS  PubMed  Google Scholar 

  197. Pei G, Dai Y, Zhao Z, Jia P. deTS: tissue-specific enrichment analysis to decode tissue specificity. Bioinformatics. 2019;35(19):3842–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  198. Jia P, Zheng S, Long J, Zheng W, Zhao Z. dmGWAS: dense module searching for genome-wide association studies in protein-protein interaction networks. Bioinformatics. 2011;27(1):95–102.

    Article  CAS  PubMed  Google Scholar 

  199. Cochran AL, Nieser KJ, Forger DB, Zöllner S, McInnis MG. Gene-set Enrichment with Mathematical Biology (GEMB). Gigascience. 2020;9(10).

  200. Cabrera CP, Navarro P, Huffman JE, Wright AF, Hayward C, Campbell H, et al. Uncovering networks from genome-wide association studies via circular genomic permutation. G3 (Bethesda). 2012;2(9):1067–75.

  201. Shim JE, Bang C, Yang S, Lee T, Hwang S, Kim CY, et al. GWAB: a web server for the network-based boosting of human genome-wide association data. Nucleic Acids Res. 2017;45(W1):W154–61.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  202. Hoppmann AS, Schlosser P, Backofen R, Lausch E, Köttgen A. GenToS: Use of Orthologous Gene Information to Prioritize Signals from Human GWAS. PLoS ONE. 2016;11(9): e0162466.

    Article  PubMed  PubMed Central  Google Scholar 

  203. Wen Y, Wang W, Guo X, Zhang F. PAPA: a flexible tool for identifying pleiotropic pathways using genome-wide association study summaries. Bioinformatics. 2016;32(6):946–8.

    Article  CAS  PubMed  Google Scholar 

  204. Amlie-Wolf A, Tang M, Mlynarski EE, Kuksa PP, Valladares O, Katanic Z, et al. INFERNO: inferring the molecular mechanisms of noncoding genetic variants. Nucleic Acids Res. 2018;46(17):8740–53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  205. Ding J, Blencowe M, Nghiem T, Ha SM, Chen YW, Li G, et al. Mergeomics 2.0: a web server for multi-omics data integration to elucidate disease networks and predict therapeutics. Nucleic Acids Res. 2021;49(W1):W375-w87.

  206. Wang QS, Huang H. Methods for statistical fine-mapping and their applications to auto-immune diseases. Semin Immunopathol. 2022;44(1):101–13.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  207. Hutchinson A, Asimit J, Wallace C. Fine-mapping genetic associations. Hum Mol Genet. 2020;29(R1):R81–8.

    Article  PubMed  PubMed Central  Google Scholar 

  208. Kichaev G, Roytman M, Johnson R, Eskin E, Lindström S, Kraft P, et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics. 2017;33(2):248–55.

    Article  CAS  PubMed  Google Scholar 

  209. Wen X, Lee Y, Luca F, Pique-Regi R. Efficient Integrative Multi-SNP Association Analysis via Deterministic Approximation of Posteriors. Am J Hum Genet. 2016;98(6):1114–29.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  210. Pickrell JK. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am J Hum Genet. 2014;94(4):559–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  211. Benner C, Spencer CC, Havulinna AS, Salomaa V, Ripatti S, Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32(10):1493–501.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  212. Hernández N, Soenksen J, Newcombe P, Sandhu M, Barroso I, Wallace C, et al. The flashfm approach for fine-mapping multiple quantitative traits. Nat Commun. 2021;12(1):6147.

    Article  PubMed  PubMed Central  Google Scholar 

  213. Karhunen V, Launonen I, Järvelin MR, Sebert S, Sillanpää MJ. Genetic fine-mapping from summary data using a nonlocal prior improves the detection of multiple causal variants. Bioinformatics. 2023;39(7).

  214. Yang Z, Wang C, Liu L, Khan A, Lee A, Vardarajan B, et al. CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses. Nat Genet. 2023;55(6):1057–65.

    Article  CAS  PubMed  Google Scholar 

  215. Chen W, Larrabee BR, Ovsyannikova IG, Kennedy RB, Haralambieva IH, Poland GA, et al. Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics. Genetics. 2015;200(3):719–36.

    Article  PubMed  PubMed Central  Google Scholar 

  216. LaPierre N, Taraszka K, Huang H, He R, Hormozdiari F, Eskin E. Identifying causal variants by fine mapping across multiple studies. PLoS Genet. 2021;17(9): e1009733.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  217. Cai M, Wang Z, Xiao J, Hu X, Chen G, Yang C. XMAP: Cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias. Nat Commun. 2023;14(1):6870.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  218. Ghosal S, Schatz MC, Venkataraman A. BEATRICE: Bayesian Fine-mapping from Summary Data using Deep Variational Inference. bioRxiv. 2023.a

  219. Li Y, Kellis M. Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases. Nucleic Acids Res. 2016;44(18): e144.

    Article  PubMed  PubMed Central  Google Scholar 

  220. Weissbrod O, Hormozdiari F, Benner C, Cui R, Ulirsch J, Gazal S, et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat Genet. 2020;52(12):1355–63.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  221. Zou Y, Carbonetto P, Wang G, Stephens M. Fine-mapping from summary data with the “Sum of Single Effects” model. PLoS Genet. 2022;18(7): e1010299.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  222. Chen S, Nunez S, Reilly MP, Foulkes AS. Bayesian variable selection for post-analytic interrogation of susceptibility loci. Biometrics. 2017;73(2):603–14.

    Article  CAS  PubMed  Google Scholar 

  223. Newcombe PJ, Conti DV, Richardson S. JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects. Genet Epidemiol. 2016;40(3):188–201.

    Article  PubMed  PubMed Central  Google Scholar 

  224. Ning Z, Lee Y, Joshi PK, Wilson JF, Pawitan Y, Shen X. A Selection Operator for Summary Association Statistics Reveals Allelic Heterogeneity of Complex Traits. Am J Hum Genet. 2017;101(6):903–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  225. Fisher V, Sebastiani P, Cupples LA, Liu CT. ANNORE: genetic fine-mapping with functional annotation. Hum Mol Genet. 2021;31(1):32–40.

    Article  PubMed  PubMed Central  Google Scholar 

  226. Zhang W, Li SY, Liu T, Li Y. Partitioning gene-based variance of complex traits by gene score regression. PLoS ONE. 2020;15(8): e0237657.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  227. Zhu X, Stephens M. BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES. Ann Appl Stat. 2017;11(3):1561–92.

    Article  PubMed  PubMed Central  Google Scholar 

  228. Deng Y, Pan W. Significance Testing for Allelic Heterogeneity. Genetics. 2018;210(1):25–32.

    Article  PubMed  PubMed Central  Google Scholar 

  229. Taylor KE, Ansel KM, Marson A, Criswell LA, Farh KK. PICS2: next-generation fine mapping via probabilistic identification of causal SNPs. Bioinformatics. 2021;37(18):3004–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  230. Schilder BM, Humphrey J, Raj T. echolocatoR: an automated end-to-end statistical and functional genomic fine-mapping pipeline. Bioinformatics. 2022;38(2):536–9.

    Article  CAS  PubMed  Google Scholar 

  231. Tyler AL, Crawford DC, Pendergrass SA. The detection and characterization of pleiotropy: discovery, progress, and promise. Brief Bioinform. 2016;17(1):13–22.

    Article  CAS  PubMed  Google Scholar 

  232. Wu P, Wang B, Lubitz SA, Benjamin EJ, Meigs JB, Dupuis J. Approximate conditional phenotype analysis based on genome wide association summary statistics. Sci Rep. 2021;11(1):2518.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  233. Conneely KN, Boehnke M. So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. Am J Hum Genet. 2007;81(6):1158–68.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  234. Taraszka K, Zaitlen N, Eskin E. Leveraging pleiotropy for joint analysis of genome-wide association studies with per trait interpretations. PLoS Genet. 2022;18(11): e1010447.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  235. Deng Y, Pan W. Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses. Genetics. 2017;207(4):1285–99.

    Article  PubMed  PubMed Central  Google Scholar 

  236. Ray D, Pankow JS, Basu S. USAT: A Unified Score-Based Association Test for Multiple Phenotype-Genotype Analysis. Genet Epidemiol. 2016;40(1):20–34.

    Article  PubMed  Google Scholar 

  237. Sitlani CM, Baldassari AR, Highland HM, Hodonsky CJ, McKnight B, Avery CL. Comparison of adaptive multiple phenotype association tests using summary statistics in genome-wide association studies. Hum Mol Genet. 2021;30(15):1371–83.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  238. Guo B, Wu B. Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach. Bioinformatics. 2019;35(13):2251–7.

    Article  CAS  PubMed  Google Scholar 

  239. Turchin MC, Stephens M. Bayesian multivariate reanalysis of large genetic studies identifies many new associations. PLoS Genet. 2019;15(10): e1008431.

    Article  PubMed  PubMed Central  Google Scholar 

  240. Bu D, Wang X, Li Q. Summary statistics-based association test for identifying the pleiotropic effects with set of genetic variants. Bioinformatics. 2023;39(4).

  241. Deng Q, Song C, Lin S. An adaptive and robust method for multi-trait analysis of genome-wide association studies using summary statistics. Eur J Hum Genet. 2023.

  242. Liu W, Xu Y, Wang A, Huang T, Liu Z. The eigen higher criticism and eigen Berk-Jones tests for multiple trait association studies based on GWAS summary statistics. Genet Epidemiol. 2022;46(2):89–104.

    Article  CAS  PubMed  Google Scholar 

  243. Svishcheva GR, Tiys ES, Elgaeva EE, Feoktistova SG, Timmers P, Sharapov SZ, et al. A Novel Framework for Analysis of the Shared Genetic Background of Correlated Traits. Genes (Basel). 2022;13(10).

  244. Qi G, Chatterjee N. Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits. PLoS Genet. 2018;14(10): e1007549.

    Article  PubMed  PubMed Central  Google Scholar 

  245. Jordan DM, Verbanck M, Do R. HOPS: a quantitative score reveals pervasive horizontal pleiotropy in human genetic variation is driven by extreme polygenicity of human traits and diseases. Genome Biol. 2019;20(1):222.

    Article  PubMed  PubMed Central  Google Scholar 

  246. Ballard JL, O’Connor LJ. Shared components of heritability across genetically correlated traits. Am J Hum Genet. 2022;109(6):989–1006.

    Article  PubMed  PubMed Central  Google Scholar 

  247. Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ, Fontana MA, et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat Genet. 2018;50(2):229–37.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  248. Lee CH, Shi H, Pasaniuc B, Eskin E, Han B. PLEIO: a method to map and interpret pleiotropic loci with GWAS summary statistics. Am J Hum Genet. 2021;108(1):36–48.

    Article  CAS  PubMed  Google Scholar 

  249. Guo B, Wu B. Powerful and efficient SNP-set association tests across multiple phenotypes using GWAS summary data. Bioinformatics. 2019;35(8):1366–72.

    Article  CAS  PubMed  Google Scholar 

  250. Dutta D, Scott L, Boehnke M, Lee S. Multi-SKAT: General framework to test for rare-variant association with multiple phenotypes. Genet Epidemiol. 2019;43(1):4–23.

    Article  PubMed  Google Scholar 

  251. Van der Sluis S, Dolan CV, Li J, Song Y, Sham P, Posthuma D, et al. MGAS: a powerful tool for multivariate gene-based genome-wide association analysis. Bioinformatics. 2015;31(7):1007–15.

    Article  PubMed  Google Scholar 

  252. Wang T, Lu H, Zeng P. Identifying pleiotropic genes for complex phenotypes with summary statistics from a perspective of composite null hypothesis testing. Brief Bioinform. 2022;23(1).

  253. Luo L, Shen J, Zhang H, Chhibber A, Mehrotra DV, Tang ZZ. Multi-trait analysis of rare-variant association summary statistics using MTAR. Nat Commun. 2020;11(1):2850.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  254. Zeng P, Hao X, Zhou X. Pleiotropic mapping and annotation selection in genome-wide association studies with penalized Gaussian mixture models. Bioinformatics. 2018;34(16):2797–807.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  255. Deng Q, Gupta A, Jeon H, Nam JH, Yilmaz AS, Chang W, et al. graph-GPA 2.0: improving multi-disease genetic analysis with integration of functional annotation data. Front Genet. 2023;14:1079198.

  256. von Berg J, Ten Dam M, van der Laan SW, de Ridder J. PolarMorphism enables discovery of shared genetic variants across multiple traits from GWAS summary statistics. Bioinformatics. 2022;38(Suppl 1):i212–9.

    Article  Google Scholar 

  257. Julienne H, Laville V, McCaw ZR, He Z, Guillemot V, Lasry C, et al. Multitrait GWAS to connect disease variants and biological mechanisms. PLoS Genet. 2021;17(8): e1009713.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  258. Pickrell JK, Berisa T, Liu JZ, Ségurel L, Tung JY, Hinds DA. Detection and interpretation of shared genetic influences on 42 human traits. Nat Genet. 2016;48(7):709–17.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  259. Zhang Z, Jung J, Kim A, Suboc N, Gazal S, Mancuso N. A scalable approach to characterize pleiotropy across thousands of human diseases and complex traits using GWAS summary statistics. Am J Hum Genet. 2023;110(11):1863–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  260. Zilinskas R, Li C, Shen X, Pan W, Yang T. Inferring a directed acyclic graph of phenotypes from GWAS summary statistics. bioRxiv. 2023.

  261. Yin L, Chau CK, Lin YP, Rao S, Xiang Y, Sham PC, et al. A framework to decipher the genetic architecture of combinations of complex diseases: applications in cardiovascular medicine. Bioinformatics. 2021;37(22):4137–47.

    Article  CAS  PubMed  Google Scholar 

  262. Asgari Y, Sugier PE, Baghfalaki T, Lucotte E, Karimi M, Sedki M, et al. GCPBayes pipeline: a tool for exploring pleiotropy at the gene level. NAR Genom Bioinform. 2023;5(3):lqad065.

  263. Liu J, Wan X, Ma S, Yang C. EPS: an empirical Bayes approach to integrating pleiotropy and tissue-specific information for prioritizing risk genes. Bioinformatics. 2016;32(12):1856–64.

    Article  CAS  PubMed  Google Scholar 

  264. Chung D, Yang C, Li C, Gelernter J, Zhao H. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS Genet. 2014;10(11): e1004787.

    Article  PubMed  PubMed Central  Google Scholar 

  265. Weissbrod O, Flint J, Rosset S. Estimating SNP-Based Heritability and Genetic Correlation in Case-Control Studies Directly and with Summary Statistics. Am J Hum Genet. 2018;103(1):89–99.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  266. Lu Q, Li B, Ou D, Erlendsdottir M, Powles RL, Jiang T, et al. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics. Am J Hum Genet. 2017;101(6):939–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  267. Zhang Y, Lu Q, Ye Y, Huang K, Liu W, Wu Y, et al. SUPERGNOVA: local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits. Genome Biol. 2021;22(1):262.

    Article  PubMed  PubMed Central  Google Scholar 

  268. Werme J, van der Sluis S, Posthuma D, de Leeuw CA. An integrated framework for local genetic correlation analysis. Nat Genet. 2022;54(3):274–82.

    Article  CAS  PubMed  Google Scholar 

  269. Ning Z, Pawitan Y, Shen X. High-definition likelihood inference of genetic correlations across human complex traits. Nat Genet. 2020;52(8):859–64.

    Article  CAS  PubMed  Google Scholar 

  270. Brown BC, Ye CJ, Price AL, Zaitlen N. Transethnic Genetic-Correlation Estimates from Summary Statistics. Am J Hum Genet. 2016;99(1):76–88.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  271. Gao B, Yang C, Liu J, Zhou X. Accurate genetic and environmental covariance estimation with composite likelihood in genome-wide association studies. PLoS Genet. 2021;17(1): e1009293.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  272. Zheng J, Richardson TG, Millard LAC, Hemani G, Elsworth BL, Raistrick CA, et al. PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics. Gigascience. 2018;7(8).

  273. Ming J, Wang T, Yang C. LPM: a latent probit model to characterize the relationship among complex traits using summary statistics from multiple GWASs and functional annotations. Bioinformatics. 2020;36(8):2506–14.

    Article  CAS  PubMed  Google Scholar 

  274. Peyrot WJ, Price AL. Identifying loci with different allele frequencies among cases of eight psychiatric disorders using CC-GWAS. Nat Genet. 2021;53(4):445–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  275. Mancuso N, Shi H, Goddard P, Kichaev G, Gusev A, Pasaniuc B. Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. Am J Hum Genet. 2017;100(3):473–87.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  276. Guo H, Li JJ, Lu Q, Hou L. Detecting local genetic correlations with scan statistics. Nat Commun. 2021;12(1):2033.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  277. Wu Y, Zhong X, Lin Y, Zhao Z, Chen J, Zheng B, et al. Estimating genetic nurture with summary statistics of multigenerational genome-wide association studies. Proc Natl Acad Sci U S A. 2021;118(25).

  278. Smith GD, Ebrahim S. Mendelian randomization: prospects, potentials, and limitations. Int J Epidemiol. 2004;33(1):30–42.

    Article  PubMed  Google Scholar 

  279. Didelez V, Sheehan N. Mendelian randomization as an instrumental variable approach to causal inference. Stat Methods Med Res. 2007;16(4):309–30.

    Article  PubMed  Google Scholar 

  280. Thompson JR, Minelli C, Abrams KR, Tobin MD, Riley RD. Meta-analysis of genetic studies using Mendelian randomization–a multivariate approach. Stat Med. 2005;24(14):2241–54.

    Article  PubMed  Google Scholar 

  281. Bowden J, Holmes MV. Meta-analysis and Mendelian randomization: A review. Res Synth Methods. 2019;10(4):486–96.

    Article  PubMed  PubMed Central  Google Scholar 

  282. Kraft P, Chen H, Lindström S. The Use Of Genetic Correlation And Mendelian Randomization Studies To Increase Our Understanding of Relationships Between Complex Traits. Curr Epidemiol Rep. 2020;7(2):104–12.

    Article  PubMed  PubMed Central  Google Scholar 

  283. Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator. Genet Epidemiol. 2016;40(4):304–14.

    Article  PubMed  PubMed Central  Google Scholar 

  284. Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018;7.

  285. Burgess S, Foley CN, Allara E, Staley JR, Howson JMM. A robust and efficient method for Mendelian randomization with hundreds of genetic variants. Nat Commun. 2020;11(1):376.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  286. Zhao J, Ming J, Hu X, Chen G, Liu J, Yang C. Bayesian weighted Mendelian randomization for causal inference based on summary statistics. Bioinformatics. 2020;36(5):1501–8.

    Article  CAS  PubMed  Google Scholar 

  287. Xu S, Wang P, Fung WK, Liu Z. A novel penalized inverse-variance weighted estimator for Mendelian randomization with applications to COVID-19 outcomes. Biometrics. 2023;79(3):2184–95.

    Article  PubMed  Google Scholar 

  288. Qi G, Chatterjee N. Mendelian randomization analysis using mixture models for robust and efficient estimation of causal effects. Nat Commun. 2019;10(1):1941.

    Article  PubMed  PubMed Central  Google Scholar 

  289. Xue H, Shen X, Pan W. Constrained maximum likelihood-based Mendelian randomization robust to both correlated and uncorrelated pleiotropic effects. Am J Hum Genet. 2021;108(7):1251–69.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  290. Cheng Q, Yang Y, Shi X, Yeung KF, Yang C, Peng H, et al. MR-LDP: a two-sample Mendelian randomization for GWAS summary statistics accounting for linkage disequilibrium and horizontal pleiotropy. NAR Genom Bioinform. 2020;2(2):lqaa028.

  291. Cheng Q, Qiu T, Chai X, Sun B, Xia Y, Shi X, et al. MR-Corr2: a two-sample Mendelian randomization method that accounts for correlated horizontal pleiotropy using correlated instrumental variants. Bioinformatics. 2022;38(2):303–10.

    Article  CAS  PubMed  Google Scholar 

  292. Verbanck M, Chen CY, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet. 2018;50(5):693–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  293. Zhu X, Li X, Xu R, Wang T. An iterative approach to detect pleiotropy and perform Mendelian Randomization analysis using GWAS summary statistics. Bioinformatics. 2021;37(10):1390–400.

    Article  CAS  PubMed  Google Scholar 

  294. Hu X, Zhao J, Lin Z, Wang Y, Peng H, Zhao H, et al. Mendelian randomization for causal inference accounting for pleiotropy and sample structure using genome-wide summary statistics. Proc Natl Acad Sci U S A. 2022;119(28): e2106858119.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  295. Mounier N, Kutalik Z. Bias correction for inverse variance weighting Mendelian randomization. Genet Epidemiol. 2023;47(4):314–31.

    Article  CAS  PubMed  Google Scholar 

  296. Cheng Q, Zhang X, Chen LS, Liu J. Mendelian randomization accounting for complex correlated horizontal pleiotropy while elucidating shared genetic etiology. Nat Commun. 2022;13(1):6490.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  297. Ding M. A Two-stage Linear Mixed Model (TS-LMM) for Summary-data-based Multivariable Mendelian Randomization. medRxiv. 2023.

  298. O’Connor LJ, Price AL. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nat Genet. 2018;50(12):1728–34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  299. Wang L, Gao B, Fan Y, Xue F, Zhou X. Mendelian randomization under the omnigenic architecture. Brief Bioinform. 2021;22(6).

  300. Gkatzionis A, Burgess S, Conti DV, Newcombe PJ. Bayesian variable selection with a pleiotropic loss function in Mendelian randomization. Stat Med. 2021;40(23):5025–45.

    Article  PubMed  PubMed Central  Google Scholar 

  301. Xue H, Pan W. Inferring causal direction between two traits in the presence of horizontal pleiotropy with GWAS summary data. PLoS Genet. 2020;16(11): e1009105.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  302. Xue H, Pan W. Robust inference of bi-directional causal relationships in presence of correlated pleiotropy with GWAS summary data. PLoS Genet. 2022;18(5): e1010205.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  303. Liu Z, Qin Y, Wu T, Tubbs JD, Baum L, Mak TSH, et al. Reciprocal causation mixture model for robust Mendelian randomization analysis using genome-scale summary data. Nat Commun. 2023;14(1):1131.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  304. Darrous L, Mounier N, Kutalik Z. Simultaneous estimation of bi-directional causal effects and heritable confounding from GWAS summary statistics. Nat Commun. 2021;12(1):7274.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  305. Zuber V, Lewin A, Levin MG, Haglund A, Ben-Aicha S, Emanueli C, et al. Multi-response Mendelian randomization: Identification of shared and distinct exposures for multimorbidity and multiple related disease outcomes. Am J Hum Genet. 2023;110(7):1177–99.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  306. Sanderson E, Davey Smith G, Windmeijer F, Bowden J. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int J Epidemiol. 2019;48(3):713–27.

    Article  PubMed  Google Scholar 

  307. Lorincz-Comi N, Yang Y, Li G, Zhu X. MRBEE: A novel bias-corrected multivariable Mendelian Randomization method. bioRxiv. 2023.

  308. Lin Z, Xue H, Pan W. Robust multivariable Mendelian randomization based on constrained maximum likelihood. Am J Hum Genet. 2023;110(4):592–605.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  309. Jin C, Lee B, Shen L, Long Q. Integrating multi-omics summary data using a Mendelian randomization framework. Brief Bioinform. 2022;23(6).

  310. Zuber V, Colijn JM, Klaver C, Burgess S. Selecting likely causal risk factors from high-throughput experiments using multivariable Mendelian randomization. Nat Commun. 2020;11(1):29.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  311. Jiang L, Xu S, Mancuso N, Newcombe PJ, Conti DV. A Hierarchical Approach Using Marginal Summary Statistics for Multiple Intermediates in a Mendelian Randomization or Transcriptome Analysis. Am J Epidemiol. 2021;190(6):1148–58.

    Article  PubMed  PubMed Central  Google Scholar 

  312. Zhao Q, Chen Y, Wang J, Small DS. Powerful three-sample genome-wide design and robust statistical inference in summary-data Mendelian randomization. Int J Epidemiol. 2019;48(5):1478–92.

    Article  PubMed  Google Scholar 

  313. Fan Q, Zhang F, Wang W, Xu J, Hao J, He A, et al. GWAS summary-based pathway analysis correcting for the genetic confounding impact of environmental exposures. Brief Bioinform. 2018;19(5):725–30.

    Article  CAS  PubMed  Google Scholar 

  314. Mai J, Lu M, Gao Q, Zeng J, Xiao J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Commun Biol. 2023;6(1):899.

    Article  PubMed  PubMed Central  Google Scholar 

  315. Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48(5):481–7.

    Article  CAS  PubMed  Google Scholar 

  316. Yuan Z, Zhu H, Zeng P, Yang S, Sun S, Yang C, et al. Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies. Nat Commun. 2020;11(1):3861.

    Article  PubMed  PubMed Central  Google Scholar 

  317. Xue H, Shen X, Pan W. Causal Inference in Transcriptome-Wide Association Studies with Invalid Instruments and GWAS Summary Data. J Am Stat Assoc. 2023;118(543):1525–37.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  318. Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BW, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016;48(3):245–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  319. Barbeira AN, Dickinson SP, Bonazzola R, Zheng J, Wheeler HE, Torres JM, et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun. 2018;9(1):1825.

    Article  PubMed  PubMed Central  Google Scholar 

  320. Xu Z, Wu C, Wei P, Pan W. A Powerful Framework for Integrating eQTL and GWAS Summary Data. Genetics. 2017;207(3):893–902.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  321. Barfield R, Feng H, Gusev A, Wu L, Zheng W, Pasaniuc B, et al. Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet Epidemiol. 2018;42(5):418–33.

    Article  PubMed  PubMed Central  Google Scholar 

  322. Rojo C, Zhang Q, Keleş S. iFunMed: Integrative functional mediation analysis of GWAS and eQTL studies. Genet Epidemiol. 2019;43(7):742–60.

    Article  PubMed  PubMed Central  Google Scholar 

  323. Dong X, Su YR, Barfield R, Bien SA, He Q, Harrison TA, et al. A general framework for functionally informed set-based analysis: Application to a large-scale colorectal cancer study. PLoS Genet. 2020;16(8): e1008947.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  324. Zhang Y, Quick C, Yu K, Barbeira A, Luca F, Pique-Regi R, et al. PTWAS: investigating tissue-relevant causal molecular mechanisms of complex traits using probabilistic TWAS analysis. Genome Biol. 2020;21(1):232.

    Article  PubMed  PubMed Central  Google Scholar 

  325. Yang Y, Yeung KF, Liu J. CoMM-S(4): A Collaborative Mixed Model Using Summary-Level eQTL and GWAS Datasets in Transcriptome-Wide Association Studies. Front Genet. 2021;12: 704538.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  326. Shi X, Chai X, Yang Y, Cheng Q, Jiao Y, Huang J, et al. A tissue-specific collaborative mixed model for jointly analyzing multiple tissues in transcriptome-wide association studies. bioRxiv. 2019:789396.

  327. Park Y, Sarkar A, Bhutani K, Kellis M. Multi-tissue polygenic models for transcriptome-wide association studies. bioRxiv. 2017:107623.

  328. Feng H, Mancuso N, Gusev A, Majumdar A, Major M, Pasaniuc B, et al. Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies. PLoS Genet. 2021;17(4): e1008973.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  329. Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat Genet. 2019;51(3):568–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  330. Gleason KJ, Yang F, Pierce BL, He X, Chen LS. Primo: integration of multiple GWAS and omics QTL summary statistics for elucidation of molecular mechanisms of trait-associated SNPs and detection of pleiotropy in complex traits. Genome Biol. 2020;21(1):236.

    Article  PubMed  PubMed Central  Google Scholar 

  331. Wu Y, Qi T, Wray NR, Visscher PM, Zeng J, Yang J. Joint analysis of GWAS and multi-omics QTL summary statistics reveals a large fraction of GWAS signals shared with molecular phenotypes. Cell Genom. 2023;3(8): 100344.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  332. Zhang Z, Bae YE, Bradley JR, Wu L, Wu C. SUMMIT: An integrative approach for better transcriptomic data imputation improves causal gene identification. Nat Commun. 2022;13(1):6336.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  333. Zeng P, Dai J, Jin S, Zhou X. Aggregating multiple expression prediction models improves the power of transcriptome-wide association studies. Hum Mol Genet. 2021;30(10):939–51.

    Article  CAS  PubMed  Google Scholar 

  334. Luningham JM, Chen J, Tang S, De Jager PL, Bennett DA, Buchman AS, et al. Bayesian Genome-wide TWAS Method to Leverage both cis- and trans-eQTL Information through Summary Statistics. Am J Hum Genet. 2020;107(4):714–26.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  335. Dutta D, He Y, Saha A, Arvanitis M, Battle A, Chatterjee N. Aggregative trans-eQTL analysis detects trait-specific target gene sets in whole blood. Nat Commun. 2022;13(1):4323.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  336. Nagpal S, Meng X, Epstein MP, Tsoi LC, Patrick M, Gibson G, et al. TIGAR: An Improved Bayesian Tool for Transcriptomic Data Imputation Enhances Gene Mapping of Complex Traits. Am J Hum Genet. 2019;105(2):258–66.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  337. Chatzinakos C, Georgiadis F, Lee D, Cai N, Vladimirov VI, Docherty A, et al. TWAS pathway method greatly enhances the number of leads for uncovering the molecular underpinnings of psychiatric disorders. Am J Med Genet B Neuropsychiatr Genet. 2020;183(8):454–63.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  338. Mancuso N, Freund MK, Johnson R, Shi H, Kichaev G, Gusev A, et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat Genet. 2019;51(4):675–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  339. Zhu H, Zhou X. Transcriptome-wide association studies: a view from Mendelian randomization. Quant Biol. 2021;9(2):107–21.

    Article  PubMed  PubMed Central  Google Scholar 

  340. Zhu A, Matoba N, Wilson EP, Tapia AL, Li Y, Ibrahim JG, et al. MRLocus: Identifying causal genes mediating a trait through Bayesian estimation of allelic heterogeneity. PLoS Genet. 2021;17(4): e1009455.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  341. Porcu E, Rüeger S, Lepik K, Santoni FA, Reymond A, Kutalik Z. Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nat Commun. 2019;10(1):3300.

    Article  PubMed  PubMed Central  Google Scholar 

  342. Gleason KJ, Yang F, Chen LS. A robust two-sample transcriptome-wide Mendelian randomization method integrating GWAS with multi-tissue eQTL summary statistics. Genet Epidemiol. 2021;45(4):353–71.

    Article  PubMed  PubMed Central  Google Scholar 

  343. Al-Barghouthi BM, Rosenow WT, Du KP, Heo J, Maynard R, Mesner L, et al. Transcriptome-wide association study and eQTL colocalization identify potentially causal genes responsible for human bone mineral density GWAS associations. Elife. 2022;11.

  344. Plagnol V, Smyth DJ, Todd JA, Clayton DG. Statistical independence of the colocalized association signals for type 1 diabetes and RPS26 gene expression on chromosome 12q13. Biostatistics. 2009;10(2):327–34.

    Article  PubMed  Google Scholar 

  345. Wallace C. A more accurate method for colocalisation analysis allowing for multiple causal variants. PLoS Genet. 2021;17(9): e1009440.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  346. Giambartolomei C, Zhenli Liu J, Zhang W, Hauberg M, Shi H, Boocock J, et al. A Bayesian framework for multiple trait colocalization from summary association statistics. Bioinformatics. 2018;34(15):2538–45.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  347. Foley CN, Staley JR, Breen PG, Sun BB, Kirk PDW, Burgess S, et al. A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits. Nat Commun. 2021;12(1):764.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  348. Wang F, Panjwani N, Wang C, Sun L, Strug LJ. A flexible summary statistics-based colocalization method with application to the mucin cystic fibrosis lung disease modifier locus. Am J Hum Genet. 2022;109(2):253–69.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  349. Liu J, Wan X, Wang C, Yang C, Zhou X, Yang C. LLR: a latent low-rank approach to colocalizing genetic risk variants in multiple GWAS. Bioinformatics. 2017;33(24):3878–86.

    Article  CAS  PubMed  Google Scholar 

  350. King EA, Dunbar F, Davis JW, Degner JF. Estimating colocalization probability from limited summary statistics. BMC Bioinformatics. 2021;22(1):254.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  351. Kuksa PP, Lee CY, Amlie-Wolf A, Gangadharan P, Mlynarski EE, Chou YF, et al. SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants. Bioinformatics. 2020;36(12):3879–81.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  352. Zheng J, Haberland V, Baird D, Walker V, Haycock PC, Hurle MR, et al. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nat Genet. 2020;52(10):1122–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  353. Chen BY, Bone WP, Lorenz K, Levin M, Ritchie MD, Voight BF. ColocQuiaL: a QTL-GWAS colocalization pipeline. Bioinformatics. 2022;38(18):4409–11.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  354. Hormozdiari F, van de Bunt M, Segrè AV, Li X, Joo JWJ, Bilow M, et al. Colocalization of GWAS and eQTL Signals Detects Target Genes. Am J Hum Genet. 2016;99(6):1245–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  355. Ji Y, Wei Q, Chen R, Wang Q, Tao R, Li B. Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery. PLoS Genet. 2022;18(6): e1009814.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  356. Zhang W, Lu T, Sladek R, Li Y, Najafabadi HS, Dupuis J. SharePro: an accurate and efficient genetic colocalization method accounting for multiple causal signals. bioRxiv. 2023:2023.07.24.550431.

  357. Shi H, Burch KS, Johnson R, Freund MK, Kichaev G, Mancuso N, et al. Localizing Components of Shared Transethnic Genetic Architecture of Complex Traits from GWAS Summary Data. Am J Hum Genet. 2020;106(6):805–17.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  358. He X, Fuller CK, Song Y, Meng Q, Zhang B, Yang X, et al. Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. Am J Hum Genet. 2013;92(5):667–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  359. Panjwani N, Wang F, Mastromatteo S, Bao A, Wang C, He G, et al. LocusFocus: Web-based colocalization for the annotation and functional follow-up of GWAS. PLoS Comput Biol. 2020;16(10): e1008336.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  360. Zhang T, Klein A, Sang J, Choi J, Brown KM. ezQTL: A Web Platform for Interactive Visualization and Colocalization of QTLs and GWAS Loci. Genomics Proteomics Bioinformatics. 2022;20(3):541–8.

    Article  PubMed  PubMed Central  Google Scholar 

  361. Lamparter D, Bhatnagar R, Hebestreit K, Belgard TG, Zhang A, Hanson-Smith V. A framework for integrating directed and undirected annotations to build explanatory models of cis-eQTL data. PLoS Comput Biol. 2020;16(6): e1007770.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  362. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet. 2017;101(1):5–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  363. Schultheiss SJ, Münch MC, Andreeva GD, Rätsch G. Persistence and availability of Web services in computational biology. PLoS ONE. 2011;6(9): e24914.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  364. Veretnik S, Fink JL, Bourne PE. Computational biology resources lack persistence and usability. PLoS Comput Biol. 2008;4(7): e1000136.

    Article  PubMed  PubMed Central  Google Scholar 

  365. Wren JD. 404 not found: the stability and persistence of URLs published in MEDLINE. Bioinformatics. 2004;20(5):668–72.

    Article  CAS  PubMed  Google Scholar 

  366. Kern F, Fehlmann T, Keller A. On the lifetime of bioinformatics web services. Nucleic Acids Res. 2020;48(22):12523–33.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  367. Taschuk M, Wilson G. Ten simple rules for making research software more robust. PLoS Comput Biol. 2017;13(4): e1005412.

    Article  PubMed  PubMed Central  Google Scholar 

  368. Brazas MD, Yim D, Yeung W, Ouellette BF. A decade of Web Server updates at the Bioinformatics Links Directory: 2003–2012. Nucleic Acids Res. 2012;40(Web Server issue):W3-w12.

  369. Chakiachvili M, Milanesi S, Arigon Chifolleau AM, Lefort V. WAVES: a web application for versatile enhanced bioinformatic services. Bioinformatics. 2019;35(1):140–2.

    Article  CAS  PubMed  Google Scholar 

  370. Daniluk P, Wilczyński B, Lesyng B. WeBIAS: a web server for publishing bioinformatics applications. BMC Res Notes. 2015;8:628.

    Article  PubMed  PubMed Central  Google Scholar 

  371. Jia L, Yao W, Jiang Y, Li Y, Wang Z, Li H, et al. Development of interactive biological web applications with R/Shiny. Brief Bioinform. 2022;23(1).

  372. Joppich M, Zimmer R. From command-line bioinformatics to bioGUI PeerJ. 2019;7: e8111.

    PubMed  Google Scholar 

  373. Kadri S, Sboner A, Sigaras A, Roy S. Containers in Bioinformatics: Applications, Practical Considerations, and Best Practices in Molecular Pathology. J Mol Diagn. 2022;24(5):442–54.

    Article  CAS  PubMed  Google Scholar 

  374. Williams CL, Sica JC, Killen RT, Balis UG. The growing need for microservices in bioinformatics. J Pathol Inform. 2016;7:45.

    Article  PubMed  PubMed Central  Google Scholar 

  375. Boettiger C. An introduction to Docker for reproducible research. ACM SIGOPS Operating Systems Review. 2015;49(1):71–9.

    Article  Google Scholar 

  376. Gomes J, Bagnaschi E, Campos I, David M, Alves L, Martins J, et al. Enabling rootless Linux Containers in multi-user environments: the udocker tool. Comput Phys Commun. 2018;232:84–97.

    Article  CAS  Google Scholar 

  377. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL. The human disease network. Proc Natl Acad Sci U S A. 2007;104(21):8685–90.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  378. Kontou PI, Pavlopoulou A, Dimou NL, Pavlopoulos GA, Bagos PG. Network analysis of genes and their association with diseases. Gene. 2016;590(1):68–78.

    Article  CAS  PubMed  Google Scholar 

  379. Corrigendum to: Bipartite graphs in systems biology and medicine: a survey of methods and applications. Gigascience. 2020;9(1).

  380. Pavlopoulos GA, Secrier M, Moschopoulos CN, Soldatos TG, Kossida S, Aerts J, et al. Using graph theory to analyze biological networks. BioData Min. 2011;4:10.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers whose comments and constructive criticism helped in improving the quality of the manuscript.

Funding

This work is funded by the project “Bridging big omic, genetic and medical data for Precision Medicine implementation in Greece” (TAEDR-0539180) which is carried out within the framework of the National Recovery and Resilience Plan Greece 2.0, funded by the European Union –NextGenerationEU.

Author information

Authors and Affiliations

Authors

Contributions

PK: Investigation, Methodology, Data Curation, Visualization. PB: Conceptualization, Supervision, Investigation, Methodology, Data Curation, Visualization. PK and PB wrote parts of the manuscript and have read and approved the final manuscript.

Corresponding author

Correspondence to Pantelis G. Bagos.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kontou, P.I., Bagos, P.G. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Mining 17, 31 (2024). https://doi.org/10.1186/s13040-024-00385-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13040-024-00385-x

Keyword