Required Input Data Frames | ||
GWAS.df, a data frame, one row per SNP, with columns as one might obtain from a genome-wide association study performed in PLINK using either the --logistic or --linear flags | ||
Column Name | Data type | Description |
CHR | Integer | Chromosome for SNP (sex chromosomes coded numerically) |
BP | Integer | Chromosomal position for each SNP, in base pairs |
SNP | Character | Variant ID (such as dbSNP ID “rs...”. (Note: naming scheme must be the same as what is used in the eQTL.df to ensure proper SNP matching) |
P | Numeric | P-value for the SNP from GWAS analysis |
BETA | Numeric | Beta for SNP from GWAS analysis |
PHE (Optional) | Character | Name of the phenotype for which the GWAS data refers. This column is optional and is useful if your GWAS.df contains data for multiple phenotypes, such as one might obtain from a PheWAS. If GWAS.df does not contain a “PHE” column, eQTpLot will assume all the supplied GWAS data is for a single phenotype, with a name to be specified with the “trait” argument. |
eQTL.df, a data frame, one row per SNP, with columns as one might download directly from the GTEx Portal in .csv format | ||
Column Name | Data type | Description |
SNP.Id | Character | Variant ID (such as dbSNP ID “rs...”. (Note: naming scheme must be the same as what is used in the GWAS.df to ensure proper matching). |
Gene.Symbol | Character | Gene symbol to which the eQTL expression data refers (Note: gene symbol must match entries in Genes.df to ensure proper matching) |
P.value | Numeric | P-value for the SNP from eQTL analysis |
NES | Numeric | Normalized effect size for the SNP from eQTL analysis (Per GTEx, defined as the slope of the linear regression, and is computed as the effect of the alternative allele relative to the reference allele in the human genome reference. |
Tissue | Character | Tissue type to which the eQTL pvalue/NES refer (Note: eQTL.df can contain multiple tissue types) |
N (Optional) | Numeric | The number of samples used to calculate the p-value and NES for the eQTL data. This value is used if performing a MultiTissue or PanTissue analysis with the option CollapseMethod set to “meta” for a simple sample size weighted meta-analysis. |
Optional Input Data Frames | ||
Genes.df, an optional data frame, one row per gene, with the following columns (Note: eQTpLot automatically loads a default Genes.df containing information for most protein-coding genes for genomic builds hg19 and hg38, but you may wish to specify our own Genes.df data frame if your gene of interest is not included in the default data frame, or if your eQTL data uses a different gene naming scheme (for example, Gencode ID instead of gene symbol)) | ||
Column Name | Data type | Description |
Gene | Character | Gene symbol/name (Note: gene naming scheme must match entries in eQTL.df to ensure proper matching) |
CHR | Integer | Chromosome the gene is on (Note: do not include a “chr” prefix, and sex chromosomes should be coded numerically) |
Start | Integer | Base pair coordinate of the beginning of the gene (Note: this should be the smaller of the two values between Start and Stop) |
Stop | Integer | Base pair coordinate of the end of the gene (Note: this should be the larger of the two values between Start and Stop) |
Build | Character, “hg19” or “hg38” | The genome build (either hg19 or hg38) for the location data |
LD.df, an optional data frame of SNP linkage data, one row per SNP pair, with columns as one might obtain from a PLINK linkage disequilibrium analysis using the PLINK --r2 option. (Note: If no LD.df is supplied, eQTpLot will plot data without LD information) | ||
Column Name | Data type | Description |
BP_A | Integer | Base pair position of the first variant in the LD pair |
SNP_A | Character | Variant ID of the first variant in the LD pair (Note: only variants that also appear in the GWAS.df SNP column will be used for LD analysis) |
BP_B | Integer | Base pair position of the second variant in the LD pair |
SNP_B | Character | Variant ID of the second variant in the LD pair (Note: only SNPs that also appear in the GWAS.df SNP column will be used for LD analysis) |
R2 | Numeric | Squared correlation measure of linkage between the two variants |