Skip to main content

Table 1 Description of required and optional input data frames for eQTpLot

From: eQTpLot: a user-friendly R package for the visualization of colocalization between eQTL and GWAS signals

Required Input Data Frames

GWAS.df, a data frame, one row per SNP, with columns as one might obtain from a genome-wide association study performed in PLINK using either the --logistic or --linear flags

Column Name

Data type

Description

CHR

Integer

Chromosome for SNP (sex chromosomes coded numerically)

BP

Integer

Chromosomal position for each SNP, in base pairs

SNP

Character

Variant ID (such as dbSNP ID “rs...”. (Note: naming scheme must be the same as what is used in the eQTL.df to ensure proper SNP matching)

P

Numeric

P-value for the SNP from GWAS analysis

BETA

Numeric

Beta for SNP from GWAS analysis

PHE

(Optional)

Character

Name of the phenotype for which the GWAS data refers. This column is optional and is useful if your GWAS.df contains data for multiple phenotypes, such as one might obtain from a PheWAS. If GWAS.df does not contain a “PHE” column, eQTpLot will assume all the supplied GWAS data is for a single phenotype, with a name to be specified with the “trait” argument.

eQTL.df, a data frame, one row per SNP, with columns as one might download directly from the GTEx Portal in .csv format

Column Name

Data type

Description

SNP.Id

Character

Variant ID (such as dbSNP ID “rs...”. (Note: naming scheme must be the same as what is used in the GWAS.df to ensure proper matching).

Gene.Symbol

Character

Gene symbol to which the eQTL expression data refers (Note: gene symbol must match entries in Genes.df to ensure proper matching)

P.value

Numeric

P-value for the SNP from eQTL analysis

NES

Numeric

Normalized effect size for the SNP from eQTL analysis (Per GTEx, defined as the slope of the linear regression, and is computed as the effect of the alternative allele relative to the reference allele in the human genome reference.

Tissue

Character

Tissue type to which the eQTL pvalue/NES refer (Note: eQTL.df can contain multiple tissue types)

N

(Optional)

Numeric

The number of samples used to calculate the p-value and NES for the eQTL data. This value is used if performing a MultiTissue or PanTissue analysis with the option CollapseMethod set to “meta” for a simple sample size weighted meta-analysis.

Optional Input Data Frames

Genes.df, an optional data frame, one row per gene, with the following columns (Note: eQTpLot automatically loads a default Genes.df containing information for most protein-coding genes for genomic builds hg19 and hg38, but you may wish to specify our own Genes.df data frame if your gene of interest is not included in the default data frame, or if your eQTL data uses a different gene naming scheme (for example, Gencode ID instead of gene symbol))

Column Name

Data type

Description

Gene

Character

Gene symbol/name (Note: gene naming scheme must match entries in eQTL.df to ensure proper matching)

CHR

Integer

Chromosome the gene is on (Note: do not include a “chr” prefix, and sex chromosomes should be coded numerically)

Start

Integer

Base pair coordinate of the beginning of the gene (Note: this should be the smaller of the two values between Start and Stop)

Stop

Integer

Base pair coordinate of the end of the gene (Note: this should be the larger of the two values between Start and Stop)

Build

Character, “hg19” or “hg38”

The genome build (either hg19 or hg38) for the location data

LD.df, an optional data frame of SNP linkage data, one row per SNP pair, with columns as one might obtain from a PLINK linkage disequilibrium analysis using the PLINK --r2 option. (Note: If no LD.df is supplied, eQTpLot will plot data without LD information)

Column Name

Data type

Description

BP_A

Integer

Base pair position of the first variant in the LD pair

SNP_A

Character

Variant ID of the first variant in the LD pair (Note: only variants that also appear in the GWAS.df SNP column will be used for LD analysis)

BP_B

Integer

Base pair position of the second variant in the LD pair

SNP_B

Character

Variant ID of the second variant in the LD pair (Note: only SNPs that also appear in the GWAS.df SNP column will be used for LD analysis)

R2

Numeric

Squared correlation measure of linkage between the two variants