This article has Open Peer Review reports available.
Integrative genomics and transcriptomics analysis of human embryonic and induced pluripotent stem cells
© Laurila et al.; licensee BioMed Central. 2014
Received: 16 January 2014
Accepted: 4 December 2014
Published: 13 December 2014
Human genomic variations, including single nucleotide polymorphisms (SNPs) and copy number variations (CNVs), are associated with several phenotypic traits varying from mild features to hereditary diseases. Several genome-wide studies have reported genomic variants that correlate with gene expression levels in various tissue and cell types.
We studied human embryonic stem cells (hESCs) and human induced pluripotent stem cells (hiPSCs) measuring the SNPs and CNVs with Affymetrix SNP 6 microarrays and expression values with Affymetrix Exon microarrays. We computed the linear relationships between SNPs and expression levels of exons, transcripts and genes, and the associations between gene CNVs and gene expression levels. Further, for a few of the resulted genes, the expression value was associated with both CNVs and SNPs. Our results revealed altogether 217 genes and 584 SNPs whose genomic alterations affect the transcriptome in the same cells. We analyzed the enriched pathways and gene ontologies within these groups of genes, and found out that the terms related to alternative splicing and development were enriched.
Our results revealed that in the human pluripotent stem cells, the expression values of several genes, transcripts and exons were affected due to the genomic variation.
KeywordshESC hiPSC Association analysis SNP CNV Gene expression Exon expression Transcript expression
After sequencing the human genome, numerous projects have focused on characterizing genomic alterations and associating them with the different diseases or functional elements of the genome. For example, the focus of two such projects, HapMap project  and 1000 genomes , is to identify the variants in the human genome. These variants are diverse and include e.g. single nucleotide polymorphisms (SNPs), insertions, deletions and copy number variations (CNVs) that comprise together 0.1% of the genome . Moreover, the variants cause different types of phenotypic traits varying from mild properties such as eye color to severe hereditary diseases. These traits can be the consequences of alterations that are directly changing the protein function, or they can emerge after several gene expression regulation steps, caused for example by alternative splicing or methylation. Indeed, recent studies have shown that alterations in SNPs are of great importance as they can affect gene expression levels, alternative splicing, DNA methylation and miRNA-mediated gene expression levels in different types of cells -. Similarly, CNVs have been associated with changes in gene expression values in various cell types -.
Several genome-wide studies have reported differences in cis-acting genomic variations between individuals and populations, in different cell types -, and SNPs have also been associated with transcript isoform variation and alternative splicing -. Further, it has been reported that intronic SNPs are associated with both exon skipping events and complex traits, and that they are also predicted to result in protein domain changes . Moreover, the correlation between SNPs and alternative splicing of exons is found to be the strongest at the exon-intron boundary, and the SNP closest to the alternative splicing event is most likely the functional one .
Genetic variants have been found to affect chromatin accessibility and transcription factor binding resulting in gene expression changes and phenotypic variation . To that end, eQTLs are often (50%) DNase I sensitivity quantitative trait loci (dsQTLs) and majority of dsQTLs are located near genes . Some of the cis-acting variations are similar across various cell types, while most of them can be detected in only some tissue and cell types . These cell type specific variations differ between separate differentiation stages as has been shown with the cells of the hemapoietic system, using stem cells, progenitor cells, and differentiated cells of the myeloid and erythroid lineages . On the other hand, when comparing human induced pluripotent stem cells (hiPSCs) with cells with less potency during the differentiation process, several autosomal allele-specific gene expressions remained similar during the differentiation process and were more dependent on genotypes than cell types even though more genes were expressed in hiPSCs . However, similar behavior could not be detected in X chromosomal regions .
Despite the importance of the associations between the SNPs and other measurements, they have not yet been studied in human pluripotent stem cells. As most association studies are linking SNPs to a specific disease, there is no such known phenotype with the stem cells. The SNP arrays have been used in several studies of pluripotent stem cells and they have been mostly utilized for the copy number analysis ,-. By associating the SNPs to the gene expression levels we believe we can find new insights to the behavior of the pluripotent stem cells. As stem cells hold promise for the future medicine, the possible aberrations in the genome level with their associations to the transcriptomics must be recognized before the cells can be safely used for example in stem cell therapy .
In this study, we detected the effects of SNPs on expression values of both human embryonic stem cells (hESCs) and hiPSCs that are derived from fibroblasts. Further, we analyzed the associations between the gene copy numbers and gene expression values in hiPSCs. Similar associations between CNVs and gene expressions in hESCs have been reported earlier . Previously, it has been shown that when different variants are associated with the gene expression levels, only a small part (<2%) of the SNPs (associated with 84% of gene expression differences) and CNVs (associated with 18%) overlap . We studied the correlations between the SNPs and the expression levels of gene, transcript and exon expressions and the associations between the gene copy numbers and gene expressions. Further, we performed downstream analyses of the resulting cis-acting pairs and detected the overlapping expression changes associated with SNPs and CNVs.
Exon array hybridizations
For measuring the expression values of the hiPSC samples, the RNA was isolated using RNeasy Kit (Qiagen) and DNase I (Qiagen) digestion was performed to eliminate DNA from RNA samples. The concentration of the samples was measured with Nanodrop and the expression values of all hESC, hiPSC and fibroblast samples were measured with the Affymetrix GeneChip Human Exon 1.0 ST arrays. All the samples were hybridized in the Finnish Microarray and Sequencing Centre (Turku, Finland) according to manufacturer's protocol as described in  and .
Effect of gene copy number on expression values
The CNVs of the genes were detected with Affymetrix Genotyping Console (3.0.2) utilizing the Birdseed v2 algorithm and were analyzed against the 40 in house hybridized HapMap samples (available in GSE15097) as reported in . We used the regional GC correction, and the variations with at least five markers and length 10 kb in hiPSCs  and 50 kb in hESCs  were considered to have a CNV. All the variations were linked to Ensembl genes (build 49, corresponds the genome version hg18). The gene expression values of the exon array samples were computed with the aroma.affymetrix package of Bioconductor  in R . The probe values were directly linked to Ensembl genes (build 49) with the CDF files provided by aroma.affymetrix and preprocessed with the RMA method . We performed the integration analysis for all the genes having a duplication or deletion in copy number in at least one sample. All the gene values in each sample were labeled based on the copy number value as gain, normal or loss. In the integration analysis, for each gene we computed a weight value where the m G1 is the mean expression value and std G1 is the standard deviation of the gene i in those samples where the gene i was detected to be gained, and m G0 and std G0 are the mean and standard deviation of the expression values of the gene in those samples where the copy number of the gene has not altered ,,-. The weight value corresponds to the difference between the groups. The weight value is high in cases where the distance between the mean values of the groups is large and the deviations within the groups are small. Therefore, high weight value for duplication indicates that gene expression is likely to be over expressed due to duplication. Similarly, we computed the weight values for the genes with the loss in their CNV. We computed the p-values for each weight value by 10000 permutations, by permuting the sample labels based on the permutation events. Further, the p-values were adjusted with the Benjamini-Hochberg method.
Effect of SNPs on expression values
Exon array data were analyzed separately in the levels of exons, transcripts and genes using custom CDF files, version 11  linking the probes to Ensembl (build 49, corresponds the genome version hg18) genes, transcripts and exons. Additionally, several probes providing possibly unreliable information were filtered out before the actual correlation analysis. In the first filtering step, probes with low expression were considered to be background probes and thus filtered out. The limit for the background probes in Human Exon 1.0 ST Array (26 445 gene / 45 439 transcript / 23 2376 exon probesets) was separately defined for every G + C content using the antigenomic probes of the exon array. This limit was the average plus two standard deviations of the intensity of the antigenomic probes, and for each genomic probe of the array the maximum probe intensity across all the samples was compared with the limit to qualify the probe between the background and non-background probes . In addition to the filtering of the low intensity probes, the probes that could be cross-hybridized were filtered out . Moreover, since single nucleotide polymorphisms (SNPs) at probe sequences can affect the intensities of probes -, especially they can severely bias exon expression estimates in individuals, not when using pooled samples , we filtered out the probes whose sequences had an SNP (SNP locations were determined using dbSNP version 129 ). As some of the exons are small, this QC filtering can result in removing some SNP-exon pairs, in which an SNP is located inside the exon and could contain a real association signal. Nevertheless, this additional QC step mainly removes false positives. Finally, the probesets with only one or two probes were masked out to avoid the unreliability of the expression values . This filtering dramatically reduced the number of analyzed probesets. For example in the gene level analysis, the original 26 445 different probesets are reduced to 17 742 due to the filtering and in the exon level analysis the effect is even larger when more than half of the probesets are filtered out (Figure 2). After the filtering, the expression values were computed with the Robust Multi-array Average (RMA)-normalization method  of the Bioconductor's affy –package .
SNP 6.0 arrays were analyzed using the Bioconductor oligo package . First, the array data was normalized using the SNPRMA normalization and the genotypes for each of the 906 600 SNPs of the array were determined with the Correct Robust Linear Model with Maximum likelihood based distance (CRLMM) method . Further, the SNPs with less than 25% of genetic variation among the samples, were filtered out as well as the probes with a confidence value smaller than 0.9 in at least one of the samples. Remarkably, the number of filtered SNPs varied between the analyzed groups (hiPSCs, hESCs and all samples) as the proportion of the genetic variation differs in them. In the cases where an SNP was filtered out from the analysis for the whole data set (i.e. 20 samples) it was removed of the hiPSCs or hESCs groups also. Finally, we filtered out the SNPs occurring in the regions reported to have copy number changes in hESCs  and in hiPSCs . Likewise, we excluded the SNPs within regions of the gained chromosomes of the samples having a mosaic karyotype.
After the data preparation for the SNPs and expressions, the SNPs were linked to exons, transcripts and genes. For the exons the linking was performed with two different ways; the SNP was considered to be linked to the studied exon if the polymorphism occurred 1) in the exon sequence or in the adjacent intron regions (short interval) or 2) in the whole gene area (long interval). An SNP located within the transcript region was considered to be linked to a transcript, as well as to a gene if located within or 5000 bps up- or downstream of the gene region. The linear relationships were detected with correlation analysis separately performed for each SNP - exon / transcript / gene pair by fitting a linear model for data (homozygous genotypes had values 1 and 3 and the heterozygous one had the value 2) by linear regression analysis with R's lm function. The correlation p-values were adjusted with Benjamini-Hochberg multiple testing correction method.
For validation, we studied the overlap between our associations and associations in rSNPBase . We studied also the relation of the expression correlating SNPs with transcription factors. Specifically, the possible binding of the key embryonic stem cell transcription factors (NANOG, OCT4, SOX2, E2F4) to the SNP-regions was explored. This was performed by searching the overlapping SNP locations and the parts of the reported transcription factor binding sites (TFBSs) for the key stem cell factors identified with chromatin immunoprecipitation microarray (ChIP-chip) . We further studied possible overlap between TFBSs and the SNPs by predicting the transcription factor (TF) binding to the SNP regions. The prediction of TF binding was computed using the TRANSFAC (Release 2010.2)  human binding motif position weight matrices (PWMs) (altogether 618 models) with the pseudocount 0.005. The binding was scored for the sequences of both SNP alleles by sliding the PWM over the flanking SNP sequence and computing the maximum binding score using uniform background probabilities . Only the sequences with at least 80% of the maximum possible binding score on either of the alleles were further studied by computing the absolute difference of the binding scores between the alleles. We also studied if the genes, whose expressions were correlated with SNPs, showed enrichments in different pathways, networks or annotations. These analyses were performed with core analysis of Ingenuity Pathway Analysis (IPA) with Benjamini-Hochberg multiple testing correction method and with the Database for Annotation, Visualization and Integrated Discovery system (DAVID) ,.
Results and discussion
Copy number variation association with gene expression in hiPSCs
Top genes with highest FC change between the average gene expression values between the gained and normal and lost and normal samples
Logarithmic fold change
Glutathione S-transferase theta-1
Calcium-activated potassium channel
RING finger and transmembrane
Amyloid protein-binding protein
Structural maintenance of chromosomes exible hinge domain-containing protein 1
Kinetochore protein NDC80 homolog
HEAT repeat-containing protein
HLA class II histocompatibility antigen, DRB5 beta chain precursor
Protein phosphatase 1D
Podoplanin precursor (Glycoprotein 36)
Alkaline phosphatase, tissue nonspecific isozyme precursor
Natriuretic peptides B precursor
Calcipressin-3 (Regulator of calcineurin 3)
DNA fragmentation factor subunit alpha
Tumor protein p63-regulated gene 1-like protein
Protein DDI1 homolog 2
Mitotic spindle assembly checkpoint protein MAD2B
Lysine-specific histone demethylase 1
ATPase family AAA
For the most of these 217 genes having association in hiPSC, the copy number of the gene is altered only in one sample. Especially the clear majority of the genes that are detected to have an association between the loss and low expression are locating in 1p36 which is only lost in the sample FiPS3-12 (Additional file 1: Table S1, Additional file 2: Figure S1). This deletion does not occur in the parental fibroblast sample. However, there are many genes whose copy numbers have altered in several samples, such as SUMF1 (ENSG00000144455, Sulfatase-modifying factor 1 precursor) and CFHR1 (ENSG00000080910, Complement factor H related protein 1 precursor) that are lost in the fibroblast sample HFF and in all the four hiPSC samples derived from HFF, and also further significantly associated with the low expression value in hiPSCs. In addition, genes TBC1D3F (ENSG00000189309, protein coding TBC1 domain family member 3B) and TBC1D3C (ENSG00000205019) gained in the IMR fibroblast sample are copied not only in the hiPSCs derived from IMR but also in hiPSCs derived from other fibroblasts, and are further having a significant association between the gained copy number and a high expression value in hiPSC samples. The gene HLA-DRB5, (ENSG00000198502, HLA class II histocompatibility antigen) which is gained in the fibroblast samples HFF and MRC5, is also gained in all the hiPSCs derived from HFF and MRC5, and has further an association between the gain and the expression value. Interestingly, the gene DEFB4 (ENSG00000171711, Beta-defensin 2 precursor) is gained in three hiPSC samples and is significantly associated to a high expression value even though the copy number in all fibroblast samples is normal. In addition, there are two genes SLC30A6 (ENSG00000152683, protein coding Zinc transporter 6) and TUBA8 (ENSG00000183785, protein coding Tubulin alpha-8 chain) that are gained in all of the fibroblast and hiPSC samples. In total, 4% of the genes detected with an association between the changed copy number and expression value had a copy number variation already in the fibroblast samples (i.e. 5/29 of the gained associations and 3/188 of the lost associations), while 96% of these associated genes had a normal copy number value in fibroblasts. Interestingly, our analysis detected genes for which the CNV in their region is actually associating negatively with the expression value. For the gene CALB1, the deletion in copy number data resulted with more than 1.5 fold higher gene expression value, as well as gains have decreased with 1.5 fold the expression for NLGN4Y, TMPRSS11, NEBL, GPC6 and C10orf113 (Additional file 1: Table S1). However, these negative correlations were not confirmed with good p-values.
Furthermore, the functionality of the genes with positive association between the copy number and the expression value in the hiPSC samples was detected with the enrichment analysis on gene ontologies and pathways. Regulation of RAS protein signal transduction, organelle localization, macromolecule catabolism, as well as alternative splicing, alcaloid, coenzyme and secondary metabolism terms are significantly enriched (Additional file 3: Table S2).
SNP association with exon, transcript and gene expression in hiPSCs and hESC
Results of different SNP correlation analyses
Area of SNPs
Number of significant correlation pairs/number of all pairs
Number of unique genes/transcripts/exons affected
Number of unique genes
Number of correlating SNPs inside/outside gene/transcript/exon regions
Within the resulted SNPs correlating with expression values, often only two genotypes existed among the samples. This indicates that heterozygosity in a single nucleotide can have significant effect on the expression levels of genes. The detected variants in genotypes were equally common among different cell types, and thus these findings seem to be independent of the stem cell type. Further, as the fibroblasts were included in this part of the analysis, the results indicate that differentiated cells can also have similar effects. Based on our analysis of separate groups of hiPSC and hESC data, only few SNPs correlate with transcript expression and none with exon expression. However, in our analysis the sample size in the individual cell types is small, and as at least 25% of variation in an SNP is required, some of the interesting correlations might have been filtered out. Nevertheless, our results indicate that the expression differences in stem cells can be caused by SNPs and therefore they should be taken into account when considering the differential expression and alternative splicing.
Most of the associations in the levels of genes, transcripts and exons, were detected when all the samples were combined into one sample group (Table 2). Our study revealed three SNPs having an effect on the expression values at all the levels of genes, transcripts and exons (Figure 5C). Further as the datasets overlap, if the transcript expression is affected, the effect can always be seen at the gene level too. The majority of the effects in the exon levels do not occur at the gene levels, indicating potential cases of exon skipping or other alternative splicing events. Our results showed also that the affecting SNPs are located usually at the introns or at one of the other exons when affecting the exon expression, and within the gene region (not in the up-/downstream regions), when affecting the gene expression. Also, we did not detect any SNP enriched locations. The locations of SNPs detected to be associated with the expression values of genes, transcripts and exons are illustrated with POMO  in Additional file 5: Figure S2.
In total we found 584 SNPs that were associated with the expression value. Of these SNPs 438 (75%) have been reported in other eQTL studies and 458 (78%) of them were involved in RNA binding protein mediated regulation based on the rSNPBase . Further, we overlapped the detected SNP – expression associations in the gene level to the list of rSNP related genes in the rSNPBase, and found out that 88% of our SNP- gene pairs and 95% of the SNP – transcript pairs were found also in rSNPBase. As expected, almost all of our SNP – exon pairs were not detected in the rSNPBase because rSNPBase had only associations in gene level, and most our exon level findings were exon specific and did not correspond to whole gene association in our data either. Further, when we compared the SNP locations with the stem cells key transcription factors (TF) (NANOG, OCT4, SOX2, E2F4) binding regions according to the ChIP-chip measurements , none of the SNPs are located in the binding sites. Next, we studied if the SNPs could affect other TF binding sites by comparing the binding scores computed according to TRANSFAC position weight matrices, and detected 29 unique transcription factors with a large difference in the binding score (Additional file 6: Table S4). Thus, this analysis suggests transcriptional regulators, which may have a causal role in regulating these genes in stem cells.
The functional analysis using the protein information resource (PIR)  for the exons, transcripts, and genes associating with an SNP genotype variation (SNP located in gene region) showed enrichment for alternative splicing and splice variant terms. Further, the Ingenuity Pathway Analysis (IPA, Ingenuity® Systems, http://www.ingenuity.com) showed that in several networks our result genes are not the most strategic genes of the network, but rather the targets of other network molecules instead (Additional file 6: Table S5, Table S6). For example in the network "Cellular Development, Embryonic Development, Organ Development", one third of affected molecules (ATXN1, C9orf3, KCNJ3, NUAK1) are indirect targets of TGFB1. Similar effects could be seen in other networks as well. In particular, several of these networks are related to the development, such as embryonic development, cardiovascular system development, tissue development etc. (Additional file 6: Table S6). This is however understandable, as 85% of the samples in the analysis are pluripotent stem cells.
Overlap between SNP and CNV association results
Further, we wanted to know if the associated expression pairs with SNPs are the same ones as the expressions associated with CNVs. Therefore, we compared the genes in each group and found that four genes could be found in both of the results, all these genes are detected in hESC samples. Genes ARNTL2 (ENSG00000029153, aryl hydrocarbon receptor nuclear translocator-like 2) and PPP1CC (ENSG00000186298, protein phosphatase 1, catalytic subunit, gamma isozym) are both in gained regions and genes CDS2 (ENSG00000101290, CDP-diacylglycerol synthase) and LRRN4 (ENSG00000125872, leucine rich repeat neuronal 4) are in loss regions. Thus for these four genes, we cannot rule out the possibility that the gene expression differences might be related actually to copy number variations, and not to the genotype. All the other associations occurred only either between SNPs and the expression value or between CNV and expression value, which indicates that most likely they were not due to other genomic variation. This small amount of overlapping genes in these analyses also confirms the finding that the CNV and SNP associations with expression levels overlap only slightly .
In this paper, we have analyzed human embryonic and induced pluripotent stem cells with two different methods for finding associations between genomic variations i.e. SNPs and CNVs, and the transcriptomics data. The results revealed several associations between gene copy numbers and the gene expression values, and also some associations between the SNPs and exon, transcript and gene expression values. Several copy number associations can be found in hiPSCs suggesting similar features with genomic instability as has been described in hESCs . In hiPSCs data, we detected altogether 217 genes, i.e. ~1% of all genes, of which copy number variation were associated with the expression value. Further, after careful filtering the suspicious SNP probes, we had 366138 SNPs of which 584 (0.16%) were significantly associated with the expression value in at least one of the analysis done for the hESCs, hiPSCs and combined group of samples. When also transcript and exon results were studied in the gene level, we had together 721 associations between SNP and genes.
Some integrative analyses have already been performed where individual variation has been linked to transcription factor binding , or to DNA methylation . We believe that these type of analyses need to be correlated with the functional parameters of differentiated stem cells in order to understand how the genetic and epigenetic variability of pluripotent stem cells translates into the performance and safety of their differentiated progeny. Although we report several such genotype-transcriptome effects here and as earlier results have shown , further studies are needed to understand the significance of such associations. Meanwhile the size of the sample groups in such studies should be enlarged to detect the subgroups having certain association. Particularly in cell therapy studies, it would be essential to have the detailed information of such associations.
This study is supported by funding by Academy of Finland project nos 134117, 134290, and 129657, SyMMyS, ERA-NET Erasysbio+, Emil Aaltonen Foundation and ESTOOLS consortium under the Sixth Research Framework Programme of the European Union, and Finnish Cancer Organizations. We also want to acknowledge Finnish Microarray and Sequencing Centre (Turku, Finland).
- The International HapMap Consortium: The International HapMap Project. Nature. 2003, 426: 789-796. 10.1038/nature02168.View ArticleGoogle Scholar
- Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA, 1000 Genomes: An integrated map of genetic variation from 1,092 human genomes. Nature. 2012, 491: 56-65. 10.1038/nature11632.View ArticlePubMedGoogle Scholar
- Wang L, Luhm R, Lei M: SNP and mutation analysis. Adv Exp Med Biol. 2007, 593: 105-116. 10.1007/978-0-387-39978-2_11.View ArticlePubMedGoogle Scholar
- Hull J, Campino S, Rowlands K, Chan M-S, Copley RR, Taylor MS, Rockett K, Elvidge G, Keating B, Knight J, Kwiatkowski D: Identification of Common Genetic Variation That Modulates Alternative Splicing. PLoS Genet. 2007, 3: e99-10.1371/journal.pgen.0030099.View ArticlePubMedPubMed CentralGoogle Scholar
- Coulombe-Huntington J, Lam KCL, Dias C, Majewski J: Fine-Scale Variation and Genetic Determinants of Alternative Splicing across Individuals. PLoS Genet. 2009, 5: e1000766-10.1371/journal.pgen.1000766.View ArticlePubMedPubMed CentralGoogle Scholar
- Gibbs JR, van der Brug MP, Hernandez DG, Traynor BJ, Nalls MA, Lai S-L, Arepalli S, Dillman A, Rafferty IP, Troncoso J, Johnson R, Zielke HR, Ferrucci L, Longo DL, Cookson MR, Singleton AB: Abundant Quantitative Trait Loci Exist for DNA Methylation and Gene Expression in Human Brain. PLoS Genet. 2010, 6: e1000952-10.1371/journal.pgen.1000952.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang W, Edwards A, Zhu D, Flemington EK, Deininger P, Zhang K: miRNA-mediated relationships between Cis-SNP genotypes and transcript intensities in lymphocyte cell lines. PloS One. 2012, 7: e31429-10.1371/journal.pone.0031429.View ArticlePubMedPubMed CentralGoogle Scholar
- Närvä E, Autio R, Rahkonen N, Kong L, Harrison N, Kitsberg D, Borghese L, Itskovitz-Eldor J, Rasool O, Dvorak P, Hovatta O, Otonkoski T, Tuuri T, Cui W, Brustle O, Baker D, Maltby E, Moore HD, Benvenisty N, Andrews PW, Yli-Harja O, Lahesmaa R: High-resolution DNA analysis of human embryonic stem cell lines reveals culture-induced copy number changes and loss of heterozygosity. Nat Biotech. 2010, 28: 371-377. 10.1038/nbt.1615.View ArticleGoogle Scholar
- Järvinen A-K, Autio R, Haapa-Paananen S, Wolf M, Saarela M, Grénman R, Leivo I, Kallioniemi O, Mäkitie AA, Monni O: Identification of target genes in laryngeal squamous cell carcinoma by high-resolution copy number and gene expression microarray analyses. Oncogene. 2006, 25: 6997-7008. 10.1038/sj.onc.1209690.View ArticlePubMedGoogle Scholar
- Skotheim RI, Autio R, Lind GE, Kraggerud SM, Andrews PW, Monni O, Kallioniemi O, Lothe RA: Novel genomic aberrations in testicular germ cell tumors by array-CGH, and associated gene expression changes. Cell Oncol Off J Int Soc Cell Oncol. 2006, 28: 315-326.Google Scholar
- Stranger BE, Montgomery SB, Dimas AS, Parts L, Stegle O, Ingle CE, Sekowska M, Smith GD, Evans D, Gutierrez-Arcelus M, Price A, Raj T, Nisbett J, Nica AC, Beazley C, Durbin R, Deloukas P, Dermitzakis ET: Patterns of Cis Regulatory Variation in Diverse Human Populations. PLoS Genet. 2012, 8: e1002639-10.1371/journal.pgen.1002639.View ArticlePubMedPubMed CentralGoogle Scholar
- Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG: Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet. 2007, 39: 226-231. 10.1038/ng1955.View ArticlePubMedPubMed CentralGoogle Scholar
- Kwan T, Benovoy D, Dias C, Gurd S, Provencher C, Beaulieu P, Hudson TJ, Sladek R, Majewski J: Genome-wide analysis of transcript isoform variation in humans. Nat Genet. 2008, 40: 225-231. 10.1038/ng.2007.57.View ArticlePubMedGoogle Scholar
- Zhang W, Duan S, Bleibel WK, Wisel SA, Huang RS, Wu X, He L, Clark TA, Chen TX, Schweitzer AC, Blume JE, Dolan ME, Cox NJ: Identification of common genetic variants that account for transcript isoform variation between human populations. Hum Genet. 2009, 125: 81-93. 10.1007/s00439-008-0601-x.View ArticlePubMedGoogle Scholar
- Richards JB, Rivadeneira F, Inouye M, Pastinen TM, Soranzo N, Wilson SG, Andrew T, Falchi M, Gwilliam R, Ahmadi KR, Valdes AM, Arp P, Whittaker P, Verlaan DJ, Jhamai M, Kumanduri V, Moorhouse M, van Meurs JB, Hofman A, Pols HAP, Hart D, Zhai G, Kato BS, Mullin BH, Zhang F, Deloukas P, Uitterlinden AG, Spector TD: Bone mineral density, osteoporosis, and osteoporotic fractures: a genome-wide association study. Lancet. 2008, 371: 1505-1512. 10.1016/S0140-6736(08)60599-1.View ArticlePubMedPubMed CentralGoogle Scholar
- Lee Y, Gamazon ER, Rebman E, Lee Y, Lee S, Dolan ME, Cox NJ, Lussier YA: Variants affecting exon skipping contribute to complex traits. PLoS Genet. 2012, 8: e1002998-10.1371/journal.pgen.1002998.View ArticlePubMedPubMed CentralGoogle Scholar
- Degner JF, Pai AA, Pique-Regi R, Veyrieras J-B, Gaffney DJ, Pickrell JK, De Leon S, Michelini K, Lewellen N, Crawford GE, Stephens M, Gilad Y, Pritchard JK: DNase I sensitivity QTLs are a major determinant of human expression variation. Nature. 2012, 482: 390-394. 10.1038/nature10808.View ArticlePubMedPubMed CentralGoogle Scholar
- Heap G, Trynka G, Jansen R, Bruinenberg M, Swertz M, Dinesen L, Hunt K, Wijmenga C, van Heel D, Franke L: Complex nature of SNP genotype effects on gene expression in primary human leucocytes. BMC Med Genomics. 2009, 2: 1-10.1186/1755-8794-2-1.View ArticlePubMedPubMed CentralGoogle Scholar
- Gerrits A, Li Y, Tesson BM, Bystrykh LV, Weersing E, Ausema A, Dontje B, Wang X, Breitling R, Jansen RC, de Haan G: Expression Quantitative Trait Loci Are Highly Sensitive to Cellular Differentiation State. PLoS Genet. 2009, 5: e1000692-10.1371/journal.pgen.1000692.View ArticlePubMedPubMed CentralGoogle Scholar
- Lee J-H, Park I-H, Gao Y, Li JB, Li Z, Daley GQ, Zhang K, Church GM: A robust approach to identifying tissue-specific gene expression regulatory variants using personalized human induced pluripotent stem cells. PLoS Genet. 2009, 5: e1000718-10.1371/journal.pgen.1000718.View ArticlePubMedPubMed CentralGoogle Scholar
- Hussein SM, Batada NN, Vuoristo S, Ching RW, Autio R, Närvä E, Ng S, Sourour M, Hamalainen R, Olsson C, Lundin K, Mikkola M, Trokovic R, Peitz M, Brustle O, Bazett-Jones DP, Alitalo K, Lahesmaa R, Nagy A, Otonkoski T: Copy number variation and selection during reprogramming to pluripotency. Nature. 2011, 471: 58-62. 10.1038/nature09871.View ArticlePubMedGoogle Scholar
- Laurent LC, Ulitsky I, Slavin I, Tran H, Schork A, Morey R, Lynch C, Harness JV, Lee S, Barrero MJ, Ku S, Martynova M, Semechkin R, Galat V, Gottesfeld J, Belmonte JCI, Murry C, Keirstead HS, Park H-S, Schmidt U, Laslett AL, Muller F-J, Nievergelt CM, Shamir R, Loring JF: Dynamic Changes in the Copy Number of Pluripotency and Cell Proliferation Genes in Human ESCs and iPSCs during Reprogramming and Time in Culture. Cell Stem Cell. 2011, 8: 106-118. 10.1016/j.stem.2010.12.003.View ArticlePubMedPubMed CentralGoogle Scholar
- Lund RJ, Närvä E, Lahesmaa R: Genetic and epigenetic stability of human pluripotent stem cells. Nat Rev Genet. 2012, 13: 732-744. 10.1038/nrg3271.View ArticlePubMedGoogle Scholar
- Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavaré S, Deloukas P, Hurles ME, Dermitzakis ET: Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007, 315: 848-853. 10.1126/science.1136678.View ArticlePubMedPubMed CentralGoogle Scholar
- Bengtson H, Simpson K, Bullard J, Hansen K: Aroma.affymetrix: A Generic Framework in R for Analyzing Small to Very Large Affymetrix Data Sets in Bounded Memory. Technical Report 745, University of California, Department of Statistics. 2008, Department of Statistics, University of CaliforniaGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.View ArticlePubMedPubMed CentralGoogle Scholar
- Ihaka R, Gentleman R: R: A Language for Data Analysis and Graphics. J Comput Graph Stat. 1996, 5: 299-314.Google Scholar
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostat Oxf Engl. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.View ArticleGoogle Scholar
- Hautaniemi S, Ringnér M, Kauraniemi P, Autio R, Edgren H, Yli-Harja O, Astola J, Kallioniemi A, Kallioniemi O-P: A strategy for identifying putative causes of gene expression variation in human cancers. Genomics Signal Process Stat. 2004, 341: 77-88.Google Scholar
- Järvinen A-K, Autio R, Kilpinen S, Saarela M, Leivo I, Grénman R, Mäkitie AA, Monni O: High-resolution copy number and gene expression microarray analyses of head and neck squamous cell carcinoma cell lines of tongue and larynx. Genes Chromosomes Cancer. 2008, 47: 500-509. 10.1002/gcc.20551.View ArticlePubMedGoogle Scholar
- Myllykangas S, Junnila S, Kokkola A, Autio R, Scheinin I, Kiviluoto T, Karjalainen-Lindsberg M-L, Hollmén J, Knuutila S, Puolakkainen P, Monni O: Integrated gene copy number and expression microarray analysis of gastric cancer highlights potential target genes. Int J Cancer J Int Cancer. 2008, 123: 817-825. 10.1002/ijc.23574.View ArticleGoogle Scholar
- Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, Watson SJ, Meng F: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005, 33: e175-10.1093/nar/gni179.View ArticlePubMedPubMed CentralGoogle Scholar
- Kapur K, Jiang H, Xing Y, Wong WH: Cross-hybridization modeling on Affymetrix exon arrays. Bioinforma Oxf Engl. 2008, 24: 2887-2893. 10.1093/bioinformatics/btn571.View ArticleGoogle Scholar
- Cooper TA, Wan L, Dreyfuss G: RNA and Disease. Cell. 2009, 136: 777-793. 10.1016/j.cell.2009.02.011.View ArticlePubMedPubMed CentralGoogle Scholar
- Gamazon ER, Zhang W, Dolan ME, Cox NJ: Comprehensive Survey of SNPs in the Affymetrix Exon Array Using the 1000 Genomes Dataset. PLoS ONE. 2010, 5: e9366-10.1371/journal.pone.0009366.View ArticlePubMedPubMed CentralGoogle Scholar
- Sela N, Mersch B, Hotz-Wagenblatt A, Ast G: Characteristics of transposable element exonization within human and mouse. PloS One. 2010, 5: e10907-10.1371/journal.pone.0010907.View ArticlePubMedPubMed CentralGoogle Scholar
- Ciobanu DC, Lu L, Mozhui K, Wang X, Jagalur M, Morris JA, Taylor WL, Dietz K, Simon P, Williams RW: Detection, validation, and downstream analysis of allelic variation in gene expression. Genetics. 2010, 184: 119-128. 10.1534/genetics.109.107474.View ArticlePubMedPubMed CentralGoogle Scholar
- Benovoy D, Kwan T, Majewski J: Effect of polymorphisms within probe-target sequences on olignonucleotide microarray experiments. Nucleic Acids Res. 2008, 36: 4417-4423. 10.1093/nar/gkn409.View ArticlePubMedPubMed CentralGoogle Scholar
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001, 29: 308-311. 10.1093/nar/29.1.308.View ArticlePubMedPubMed CentralGoogle Scholar
- Bolstad BM, Irizarry R, Astrand M, Speed T: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19: 185-193. 10.1093/bioinformatics/19.2.185.View ArticlePubMedGoogle Scholar
- Gautier L, Cope L, Bolstad BM, Irizarry RA: affy–analysis of Affymetrix GeneChip data at the probe level. Bioinforma Oxf Engl. 2004, 20: 307-315. 10.1093/bioinformatics/btg405.View ArticleGoogle Scholar
- Carvalho B, Bengtsson H, Speed TP, Irizarry RA: Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostat Oxf Engl. 2007, 8: 485-499. 10.1093/biostatistics/kxl042.View ArticleGoogle Scholar
- Guo L, Du Y, Chang S, Zhang K, Wang J: rSNPBase: a database for curated regulatory SNPs. Nucleic Acids Res. 2014, 42: D1033-D1039. 10.1093/nar/gkt1167.View ArticlePubMedGoogle Scholar
- Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA: Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 2005, 122: 947-956. 10.1016/j.cell.2005.08.020.View ArticlePubMedPubMed CentralGoogle Scholar
- Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006, 34 (Database issue): D108-110. 10.1093/nar/gkj143.View ArticlePubMedGoogle Scholar
- Stormo GD: DNA binding sites: representation and discovery. Bioinforma Oxf Engl. 2000, 16: 16-23. 10.1093/bioinformatics/16.1.16.View ArticleGoogle Scholar
- Dennis G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003, 4: P3-10.1186/gb-2003-4-5-p3.View ArticlePubMedGoogle Scholar
- Huang DW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009, 4: 44-57. 10.1038/nprot.2008.211.View ArticleGoogle Scholar
- Lin J, Kreisberg R, Kallio A, Dudley A, Nykter M, Shmulevich I, May P, Autio R: POMO - Plotting Omics analysis results for Multiple Organisms. BMC Genomics. 2013, 14: 918-10.1186/1471-2164-14-918.View ArticlePubMedPubMed CentralGoogle Scholar
- McGarvey PB, Huang H, Barker WC, Orcutt BC, Garavelli JS, Srinivasarao GY, Yeh LS, Xiao C, Wu CH: PIR: a new resource for bioinformatics. Bioinforma Oxf Engl. 2000, 16: 290-291. 10.1093/bioinformatics/16.3.290.View ArticleGoogle Scholar
- Kasowski M, Grubert F, Heffelfinger C, Hariharan M, Asabere A, Waszak SM, Habegger L, Rozowsky J, Shi M, Urban AE, Hong MY, Karczewski KJ, Huber W, Weissman SM, Gerstein MB, Korbel JO, Snyder M: Variation in Transcription Factor Binding Among Humans. Science. 2010, 328: 232-235. 10.1126/science.1183621.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen K, van Nimwegen E, Rajewsky N, Siegal ML: Correlating gene expression variation with cis-regulatory polymorphism in Saccharomyces cerevisiae. Genome Biol Evol. 2010, 2: 697-707.PubMedPubMed CentralGoogle Scholar
- Hellman A, Chess A: Extensive sequence-influenced DNA methylation polymorphism in the human genome. Epigenetics Chromatin. 2010, 3: 11-10.1186/1756-8935-3-11.View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.