- Research
- Open access
- Published:
On predicting regulatory genes by analysis of functional networks in C. elegans
BioData Mining volume 8, Article number: 33 (2015)
Abstract
Background
Connectivity networks, which reflect multiple interactions between genes and proteins, possess not only a descriptive but also a predictive value, as new connections can be extrapolated and tested by means of computational analysis. Integration of different types of connectivity data (such as co-expression and genetic interactions) in one network has proven to benefit ‘guilt by association’ analysis. However predictive values of connectives of different types, that had their specific functional meaning and topological characteristics were not obvious, and have been addressed in this analysis.
Methods
eQTL data for 3 experimental C.elegans age groups were retrieved from WormQTL. WormNet has been used to obtain pair-wise gene interactions. The Shortest Path Function (SPF) has been adopted for statistical validation of the co-expressed gene clusters and for computational prediction of their potential gene expression regulators from a network context. A new SPF-based algorithm has been applied to genetic interactions sub-networks adjacent to the clusters of co-expressed genes for ranking the most likely gene expression regulators causal to eQTLs.
Results
We have demonstrated that known co-expression and genetic interactions between C. elegans genes can be complementary in predicting gene expression regulators. Several algorithms were compared in respect to their predictive potential in different network connectivity contexts. We found that genes associated with eQTLs are highly clustered in a C. elegans co-expression sub-network, and their adjacent genetic interactions provide the optimal functional connectivity environment for application of the new SPF-based algorithm. It was successfully tested in the reverse-prediction analysis on groups of genes with known regulators and applied to co-expressed genes and experimentally observed expression quantitative trait loci (eQTLs).
Conclusions
This analysis demonstrates differences in topology and connectivity of co-expression and genetic interactions sub-networks in WormNet. The modularity of less continuous genetic interaction network does not correspond to modularity of the dense network comprised by gene co-expression interactions. However the genetic interaction network can be used much more efficiently with the SPF method in prediction of potential regulators of gene expression. The developed method can be used for validation of functional significance of suggested eQTLs and a discovery of new regulatory modules.
Background
Reconstruction of functional networks from the known pair-wise connectivity between biological molecules offers systems level insights into complex biological processes [1]. The topology of such a network is determined by all types of interactions used for its reconstruction and selected from: direct physical or regulatory protein interactions as well as indirect indicators of functional links between proteins. The indirect indicators such as genetic interactions, gene co-expression, co-occurrence, fusions are usually presented as probabilistic. Integrative analysis of different types of data is widely applied to construct regulatory gene networks [2, 3], overall improving the predictive power of such networks [4]. It has been shown that indirect indications of functional relevance between genes such as gene co-expression and genome co-localization are largely complementary and correlate well with ontology-based protein groupings [5, 6]. However, not all interactions correlate well: genetic and protein interactions barely overlap [7], that causes some obvious challenges in retrieving useful information from a reconstructed species-specific network. Network-based investigations require an accurate choice of data and significance thresholds to reflect a proper balance between the connectivity and the reliability of a network. Pitfalls are that not all types of connectivity data have been investigated to the same extent and the unequal availability of data for different organisms. Therefore a prior understanding of the impact of the different available data types on the topology of the generated networks is essential.
The gaps in experimentally-derived knowledge on regulatory and structural features of biological systems can be filled to some extent by theoretical predictions. For this study we propose a new application of a modified statistical algorithm [8], based on the “shortest path function” (SPF) to rank the regulators by their potential involvement with the genes in a co-expressed cluster. The suggested algorithm can also be applied to any explicitly defined group of genes.
One of the efficient methods that allow reconstruction of the regulatory interactions between genes is based on expression quantitative trait locus (eQTL) data derived from genetical genomics experiments. eQTL data has been used in several ways for network/pathway reconstruction [3, 9–11]. However these methods focused on a small number of genes or only used eQTL data without consideration of other available information on gene and protein connectivity. The advantage of using of eQTL related co-expression clusters is an opportunity to filter potential candidates by their genomic position. Here we present an algorithm which uses eQTL data in combination with published functional interactions in C. elegans [1]. By application to age-specific eQTL data for C. elegans [12] we show that it leads to reasonable predictions for the underlying regulatory genes. The suggested approach can refine interpretation of organism- specific integral biological networks and used for prediction of protein complexes and genetic regulators from a network context.
Methods
Data sets
Dataset for validation of gene clusters
For eQTL-hotspot gene selection we used previously published eQTL data [12], retrieved from WormQTL [13]. This experiment was done on three C. elegance worm age groups. In each of the 3 experimental age groups the genes with a shared regulatory locus were selected by taking all the genes having an eQTL with a log10(p)-value above 3 at the same locus (see Table 1).
WormBase WS220 [14] has been used for retrieval of gene names and IDs, associated functional annotations and ontological categories. WormNet [1] has been used to obtain pair-wise interactions between genes. WormNet contains connectivity data from C. elegans, Drosophila melanogaster, Homo sapiens, and Saccharomyces cerevisiae. Among the different types of data there are: co-citation, co-expression, protein physical interactions, protein complexes, genetic interactions. In our analysis we used complete Wormnet, which was denoted as ’wWormnet’ and Wormnet sub networks, comprised of the genetic interactions (’gWormnet’) or the gene co-expression connections in C. elegans (’eWormNet’).
Data set for testing predictive algorithms
To test our algorithms for detection of potential regulators from the gWormNet we used 3 groups of genes, each known to be regulated by 3 regulators highly ranked in our eQTL analysis (see Table 2). These groups of genes were retrieved from WormBase and complemented with their genetic interactions and co-expression data retrieved from WormNet.
Application of the SPF method to a new data set
To test our algorithm on a larger set of highly interconnected and co-expressed genes we selected a group of genes involved in translation that had a strong co-expression pattern in two C.elegans strains [15–17]. The micro-array data [18] were retrieved from NCBI’s Gene Expression Omnibus (GEO [19]) under GSE5395. By means of the Mev4 application [20] we performed clustering of the gene expression profiles by absolute mRNA values. By application of K-means clustering of the expression profiles we have produced a number of gene cluster arrays and have chosen the most robust cluster of genes (slightly changes depending on the requested cluster number) from a 50-cluster K-means analysis where it was composed by genes with highly homogenous expression values. This largest cluster (Cluster K1) enriched for highly co-expressed genes relevant to translation was selected for further analysis. String software [21, 22] has been used for visualization of graphical networks reconstructed for sets of C. elegans genes.
Methods
Statistical validation of the gene clusters
To investigate WormNet connectivity properties of the selected gene clusters we have used quite a standard approach based on calculation of gene pairs (GP) connected in a cluster (or a module) [2, 4]. But in contrast to common methods, where the fraction of direct links among all links is defined, we calculate this fraction as dependent on a random set of genes (nodes) with the same degree distribution in a network (further, we use the notation referred to as “random cluster” for such a set). The respective values of statistical significance (GP score) can be defined as
Where <GP> random and σ(<GP> random ) are the mean and the deviation of GP in ensemble of random clusters defined in a given network. Note that the use a set of nodes of the same degree distribution is a quite natural “null model”, arising in the study of motif‘s distribution [23] or of the specificity and the stability in a topology of networks [24]. Another approach to characterize the connectivity of a gene cluster is based on the so-called the shortest path function (SPF): where the shortest path along the network from a given vertex node i to some another vertex node j is denoted by di,j. The SPF for a given cluster is determined by the following
Thus defined, the SPF has a very transparent meaning, since it defines the sum of lengths of reciprocal paths between a pair of genes. If i and j are not linked within the network, the contribution to the SPF from this pair (i,j) equals zero. Whereas, for a directly linked node pair the contribution reaches its maximum (2). So, the SPF can be used to quantitatively characterize the connectivity of a gene cluster within a given network (see Fig. 1).
As for the gene pairs, we compare the SPF coefficients calculated for gene clusters and for random sets of nodes with the same degree distribution in a network. Apparently, direct links contribute to the calculated SPF coefficient the most; however for quite rare networks a contribution of longer paths can be significant.
Prediction of potential regulators
The methods for prediction of potential gene expression regulators from a network are usually based on the enrichment of direct links [2] or the overlap of the sub network of the genes directly connected to the potential regulator with the studied gene cluster [4]. Our approach to predict the potential gene expression regulators from a network context is based on the assumption that potential regulators of a cluster are associated with the cluster by multiple genetic connections. The SPF coefficients \(k^{SPF}_{M}\), determined as
is proved to be promising identifier of potential regulators of co-expressed gene clusters. Note that a potential regulator that belongs to the cluster or lie outside of it, can have connections to not all cluster genes. It should however have more connections to the cluster than to other genes in the network, where the connections can be non-direct. As before, we analyze the correlation of the SPF coefficients and a respective number of direct links between a potential regulator and the cluster genes.
Results
Statistical properties of co-expression clusters
Statistical validation of eQTL hot spot gene clusters
Our first goal was to define the topological properties of the eQTL hot spots in a network. We calculated the SPF coefficients for the trans-regulatory hotspots (or trans-bands) associated with ageing in C.elegans (see Table 1) in eWormNet (Fig. 1). Table 3 demonstrates the values of GP scores defined by (1) and the respective GP scores in the SPF coefficients. The SPF coefficients for trans-bands (numbering according to Table 1) are slightly smaller than the respective GP scores. This indicates a nearly homogeneous distribution of the shortest paths in eWormNet. However, the number of direct links (di,j=1) is contrasting between the eQTL hotspots and random clusters for all trans-bands (the respective scores are more than 1). Therefore, we can conclude that the eQTL trans-bands are characterized by higher connectivity at the level of direct links in eWormNet.
The same analysis was applied to the co-expression cluster of genes involved in translation (Cluster K1). We found that the genes within the cluster have a higher connectivity both in SPF and in the number of direct links Table 3. Again, we can confidently distinguish the co-expression gene cluster by only on the number of the comprised direct links, with the value of the respective statistical significance is order of 100.
Although we cannot distinguish both the cluster of co-expressed genes and eQTL hotspots from a random gene set by the values of the SPF coefficient alone, this method can be successfully applied to predict potential regulatory genes. We assume that a cluster‘s regulatory gene must be well-connected with all genes in the cluster, rather than directly linked to a small subcluster. Therefore the calculation of a SPF coefficient (averaging the lengths of all paths to a cluster) should be helpful and interpretive.
Prediction of expression cluster regulators
Testing of the SPF method
We have used the 3 pre-selected gene groups with known regulators (Table 2) to test our algorithm on prediction of potential regulators. Two different methods have been tested. We consider every gene in a network as a potential regulator and rank them by the fraction of direct links (FDL), which a potential regulator has to genes in a co-expression cluster. To be efficient, this method requires a dense connectivity matrix. The predicted regulators were expected to have a strong and specific involvement in modulation of expression of at least part of the gene cluster. A gene with any function could have an effect on other gene expression. However, we expect to define a direct regulator of the mRNA pool from the set of highly ranked genes. We used a number of ontological categories, such as transcription, splicing, mRNA degradation and transport functions to filter the best candidates.
The second method was based on ranking of the network nodes by their average distance to all cluster genes, which was defined by the SPF. The different connectivity subnetworks reconstructed for genes in a cluster had different topologies, with the genetic interactions subnetwork (gWormNet) generally being the least connected. The SPF method did not require a matrix to be dense and could be applied to a gWormNet. The power of the SPF method was in its compensation for fragmentary connectivity data, as the top ranked regulatory links would be projected on the rest of the gene cluster.
Both the FDL and SPF methods were applied to the wWormNet and gWormNet subnetworks reconstructed for the 3 test clusters (Table 4). We ranked all genes in WormNet as potential regulators to a given gene group. Well studied regulators had high ranks in both wWormNet and gWormNet subnetworks with pop-1 and daf-2 showing the largest number of direct links to their gene groups (Table 4). Their coefficients in eWormNet were equal to zero (not shown in Table 4), indicating that the identified regulators would be unlikely co-expressed with the regulated groups of genes. Application of the SPF algorithm increased the rank of the regulator lin-11 from the 19th to 8th position for the test cluster 3 in wWormNet, and from 3rd to 1st position in gWormNet, proving that the method may be successfully applied to even a small group of C. elegans genes connected in WormNet.
Comparison of computational extrapolations made via applications of the wSPF and gSPF methods demonstrate robustness of the highest and also other highly ranked predictions. Two other suggested regulators for the pop-1 regulated test gene group (mom-2 and skn-1), were ranked second and third by both methods, and 6 regulators suggested for daf-2 test gene group were also among the top 10 predicted by application of the wSPF or gSPF. Three other regulators for emphlin-11 test group (lin-1, lin-29 and egl-38), were also highly ranked by both the wSPF and gSPF scores.
To address problems with potential false-positive outcomes in the application of SPF-based algorithms, we have compared the scores and inter-connectivity of the top regulators predicted for each test gene group. This comparison showed that the very top ranked regulators occurred to have strongly distinguished increments in their ranking scores compared to the other suggested regulators, making them to stand out. For instance, the gSPF score of daf-2’s is 0.94, which is high compared to the second ranked (daf-7’s- 0.69), the third ranked (age-1’s -0.67) and the eighth ranked (daf-16’s -0.56). The score of daf-2’s in wSPF equals 1 and for both daf-7 and age-1, now ranked as the 8th and the 9th, scores equal at 0.69. The gSPF scores for the three top ranked regulators of the pop-1 associated gene group are 0.55, 0.50, and 0.50 respectively, whereas the score of the fourth ranked gene decreases abruptly to 0.37. Their corresponding scores using wSPF are: 0.78, 0.65, and 0.63. Lin-11 shares the gSPF score of 0.5 with lin-29, however lin-11 has a higher score (0.73 versus 0.67) using wSPF.
The top suggested regulators that have high scores in both gSPF and wSPF ranking are also strongly inter-connected in a network. age-1, daf-7, and Pdk-1 that followed daf-2 in the gSPF prediction list are the most connected to daf-2 by the number of experimentally supported links. On the other hand, dao-5, dao-6 and isp-1 were only higly ranked using the wSPF method and have lower positions in the gSPF list. These genes did not have experimentally defined connections to daf-2 or its immediate connectors. Functions of dao-5and dao-6 are also linked to regulation of dauer stage of larva development, which may explain their high over-all connection to the dauer-associated daf-2-regulated test gene group on wSPF. The gene isp-1 which is a component of the respiratory chain and probably is associated with the test group of genes via co-expression connections taken in consideration only in wSPF analysis. In case of the lin-11-regulated gene group all the other highly ranked regulators contributed to cell differentiation and egg laying and can be potentially functionally-relevant. wSPF however could generate potentially false predictions, such as of the gene B0034.1 ranked as 3rd by wSPF and having ‘0’ ranking score in gSPF.
From this analysis we conclude that there is no critical false-positive issue with respect to reliability of SPF method-based predictions, especially if the utilized network context is comprised of only experimentally validated genetic interactions. Parallel application of wSPF and gSPF can help to refine the predictions by contrasting true functional regulators among at least top 10 ranked genes.
Prediction of regulators for eQTL hotspots
Subsequently we applied the SPF method to the age-associated C. elegance eQTL data (Table 1) [13, 16]. Application of the SPF method to a gWormNet led to promising regulatory predictions for eQTL-hotspots. Four regulators could be predicted for the eQTL-hotspot on the left arm of chromosome I in the juvenile (L4) group when selection included the position of the eQTL-hotspot locus. Interestingly, 3 from top 4 suggested regulators (Pop-1, xnp-1, lin-17 and lin-44) are related to WNT pathway (see Table 5). Pop-1 also associates with the chromosome V eQTL-hotspot in juvenile but cannot be the first-order causal regultor of this QTL-hotspot as it is not located on the chromosome V locus. When the location of the regulators was not considered, we found that both daf-2 and daf-16 were associated with the two juvenile eQTL-hotspots possibly functionally linking these loci to wnt signaling. Age-1 was suggested for the eQTL-hotspot on chromosome II, specifically found in old worms, by the analysis of the gWormNet (see Table 5). No regulator could be identified for the eQTL-hotspot on the far right arm of chromosome V, found in all three age groups, even though the genes in this eQTL-hotspot were highly linked in wWormNet. This could mean that a relatively less well studied gene is involved in this eQTL regulation.
Application of the SPF method to wWormNet gave more diverse results presented in Table 6. Besides a long list of candidate genes with unknown functions there were promising predictions of steroid-hormone receptors nhr-218 and nhr-269, linked to let-60 and thus to ras and wnt signaling for the Chromosome V eQTL-hotspot in old worms, and also a prediction of RNA binding protein modulator encoded by moe-3 for the chromosome II eQTL-hotspot in old worms.
Predictions for the co-expression cluster K1
Both algorithms were applied to the cluster of co-expressed genes involved in translation (Cluster K1). Among the most promising predicted regulators are: daf-2, iff-1, cgh-1, tin-9.2 and car-1. They all are related to mRNA processing/translation/decay and are in a cross-talking relationship (see Table 7). The SPF method allowed us to predict some regulators which we could not detect by the FDL method. Especially, it was demonstrated on gWormNet, where the density of network links is low (see Table 8). We could identify a number of genes encoding TFs that may be considered for a role of transcription regulators of genes in Cluster K1, such as: taf-5— transcription initiation factor TFIID subunit 5, xbp-1— heat-shock transcription factor, sin-3 — histone deacetylase subunit and premRNA-splicing factor cwc-22. Compared to FDL, SPF greatly increased the ranking position of daf-2, genes upstream daf-2 (C25A1.10) or ones that were known to be directly affected by daf-2 mutation (C05C8.3).
Figure 2 illustrates typical positions of the predicted regulators for the Cluster K1. Nodes predicted by the FDL method (purple frame) are located proximal to the cluster or even inside the cluster. The nodes predicted by the SPF method can be distant from the many nodes in the cluster (ces-1, eor-1, orange frames in Fig. 2). Though the connections between the SPF-predicted node and the cluster may include several intermediate steps, the majority of these steps do contain the nodes that can translate signals at the level of mRNA pool regulation, potentially representing complexes of proteins with a joint regulatory performance.
Discussion
Our study aimed to refine algorithms that use biological networks for identification of gene regulators. We used test clusters of co-regulated genes with known regulators and a large cluster of co-expressed house-keeping genes to validate their performances. Both test computations gave us positive outcomes for an application of the SPF-function based algorithm, especially when only genetic interactions were used for network reconstructions. We were able to perform a reverse prediction of the regulators for the selected clusters of co-regulated genes and to suggest a number of expected functional links/potential regulators of the cluster of co-expressed genes relevant to translation.
Our method may be especially useful in finding the causal regulators for gene expression QTLs in genetical genomics studies. Genes sharing an eQTL are very likely to have a common regulator as well as a joined biological function, however, candidate regulators in the relevant genomic position are still too numerous for a focused experimental validation. As more diverse organisms like yeast [4, 25, 26], plants [3, 9, 10, 27], animals [12, 18, 28–32], human [33] are interrogated via eQTL analysis, an efficient way of candidate gene selection is indeed becoming essential. This application is also of potential interest for interpreting the results of population genomic studies, since eQTLs from individual experiments may provide inconclusive clues to the relevant functional relationships underlying observed responses. Our methods predict the most likely regulator based on hundreds of previously published experiments, as, in our case, are those used to generate WormNet [34].
The test co-expression gene Cluster K1 mainly contained genes involved in the translational machinery. Our analysis highlighted its primary association with insulin-dependent pathway via such suggested regulators as daf-2 and insulin-regulated mRNA decay functions iff-1 and bir-2 [35]. The insulin pathway has an established role in the regulation of translation [36, 37]. As it is involved in regulation of the aging process, and iff-1 was shown to have a longevity phenotype, we investigated the genes of cluster K1 for association with longevity phenotypes (Fig. 3). The analysis has produced a supportive outcome. Predicted K1 connections, iff-1 and tin-9.2, are associated in a network with a ribosome maturation protein SBDS [38, 39], which is required for the longevity phenotype of daf-2 [40]. Interestingly, the transcription factors predicted for Cluster K1 by the SPF method were also found to be involved in regulation of longevity. The genes cgh-1 [41], dao-5 [42], hel-1 [43] were already linked to aging processes downstream of daf-2, daf-16, and in case of dao-5, to a daf-16 independent pathway associated with determination of the adult life span GO-term in WormBase database.
All top regulators predicted for age-associated eQTLs were relevant to aging and longevity processes. Finding AGE-1 as a possible regulator for an eQTL-hotspot expressed in old age worms is especially interesting as this protein gene has been already suggested as a regulator of lifespan after heat shock [44]. POP-1, a predicted regulator of the chromosome I eQTL-hotspot in juveniles, is a TF that functions as a component of WNT signaling pathways [36], and both longevity-related DAF-2 and DAF-16 are known to interact with its components [45]. For instance, it was shown that DAF-2, DAF-16 and POP-1 synergistically affect the immune response in C. elegans [46].
We anticipate that our work provides new insights to the structure of biological functional networks and highlights the aspects that need to be considered in the prediction of regulatory nodes and functional modules from a multilevel and heterogeneous network context.
Conclusions
-
1.
Application of the SPF function has been adopted for computational prediction of potential gene expression regulators from a network context.
-
2.
Computational identification of groups of co-expressed genes in a network was proven to be achievable. The developed method can be used for validation of functional significance of suggested eQTLs and a discovery of new regulatory interactions.
-
3.
We have demonstrated differences in topology and connectivity of co-expression and genetic interactions subnetworks in WormNet. The modularity of less continuous genetic interaction network does not correspond to modularity of the dense network comprised by gene co-expression interactions. However the genetic interaction network can be used much more efficiently with the SPF method in prediction of potential regulators of gene expression.
-
4.
Regulators predicted for the test cluster of co-expressed genes related to translation revealed the relation of this gene group to longevity. RNA decay may be suggested as an important player in longevity regulation.
References
Lee I, Lehner B, Vavouri T, Shin J, Fraser AG, Marcotte EM. Predicting genetic modifier loci using functional gene networks. Genome Res. 2010; 20(8):1143–53. doi:http://dx.doi.org/10.1101/gr.102749.109.
Segal E, Shapira M, Regev A, Peer D, Botstein D, Koller D, Friedman N. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet. 2003; 34(2):166–76. doi:http://dx.doi.org/10.1038/ng1165.
Terpstra IR, Snoek LB, Keurentjes JJ, Peeters AJ, Van den Ackerveken G. Regulatory network identification by genetical genomics: signaling downstream of the Arabidopsis receptor-like kinase ERECTA. Plant Physiol. 2010; 154(3):1067–78.
Zhu J, Zhang B, Smith EN, Drees B, Brem RB, Kruglyak L, et al.Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet. 2008; 40(7):854–61. doi:http://dx.doi.org/10.1038/ng.167.
Sharabiani M, Siermala M, Lehtinen TO, Vihinen M. Dynamic covariation between gene expression and proteome characteristics. Bioinformatics. 2005; 6(1):215. doi:http://dx.doi.org/10.1186/1471-2105-6-215.
Fagan A, Culhane AC, Higgins DG. A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics. 2007; 7(13):2162–171. doi:http://dx.doi.org/10.1002/pmic.200600898.
Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, et al.The genetic landscape of a cell. Science. 2010; 327(5964):425–31. doi:http://dx.doi.org/10.1126/science.1180823.
Vohradsky J. Neural model of the genetic network. J Biol Chem. 2001; 276(39):36168–73. doi:http://dx.doi.org/10.1074/jbc.M104391200.
Keurentjes JJB, Fu J, Terpstra IR, Garcia JM, van den Ackerveken G, Snoek LB, et al.Regulatory network construction in Arabidopsis by using genome-wide gene expression quantitative trait loci. Proc Natl Acad Sci. 2007; 104(5):1708–13. doi:http://dx.doi.org/10.1073/pnas.0610429104.
Snoek LB, Terpstra IR, Dekter R, Van den Ackerveken G, Peeters AJM. Genetical genomics reveals large scale genotype-by-environment interactions in arabidopsis thaliana. Front Gene. 2013; 3. doi:http://dx.doi.org/10.3389/fgene.2012.00317.
Bing N. Genetical genomics analysis of a yeast segregant population for transcription network inference. Genetics. 2005; 170(2):533–42. doi:http://dx.doi.org/10.1534/genetics.105.041103.
Vinuela A, Snoek LB, Riksen JAG, Kammenga JE. Genome-wide gene expression regulation as a function of genotype and age in C. elegans. Genome Res. 2010; 20(7):929–37. doi:http://dx.doi.org/10.1101/gr.102160.109.
Snoek LB, Van der Velde KJ, Arends D, Li Y, Beyer A, Elvin M, et al.WormQTL–public archive and analysis web portal for natural variation data in Caenorhabditis spp. Nucleic Acids Res. 2012; 41(D1):738–43. doi:http://dx.doi.org/10.1093/nar/gks1124.
WormBase WS220. http://www.wormbase.org/. access date November 2012.
Capra EJ, Skrovanek SM, Kruglyak L. Comparative developmental expression profiling of two C. elegans isolates. PLOS One. 2008;3(12). doi:http://dx.doi.org/10.1371/journal.pone.0004055.
Vinuela A, Snoek LB, Riksen JAG, Kammenga JE. Aging uncouples heritability and expression-QTL in Caenorhabditis elegans. G3: Genes Genomes Genet. 2012; 2(5):597–605. doi:http://dx.doi.org/10.1534/g3.112.002212.
Volkers RJ, Snoek L, Hubar CJVH, Coopman R, Chen W, Yang W, et al.Gene-environment and protein-degradation signatures characterize genomic and phenotypic diversity in wild caenorhabditis elegans populations. BMC Biol. 2013; 11(1):93. doi:http://dx.doi.org/10.1186/1741-7007-11-93.
Li Y, Alvarez OA, Gutteling EW, Tijsterman M, Fu J, Riksen JAG, et al.Mapping determinants of gene expression plasticity by genetical genomics in C. elegans. PLOS Genet.2006;2(12). doi:http://dx.doi.org/10.1371/journal.pgen.0020222.
Gene Expression Omnibus. http://www.ncbi.nlm.nih.gov/geo/. access date September 2012.
Saeed A, Sharov V, White J, Li J, Liang W, Bhagabati N, et al.TM4: a free, open-source system for microarray data management and analysis. Biotechniques. 2003; 34:374–8.
STRING - Known and Predicted Protein-Protein Interactions. http://string-db.org/, access date March 2013.
Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al.The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2010; 39(Database):561–8. doi:http://dx.doi.org/10.1093/nar/gkq973.
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: simple building blocks of complex networks. Science. 2002; 298(5594):824–7.
Maslov S, Sneppen K. Specificity and stability in topology of protein networks. Science. 2002; 296(5569):910–3.
Lorenz DR, Cantor CR, Collins JJ. A network biology approach to aging in yeast. Proc Natl Acad Sci. 2009; 106(4):1145–50. doi:http://dx.doi.org/10.1073/pnas.0812551106.
Ye C, Galbraith SJ, Liao JC, Eskin E. Using network component analysis to dissect regulatory networks mediated by transcription factors in yeast. PLoS Comput Biol. 2009; 5(3):1000311. doi:http://dx.doi.org/10.1371/journal.pcbi.1000311.
West MA, Kim K, Kliebenstein DJ, van Leeuwen H, Michelmore RW, Doerge R, et al.Global eQTL mapping reveals the complex genetic architecture of transcript-level variation in Arabidopsis. Genetics. 2007; 175(3):1441–50.
Li Y, Breitling R, Snoek LB, van der Velde KJ, Swertz MA, Riksen J, et al.Global genetic robustness of the alternative splicing machinery in Caenorhabditis elegans. Genetics. 2010; 186(1):405–10. doi:http://dx.doi.org/10.1534/genetics.110.119677.
Rockman MV, Skrovanek SS, Kruglyak L. Selection at linked sites shapes heritable phenotypic variation in C. elegans. Science. 2010; 330(6002):372–6. doi:http://dx.doi.org/10.1126/science.1194208.
Stigler B, Chamberlin HM. A regulatory network modeled from wild-type gene expression data guides functional predictions in Caenorhabditis elegans development. BMC Syst Biol. 2012; 6(1):77. doi:http://dx.doi.org/10.1186/1752-0509-6-77.
Wang W, Zhang X. Network-based group variable selection for detecting expression quantitative trait loci (eQTL). Bioinformatics. 2011; 12(1):269. doi:http://dx.doi.org/10.1186/1471-2105-12-269.
Stanley D, Watson-Haigh NS, Cowled CJ, Moore RJ. Genetic architecture of gene expression in the chicken. BMC Genomics. 2013; 14(1):13. doi:http://dx.doi.org/10.1186/1471-2164-14-13.
Suthram S, Beyer A, Karp RM, Eldar Y, Ideker T. eQED: an efficient method for interpreting eQTL associations using protein networks. Mol Syst Biol. 2008; 4(1):162.
Lee I, Lehner B, Crombie C, Wong W, Fraser AG, Marcotte EM. A single gene network accurately predicts phenotypic effects of gene perturbation in caenorhabditis elegans. Nat Genet. 2008; 40(2):181–8. doi:http://dx.doi.org/10.1038/ng.2007.70.
Murphy CT, McCarroll SA, Bargmann CI, Fraser A, Kamath RS, Ahringer J, et al.Genes that act downstream of DAF-16 to influence the lifespan of Caenorhabditis elegans. Nature. 2003; 424(6946):277–83. doi:http://dx.doi.org/10.1038/nature01789.
Syntichaki P, Tavernarakis N. Signaling pathways regulating protein synthesis during ageing. Exp Gerontol. 2006; 41(10):1020–5. doi:http://dx.doi.org/10.1016/j.exger.2006.05.014.
Hamilton B. A systematic rnai screen for longevity genes in c. elegans. Genes Dev. 2005; 19(13):1544–55. doi:http://dx.doi.org/10.1101/gad.1308205.
Menne TF, Goyenechea B, Sanchez-Puig N, Wong CC, Tonkin LM, Ancliff PJ, et al.The Shwachman-Bodian-Diamond syndrome protein mediates translational activation of ribosomes in yeast. Nat Genet. 2007; 39(4):486–95. doi:http://dx.doi.org/10.1038/ng1994.
Vasieva O. Role of Shwachman-Bodian-Diamond syndrome protein in translation machinery and cell chemotaxis: a comparative genomics approach. Adv Appl Bioinforma Chem.2011;43. doi:http://dx.doi.org/10.2147/AABC.S23510.
Samuelson AV, Carr CE, Ruvkun G. Gene activities that mediate increased life span of C. elegans insulin-like signaling mutants. Genes Dev. 2007; 21(22):2976–994. doi:http://dx.doi.org/10.1101/gad.1588907.
Navarro RE, Shim EY, Kohara Y, Singson A, Blackwell TK. Cgh-1. a conserved predicted RNA helicase required for gametogenesis and protection from physiological germline apoptosis in C. elegans. Development. 2001; 128:322–2.
Simmer F, Moorman C, van der Linden AM, Kuijk E, van den Berghe PVE, Kamath RS, et al.Genome-wide RNA of C. elegans using the hypersensitive rrf-3 strain reveals novel gene functions. PLoS Biol. 2003; 1(1):2. doi:http://dx.doi.org/10.1371/journal.pbio.0000012.
Halaschek-Wiener J. Analysis of long-lived c. elegans daf-2 mutants using serial analysis of gene expression. Genome Res. 2005; 15(5):603–15. doi:http://dx.doi.org/10.1101/gr.3274805.
Rodriguez M, Snoek LB, Riksen JAG, Bevers RP, Kammenga JE. Genetic variation for stress-response hormesis in C. elegans lifespan. Exp Gerontol. 2012; 47(8):581–7. doi:http://dx.doi.org/10.1016/j.exger.2012.05.005.
van der Bent ML, Sterken MG, Volkers RJM, Riksen JAG, Schmid T, Hajnal A, et al.Loss-of-function of catenin bar-1 slows development and activates the wnt pathway in caenorhabditis elegans. Sci Rep.2014;4. doi:http://dx.doi.org/10.1038/srep04926.
Irazoqui JE, Urbach JM, Ausubel FM. Evolution of host innate defence: insights from Caenorhabditis elegans and primitive invertebrates. Nat Rev Immun. 2010; 10(1):47–58. doi:http://dx.doi.org/10.1038/nri2689.
Acknowledgements
This work has been funded by ERASysbio-plus ZonMW project GRAPPLE (project nr. 90201066). We are very grateful to Andrew Cossins for his helpful advice. The work of O.V. Valba was supported in 2014 by the Programm of fundamental study of National Research University Higher School of Economics.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests. This study did not require Ethics committee approval.
Authors’ contributions
OVV participated in design of the study, carried out the computational analysis, drafted the computational part of the manuscript and performed the required formatting. SKN participated in design of the study, coordination of the computational analysis and drafting of the manuscript, MGS and LBS participated in experimental data retrieval; analysis of the results and drafting of the manuscript, JEK participated in design of the study and drafting of the manuscript, OOV participated in design and coordination of the study, data retrieval and analysis, and drafted the manuscript. All authors read and approved the final manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Valba, O.V., Nechaev, S.K., Sterken, M.G. et al. On predicting regulatory genes by analysis of functional networks in C. elegans. BioData Mining 8, 33 (2015). https://doi.org/10.1186/s13040-015-0066-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13040-015-0066-0