This article has Open Peer Review reports available.
Functional dyadicity and heterophilicity of gene-gene interactions in statistical epistasis networks
© Hu et al. 2015
Received: 12 December 2014
Accepted: 3 July 2015
Published: 21 December 2015
The interaction effect among multiple genetic factors, i.e. epistasis, plays an important role in explaining susceptibility on common human diseases and phenotypic traits. The uncertainty over the number of genetic attributes involved in interactions poses great challenges in genetic association studies and calls for advanced bioinformatics methodologies. Network science has gained popularity in modeling genetic interactions thanks to its structural characterization of large numbers of entities and their complex relationships. However, little has been done on functionally interpreting statistically inferred epistatic interactions using networks.
In this study, we propose to characterize gene functional properties in the context of interaction network structure. We used Gene Ontology (GO) to functionally annotate genes as vertices in a statistical epistasis network, and quantitatively characterize the correlation between the distribution of gene functional properties and the network structure by measuring dyadicity and heterophilicity of each functional category in the network. These two parameters quantify whether genetic interactions tend to occur more frequently for genes from the same functional category, i.e. dyadic effect, or more frequently for genes from across different functional categories, i.e. heterophilic effect.
By applying this framework to a population-based bladder cancer dataset, we were able to identify several GO categories that have significant dyadicity or heterophilicity associated with bladder cancer susceptibility. Thus, our informatics framework suggests a new methodology for embedding functional analysis in network modeling of statistical epistasis in genetic association studies.
The goal of genetic association studies is to identify heritable genetic factors that can help explain common human diseases and phenotypic traits [1–3]. Recent rapid development of sequencing technologies enables genotyping thousands to millions of single nucleotide polymorphisms (SNPs) for testing their phenotypic associations and thus brings the genetic association studies to a new era [4, 5]. Although studies have uncovered numerous disease susceptibility loci over the years [1, 6, 7], the majority of them were only able to find limited associations between individual genetic factors and disease risks with commonly used main-effect based methods . The non-linear interaction effect among multiple genetic attributes has been realized to play an important role explaining the missing heritability [9, 10]. This interaction effect, also defined as epistasis, describes the departure of independence among multiple genetic attributes associated with a particular phenotypic outcome [11–14]. Epistasis holds great potentials and has become a new focus of genetic association studies [15–17]. However, it also poses great statistical and computational challenges due to the high dimensionality and computational demands of interaction analysis [18, 19].
Network science has gained popularity in biological sciences thanks to its ability of modeling complex relationships among a large number of entities [20, 21]. A network is generally defined by a collection of vertices joined in pairs by edges. It has been used to study biological systems at multiple levels of organization including metabolisms , protein-protein interactions , genetic regulatory networks , and food webs . It also provides a very suitable framework for epistasis studies since it allows for a structural representation of a large number of genetic attributes and their interaction relationships . A number of genetic association studies have used networks to characterize epistatic interactions and have seen successful applications to various human diseases and traits [27–29].
Most existing epistasis network methodologies construct genetic interaction networks by assigning vertices as genetic attributes, e.g. SNPs or genes, and linking pairs of vertices if significant interaction relationships are detected, either biologically or statistically. Then vertices with outstanding network properties are identified as key vertices including hubs, i.e. vertices with a significantly larger number of neighbors than average, or bottlenecks, i.e. vertices with high centralities that hold essential positions on information transmission flows between all pairwise vertices in a network. Annotation of these key vertices is then used to prioritize particular functional categories, such as pathways, with high disease/phenotype association and to propose hypothesis for further biological validations [30–32]. In this study, we take a different route incorporating functional annotation in genetic interaction networks by analyzing the distribution of vertex functional characteristics in the context of network structure.
In most complex networks, besides contributing to the network topology, vertices may possess various characteristics, for instance individual education level in social networks or biological functions in protein-protein interaction networks . The distribution of these vertex characteristics may not be random in the networks but likely correlated with the underlying network structure. There are, in fact, many empirical observations that vertices with similar characteristics tend to be linked together or vice versa . However, not much analytical methodologies have been proposed to quantify such correlations. A recent study by Park et al.  proposed using two parameters, dyadicity and heterophilicity, to quantify such interplay between the distribution of vertex properties and the network structure. Their method was applied to complex networks including protein-protein interaction network and mobile service network, and proved effective using these two parameters to quantify the dyadic and heterophilic effects of the distribution of vertex properties.
In this study, we adopted the dyadicity and heterophilicity measurements to characterize gene-gene interactions in the context of epistasis networks. Previously we developed the framework of inferring large scale genetic interactions using Statistical Epistasis Networks (SEN) in disease association studies [27, 35, 36]. We constructed a gene-gene interaction network based on the SEN methodology and investigated the distribution of Gene Ontology functions of genes in such an interaction network. This analysis was expected to help elucidate the varying properties of gene-gene interactions for different functional categories, and thus help us to better understand the underlying biology of statistical genetic interactions.
Bladder cancer dataset
We used a population-based bladder cancer dataset in this study. Bladder cancer cases were from residents of New Hampshire identified in the State Cancer Registry. The cancer patients are of ages 25 to 74 years, diagnosed from July 1, 1994 to June 30, 2001. Healthy controls of age under 65 were selected using population lists from the New Hampshire Department of Transportation, and those of age 65 and above were chosen from data files provided by the Centers for Medicare & Medicaid Services (CMS) of New Hampshire. More than 95 % of the population were of Caucasian origin. Each participant provided informed consent and all data collection procedures and study materials were approved by the Committee for the Protection of Human Subjects at Dartmouth College.
In the genotyping process, DNA was isolated from peripheral circulating blood lymphocyte specimens using Qiagen genomic DNA extraction kits (QIAGEN inc., Valencia, CA). All DNA samples of sufficient concentration were genotyped using the GoldenGate Assay system by Illumina’s Custom Genetic Analysis service (Illumina, Inc., San Diego, CA). Ninety nine point five percent of the submitted samples were successfully genotyped, and samples repeated on multiple plates yielded the same call for 99.9 % of the SNPs. SNPs with more than 5 % missing data were removed from our analysis, and the remaining missing genotypes were imputed using alleles of the highest frequencies across the population. The final dataset includes 1422 SNPs from 396 cancer susceptibility genes from 491 bladder cancer cases and 791 healthy controls. More details of this dataset were discussed in [37, 38].
Gene interaction network
We previously developed a framework of Statistical Epistasis Networks (SEN) to infer the global structure of interactions among a large set of genetic attributes in genetic association studies . First, all the pairwise epistatic interactions were measured using the information theoretic metric of information gain [39, 40]. Specifically, given a pair of SNPs A and B, the amount of information each of them explains on the phenotypic outcome C was measured using mutual information I(A;C) and I(B;C). When joining A and B, I(A,B;C) captured the total association of A and B together on C. Subtracting the individual associations of I(A;C) and I(B;C) from I(A,B;C), i.e. the information gain IG(A;B;C), provided the gain of information on C by combining A and B together, and served as the measure of epistatic interaction between A and B on C.
Then networks were built by including pairs of SNPs as connected vertices if their epistatic interaction strengths were stronger than a theoretically derived threshold. We used global network properties, including the size of the network, the size of the giant connected component and vertex degree distribution, and permutation testing to derive a threshold for including SNP pairs when the network built from the real data showed the most distinguishing topological properties than permuted data networks. Such statistical epistasis networks were able to capture the global interaction structure of a large set of SNPs.
The SEN framework was successfully applied to the population-based bladder cancer dataset, and we were able to identify a SNP interaction network that had a significantly larger giant connected component and a distinguishing heavy-tail degree distribution, compared to all permuted data networks built using the same pairwise interaction threshold . The finding of such a network proposes an important hypothesis of the existence of a large connected structure of complex interactions among bladder cancer associated SNPs, and calls for further validations and investigations.
In the current study, we used Gene Ontology to assign function annotation of each gene and look into the characterization of vertex properties in the epistasis interaction network. Therefore, we built a gene-gene interaction network from the previously identified SNP-SNP interaction network of bladder cancer. In the gene interaction network, each vertex represented a gene, and two genes were connected by an edge if there existed at least one pair of SNPs, one from each gene, that were connected in the identified SNP-SNP interaction network. Transforming the SNP-SNP interaction network to the gene-gene interaction network allowed functional categorizing directly on genes as vertices in the network since the Gene Ontology annotation is on the gene level.
Gene ontology annotation
We used the Database for Annotation, Visualization and Integrated Discovery (DAVID)  to functionally annotate the 185 genes in our epistasis network based on Gene Ontology. The FAT level was used for biological process (BP), cellular component (CC), and molecular function (MF) annotations. GO categories were considered significantly enriched in our network if their enrichment significances were higher than the conventionally used threshold 0.05. We set the gene-in-category count threshold to 3, i.e., we included GO terms in the annotation analysis only if they had at least three genes from our 185 network genes.
Distribution of vertex properties in networks
Networks have been used to model interactions in complex systems in various areas from biological sciences, engineering, to social science. In most real complex networks, vertices themselves also possess functional characteristics, and observing the distribution of vertex characteristics in the context of network structure provides insights into whether vertices with similar functions tend to connect to each other. A recent study on complex networks  proposed a quantitative approach to depicting the interplay between vertex properties and the structure of the underlying network. They proposed two parameters, dyadicity and heterophilicity, to measure to what degree the vertex characteristics are correlated with the network structure.
Gene interaction network of bladder cancer
Enriched gene functional categories
Gene Ontology annotation using DAVID returned 808 GO terms as significantly enriched functional categories for our set of 185 network genes. The category of the largest gene-in-category count was GO_MF_FAT nucleotide binding that had 48 genes, followed by GO_BP_FAT response to organic substance (45 genes), GO_CC_FAT cell fraction (45 genes), and GO_CC_FAT membrane-enclosed lumen (45 genes). We then used these enriched 808 GO terms as vertex properties to perform the analysis on the distribution of vertex characteristics in the network.
Dyadicity and heterophilicity of enriched GO categories
Each of the enriched GO terms was set as a vertex property, and we assigned each vertex a value of 1 for the property if the represented gene was in the GO category and 0 if not. The dyadicity (D) and heterophilicity (H) values were then calculated for each of the 808 GO categories. A 100,000-fold permutation test was used to estimate the significance of observed D and H, by shuffling the assignment of vertex property values. The p-value was calculated as the number of D (H) values of permuted networks that were greater than or equal to the observed values of the real network.
Dyadicity and heterophilicity analysis results of the bladder cancer gene interaction network
Gene Ontology terms
Identical protein binding
Response to estrogen stimulus
RNA biosynthetic process
Negative regulation of DNA binding
Regulation of phagocytosis
Nucleotide-excision repair, DNA gap filling
Regulation of sterol-transport
Regulation of cholesterol transport
In this article, we proposed the methodology of analyzing the distribution of gene functional properties in the context of statistical epistasis networks. The gene interaction network was constructed by first identifying the network of strong and significant pairwise SNP epistatic interactions and then building gene network on top of the SNP interaction network. After annotating genes as vertices based on their functional Gene Ontology, dyadicity and heterophilicity analysis was performed for each GO term to investigate to what degree the vertex characteristics correlate with the underlying interaction network topology. Using a population-based bladder cancer dataset and its previously identified SNP statistical epistasis network, we performed the dyadicity and heterophilicity analysis on enriched GO terms for the genes in the gene interaction network associated with bladder cancer. We were able to find 12 GO categories with significant dyadicity or heterophilicity, which indicated the differential interaction patterns among genes from various functional categories, i.e. some functional categories tend to have genes interacting with each other within the same categories whereas genes from some other functional categories tend to interact more with genes from other categories.
This study complements our previous framework of statistical epistasis networks by constructing gene interaction networks and further analyzing the distribution of gene functional characteristics in the networks. Network science has become very powerful in modeling epistatic interactions in genetic association studies. It is capable of representing and analyzing complex interactions among a large number of genetic attributes. However, less has been done on incorporating functional properties of genetic attributes in the context of interaction networks. Our work analyzed the interplay between functional properties and network topology and provides important insights into the interpretation of the interactions and better understandings of the etiology of the associated diseases.
The bladder cancer gene interaction network had a large connected giant component. This indicates the complex genetic architecture underlying bladder cancer. A total of 808 functional categories were enriched across the 185 genes in the gene interaction network using GO functional annotation analysis. Seven GO terms were significantly dyadic and five others were significantly heterophilic. These different interaction properties of GO categories provide useful insights in understanding various functional components in the etiology of bladder cancer. For instance, note that the functional category nucleotide-excision repair, DNA gap filling was enriched in our set of network genes and was shown possessing significantly high heterophilicity (H=2.162, p H =0.037). DNA repair genes were previously found to be associated with bladder cancer susceptibility . The current study demonstrates that these genes contribute to bladder cancer susceptibility through epistatic interactions, and their interaction effect is heterophilic, which could indicate that, rather than depending on each other, DNA repair genes would be more likely to interact with genes from other functional categories. SNPs that lead to an increase in the level of DNA damage, (i.e. by increasing the bioactivation of toxins to reactive intermediates), could synergize with impaired DNA repair mechanisms, leading to a greater than additive increase in cancer risk.
Also note that regulation of cholesterol transport (D=32.794, p D =0.048) and regulation of sterol transport (D=32.794, p D =0.047) that included genes APOA2, BZRP, and LEP in the network, were enriched and found highly dyadic in the gene interaction network. A growing body of literature suggests increased risk of cancers, including bladder, is associated with high intake of dietary cholesterol . Recent studies have identified the role of cholesterol homeostasis as potential targets for cancer therapeutics . It has been well accepted that excess cholesterol and intermediates of the cholesterol biosynthesis pathway are needed for cancer cells to maintain a high level of proliferation, and the cholesterol and sterol transport mechanisms could be used as potential targets for cancer drug design . Our results suggest that the interaction effects of cholesterol and sterol transport regulation genes, mostly dyadic, contribute to the susceptibility of bladder cancer, and might be useful for future identification of cancer drug targets. We also speculate that the dyadic interaction effect could be the indication that cholesterol transport molecules must bind to cholesterol and to each other to move cholesterol through the body since it is insoluble in blood and many of them exhibit feedback regulation. Therefore regulation of cholesterol and sterol transports have more protein-protein interactions among themselves that are reflected as statistical epistasis interactions in relation to bladder cancer than with other functional groups.
Our methodology itself has great application potential in genetic association studies. It can be used to analyze and interpret the gene-gene interactions for a wide range of phenotypes or diseases. In the current study, we adopted GO annotation with the limitations including that the categorizations are assigned based on current knowledge but many change as new scientific discoveries are made, and that categories are sometimes subsets of one another. In future extensions and applications, we are interested in using other functional annotation methods, such as pathways, drug-, and environment-associations, to look into how these different methods of functional categorization interplay with the vertex property distribution in the gene interaction networks.
This work was supported by the National Institute of Health (NIH) of the United States of America grants R01-LM010098, R01-LM009012, R01-AI59694, P20-GM103506, P20-GM103534 to JHM, and R25-CA134286, R01-CA05749, P20-GM104416 to MRK.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Hardy J, Singleton A. Genome-wide association studies and human disease. N Engl J Med. 2009; 360(17):1759–1768.PubMedPubMed CentralView ArticleGoogle Scholar
- Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005; 6(2):95–108.PubMedView ArticleGoogle Scholar
- Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996; 273(5281):1516–1517.PubMedView ArticleGoogle Scholar
- The international HapMap Consortium. The international HapMap project. Nature. 2003; 426:789–96.View ArticleGoogle Scholar
- Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001; 409:928–33.PubMedView ArticleGoogle Scholar
- Hindorff LA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci. 2009; 106(23):9362–367.PubMedPubMed CentralView ArticleGoogle Scholar
- Hirschhorn JN. Genomewide association studies — illuminating biologic pathways. The N Engl J Med. 2009; 360(17):1699–1701.PubMedView ArticleGoogle Scholar
- Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009; 461:747–53.PubMedPubMed CentralView ArticleGoogle Scholar
- Moore JH. A global view of epistasis. Nat Genet. 2005; 37(1):13–14.PubMedView ArticleGoogle Scholar
- Musani SK, Shriner D, Liu N, Feng R, Coffey CS, Yi N, et al. Detection of gene-gene interactions in genome-wide sssociation studies of human population data. Human Hered. 2007; 63:67–84.View ArticleGoogle Scholar
- Moore JH, Williams SM. Traversing the conceptual divide between biological and statistical epistasis: Systems biology and a more modern synthesis. BioEssays. 2005; 27(6):637–46.PubMedView ArticleGoogle Scholar
- Moore JH, Williams SM. Epistasis and its implications for personal genetics. The Am J Hum Genet. 2009; 85(3):309–20.PubMedView ArticleGoogle Scholar
- Phillips PC. The language of gene interaction. Genetics. 1998; 149:1167–1171.PubMedPubMed CentralGoogle Scholar
- Phillips, PC. Epistasis - the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 2008; 9:855–67.View ArticleGoogle Scholar
- Carlborg O, Haley CS. Epistasis: too often neglected in complex trait studies?Nat Rev Genet. 2004; 5:618–524.PubMedView ArticleGoogle Scholar
- Moore JH. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered. 2003; 56:73–82.PubMedView ArticleGoogle Scholar
- Van Steen K. Travelling the world of gene-gene interactions. Brief Bioinform. 2012; 13(1):1–19.View ArticleGoogle Scholar
- Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009; 10(6):392–404.PubMedPubMed CentralView ArticleGoogle Scholar
- Moore JH, Asselbergs FW, Williams SM. Bioinformatics challenges for genome-wide association studies. Bioinformatics. 2010; 26(4):445–55.PubMedPubMed CentralView ArticleGoogle Scholar
- Newman MEJ. Networks: An Introduction. Oxford, UK: Oxford University Press; 2010.View ArticleGoogle Scholar
- Strogatz SH. Exploring complex networks. Nature. 2001; 410:268–76.PubMedView ArticleGoogle Scholar
- Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL. Hierarchical organization of modularity in metabolic networks. Science. 2002; 297:1551–1555.PubMedView ArticleGoogle Scholar
- Jeong H, Mason SP, Barabasi AL, Oltvai ZN. Lethality and centrality in protein networks. Nature. 2001; 411:41–2.PubMedView ArticleGoogle Scholar
- Barabasi AL, Oltvai ZN. Network biology: Understanding the cell’s functional organization. Nat Rev Genet. 2004; 5:101–13.PubMedView ArticleGoogle Scholar
- Martinez ND. Constant connectance in community food webs. The Am Soc Nat. 1992; 140(6):1208–1218.View ArticleGoogle Scholar
- Hu T, Moore JH. Network modeling of statistical epistasis In: Elloumi M, Zomaya AY, editors. Biological knowledge discovery handbook: preprocessing, mining, and postprocessing of biological data. NJ, USA: Wiley: 2013. p. 175–90. Chap. 8.Google Scholar
- Hu T, Sinnott-Armstrong NA, Kiralis JW, Andrew AS, Karagas MR, Moore JH. Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinforma. 2011; 12:364.View ArticleGoogle Scholar
- McKinney BA, Crowe JE, Guo J, Tian D. Capturing the spectrum of interaction effects in genetic association studies by simulated evaporative cooling network analysis. PLoS Genet. 2009; 5(3):1000432.View ArticleGoogle Scholar
- Wu Y, Zhu X, Chen J, Zhang X. Einvis: a visualization tool for analyzing and exploring genetic interactions in large-scale association studies. Genet Epidemiol. 2013; 37(7):675–85.PubMedView ArticleGoogle Scholar
- Hu T, Pan Q, Andrew AS, Langer JM, Cole MD, Tomlinson CR, et al. Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibility. BioData Min. 2014; 7(1):5.PubMedPubMed CentralView ArticleGoogle Scholar
- Pandey A, Davis NA, White BC, Pajewski NM, Savitz J, Drevets WC, et al. Epistasis network centrality analysis yields pathway replication across two GWAS cohorts for bipolar disorder. Transl Psychiatry. 2012; 2:154.View ArticleGoogle Scholar
- West J, Widschwendter M, Teschendorff AE. Distinctive topology of age-associated epigenetic drift in the human interactome. Proc Natl Acad Sci. 2013; 110(35):14138–14143.PubMedPubMed CentralView ArticleGoogle Scholar
- Newman MEJ. Assortative mixing in networks. Phys Rev Lett. 2002; 89(20):208701.PubMedView ArticleGoogle Scholar
- Park J, Barabasi AL. Distribution of node characteristics in complex networks. Proc Natl Acad Sci. 2007; 104(46):17916–17920.PubMedPubMed CentralView ArticleGoogle Scholar
- Hu T, Andrew AS, Karagas MR, Moore JH. Statistical epistasis networks reduce the computational complexity of searching three-locus genetic models. Proc Pac Symp Biocomput. 2013; 18:397–408.Google Scholar
- Hu T, Chen Y, Kiralis JW, Moore JH. ViSEN: Methodology and software for visualization of statistical epistasis networks. Genet Epidemiol. 2013; 37:283–5.PubMedPubMed CentralView ArticleGoogle Scholar
- Andrew AS, Nelson HH, Kelsey KT, Moore JH, Meng AC, Casella DP, et al. Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. Carcinogenesis. 2006; 27(5):1030–1037.PubMedView ArticleGoogle Scholar
- Karagas MR, Tosteson TD, Blum J, Morris JS, Baron JA, Klaue B. Design of an epidemiologic study of drinking water arsenic exposure and skin and bladder cancer risk in a U.S. population. Environ Health Perspect. 1998; 106(4):1047–1050.PubMedPubMed CentralView ArticleGoogle Scholar
- Cover TM, Thomas JA. Elements of Information Theory: Second Edition. NJ, USA: Wiley; 2006.Google Scholar
- Hu T, Chen Y, Kiralis JW, Collins RL, Wejse C, Sirugo G, et al. An information-gain approach to detecting three-way epistatic interactions in genetic association studies. J Am Med Inform Assoc. 2013; 20:630–6.PubMedPubMed CentralView ArticleGoogle Scholar
- Huang D, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009; 4:44–57.View ArticleGoogle Scholar
- Hu J, La Vecchia C, de Groh M, Negri E, Morrison H, Mery L. Dietary cholesterol intake and cancer. Ann Oncol. 2012; 23(2):491–500.PubMedView ArticleGoogle Scholar
- Cruz PMR, Mo H, McConathy WJ, Sabnis N, Lacko AG. The role of cholesterol metabolism and cholesterol transport in carcinogenesis: a review of scientific findings, relevant to future cancer therapeutics. Front Pharmacol. 2013; 4:119.PubMedPubMed CentralView ArticleGoogle Scholar
- Kang M, Jeong CW, Ku JH, Kwak C, Kim HH. Inhibition of autophagy protentiates atorvastatin-induced apoptotic cell death in human bladder cancer cells in vitro. Int J Mol Sci. 2014; 15(5):8106–121.PubMedPubMed CentralView ArticleGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13:2498–504.PubMedPubMed CentralView ArticleGoogle Scholar