GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures
 Ryan J Urbanowicz^{1},
 Jeff Kiralis^{1},
 Nicholas A SinnottArmstrong^{1},
 Tamra Heberling^{1},
 Jonathan M Fisher^{1} and
 Jason H Moore^{1}Email author
DOI: 10.1186/17560381516
© Urbanowicz et al.; licensee BioMed Central Ltd. 2012
Received: 24 April 2012
Accepted: 14 September 2012
Published: 1 October 2012
Abstract
Background
Geneticists who look beyond single locus disease associations require additional strategies for the detection of complex multilocus effects. Epistasis, a multilocus masking effect, presents a particular challenge, and has been the target of bioinformatic development. Thorough evaluation of new algorithms calls for simulation studies in which known disease models are sought. To date, the best methods for generating simulated multilocus epistatic models rely on genetic algorithms. However, such methods are computationally expensive, difficult to adapt to multiple objectives, and unlikely to yield models with a precise form of epistasis which we refer to as pure and strict. Purely and strictly epistatic models constitute the worstcase in terms of detecting disease associations, since such associations may only be observed if all nloci are included in the disease model. This makes them an attractive gold standard for simulation studies considering complex multilocus effects.
Results
We introduce GAMETES, a userfriendly software package and algorithm which generates complex biallelic single nucleotide polymorphism (SNP) disease models for simulation studies. GAMETES rapidly and precisely generates random, pure, strict nlocus models with specified genetic constraints. These constraints include heritability, minor allele frequencies of the SNPs, and population prevalence. GAMETES also includes a simple dataset simulation strategy which may be utilized to rapidly generate an archive of simulated datasets for given genetic models. We highlight the utility and limitations of GAMETES with an example simulation study using MDR, an algorithm designed to detect epistasis.
Conclusions
GAMETES is a fast, flexible, and precise tool for generating complex nlocus models with random architectures. While GAMETES has a limited ability to generate models with higher heritabilities, it is proficient at generating the lower heritability models typically used in simulation studies evaluating new algorithms. In addition, the GAMETES modeling strategy may be flexibly combined with any dataset simulation strategy. Beyond dataset simulation, GAMETES could be employed to pursue theoretical characterization of genetic models and epistasis.
Keywords
GAMETES SNP Epistasis Simulation Model GeneticsBackground
Despite the rising quality and abundance of genetic data, epidemiologists continue to struggle to explain the known heritability of these complex phenotypes with existing genetic factors. While strategies seeking single locus associations (i.e. main effects) are often sufficient to address diseases which follow Mendelian patterns of inheritance, their application to diseases characterized as complex has yielded limited success in the pursuit of explaining heritability [1, 2]. Epistasis is one of several phenomena reviewed by [3] believed to hinder the reliable identification of predictive genetic markers in association studies. The term epistasis was coined to describe a genetic “masking” effect viewed as a multilocus extension of the dominance phenomenon, where a variant at one locus prevents the variant at another locus from manifesting its effect [4]. In the present study we consider statistical epistasis, which is the phenomenon as it would be observed in association studies. Statistical epistasis is traditionally defined as a deviation from additivity in a mathematical model summarizing the relationship between multilocus genotypes and phenotypic variation in a population [5]. Specifically, we focus on statistical epistasis that is both strict and pure. An nlocus model is purely and strictly epistatic if all n loci, but no fewer, are predictive of disease status. The loci in these models could be viewed as “fully masked” in that no predictive information is gained until all n loci are considered in concert.
Computational geneticists are putting significant effort into developing algorithms for the detection of complex disease associations within clinical data [6–9]. While real biological datasets serve as the gold standard for validating new techniques, the development and systematic evaluation of computational strategies calls for simulated data.
Previous genetic data simulation efforts have introduced strategies for generating different data types including: quantitative trait loci mapping [10], pedigree association [11–15], and case/controlbased association [16–23]. The simulation of datasets typically involves two stages: model generation, and sample generation. While most of the strategies cited above focus on the latter stage, the generation of realistic complex disease models constitutes a key challenge.
Moore, et. al [18] noted that simple, biallelic, 1locus or purely epistatic 2locus models of disease can be assembled with relative ease via trial and error. [24] characterized all fully penetrant twolocus models, where genotype disease probabilities were restricted to zero and one [25], later expanded upon this to include models with continuous penetrance values [26], generated purely epistatic models using the double description method [27] (for 2locus models) and a nonlinear maximization strategy (for 3 and 4locus models) [17] and [18], recruited an evolutionary algorithm (EA) to evolve 2 to 5locus epistatic models encoded as binary chromosomes. Over successive generations, models evolved towards a state of pure epistasis. Most recently, [23] circumvented the typical first stage of data simulation (i.e. model generation), directly evolving the genotypes and affection status of samples in a dataset. Evolving datasets with 3 to 5 SNPs at a time, this EA strategy evolved 3 to 5locus epistatic interactions while attempting to avoid main effects and any intermediate nested 2locus interactions.
While EA’s have offered some success in the automated discovery of complex genetic models, there are distinct drawbacks to their use. First, they are computationally expensive, limiting the order and the quantity of models that can be feasibly generated. Second, EA’s are not guaranteed to find epistatic models that are either pure or strict. Finally, if researchers wish to specify other model constraints (e.g. heritability, or prevalence), the EA fitness landscape becomes multiobjective, introducing further challenges and limitations to the evolutionary process.
In this study we introduce a Genetic Architecture Model Emulator for Testing and Evaluating Software (GAMETES). This algorithm provides a direct approach for the simulation of biallelic nlocus epistatic models which may be used in conjunction with any sample generation strategy. Specifically, GAMETES generates a precise class of epistatic models that are both pure and strict. Each nlocus model is generated deterministically, based on a set of random parameters, a randomly selected direction, and specified values of heritability, minor allele frequencies, and (optionally) population disease prevalence. For valid combinations of these model constraints, GAMETES attempts to generate a population of model architectures. We use the term architecture to reference the unique composition of a model (i.e. the penetrance values and arrangement of those values across genotypes). We demonstrate that GAMETES is a fast, reliable, and flexible method for generating complex genetic models of random architecture. We evaluate GAMETES over an example simulation study using MDR [28], an exhaustive combinatorial search algorithm designed to detect epistasis.
Methods
In this section, we describe (1) relevant background in genetics and modeling, (2) the specific steps for generating nlocus epistatic models that are both strict and pure and (3) an example simulation study using GAMETES.
Genetics and modeling
Single nucleotide polymorphisms (SNPs) are single loci in the DNA sequence where alternate nucleotides (i.e. alleles) are observed between members of a species or between paired chromosomes in an individual. Most characterized SNPs are biallelic, meaning that only two alleles (A or a) are observed in a population. The genotype of a SNP in a diploid organism is determined by alleles found on each chromosome of the homologous pair. A biallelic SNP can have one of three genotypes: AA, Aa, or aa. The term genotype has been used to refer both to the allele states of a single SNP, as well as the combined allele states of multiple SNPs. To avoid confusion, we refer to the latter as a multilocus genotype (MLG) whenever necessary.
SNPs not under selective pressure within a population typically exhibit genotype frequencies that are predicted by the HardyWeinberg Law [29]. Like most data simulation strategies, GAMETES adopts the assumption of HardyWeinberg Equilibrium (HWE) such that the allele frequencies of a SNP may be used to calculate it’s genotype frequencies as follows: freq(AA) =p ^{2}, freq(Aa) =2pq, and freq(aa) =q ^{2}, where p is the frequency of the major (more common) allele ‘A’, q is the minor allele frequency (MAF) of the minor allele ‘a’, and p + q=1. GAMETES also assumes that alleles at different loci are in linkage equilibrium.
Penetrance function providing penetrance values for three genotypes from a SNP acting under an autosomal recessive disease model
SNP 1  

AA(.25)  Aa (.5)  aa(.25) 
0  0  1 
This penetrance function is fully penetrant, since disease status is completely dependent on genotype (i.e. penetrance values are either 0 or 1). For singlelocus models such as this, one can quickly determine if the SNP displays any main effect. The absence of a main effect would manifest itself as equal penetrance values for all three genotypes. Penetrance functions are easily extended to describe nlocus interactions between n predictive loci using a penetrance function comprised of 3^{ n } penetrance values corresponding to each of the 3^{ n }MLGs.
Statistical models of epistasis
A fully penetrant 2locus purely epistatic penetrance function
SNP 2  Marginal  

Genotype  BB(.25)  Bb (.5)  bb(.25)  penetrance  
AA(.25)  0  1  0  .5  
SNP 1  Aa (.5)  1  0  1  .5 
aa(.25)  0  1  0  .5  
Marginal  .5  .5  .5  K = .5  
Penetrance 
A 2locus purely epistatic penetrance function
SNP 2  Marginal  

Genotype  BB(.25)  Bb (.5)  bb(.25)  penetrance  
AA(.36)  .266  .764  .664  .614  
SNP 1  Aa (.48)  .928  .398  .733  .614 
aa(.16)  .456  .927  .147  .614  
Marginal  .614  .614  .614  K = .614  
Penetrance 
Thus, SNP 1’s genotype alone is not predictive of disease status. Similar computations (using the probabilities of SNP 1’s genotypes and the columns in Table 3) give the same value of .614 for the three marginal penetrances associated with SNP 2. Therefore, SNP 2 alone is also not predictive of disease status. The equality of all six of these marginal penetrances is the mathematical definition of strict, pure epistasis for 2locus models. Their common value equals the population prevalence of disease (K). An expanded definition for nlocus models is given in Additional file 1 (§2.3). It essentially says that an (n)locus model is strictly and purely epistastic if all of the n 3^{ n−1} dot products, analogous to the six just discussed, are equal.
The GAMETES algorithm
We first outline some of the main ideas GAMETES uses to generate random, pure, strict epistatic models. More detail is given in the following sections, and still more is in Additional file 1 (§2). A key part of the GAMETES algorithm is that certain choices of 2^{ n } entries of an nway strict, pure epistatic penetrance table uniquely determine the rest of the table. Here we are assuming that the population prevalence is specified. For example, the four entries .764, .398, .733 and .456 of the penetrance table in Table 3 determine the remaining five entries via equations (1), (2) and (3), and the analogous equations involving the columns of the table. So a first attempt to generate random penetrance tables might be to randomly vary these entries or even to randomly seed the four positions in the penetrance table they occupy. Two difficulties arise with this. One is that the mere choice of these positions—the ones that would be randomly seeded—biases the resulting penetrance tables (to have, for instance, high heritabilites). The other is that varying the values, even slightly, of the four chosen entries might not produce a penetrance table. For instance, if the entry .398 of the four chosen ones is changed to .37, then the entry just below it is forced to be greater than one.
This latter difficulty is addressed by working with what we call prepenetrance tables. These are easily converted to strict, pure, epistatic penetrance tables and are essentially such tables with all marginal penetrances equal to zero. Prepenetrance tables were used in a somewhat different context from ours in §8 of [5]. There, they arise as the difference between the 2locus genotypic means of a quantitative trait and the linear model which best fits these means, and so show the wealth of epistatic models. They are our starting point for generating these models, primarily because all prepenetrance tables, unlike strict, pure, epistatic penetrance tables, are easily described: each point of ${\mathbb{R}}^{{2}^{n}}\phantom{\rule{0.3em}{0ex}}$, Euclidean space of dimension 2^{ n }, corresponds to a unique nlocus prepenetrance table. So picking a random point in ${\mathbb{R}}^{{2}^{n}}$ (only a random direction is needed) determines a random prepenetrance table. Converting this to a penetrance table gives our notion of a random, strict, pure, epistatic penetrance table.
There is a catch here which gets back to the first difficulty: the onetoone correspondence between points of ${\mathbb{R}}^{{2}^{n}}$ and all prepenetrance tables depends on the choice of the 2^{ n } positions that are randomly seeded. The GAMETES algorithm accounts for this by choosing these positions as randomly as is computationally feasible, as discussed in the next section.
Generating random parameters
A truly random way to pick parameters for nlocus prepenetrance tables is to sequentially pick 2^{ n } positions at random and then check to see if these give parameters. The check, however, is computationally slow. (It requires determining if a 2^{ n }×2^{ n }matrix is invertible.) Moreover, many such choices do not give parameters. GAMETES uses what we call the Sudoku method, a much more efficient approach to picking parameters, with only a slight loss of randomness (see Figure 2A). The Sudoku method also picks parameters sequentially. However, after each choice, it expresses as many entries as possible in terms of the chosen parameters, and these are then omitted as possible future choices. This greatly reduces the number of choices which do not lead to parameters. Another advantage of the Sudoku method is that if after 2^{ n } choices all 3^{ n } entries of an nlocus table have been expressed in terms of the chosen ones, then no check is required: the 2^{ n } choices always give parameters in this case.
The Sudoku method always produces parameters randomly when the number of loci n is 2. As n grows, two things happen: (1) some bias occurs in parameter generation, and (2) the success rate for finding parameters decreases: for n=2,…,8, these rates were 100%, 99.9%, 99%, 92.9%, 61.2%, 3.28%, and 0% based on 750,000 GAMETES runs. Because of this, GAMETES uses what we call the Point method to generate parameters when n≥6. Figure 2 (B,C,D) illustrates three iterations of the Point method in the case of two loci. For n loci, the method starts by randomly choosing any one of the 3^{ n } entries. The parameters the Point method then produces are the ones given by the 2^{ n } entries whose corresponding genotypes differ at every single SNP from the genotype corresponding to the chosen entry. The Point method always succeeds, and since it picks randomly from 3^{ n } choices of parameters, not much randomness is lost.
From parameters to penetrance functions
Any prepenetrance function can be converted to a pure, strict, epistatic penetrance function by applying a linear scaling function S to each of its entries. This function S is defined by: the i j ^{th} entry of S(G), G being any prepenetrance function, is $S{\left(G\right)}_{\mathrm{ij}}=\frac{{g}_{\mathrm{ij}}m}{Mm}$. Here g _{ ij } is the i j ^{th} entry of G, and M and m are the maximum and minimum respectively, of the entries of G. The entries of S(G) lie in the interval [0,1], with the minimum entry 0 and the maximum 1. Note that if all entries of G are multiplied by any positive constant c, giving the prepenetrance table cG, then S(cG)=S(G). So applying the function S to all prepenetrance tables in the direction of G, meaning all positive multiples of G, gives the same penetrance table.
Now, given a random direction of prepenetrance tables, the function S converts it to our notion of a random, strict, pure, epistatic penetrance table. Choosing such a random direction requires an explicit description of all prepenetrance tables. Any parameters, for instance those supplied by one of the methods discussed above, provide such a description. It takes the form of a onetoone correspondence between all points of ${\mathbb{R}}^{{2}^{n}}\phantom{\rule{0.3em}{0ex}}$ and all nlocus prepenetrance tables. Specifically, given a point in ${\mathbb{R}}^{{2}^{n}}\phantom{\rule{0.3em}{0ex}}$, the corresponding prepenetrance table has parameter values equal to the coordinates of the point. So, given randomly chosen parameters, picking a random direction of nlocus prepenetrance tables now amounts to picking a random direction or, equivalently, a unit vector in ${\mathbb{R}}^{{2}^{n}}$. We do this using G.W. Brown’s algorithm discussed on page 135 of [31].
Adjusting heritability and prevalence
Assume now that a random, pure, strict epistatic penetrance table with a specified heritability, and perhaps also a specified prevalence is desired. Then GAMETES generates a random penetrance table as above and linearly scales its entries to achieve, if possible, the specified values. Linearly scaling the entries, meaning applying a function of the form f(x)=mx + b,m>0 to each, changes the penetrance table in a relatively minor way so that randomness is preserved. If just heritability is specified, this scaling is done without changing prevalence. The required values of m and b are discussed in Additional file 1 (§2.5).
Certain values of heritability and prevalence can never be achieved as discussed in the next section. So GAMETES will always fail if these values are specified. Also, penetrance tables with certain other values of heritability and prevalence are very sparse among all penetrance tables and so are generated by GAMETES with extremely low probability. For example, there exist nlocus penetrance tables with all heritabilities ≤1 having prevalence and all MAFs equal to $\frac{1}{2}$. However, in both the 5 and 6locus case, GAMETES generated none of these with heritability ≥.1 after 100,000 iterations.
Limits on heritability and prevalence
The values of heritability and prevalence which a penetrance table can have are limited. For instance, no penetrance table has heritability 1 and prevalence $\frac{1}{4}$. The limits are more severe if MAFs are specified. For example, there are 2locus penetrance tables with heritability 1 and prevalence $\frac{1}{2}$, but none if both MAFs are $\frac{1}{4}$. Penetrance tables with heritability 1 only exist if the MAFs are $\frac{1}{2}$ or $1\frac{1}{\sqrt{2}}$. See Table 2 for an example of a penetrance table with a heritability of 1.
Summary of the GAMETES Algorithm
The GAMETES algorithm first (1) generates random parameters and a random unit vector in ${\mathbb{R}}^{{2}^{n}}\phantom{\rule{0.3em}{0ex}}$, then (2) generates a random prepenetrance table by seeding these parameters using the unit vector, and then (3) uses the function S to scale the entries of this random prepenetrance table to generate a random penetrance table. In case a random penetrance table having a specified heritability, or heritability and prevalence is desired, it further (4) scales the entries of this penetrance table to achieve, if possible, these values. If the Sudoku method fails in step (1) or the scaling in step (4), the algorithm iterates until it either succeeds or a specified iteration limit is reached.
A GAMETES simulation study
Our evaluation of GAMETES included an assessment of run time, tracking of model generation success, and an example application of GAMETESsimulated datasets to a simulation study using Multifactor Dimensionality Reduction (MDR). MDR is a well documented combinatorial genetics algorithm that exhaustively searches for epistatic interactions [28]. MDR finds models by scanning through all possible combinations of factors up to a prespecified order of interaction, and has the ability to identify pure, strict epistatic models. MDR was run on a spectrum of simulated datasets, generated using GAMETES. While model generation is the focus of GAMETES, we have included a simple sample generation strategy within the software (see Additional file 1, §2.7). We used GAMETES to attempt generating 5 random models for each of 48 different constraint combinations which differ by number of loci (2, 3, 4, or 5), heritability (0.005, 0.01, 0.025, 0.05, 0.1, or 0.2), and MAF (0.2 or 0.4) with prevalence allowed to vary. GAMETES successfully generated 5 random models for 40 of these constraint combinations. For each of these models, we simulated 100 replicate datasets having balanced sample sizes of 200,400,800, and 1600 (400 datasets per model). For simplicity, each dataset was generated with a total of 20 SNPs. On whole we generated a total of 200 models and 80,000 datasets, all of which were evaluated using MDR.
For every evaluation, MDR was directed to search up to one order higher than the order of the simulated model. The best model was selected based on cross validation (CV) consistency, and in the event of a CV tie, based on testing accuracy. In this study, model detection success was evaluated, referring to the proportion of datasets within which the correct underlying model was identified. Detection success was considered to be meaningful when it was greater than 0.8.
Results and discussion
Average nlocus run time of GAMETES
nLoci  2  3  4  5  6 

Time(sec.)  2.5  5.5  15.3  46.7  153.2 
Model generation frequency
nLoci  2  3  4  5  6 

Heritability  Generation Frequency  
0.005  1  .99  .99  .93  .71 
0.01  1  .99  .99  .88  .24 
0.025  1  .99  .85  .16  <.01 
0.05  1  .93  .17  <.01  <.01 
0.1  1  .24  <.01  0  0 
0.2  .33  .01  0  0  0 
0.3  .10  <.01  0  0  0 
0.4  .02  <.01  0  0  0 
In Additional file 1 (§3) we give the results of an example simulation study evaluating MDR using the models and datasets generated with GAMETES as previously described. We find that GAMETES is able to generate models useful for such evaluations, i.e. models with constraints at the boundary of what MDR is able todetect.
Conclusions
This study introduces GAMETES, a fast and reliable strategy for the generation of complex genetic models with random architectures. Specifically, we focus on the generation of pure, strict nlocus epistatic models (considered to be the most difficult to detect). The benefits of our strategy include (1) speed; deterministic calculation of models makes our approach much faster than an EA approach, (2) randomness; models are generated using a strategy which seeks to maximize the randomness of model architecture, (3) the ability to precisely specify genetic constraints, (4) the ability to generate a large population of models from which to choose, and (5) the potential to combine the GAMETES modeling strategy with any data simulation strategy. An obvious limitation of this approach is the difficulty it has finding models of higher heritability. However, GAMETES is proficient at finding models with heritabilities typically used in evaluating and comparing new bioinformatic strategies.
While the probability is low that these types of ‘extreme’ epistatic interactions occur in biology by chance alone, we instead focus on the fact that they ‘can’ occur. With that in mind, our focus on pure strict epistasis is intended to promote the development of strategies that can accommodate even the most challenging relationships. In doing so, we make minimal assumptions about the true nature of biological interaction. Notably, the GAMETES strategy may be extended to produce impure epistatic models, as well as nested epistasic models, which are more likely to occur by chance. This extension will be a focus of future work.
The GAMETES software is open source and freely available for download. It offers an intuitive and flexible framework for the simulation of complex genetic models, and the option to generate simulated datasets from these models. This software offers both a graphical user interface, as well as command line accessibility to facilitate the quick generation of a large simulated dataset archive. The GAMETES software along with a detailed users guide is included as Additional file 2 and Additional file 3, respectively.
Availability and requirements
Project name: GAMETES
Project home page: http://sourceforge.net/projects/gametes/files/
Operating systems: Linux, Mac, PC
Programming languages: Java
Other requirements: None
License: None
Author’s contributions
RU codeveloped the methodology and software, carried out experiments, and cowrote the manuscript. JK devolped the mathematical proofs, codeveloped the methodology, and cowrote the methods and supplemental material of the manuscript. NSA codeveloped the computational methodology, and carried out the runtime experiment. TH carried out experiment determining maximum heritabilities and worked on respective figure. JF codeveloped the software, and programmed the GAMETES GUI software. JH codeveloped software. All authors read and approved the final manuscript.
Abbreviations
 SNP:

Single nucleotide polymorphism
 EA:

Evolutionary Algorithm
 GAMETES:

Genetic Architecture Model Emulator for Testing and Evaluating Software
 MLG:

Multilocus genotype
 HWE:

HardyWeinberg Equilibrium
 MAF:

Minor Allele Frequency
 K:

Population Prevalence.
Declarations
Acknowledgements
We thank Joshua Payne, Davnah Urbach, Christian Darabos, Richard Cowper, Nadia Penrod, Dov Pechenick, and Qinxin Pan for their careful review of this paper. This work was supported by the William H. Neukom 1964 Institute for Computational Science at Dartmouth College along with NIH grants AI59694, LM009012, and LM010098.
Authors’ Affiliations
References
 Shriner D, Vaughan L, Padilla M: Problems with genomewide association studies. Science. 2007, 316 (5833): 1840c10.1126/science.316.5833.1840c.View ArticleGoogle Scholar
 Eichler E, Flint J, Gibson G, Kong A, Leal S, Moore J, Nadeau J: Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet. 2010, 11 (6): 446450. 10.1038/nrg2809.View ArticlePubMedPubMed CentralGoogle Scholar
 ThorntonWells T, Moore J, Haines J: Genetics, statistics and human disease: analytical retooling for complexity. TRENDS in Genetics. 2004, 20 (12): 640647. 10.1016/j.tig.2004.09.007.View ArticlePubMedGoogle Scholar
 Bateson W: Mendel’s Principles of Heredity. 1909, Cambridge University PressView ArticleGoogle Scholar
 Fisher R: The Correlation Between Relatives on the Supposition of Mendelian Inheritance. Trans R Soc of Edinburgh. 1918, 52: 399433.View ArticleGoogle Scholar
 Cordell H: Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Human Mol Genet. 2002, 11 (20): 246310.1093/hmg/11.20.2463.View ArticleGoogle Scholar
 McKinney B, Reif D, Ritchie M, Moore J: Machine learning for detecting genegene interactions: a review. Appl Bioinf. 2006, 5 (2): 7788. 10.2165/0082294220060502000002.View ArticleGoogle Scholar
 Cordell H: Detecting gene–gene interactions that underlie human diseases. Nat Rev Genet. 2009, 10 (6): 392404. 10.1038/nrg2579.View ArticlePubMedPubMed CentralGoogle Scholar
 Moore J, Asselbergs F, Williams S: Bioinformatics challenges for genomewide association studies. Bioinformatics. 2010, 26 (4): 44510.1093/bioinformatics/btp713.View ArticlePubMedPubMed CentralGoogle Scholar
 Carlborg O, Andersson L, Kinghorn B: The use of a genetic algorithm for simultaneous mapping of multiple interacting quantitative trait loci. Genetics. 2003, 155 (4): 2000Google Scholar
 Ploughman L, Boehnke M: Estimating the power of a proposed linkage study for a complex genetic trait. Am J Human Genet. 1989, 44 (4): 543Google Scholar
 Weeks D, Ott J, Lathrop G: SLINK: a general simulation program for linkage analysis. Am J Hum Genet. 1990, 47 (3): A204Google Scholar
 Bass M, Martin E, Mauser E: Pedigree generation for analysis of genetic linkage and association. Pacific Symposium on Biocomputing Hawaii. 2004, World Scientific Pub Co Inc, USA, 9393.Google Scholar
 Schmidt M, Hauser E, Martin E, Schmidt S: Extension of the SIMLA package for generating pedigrees with complex inheritance patterns: environmental covariates, genegene and geneenvironment interaction. Stat App Genet and Mol Biol. 2005, 4: 1133Google Scholar
 Lemire M: SUP: an extension to SLINK to allow a larger number of marker loci to be simulated in pedigrees conditional on trait values. BMC Genet. 2006, 7: 40View ArticlePubMedPubMed CentralGoogle Scholar
 Nothnagel M: Simulation of LD blockstructured SNP haplotype data and its use for the analysis of casecontrol data by supervised learning methods. Am J Hum Genet. 2002, 71 (suppl 4): A2363Google Scholar
 Moore J, Hahn L, Ritchie M, Thornton T, White B: Application Of Genetic Algorithms To The Discovery Of Complex Models For Simulation Studies In Human Genetics. Proceedings of the Genetic and Evolutionary Computation Conference New York. 2002, Morgan Kaufmann Publishers Inc, USA, 11551155.Google Scholar
 Moore J, Hahn L, Ritchie M, Thornton T, White B: Routine discovery of complex genetic models using genetic algorithms. Appl Soft Comput. 2004, 4: 7986. 10.1016/j.asoc.2003.08.003.View ArticlePubMedPubMed CentralGoogle Scholar
 Mailund T, Schierup M, Pedersen C, Mechlenborg P, Madsen J, Schauser L: CoaSim: a flexible environment for simulating genetic data under coalescent models. BMC Bioinf. 2005, 6: 25210.1186/147121056252.View ArticleGoogle Scholar
 Dudek S, Motsinger A, Velez D, Williams S, Ritchie M: Data simulation software for wholegenome association and other studies in human genetics. Pacific Symposium on Biocomputing: Hawaii, USA. 2006, 11:, 499510.Google Scholar
 Li C, Li M: GWAsimulator: a rapid wholegenome simulation program. Bioinformatics. 2008, 24: 14010.1093/bioinformatics/btm549.View ArticlePubMedGoogle Scholar
 Li J, Chen Y: Generating samples for association studies based on HapMap data. BMC Bioinf. 2008, 9: 4410.1186/14712105944.View ArticleGoogle Scholar
 Greene C, Himmelstein D, Moore J: A Model Free Method to Generate Human Genetics Datasets with Complex GeneDisease Relationships. Evol Comput, Machine Learning and Data Mining in Bioinformatics. 2010, 6023: 7485. 10.1007/9783642122118_7.View ArticleGoogle Scholar
 Li W, Reich J: A Complete Enumeration and Classification of TwoLocus Disease Models. Human Heredity. 2000, 50 (6): 334349. 10.1159/000022939.View ArticlePubMedGoogle Scholar
 Hallgrímsdóttir I, Yuster D: A complete classification of epistatic twolocus models. BMC Genet. 2008, 9: 17View ArticlePubMedPubMed CentralGoogle Scholar
 Culverhouse R, Suarez B, Lin J, Reich T: A perspective on epistasis: limits of models displaying no main effect. Am J Human Genet. 2002, 70 (2): 461471. 10.1086/338759.View ArticleGoogle Scholar
 Motzkin T, Ralffa H, Thompson G, Thrall R: The Double Description Method. In: Kuhn, HW, Tucker AW (eds) Contributions to theory of games. 1953, 2: 5173.Google Scholar
 Ritchie M, Hahn L, Roodi N, Bailey L, Dupont W, Parl F, Moore J: Multifactordimensionality reduction reveals highorder interactions among estrogenmetabolism genes in sporadic breast cancer. Am J Human Genet. 2001, 69: 138147. 10.1086/321276.View ArticleGoogle Scholar
 Hartl D, Clark A, Clark A: Principles of population genetics. 1997, Sinauer associates Sunderland, MAGoogle Scholar
 Brodie III E: Why evolutionary genetics does not always add up. Epistasis and the evolutionary process. 2000,, pp. 3–19Google Scholar
 Knuth D: The Art of Computer Programming 1: Fundamental Algorithms 2: Seminumerical Algorithms 3: Sorting and Searching. 1968, AddisonWesley, MAGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.