Identifying time-delayed gene regulatory networks via an evolvable hierarchical recurrent neural network

Background The modeling of genetic interactions within a cell is crucial for a basic understanding of physiology and for applied areas such as drug design. Interactions in gene regulatory networks (GRNs) include effects of transcription factors, repressors, small metabolites, and microRNA species. In addition, the effects of regulatory interactions are not always simultaneous, but can occur after a finite time delay, or as a combined outcome of simultaneous and time delayed interactions. Powerful biotechnologies have been rapidly and successfully measuring levels of genetic expression to illuminate different states of biological systems. This has led to an ensuing challenge to improve the identification of specific regulatory mechanisms through regulatory network reconstructions. Solutions to this challenge will ultimately help to spur forward efforts based on the usage of regulatory network reconstructions in systems biology applications. Methods We have developed a hierarchical recurrent neural network (HRNN) that identifies time-delayed gene interactions using time-course data. A customized genetic algorithm (GA) was used to optimize hierarchical connectivity of regulatory genes and a target gene. The proposed design provides a non-fully connected network with the flexibility of using recurrent connections inside the network. These features and the non-linearity of the HRNN facilitate the process of identifying temporal patterns of a GRN. Results Our HRNN method was implemented with the Python language. It was first evaluated on simulated data representing linear and nonlinear time-delayed gene-gene interaction models across a range of network sizes and variances of noise. We then further demonstrated the capability of our method in reconstructing GRNs of the Saccharomyces cerevisiae synthetic network for in vivo benchmarking of reverse-engineering and modeling approaches (IRMA). We compared the performance of our method to TD-ARACNE, HCC-CLINDE, TSNI and ebdbNet across different network sizes and levels of stochastic noise. We found our HRNN method to be superior in terms of accuracy for nonlinear data sets with higher amounts of noise. Conclusions The proposed method identifies time-delayed gene-gene interactions of GRNs. The topology-based advancement of our HRNN worked as expected by more effectively modeling nonlinear data sets. As a non-fully connected network, an added benefit to HRNN was how it helped to find the few genes which regulated the target gene over different time delays.


Background
New opportunities to reverse engineer the activities of different components of complex cellular systems are arising due to technologies like DNA microarrays and RNA sequencing which provide genomic-scale data sets [1,2]. Time series data may be collected either in longitudinal studies of cell or tissue samples collected over multiple time points [3], or expressional change across state space [4]. Effective models of gene regulatory networks (GRNs) have successfully identified regulatory interactions between genes and the specific functional roles of individual genes in cellular systems [5,6].
Reverse engineering of GRNs occurs within the context of stochastic properties of the system, measurement noise, and high dimensionality [3]. There is strong non-linearity on temporal patterns of regulatory genes [7]. Further complexity ensues given that genetic interactions among different genes can have different time delays [8,9]. These delays are due to the transcription and translation of genes varying in composition and length, along with varying kinetics of binding and completion with respect to genes being processed by the transcriptome, the spliceosome and the ribosome. Transcribed and translated products may be further converted and are eventually degraded, with some products being more stable than others. Changing physiological conditions can impact many of the above factors of time delays. As shown in Fig. 1, there are complex combinations by which the expression level of a gene at a certain time could depend upon the expression level of another gene at a previous time point.
Diverse methods with different levels of complexity have been used to model, analyze and infer complex regulatory interactions [10][11][12][13]. Boolean networks are the simplest among them [14]. They are based only upon binary outcomes (on and off ) for gene expression and therefore lack adequate dynamic resolution. Bayesian networks represent probabilistic relationships among genes and have shown some success in capturing the inherent noise and stochasticity of gene expression data [15]. Dynamic Bayesian Networks (DBN) are an extension of Bayesian networks that can unravel the feedback cycles and loops over time points [16]. However, due to their high computational cost, the application of dynamic Bayesian networks is limited to small networks. Ordinary Differential Equations (ODE) are deterministic models, where interactions among genes represent causal interactions rather than statistical dependencies [17]. They can offer continuous representations of genetic networks, but are not robust for imprecise data. Methods such as time delay linear regression [18], correlation matrices [19], stochastic simulation algorithms [9,20], dynamic Bayesian networks [16] and delayed differential equations [21] have been proposed to incorporate a fixed time delay in GRN models. In [22], pairwise correlations between each pair of genes have been used to address the various time delays in gene interactions. TD-ARACNE (Time Delay-Algorithm for the Reconstruction of Accurate Cellular Networks) has been proposed in [23]. This algorithm detects the time-delayed dependencies between the expression profiles in terms of mutual information by assuming a stationary Markov Random Field as its underlying probabilistic model. The TD-ARACNE algorithm does not assign any specific delay or regulatory effect on the edges of the GRN. HCC-CLINDE [24] is an extension of CLINDE [25], and has been developed to infer a time-delayed GRN in the presence of hidden common causes. All directed pairs of genes in the network have possible delays up to a maximum allowed delay, which is obtained based on either a correlation test or mutual information test.
The main objective of this paper is to reconstruct a time-delayed GRN which takes into account the non-linearity of gene interaction and the noise of temporal measurements. Recurrent Neural Networks (RNNs) are computational tools inspired by the structural and functional aspects of biological nervous systems, and are noted for their effectiveness in temporal data processing and approximating nonlinear patterns of dynamic temporal behaviors [26]. The ability of RNNs to learn from temporal data, estimate multivariate nonlinear functions, and tolerate noise in measurements makes RNNs an ideal fit for the modeling of gene regulatory interactions using gene expression profiles. Several variants of RNNs have been deployed for the modeling of GRNs including neural fuzzy recurrent networks [27], RNNs combined with particle swarm optimization [28], ensemble of RNNs and support vector machines [29], RNNs combined with differential evolution [30] and RNNs hybridized with the generalized extended Kalman filter [31]. Despite the great capabilities of RNNs for predictive modeling with high accuracy, RNNs are usually considered "black box" models whose internal structure and learned parameters are not interpretable. Due to the multiple layers, the non-linearity of the model, and cyclic (feedback) connections in the network structure, their interpretability still remains vague [32]. This, in particular, impedes goals with GRN reconstruction to identify pairs of genes, directions of regulation, effects (i.e. up or down regulation), and time delays.
In this paper, we have proposed a hierarchical RNN (HRNN) that surmounts the interpretation difficulties of the RNNs for application of GRN modeling. The proposed design lets us use the features of hierarchical representation in addition to the capabilities of RNNs for finding temporal dependencies. In this way, time-delayed regulations can be captured through hierarchical paths between leaf nodes (regulatory genes) and a target node (regulated gene) in the HRNN. For discovering the underlying hierarchical structure among the regulatory genes and a target gene, the network topology and connection weights are encoded by a customized genetic algorithm (GA). Through the training procedure, in addition to evolving network connection weights, the GA rewires the connectivity and length of hierarchical paths between leaf nodes and the target gene of a population of candidate networks. From the trained HRNN, the direction and effect of gene regulations in the presence of time delays can be captured. Our proposed model is evaluated on a real biological system and linear/nonlinear synthetic generated data for different sizes of networks and variances of noise. The results of our HRNN method are compared with TD-ARACNE, HCC-CLINDE, ODE implemented in the TSNI package [33] and ebdbNet package (Empirical Bayes Dynamic Bayesian Network Inference) [34].

Method
Assume that {G l 1 (t), G l 2 (t), . . . , G l P (t)} are expression levels of P genes at time t in experiment l where t ∈ {1, . . . , T l } and l ∈ {1, . . . , L}. The aim is to capture the potential regulators for each gene G i in a decoupled hierarchical RNN. In this network, G i is the target gene and the rest of the genes are the potential regulators. At the beginning, a population of candidate hierarchical RNNs should be randomly generated. A candidate network can have 0 ≤ c ≤ C context nodes in its structure. The network with c context nodes has c + 1 neurons. Neurons are the processing units in the RNNs which induce non-linearity on the inputs. Neurons have multiple inputs and one output. The maximum number of context nodes in the candidate networks is set to C. Context nodes in the hierarchical RNN are nodes without experimental measurements and assist with modeling of temporal dynamics.
Assume that x 1 , . . . , x C are context nodes and x C+1 , . . . , x C+P are genes. In a network with c ≤ C context nodes, the first c context nodes x 1 , . . . , x c and genes (excluded the target gene) are potential inputs of the c + 1 neurons in the network. In each candidate network, the target gene is the output of the first neuron, and the context node c i is the input of neuron i and output of neuron i+1 where i ∈ {1, . . . , c}. In addition to the genes and context node c i , other context nodes could also be the potential inputs of the neuron neuron i+1 , except for neuron c+1 , which has no context nodes as its inputs. Each input connection has a weight. If context node is the input of the neuron, the corresponding connection weight is positive. Else, it could be positive or negative. Through training, the customized GA evolves the connectivity between nodes and neurons and connection weights. Figure 2 shows a candidate network generated from a maximum possible number of three context nodes (x 1 , x 2 , x 3 ) and five genes (x 4 , x 5 , x 6 , x 7 , x 8 ). The candidate network in Fig. 2 The structure of a hierarchical RNN with 2 context variables, 5 genes and 3 neurons. Each neuron has only one outgoing connection. For example, Neuron 2 has three incoming connections x 1 , x 2 , x 5 at time t with corresponding weights w 2,1 , w 2,2 , w 2,5 and an outgoing connection to context node x 1 at time t + 1. Context nodes, regulatory genes and the target gene are shown by broken, highlighted and double-line ovals respectively this figure uses two out of the three possible context nodes; thus it has three neurons. This figure shows the regulatory interactions of target gene x 8 with x 4 , x 5 and x 7 . Figure 3 For the target and context nodes x i that are outputs of the neurons, Eq. 1 shows the updated value from time t to t + 1. where x j (t) is the value of j th input node connected to x i (t + 1). f is a sigmoid function in the form of Eq. 2 which is monotonically increasing in the range of [ 0, 1] and is commonly used in the literature to induce non-linearity.
In the case of self-regularization of the target gene, Eq. 3 is used for updating the value of the target gene: where μ is the decay rate of the target gene's expression over time. Estimating the decay rate for each gene helps to model the suppression effect of a gene on itself [35].

Evolutionary training algorithm
A customized GA is proposed for training the HRNN. At the beginning, the GA generates a population of random candidate networks. The structure and connection weights of the candidate networks are evolved over generations of the GA with the guidance of the fitness function. In each generation, new candidate networks (children) are formed by applying the evolutionary operators (crossover and mutation) on the old networks (parents) within the constraints of the HRNN. Parents are selected according to their fitness values, where networks with higher accuracy have more chances to reproduce. The newly generated population is used for the next generation of the GA. At each generation, an elitist evolving strategy is applied to keep the best candidate networks from the last population. The evolutionary process is repeated until the terminating conditions are satisfied. The proposed procedure is summarized in Algorithm 1.

Algorithm 1 Evolutionary training of the HRNN
P: number of genes, C: maximum number of context nodes, G: number of generations in GA for target gene i in range(1, P) do Generate a population of random candidate hierarchical RNNs, Nets, with maximum C context nodes Calculate fitness of the Nets for generation in range(1, G) do Make a new poll of candidate networks, Nets, using selection method Apply evolutionary crossover and mutation operators on Nets Calculate fitness of new candidate networks Nets end for An HRNN with the best fitness is selected to represent the regulations on target gene i From the selected HRNN, extract all the hierarchical paths from the leaf nodes to the target gene From the extracted paths, find the transition time delays and effects of the regulations end for Aggregate the obtained information from the decoupled HRNN of all the genes in a network, where the edges in the network have the tag of the time delay

Representation of candidate networks
Candidate networks in the GA are represented by their number of neurons (N n ), number of inputs to each neuron (N in ), indices of the input nodes (In), weights of the input Components of the first, c th and last neurons in a candidate network with c + 1 neurons are represented in Fig. 4. In this candidate network, P genes and c out of C context nodes are used in the network. One of the genes is considered as a target gene. The output of each neuron (Out) does not change in the training process.

Fitness of candidate networks
The performance of the candidate networks (fitness) is evaluated by measuring the trade-off between the goodness of fit and complexity of the model by using the Akaike information criterion (AIC) and the Akaike information criterion with correction (AICc). AIC is a model selection criterion which estimates the quality of a model relative to other models (Eq. 4). where x l i (t) is the expression value of the target gene i at experiment l, andx l i is the corresponding estimation by the candidate HRNN at time t. k is the number of leaf nodes in the HRNN and n is the total number of temporal samples for gene expression. If n is small or k is large, the AICc is preferred rather than AIC (Eq. 5). As n gets larger, AICc converges to AIC.

Crossover operator
Crossover is an evolutionary operator for generating a new candidate network. Before applying the proposed crossover, the tournament selection reproduces a new pool of candidate networks. The tournament selection is a method of selecting a candidate among a few candidates chosen at random from the population. The winner of each tournament (the one with the best fitness) will be replaced in the new pool. For applying the proposed crossover, first the networks in the population are shuffled and sorted by the number of neurons (N n ) in their structure. Then, for each pair of  selected networks with the same number of neurons (parents 1 and 2), crossover with probability of P c swaps the random neurons i ∈ {1, . . . , N n } in two parents. Figure 5(c)-(d) show the crossover operating on neuron 2 of the parents in Figure 5(a)-(b). Crossover creates new candidate networks (cross-children) with new connectivity and connection weights.

Mutation operator
The mutation operator mutates the number of inputs of the neurons, rewires the connectivity of the inputs of the neurons, and evolves the connection weights with the probability P m . For a mutation site m site in the network, the mutation works as below: The p-values are for how the Link F-scores of other methods compare with HRNN. P is the number of genes for each of the networks. The variance of the noise is equal to 1 • If m site is on the number of inputs of a neuron (N in ), it is mutated to N in = N in ± 1.
Therefore, a new input and its corresponding weight are added or deleted. • If m site is on an input connection of a neuron (In), the selected connection is rewired to another node in the network.
• If m site is on a connection weight of a neuron and input is a context node, the Gaussian mutation evolves the weight in the range of [ 0, w max ]; else, the weight is mutated in the range of [ w min , w max ] • If m site is on the decay rate μ, the Gaussian mutation is applied to evolve the the decay rate in the range [ 0, w max ].

Experimental results
In order to evaluate the performance of the proposed method, we have tested our method with both synthetic data and real data against TD-ARACNE, HCC-CLINDE, TSNI and ebdbNet. As the underlying regulatory networks for the real biological datasets are generally unknown, synthetic data are helpful for checking the efficiency of methods. The generated synthetic models in this paper have different levels of complexity and enable us to have a broad-ranging performance evaluation of our proposed approach in comparison to other approaches. In a real life experiment, we applied our method for finding the GRN of Saccharomyces cerevisiae.
We assess the performance of the inference algorithm on three aspects, namely Links (which is considered correct if and only if both the gene pair and the direction are correct), Delays (which is considered correct if and only if both the link and the time delay are correct), and Effects (which is considered correct if and only if both the link and the Table 3 The effect of network size on GRN reconstruction in case of linearity between the genes. Results are the average of the accuracy in terms of the Delay and Effect criterion for GRN reconstruction of 10 different randomly generated synthetic networks sign of an effect are correct) [24]. For each aspect, Recall = TP TP+FN , Precision = TP TP+FP and F-score = 2×Precision×Recall Precision+Recall metrics are computed. In these metrics, TP, FP and FN are numbers of true positives, false positives, and false negatives respectively. The Fscores of the results have been compared as an overall measurement of performance. The HCC-CLINDE method provides F-scores of Link, Delay and Effect criteria. However, TD-ARACNE, TSNI and ebdbNet provide information for finding the F-score of the Link. In all simulations, algorithms have been tested by their default parameters.

Synthetic data
The generation of synthetic data has been considered in two instances of linear and nonlinear models. To compare the accuracy of the HRNN with TD-ARACNE, HCC-CLINDE, TSNI and ebdbNet, the effects of the noise levels and number of genes in small and medium size networks are investigated. For a chosen number of genes and level of noise, ten random GRNs with random connectivity between nodes, weight of the connections, time delay and initial value of gene expressions are generated. The purpose of these experiments is to evaluate the performance of the proposed method in terms of linearity versus non-linearity of gene expression values, network size (P ∈ {5, 10, 20, 30}) and noise   Gene expression data is generated from a non-linear model where  The p-values are for how the Link F-scores of other methods compare with HRNN. P is the number of genes for each of the networks. The variance of the noise is equal to 1 time lag for each edge in the synthetic networks are also generated randomly. In Eq. 7, f is a sigmoid and monotonically increasing function in the form of Eq. 2 which adds non-linearity to the model. The accuracy of GRN reconstruction using synthetic gene expression data generated from the linear model are presented in Figs. 6, 7 and 8. These figures compare the Link Fscore, Delay F-score and Effect F-score of the HRNN with HCC-CLINDE, TD-ARACNE, TSNI and ebdbNet for different number of genes in the network. The variance of noise in these experiments is the same, and is equal to (σ 2 = 1). In part a of these figures, the box plot of F-score values of Link, Delay and Effect criterion are compared. In parts b of these figures, a linear regression model is fit on the F-score values with respect to the number of genes in the network. R-squared and p-values of the linear regression models in Figs. 6(b), 7(b) and 8(b) are stated in Table 1. In case of the linear data, the linear regression models in Fig. 6(b) shows that the HRNN and the HCC-CLINDE are responsibly competitive for finding the correct Link between the nodes in networks. Table 2 includes the average of the TP, FP, FN, Precision, Recall and F-score of Link criteria for 10 independent runs of the methods and different number of genes in the network. Also, we conduct a hypothesis test for the difference between means of F-score. For a selected number of genes in Table 2, a t-test performed on F-score values of the HRNN and other methods. The null hypothesis is defined as two population means are equal. The nominal and adjusted p-values are mentioned in the table. HRNN is tested multiple times for four different number of genes in the network, to obtain the adjusted p-values, the nominal p-values are multiplied by four. If the corresponding adjusted pvalue is less than 0.05, the null hypothesis is rejected, meaning that mean of F-scores are significantly different. p-values are for how the F-scores of the other methods may significantly differ with the F-scores of HRNN; therefore, p-values are not shown for the Table 6 The effect of network size on GRN reconstruction in case of non-linearity between the genes. Results are the average of the accuracy in terms of the Delay and Effect criterion for GRN reconstruction of 10 different randomly generated synthetic networks HRNN row in the tables. Table 3 includes the average of the TP, FP, FN, Precision, Recall, F-score and also p-values of Delay and Effect criterion for 10 independent runs of the HRNN and HCC-CLINDE and different number of genes in the network. Results shows that HRNN and HCC-CLINDE are not significantly different in terms of Delay and Effect F-scores in case of linearity among genes.
In the next step for testing synthetic data, we considered a more realistic scenario where the gene expression values are generated from nonlinear models. For 10 different randomly generated networks, the effect of the number of genes in the accuracy of GRN reconstruction of HRNN was compared with other methods. Figures 9, 10 and 11 compare the accuracy of Link F-score, Delay F-score and Effect F-score for different number  Table 4.
Tables 5 and 6 include the average of the TP, FP, FN, Precision, Recall and F-score in term of Link, Delay and Effect criterion for 10 independent runs of the methods and different number of genes in the network. For a selected number of genes in these tables, a t-test performed on F-score values. If the corresponding adjusted p-value is less than 0.05, the null hypothesis is rejected, meaning that mean of F-score of HRNN is significantly different from other method. The results show that the proposed HRNN works better than HCC-CLINDE, TD-ARACNE, TSNI and ebdbNet for cases of non-linearity among  Table 7. Tables 8 and 9 include the average of the TP, FP, FN,  Gene expression data is generated from a non-linear model Precision, Recall, F-score, nominal and adjusted p-value for 10 independent runs of the methods and different levels of the noise. HRNN is tested multiple times for five different noise levels of gene measurements, to obtain the adjusted p-values, the nominal p-values are multiplied by five. Figure 12 show that the accuracy of our proposed method in terms of Link F-score is often higher than HCC-CLINDE, TD-ARACNE, TSNI and ebdbNet.

Real-life biological data of Saccharomyces cerevisiae (IRMA)
In order to validate the performance of the proposed method on real-life biological GRNs, we considered a recent significant contribution to systems biology reported in [36] where the authors built a synthetic network, called IRMA, of the yeast organism Saccharomyces cerevisiae. The researchers tested transcription of network genes by culturing cells in presence of galactose and glucose. This is one of the first attempts at building a reference data set, having a fairly true table of regulations [8,23]. The regulatory network includes five genes. It is negligibly affected by endogenous genes. Two sets of gene profiles called Switch ON and Switch OFF were provided, each containing 16 and 21 time points. The former corresponds to the shifting of the growing cells from glucose to the galactose medium; the latter corresponds to the reverse shift. Due to the lack of stimulus, reconstruction of the GRN from the Switch OFF dataset is difficult [8,23]. The performance comparisons among various methods for the IRMA ON dataset are shown in Fig. 15 and Table 10. In Fig. 15(a), the true IRMA network is shown. Figure 15(b) displays the GRN obtained by the proposed method. Figure 15(c-f) present the GRN reconstructions by TD-ARACNE, HCC-CLINDE, TSNI and ebdbNet obtained with default parameters. In Table 10, TP, FP, FN, precision, recall and F-score values are also compared. The proposed HRNN finds the true regulations of ASH1 by SWI5, CBF1 by ASH1, GAL80 by SWI5, GAL4 by GAL80, GAL80 by GAL4 and CBF1 by SWI5. The regulation of SWI5 by GAL4 and regulation of GAL4 by CBF1 are not found. The method finds three regulations (regulations of SWI5 by CBF1, SWI5 by ASH1 and SWI5 by GAL4) that are not in the true network of IRMA. Among the eight connections in the true network, TD-ARACNE finds two correct regulations. The HCC-CLINDE method found one true regulation and one false regulation. Also, HCC-CLINDE finds the regulations of ASH1 and SWI5 by a hidden common cause which is not reported in the actual GRN of the IRMA. Results show higher accuracy in the proposed HRNN approach for finding the regulatory interactions between the genes in comparison to the other two approaches.

Conclusions
In this study, we developed and implemented a hierarchical recurrent neural network (HRNN) approach to identify time-delayed regulatory interactions of genes. The designed HRNN facilitates capturing the paths with different lengths from the leaf nodes in the network to the target node. Hierarchy and non-linearity in the network and the allowance for recurrent connections in HRNN provide an effective capability for modeling the temporal patterns of gene expression. Furthermore, partial connectivity of the nodes aids in finding the limited set of genes which regulate the target gene over different time delays. The proposed method outperformed TD-ARACNE, HCC-CLINDE, TSNI and ebdbNet in terms of reconstructing small and medium size networks having non-linearity and high levels of noise for measurement data.