Understanding the dynamical behavior of living cells from their complex genomic regulatory networks is a challenge posed in systems biology; yet it is one of critical importance (i.e., morphogenesis). Gene expression data can be used to infer, or reverse-engineer, the underlying genomic network to analyze the interactions between the molecules. Unfortunately, most of the existing work on reverse-engineering genomic regulatory networks estimates one single static network from all available data, which is often collected during different cellular functions or developmental epochs. The idea that molecular networks are remodeled as a function of time and stage is well understood; this conclusion is supported by the developmental networks of sea urchin embryos [1]. Throughout a cellular process, such as cancer progression or anticancer therapy, there may exist multiple underlying “themes” that determine the functionalities of each molecule and their relationships to others, and such themes are dynamic. In signal processing terms, summarizing gene expression data, that comes from different cellular stages, into one network would be similar to characterizing a non-stationary signal by its Fourier spectrum. Biologically, static networks cannot reveal regime-specific or key transient interactions that lead to biological changes.
One of the challenges of inferring a time-varying network is that there are only a few observations available at each time point. This small sample size is amplified by the high dimension of every sample, leading to a small n large p problem (i.e., more variables than observations). In particular, the system is under-determined. However, exploiting the fact that molecular networks are sparse, one can use compressive sensing to find a solution. Compressive sensing is concerned with the optimal reconstruction of a sparse signal from an under-determined linear system [2, 3]. Under-determined systems are quite common in computational biology/ecology, and the application of compressive sensing to solve these under-determined systems in nature has been a popular solution [4–6]. Compressive sensing theory states that, under the restricted isometry property (RIP), the optimal sparsest solution of a linear system is equivalent to the minimum l1-norm solution [2, 3]. Unfortunately, it is almost impossible to check whether a linear system satisfies the RIP condition. In general, the minimum l1-norm solution can be far from the optimal sparse solution.
In our previous work [4], we addressed the problem of under-sampled sparse systems by proposing a new energy-weighted likelihood function that ensures the convergence of the likelihood function for under-determined systems with unknown covariance. The approach was coined Small sample MUltivariate Regression with Covariance estimation (SMURC) and was applied to infer the wing-muscle gene regulatory networks of the Drosophila Melanogaster during the four phases of its development [4]. However, the estimated networks at every epoch used only the data in the corresponding epoch. In particular, the larval network ignored all the measurements in the previous embryonic phase, and so was the case for the subsequent stages. Other research efforts have been proposed to address the problem of recovering time-varying gene regulatory networks by using dynamic Bayesian models [7], non-parametric Bayesian regression [8], and random graph models [9].
In this contribution, we introduce a new approach to modeling sparse time-varying networks and their applications to gene regulatory networks that are based on our recent work [10, 11]. We start by projecting the Kalman solution onto an “approximately sparse” space by using l1-regularization. We further expand upon our previous work by using a Kalman smoother. We then explore the l1-neighborhood for sparser solutions by leveraging our recent compressive sensing technique known as Kernel RecONstruction (KRON) [12]. KRON recovers the optimal sparsest solution whether the RIP condition is satisfied or not. However, KRON’s computational complexity is still exponential in the number of parameters p. We, therefore, advance Approximate KRON (AKRON) [11], which builds growing neighborhoods of the l1 solution that moves towards the optimal sparsest solution and eventually reaches it. The size of the neighborhood is tunable depending on the computational resources available. We derive theoretical bounds of optimality. The AKRON Kalman filter is validated on synthetic and real-world data sets.
The state-space model and Kalman filter
Following the works in [4, 13], we model the network dynamics using a state space model. The system equation is given by a random walk model, which reflects a lack of prior knowledge of the network topological changes. The observation equation is given by a first-order differential equation, whose parameters reflect the strength and sign of interactions (positive and negative are activating and repressing, respectively) [14]. The state space model of the incoming edges, \(\mathbf {a}_{i} \in \mathbb {R}^{p}\), for gene i can be shown to be [13]
$$\begin{array}{*{20}l} \mathbf{a}_{i}(k+1) =& \mathbf{a}_{i}(k)+\boldsymbol{w}_{i}(k). \end{array} $$
(1)
$$\begin{array}{*{20}l} \mathbf{y}_{i}(k) =& \mathbf{X}^{\mathrm{T}}(k)~ \mathbf{a}_{i}(k)+\boldsymbol{v}_{i}(k), \end{array} $$
(2)
where i=1,⋯,p and p is the number of genes. \(\boldsymbol {X}(k) \in \mathbb {R}^{p \times n}\) is the gene expression matrix at time k. yi(k) is the rate of change of expression of gene i at time k. wi(k) and vi(k) are the process and observation noise, respectively. These noise processes are assumed to be zero mean Gaussian noise processes with the known covariances Qk and Rk, respectively, and uncorrelated to the state vector ai(k). The full connectivity matrix, A(k), can be recovered by simultaneous parallel recovery of its rows \(\boldsymbol {a}_{i}^{t}(k)\) at every time instant k. Thus, we can process each gene in parallel. The Kalman filter can be used to track a(k) [13, 15]; however, this is only if the system is observable. The problem with using a Kalman filter in our setting is that the system is under-determined (i.e., more variables than equations, p>n). This problem, however, can be circumvented by taking into account the sparsity of the vector ai(k). Since each gene in the genomic regulatory network is governed by only a small number of other genes, these networks are known to be sparse. Furthermore, we have also experimented with a Kalman Smoother that is applied after the Kalman filter. Note that the Kalman Smoother is optional. A Kalman smoother can reduce the covariance of the optimal estimate. We implemented Rauch et al.’s Kalman smoothing algorithm [16] and we compared AKRON-Kalman both with and without smoothing.
The pseudo code for the proposed Kalman filtering approach is shown in Fig. 1.
Constrained Kalman filtering
It is known that the connectivity of the edges in gene regulatory networks [13] is sparse. Unfortunately, the output of the Kalman filter is likely not going to be sparse. We first start by projecting the Kalman solution at time k, ak|k, onto the set of “approximately sparse” vectors by solving the following Lasso problem [17]:
$$ \boldsymbol{a}_{k|k}^{*} = \arg\min_{\boldsymbol{a}\in \mathbb{R}^{p}} \left\{(1 - \alpha) \left\|\boldsymbol{a}_{k|k} - \boldsymbol{a}\right\|_{2}^{2} + \alpha \|\boldsymbol{a}\|_{1}\right\}, $$
(3)
where α∈[0,1] controls the tradeoff between the Kalman estimate and sparsity. An α close to zero will result in a solution that is close to the Kalman estimate, but that may not be sparse. The opposite happens when α is close 1, which will produce a sparser solution, but may be far from the Kalman estimate. This is achieved by minimizing the reconstruction error (i.e., the first term in (3)).
AKRON: a search for a sparser solution
Consider the following l0-optimization problem, which finds the optimal sparsest solution in a linear under-determined system.
$$\begin{array}{*{20}l} \boldsymbol{x}^{*} = &\arg \min_{\boldsymbol{x} \in \mathbb{R}^{p}} \|\boldsymbol{x}\|_{0} \\ & \text{s.t.} \; \boldsymbol{\Phi}\boldsymbol{x} = \boldsymbol{y} \end{array} $$
(4)
where ∥·∥0 is the l0-norm, which is defined as the support of the vector, \(\boldsymbol {x} \in \mathbb {R}^{p}\), \(\boldsymbol {y} \in \mathbb {R}^{n}\), and \(\boldsymbol {\Phi } \in \mathbb {R}^{n \times p}\). We consider the scenario where p≫n and denote in the sequel s=p−n. Without loss of generality, Φ is assumed to be full-rank. Compressive sensing theory [3] shows that, under the Restricted Isometry Property (RIP) condition on the matrix Φ, the l1-norm solution is equivalent to the l0-norm solution. Unfortunately, it is impossible to check if the RIP condition is satisfied for a given matrix. Despite this strict condition, l1 has been routinely used to find a sparse solution in systems of the form (4).
The proposed Approximate Kernel RecONstruction (AKRON) is an approximation to computationally complex Kernel RecONstruction (KRON) problems [12]. KRON is able to achieve an exact solution to (4), but the algorithm becomes computationally expensive for typically p>15. AKRON, detailed below, is introduced to balance the trade-off between the computational resources that are available and the accuracy of the reconstruction.
The Kalman filter estimate is first sparsified by incorporating l1 regularization in (3) (line 5 of Fig. 1). However, the l1 projection is not guaranteed to be the optimal sparsest solution. AKRON-Kalman filter (AKRON-KF) starts off from the l1-regularized Kalman estimate in (3). Then, the s=p−n smallest elements of the l1 projection in (3) are set to zero. The logic behind this strategy is to use the l1-projection to guess the position of the zeros in the optimal solution. Given that the kernel of the system matrix Φ in (3) has dimension s, we know that if s zero locations are correctly set, then the optimal sparsest solution can be exactly found by solving the linear system in (4) [12].
Following this reasoning, AKRON finds a sparser solution by exploring δ-neighborhoods of the l1-projection. The central idea behind AKRON’s δ-neighborhoods is as follows: (i) find the indices with the (s+δ) smallest magnitudes of the l1 solution, (ii) set exactly s of these indices to zero, (iii) re-solve the system Φx=y. All the possible \(s+\delta \choose s\) combinations of the smallest elements in the solution of (3) are evaluated. This idea can also be viewed as a “perturbation” of the l1 approximation to make it closer to the l0-norm. The size of the neighborhood δ is tunable depending on the computational power available, and vary from 0 (l1-approximation) to n (KRON, i.e., perfect reconstruction).
Example: To understand the AKRON algorithm and illustrate the importance of the δ-neighborhoods, we present a simple numerical example. Consider the following randomly generated noiseless system as in (4):
$$\begin{array}{*{20}l} \boldsymbol{\Phi} =& \left(\begin{array}{ccccc} -0.4588 & 1.5977 & -0.8724 & -0.1121 & -1.3068 \\ 0.2942 & 3.0954 & -1.0530 & 0.3454 & 1.5257 \\ -0.1948 & -0.7558 & -0.9756 & 0.1549 & 0.9586 \\ \end{array} \right); \\ \boldsymbol{y} =& \left(\begin{array}{ccc} -1.2316 & 1.1739 & 0.8135 \\ \end{array} \right)^{T} \end{array} $$
The optimal ℓ0-norm sparsest solution is given by
$$ \boldsymbol{x}^{*} = \left(\begin{array}{ccccc} 0 & 0 & 0 & -1.2372 & 1.04858 \\ \end{array} \right)^{T} $$
(5)
The ℓ1 solution, which solves (4), is given by
$$ \widehat{\boldsymbol{x}}_{1} = \left(\begin{array}{ccccc} 0.0 & -0.034 & 0.047 & 0.0 & 0.870 \\ \end{array} \right)^{T} $$
Clearly, the l1-solution is not as sparse as the optimal solution and has incorrect zero locations. We have n=3,p=5 and thus s=2. If we choose δ=1 the AKRON considers the s+δ=3-smallest magnitudes of \(\widehat {\boldsymbol {x}}_{1}\), which are located at indices 1, 2 and 4. We set s=2 locations to zero among these 3 indices. We consider all \({s+\delta \choose s} = {3 \choose 2} = 3\) combinations of two zeros in indices 1, 2 and 4 of \(\widehat {\boldsymbol {x}}_{1}\). The combination of indices 1 and 2 set to zero leads to the sparsest optimal solution x∗ in (5). Thus, in this case, the ℓ1-norm solution is sub-optimal; but by considering a δ=1-neighborhood of this approximation, AKRON is able to exactly recover the sparsest optimal l0-solution.
In the noisy case, where the constraint in (4) is replaced by ∥Φx−y∥≤ε, with ε being a given noise threshold level, the neighborhood δ is chosen adaptively as follows: set the s smallest magnitudes of the l1 solution to zero; compute the observation error ∥Φx−y∥. If this error is smaller than the energy of the noise, we adopt this solution. Otherwise, the next smallest element is set to zero and the error is recalculated.
In the following propositions, we investigate under which assumptions on the entries of the l1 solution and its closeness to the l0 solution, will AKRON yield the optimal l0 solution.
Proposition 1
Consider the system in (4) with the optimal l0-solution x∗ having k>s zeros. Consider the l1-solution, x1, and assume that ∥x1−x∗∥2≤ε. Let J denote the number of indices j such that
$$\begin{array}{*{20}l} |(\boldsymbol{x}_{1})_{j}| \leq \frac{\epsilon}{\sqrt{k-s+1}}. \end{array} $$
(6)
Then, by choosing δ≤J−s, AKRON yields the optimal l0-solution.
Proof
Let Θ and \(\overline {\Theta }\) be the index sets of zero and non-zero entries in the l0-solution x∗, respectively. We have |Θ|=k. We need to show that at least s indices in Θ are where (6) holds. To prove this fact, assume the opposite, i.e., at least (k−s+1) indices in Θ are such that
$$|(\boldsymbol{x}_{1})_{j}| > \frac{\epsilon}{\sqrt{k-s+1}}. $$
However, in this case, we have
$$\|\boldsymbol{x}_{1} - \boldsymbol{x}^{*}\|_{2} > \sqrt{k-s+1}\frac{\epsilon}{\sqrt{k-s+1}} = \epsilon, $$
which contradicts the assumption ∥x1−x∗∥2≤ε. □
The following proposition derives an upper bound for \(\delta \in \mathbb {N}_{+}\) when the non-zero elements of the optimal l0-solution are bounded from below. We first need the following Lemma.
Lemma 1
Consider the system in (4) with the optimal l0-solution x∗ and approximate l1-solution x1. Denote by Θ and \(\overline {\Theta }\) the index sets of zero and non-zero entries in x∗, respectively. Assume that ∥x1−x∗∥2≤ε. Let R be the number of indices \(j \in \overline {\Theta }\) such that |(x1)j|≤ε. If δ=R, then AKRON yields the optimal solution.
Proof
To obtain the sparsest l0-solution, it is sufficient to choose s zeros in “correct places”, i.e., with indices in Θ. Recall that AKRON sets s out of the smallest-magnitude (s+δ) entries in x1 to zero. Therefore, AKRON will yield the optimal solution if out of these (s+δ) smallest-magnitude entries, there are at least s entries from Θ. But all entries from Θ have |(x1)j|≤ε, for otherwise the assumption ∥x1−x∗∥2≤ε would be violated. This means that only those entries from \(\overline {\Theta }\), which also satisfy |(x1)j|≤ε, could be chosen by AKRON, and there are only R of them. Hence δ=R will yield the optimal l0-solution.
Lemma 1 provides a sufficient condition on δ for optimality of AKRON, namely, if δ=R, then we are guaranteed the optimal l0-solution. Since this condition is not necessary, we could reach optimality with δ≤R. □
Proposition 2
Consider the system in (4) with the optimal l0-solution x∗ and approximate l1-solution x1. Let \(\overline {\Theta }\) be the set of indices of non-zero elements in x∗. Assume that the non-zero entries of x∗ are bounded from below, i.e.,
$$\left|\left(\boldsymbol{x}^{*}\right)_{j}\right| \geq \eta,~ \text{for all}~ j \in \overline{\Theta}~ \text{and for some}~ \eta > 0. $$
Assume further that ∥x1−x∗∥2≤ε. If ε<η, then \(\delta \leq \left (\frac {\epsilon }{\eta - \epsilon }\right)^{2}\) yields the optimal l0-solution. In particular, if \(\epsilon < \frac {\eta }{2}\) then δ=0 suffices.
Proof
Since |(x∗)j|≥η for \(j \in \overline {\Theta }\), we have
$$\left\|\boldsymbol{x}_{1}-\boldsymbol{x}^{*} \right\|_{2} \geq (\eta-\epsilon)\sqrt{R}\geq(\eta-\epsilon)\sqrt{\delta}, $$
where we used the fact that δ≤R from Lemma 1. Thus, \(\epsilon \geq \left \| \boldsymbol {x}_{1}-\boldsymbol {x}^{*} \right \|_{2} \geq (\eta -\epsilon)\sqrt {\delta }\), which completes the proof. □
Although Propositions 1 and 2 derive theoretical bounds for the choice of the neighborhood radius δ to recover the optimal sparsest solution, we found in our experiments below that relatively small values of δ are sufficient to achieve a balance between desired accuracy and computational complexity.