Approximate kernel reconstruction for time-varying networks

Background Most existing algorithms for modeling and analyzing molecular networks assume a static or time-invariant network topology. Such view, however, does not render the temporal evolution of the underlying biological process as molecular networks are typically “re-wired” over time in response to cellular development and environmental changes. In our previous work, we formulated the inference of time-varying or dynamic networks as a tracking problem, where the target state is the ensemble of edges in the network. We used the Kalman filter to track the network topology over time. Unfortunately, the output of the Kalman filter does not reflect known properties of molecular networks, such as sparsity. Results To address the problem of inferring sparse time-varying networks from a set of under-sampled measurements, we propose the Approximate Kernel RecONstruction (AKRON) Kalman filter. AKRON supersedes the Lasso regularization by starting from the Lasso-Kalman inferred network and judiciously searching the space for a sparser solution. We derive theoretical bounds for the optimality of AKRON. We evaluate our approach against the Lasso-Kalman filter on synthetic data. The results show that not only does AKRON-Kalman provide better reconstruction errors, but it is also better at identifying if edges exist within a network. Furthermore, we perform a real-world benchmark on the lifecycle (embryonic, larval, pupal, and adult stages) of the Drosophila Melanogaster. Conclusions We show that the networks inferred by the AKRON-Kalman filter are sparse and can detect more known gene-to-gene interactions for the Drosophila melanogaster than the Lasso-Kalman filter. Finally, all of the code reported in this contribution will be publicly available.

different cellular functions or developmental epochs. The idea that molecular networks are remodeled as a function of time and stage is well understood; this conclusion is supported by the developmental networks of sea urchin embryos [1]. Throughout a cellular process, such as cancer progression or anticancer therapy, there may exist multiple underlying "themes" that determine the functionalities of each molecule and their relationships to others, and such themes are dynamic. In signal processing terms, summarizing gene expression data, that comes from different cellular stages, into one network would be similar to characterizing a non-stationary signal by its Fourier spectrum. Biologically, static networks cannot reveal regime-specific or key transient interactions that lead to biological changes.
One of the challenges of inferring a time-varying network is that there are only a few observations available at each time point. This small sample size is amplified by the high dimension of every sample, leading to a small n large p problem (i.e., more variables than observations). In particular, the system is under-determined. However, exploiting the fact that molecular networks are sparse, one can use compressive sensing to find a solution. Compressive sensing is concerned with the optimal reconstruction of a sparse signal from an under-determined linear system [2,3]. Under-determined systems are quite common in computational biology/ecology, and the application of compressive sensing to solve these under-determined systems in nature has been a popular solution [4][5][6]. Compressive sensing theory states that, under the restricted isometry property (RIP), the optimal sparsest solution of a linear system is equivalent to the minimum l 1 -norm solution [2,3]. Unfortunately, it is almost impossible to check whether a linear system satisfies the RIP condition. In general, the minimum l 1 -norm solution can be far from the optimal sparse solution.
In our previous work [4], we addressed the problem of under-sampled sparse systems by proposing a new energy-weighted likelihood function that ensures the convergence of the likelihood function for under-determined systems with unknown covariance. The approach was coined Small sample MUltivariate Regression with Covariance estimation (SMURC) and was applied to infer the wing-muscle gene regulatory networks of the Drosophila Melanogaster during the four phases of its development [4]. However, the estimated networks at every epoch used only the data in the corresponding epoch. In particular, the larval network ignored all the measurements in the previous embryonic phase, and so was the case for the subsequent stages. Other research efforts have been proposed to address the problem of recovering time-varying gene regulatory networks by using dynamic Bayesian models [7], non-parametric Bayesian regression [8], and random graph models [9].
In this contribution, we introduce a new approach to modeling sparse time-varying networks and their applications to gene regulatory networks that are based on our recent work [10,11]. We start by projecting the Kalman solution onto an "approximately sparse" space by using l 1 -regularization. We further expand upon our previous work by using a Kalman smoother. We then explore the l 1 -neighborhood for sparser solutions by leveraging our recent compressive sensing technique known as Kernel RecONstruction (KRON) [12]. KRON recovers the optimal sparsest solution whether the RIP condition is satisfied or not. However, KRON's computational complexity is still exponential in the number of parameters p. We, therefore, advance Approximate KRON (AKRON) [11], which builds growing neighborhoods of the l 1 solution that moves towards the optimal sparsest solution and eventually reaches it. The size of the neighborhood is tunable depending on the computational resources available. We derive theoretical bounds of optimality. The AKRON Kalman filter is validated on synthetic and real-world data sets.

The state-space model and Kalman filter
Following the works in [4,13], we model the network dynamics using a state space model. The system equation is given by a random walk model, which reflects a lack of prior knowledge of the network topological changes. The observation equation is given by a first-order differential equation, whose parameters reflect the strength and sign of interactions (positive and negative are activating and repressing, respectively) [14]. The state space model of the incoming edges, a i ∈ R p , for gene i can be shown to be [13] ( 1 ) where i = 1, · · · , p and p is the number of genes. X(k) ∈ R p×n is the gene expression matrix at time k. y i (k) is the rate of change of expression of gene i at time k. w i (k) and v i (k) are the process and observation noise, respectively. These noise processes are assumed to be zero mean Gaussian noise processes with the known covariances Q k and R k , respectively, and uncorrelated to the state vector a i (k). The full connectivity matrix, A(k), can be recovered by simultaneous parallel recovery of its rows a t i (k) at every time instant k. Thus, we can process each gene in parallel. The Kalman filter can be used to track a(k) [13,15]; however, this is only if the system is observable. The problem with using a Kalman filter in our setting is that the system is under-determined (i.e., more variables than equations, p > n). This problem, however, can be circumvented by taking into account the sparsity of the vector a i (k). Since each gene in the genomic regulatory network is governed by only a small number of other genes, these networks are known to be sparse. Furthermore, we have also experimented with a Kalman Smoother that is applied after the Kalman filter. Note that the Kalman Smoother is optional. A Kalman smoother can reduce the covariance of the optimal estimate. We implemented Rauch et al. 's Kalman smoothing algorithm [16] and we compared AKRON-Kalman both with and without smoothing.
The pseudo code for the proposed Kalman filtering approach is shown in Fig. 1.

Constrained Kalman filtering
It is known that the connectivity of the edges in gene regulatory networks [13] is sparse. Unfortunately, the output of the Kalman filter is likely not going to be sparse. We first start by projecting the Kalman solution at time k, a k|k , onto the set of "approximately sparse" vectors by solving the following Lasso problem [17]: where α ∈[ 0, 1] controls the tradeoff between the Kalman estimate and sparsity. An α close to zero will result in a solution that is close to the Kalman estimate, but that may not be sparse. The opposite happens when α is close 1, which will produce a sparser solution, but may be far from the Kalman estimate. This is achieved by minimizing the reconstruction error (i.e., the first term in (3)).

AKRON: a search for a sparser solution
Consider the following l 0 -optimization problem, which finds the optimal sparsest solution in a linear under-determined system.
where · 0 is the l 0 -norm, which is defined as the support of the vector, x ∈ R p , y ∈ R n , and ∈ R n×p . We consider the scenario where p n and denote in the sequel s = p − n. Without loss of generality, is assumed to be full-rank. Compressive sensing theory [3] shows that, under the Restricted Isometry Property (RIP) condition on the matrix , the l 1 -norm solution is equivalent to the l 0 -norm solution. Unfortunately, it is impossible to check if the RIP condition is satisfied for a given matrix. Despite this strict condition, l 1 has been routinely used to find a sparse solution in systems of the form (4).
The proposed Approximate Kernel RecONstruction (AKRON) is an approximation to computationally complex Kernel RecONstruction (KRON) problems [12]. KRON is able to achieve an exact solution to (4), but the algorithm becomes computationally expensive for typically p > 15. AKRON, detailed below, is introduced to balance the trade-off between the computational resources that are available and the accuracy of the reconstruction.
The Kalman filter estimate is first sparsified by incorporating l 1 regularization in (3) (line 5 of Fig. 1). However, the l 1 projection is not guaranteed to be the optimal sparsest solution. AKRON-Kalman filter (AKRON-KF) starts off from the l 1 -regularized Kalman estimate in (3). Then, the s = p − n smallest elements of the l 1 projection in (3) are set to zero. The logic behind this strategy is to use the l 1 -projection to guess the position of the zeros in the optimal solution. Given that the kernel of the system matrix in (3) has dimension s, we know that if s zero locations are correctly set, then the optimal sparsest solution can be exactly found by solving the linear system in (4) [12].
Following this reasoning, AKRON finds a sparser solution by exploring δ-neighborhoods of the l 1 -projection. The central idea behind AKRON's δneighborhoods is as follows: (i) find the indices with the (s + δ) smallest magnitudes of the l 1 solution, (ii) set exactly s of these indices to zero, (iii) re-solve the system x = y. All the possible s+δ s combinations of the smallest elements in the solution of (3) are evaluated. This idea can also be viewed as a "perturbation" of the l 1 approximation to make it closer to the l 0 -norm. The size of the neighborhood δ is tunable depending on the computational power available, and vary from 0 (l 1 -approximation) to n (KRON, i.e., perfect reconstruction).
Example: To understand the AKRON algorithm and illustrate the importance of the δ-neighborhoods, we present a simple numerical example. Consider the following randomly generated noiseless system as in (4) The optimal 0 -norm sparsest solution is given by The 1 solution, which solves (4), is given by Clearly, the l 1 -solution is not as sparse as the optimal solution and has incorrect zero locations. We have n = 3, p = 5 and thus s = 2. If we choose δ = 1 the AKRON considers the s + δ = 3-smallest magnitudes of x 1 , which are located at indices 1, 2 and 4. We set s = 2 locations to zero among these 3 indices. We consider all s+δ s = 3 2 = 3 combinations of two zeros in indices 1, 2 and 4 of x 1 . The combination of indices 1 and 2 set to zero leads to the sparsest optimal solution x * in (5). Thus, in this case, the 1 -norm solution is sub-optimal; but by considering a δ = 1-neighborhood of this approximation, AKRON is able to exactly recover the sparsest optimal l 0 -solution.
In the noisy case, where the constraint in (4) is replaced by x − y ≤ , with being a given noise threshold level, the neighborhood δ is chosen adaptively as follows: set the s smallest magnitudes of the l 1 solution to zero; compute the observation error x − y . If this error is smaller than the energy of the noise, we adopt this solution. Otherwise, the next smallest element is set to zero and the error is recalculated.
In the following propositions, we investigate under which assumptions on the entries of the l 1 solution and its closeness to the l 0 solution, will AKRON yield the optimal l 0 solution. ( 4) with the optimal l 0 -solution x * having k > s zeros. Consider the l 1 -solution, x 1 , and assume that x 1 −x * 2 ≤ . Let J denote the number of indices j such that

Proposition 1 Consider the system in
Then, by choosing δ ≤ J − s, AKRON yields the optimal l 0 -solution.
Proof Let and be the index sets of zero and non-zero entries in the l 0 -solution x * , respectively. We have | | = k. We need to show that at least s indices in are where (6) holds. To prove this fact, assume the opposite, i.e., at least (k − s + 1) indices in are such that However, in this case, we have which contradicts the assumption x 1 − x * 2 ≤ .
The following proposition derives an upper bound for δ ∈ N + when the non-zero elements of the optimal l 0 -solution are bounded from below. We first need the following Lemma.
Lemma 1 Consider the system in ( 4) with the optimal l 0 -solution x * and approximate l 1 -solution x 1 . Denote by and the index sets of zero and non-zero entries in x * , respectively. Assume that x 1 − x * 2 ≤ . Let R be the number of indices j ∈ such that |(x 1 ) j | ≤ . If δ = R, then AKRON yields the optimal solution.
Proof To obtain the sparsest l 0 -solution, it is sufficient to choose s zeros in "correct places", i.e., with indices in . Recall that AKRON sets s out of the smallest-magnitude (s + δ) entries in x 1 to zero. Therefore, AKRON will yield the optimal solution if out of these (s + δ) smallest-magnitude entries, there are at least s entries from . But all entries from have |(x 1 ) j | ≤ , for otherwise the assumption x 1 − x * 2 ≤ would be violated.
This means that only those entries from , which also satisfy |(x 1 ) j | ≤ , could be chosen by AKRON, and there are only R of them. Hence δ = R will yield the optimal l 0 -solution.
Lemma 1 provides a sufficient condition on δ for optimality of AKRON, namely, if δ = R, then we are guaranteed the optimal l 0 -solution. Since this condition is not necessary, we could reach optimality with δ ≤ R.
Proposition 2 Consider the system in ( 4) with the optimal l 0 -solution x * and approximate l 1 -solution x 1 . Let be the set of indices of non-zero elements in x * . Assume that the non-zero entries of x * are bounded from below, i.e., x * j ≥ η, for all j ∈ and for some η > 0.
Proof Since |(x * ) j | ≥ η for j ∈ , we have where we used the fact that δ ≤ R from Lemma 1.
which completes the proof.
Although Propositions 1 and 2 derive theoretical bounds for the choice of the neighborhood radius δ to recover the optimal sparsest solution, we found in our experiments below that relatively small values of δ are sufficient to achieve a balance between desired accuracy and computational complexity.

Results
In this section, we present an empirical analysis of the AKRON-KF and its smoother, including comparisons to other approaches proposed for detecting the relationships between different genes in a molecular network. The experiments include a number of carefully designed synthetic data sets, as well as a real-world data set, namely the fruit fly.

Overview of experimental protocols
Our experiments are conducted on real-world and synthetic data. The advantage of the synthetic data are that the ground truth networks are known; therefore, we can calculate different statistics about the reconstruction error of the network. Unfortunately, we do not have a clear view of the "ground truth" for real-world data. Therefore, we use findings from the life sciences that have studied these networks and were able to infer gene-to-gene relationships that are well established [18].
Our experiments make use of the following algorithms for a sparse reconstruction of a time-varying network: • l 1 -KF(S): This algorithm is the output of the Kalman filter with the l 1 projection applied to the state vector. The (S) indicates whether the smoother was implemented.
• AKRON-KF(S): This is the proposed approach using the output of l 1 -KF(S) to seed AKRON. It is also implemented with and without the smoother.
Both of the above algorithms can reconstruct a network that represents the interactions between genes. We compute the true positive (TP), true negative (TN), false positive (FP) and false negative (FN) rates. These rates are summarized through accuracy (acc), sensitivity (sen), specificity (spe), and Matthew's correlation coefficient (mcc), which are defined below.
Matthew's correlation coefficient provides a more balanced statistic for examining the overall trade-offs between the different rates (i.e., TP, TN, FP, and FN).

Results on synthetic data
Synthetic time-varying networks are simulated to evaluate the efficacy of the proposed AKRON-KF(S) on data that we have complete control over. All results in this section are presented as the average over 25 monte carlo simulations. Averaging is performed because there could be a large degree of variation in the time-varying networks that are randomly generated. First, we evaluate the impact of α in (3)  AKRON-KF is the better performer across the statistics that we collected. Thus, AKRON significantly improves the previous implementation of sparse Kalman filters for timevarying networks. Furthermore, AKRON-KF detects the location of the edges in the simulated networks (see Fig. 2b). Simply using the solution from l 1 -KF for a small α is not enough to find the location of the zeros. In fact, α needs to be close to one to achieve a high accuracy at edge detection (i.e., (3) will place a large weight on the l 1 penalty and a small weigh on the error). Given these results, we choose α = 0.2 for the remainder of the experiments since this value provides a reasonable trade-off between the different statistics that were assessed. Figure 2e shows that AKRON-KF is also superior to Lasso-KF as assessed by Matthew's correlation coefficient, which provides a balanced measure. Second, we expand the synthetic experiments to evaluate the impact of δ in AKRON and the dimensionality of the kernel of X, and we also evaluate the impact that the smoothing has on the reconstruction of the networks. We simulated three different kernel dimensions and values for δ. Table 1 shows the outcome of these experiments. The entries in the table are presented as A/B, where A and B are the results from AKRON-KF and l 1 -KF, respectively. The table is also divided in half to separate the results for the Kalman filter and Kalman smoother. Similar to the first experiment, we observe that AKRON-KF typically outperforms l 1 -KF in nearly every statistic and the results can be quite significant. Furthermore, systems with a large kernel dimension (i.e., large p small n) benefit significantly from a larger value of δ. For example, consider a network that is 9 × 50 × 4. δ = 1 provides a little reduction in error; however, increasing δ to 3, significantly reduces the error and increases the other statistics. Finally, we observe that smoothing improves the error of the system; however, we should note that this improvement comes at a computational cost.

Results on Flybase
The application of interest is the inference of the time-varying wing-muscle genomic network of the Drosophila Melanogaster (fruit fly). The Drosophila's microarray dataset originally consists of 4028 genes taken over 66 different time points [18]. The data includes 4 stages of the Drosophila's life: embryonic (samples 1 through 30), larval (samples 31 through 40), pupal (samples 41 through 58), and adulthood (samples 59 through 66). Flybase hosts a list of undirected gene interactions [19]. We set α = 0.2 based on the experiments in the previous section for l 1 -KF and AKRON-KF.
In this application, we considered a list of 11 genes that are responsible for the wing muscle development, which has been considered by many researchers before [7][8][9]20]. The embryonic, pupal, and larval stages are undersampled to 9 observations in each stage that were used in the reconstruction of the 11-gene network in each developmental epoch. The other two connections appear through one medium gene (Actn,sls) appears in the embryonic, pupal and adulthood phases through one additional gene and (twin, eve) appears through one or more additional genes only in the embryonic and larval phases.
The networks reconstruction using the l 1 -KF and AKRON-KF with smoothers are shown in Figs. 5a-d and 6a-d, respectively. The smoothing provides very similar network topologies to the ones without the smoother; however, we did observe that the networks were sparser in the larval stage (see Figs. 4b and 6b). Note that while Fig. 7 is considered the ground truth, the could exist relationships that have not yet been discovered. These statistics are shown in Table 2, which again shows the benefit of using the AKRON-KF over the l 1 -KF. Table 3 lists all previous algorithms that were applied to this genetic network. Only the AKRON-KF, l 1 -KF [13], SMURC [4] and Dynamic Bayesian networks [7] considered time-varying networks; and, hence, were able to distinguish the different phases in the network. The other algorithms (minimum description length [20], random graph model [9], and nonparametric Bayesian regression [8]) assumed a stationary network, and hence it is not clear at which stage the detected connections develop. The AKRON-KF along with the l 1 -KF are the only algorithms able to recover all known interactions and specify the developmental stage where these interactions occur. Although the l 1 -KF also finds all reported interactions, the networks are denser (less sparse) than the AKRON-KF. With regard to the MDL results in Table 2, we have not reported the interations. The MDL authors in [20] inferred a single network, using all 66 time points, that characterizes the entire Drosophila's life cycle. In particular, the MDL approach is stationary and does not differentiate between the phases or time-varying epochs of the data. We wanted to report a b c d The known undirected gene interactions in the Drosophila's 11-gene wing muscle network as determined from Flybase [18] the MDL findings as they are in the literature. Notice that AKRON used less points and found more correct interactions.

Conclusion
In this work, we addressed the problem of inferring time-varying molecular networks as a tracking problem that can be solved using the Kalman filter. The major difficulty, however, is that there is not a sufficient number of observations at each time point, which makes the state-space model unobservable and the tracking senseless. Fortunately, molecular networks are known to be sparse because the dynamics of every gene are governed by only a small number of genes. By incorporating the sparsity condition, we show that the tracking problem becomes feasible. We presented the AKRON Kalman filter, which builds on our previous work on the Lasso-Kalman filter (l 1 -KF). Our proposed approach leverages the AKRON algorithm to find a sparser solution that is more representative of the ground truth. The proposed tracker/smoother first computes the output of l 1 -KF; then explores growing neighborhoods of the l 1 -projection to look for sparser solutions, eventually reaching the optimal sparsest estimate. The size of these neighborhoods is a tunable parameter that depends on