Skip to main content

QIGTD: identifying critical genes in the evolution of lung adenocarcinoma with tensor decomposition

Abstract

Background

Identifying critical genes is important for understanding the pathogenesis of complex diseases. Traditional studies typically comparing the change of biomecules between normal and disease samples or detecting important vertices from a single static biomolecular network, which often overlook the dynamic changes that occur between different disease stages. However, investigating temporal changes in biomolecular networks and identifying critical genes is critical for understanding the occurrence and development of diseases.

Methods

A novel method called Quantifying Importance of Genes with Tensor Decomposition (QIGTD) was proposed in this study. It first constructs a time series network by integrating both the intra and inter temporal network information, which preserving connections between networks at adjacent stages according to the local similarities. A tensor is employed to describe the connections of this time series network, and a 3-order tensor decomposition method was proposed to capture both the topological information of each network snapshot and the time series characteristics of the whole network. QIGTD is also a learning-free and efficient method that can be applied to datasets with a small number of samples.

Results

The effectiveness of QIGTD was evaluated using lung adenocarcinoma (LUAD) datasets and three state-of-the-art methods: T-degree, T-closeness, and T-betweenness were employed as benchmark methods. Numerical experimental results demonstrate that QIGTD outperforms these methods in terms of the indices of both precision and mAP. Notably, out of the top 50 genes, 29 have been verified to be highly related to LUAD according to the DisGeNET Database, and 36 are significantly enriched in LUAD related Gene Ontology (GO) terms, including nuclear division, mitotic nuclear division, chromosome segregation, organelle fission, and mitotic sister chromatid segregation.

Conclusion

In conclusion, QIGTD effectively captures the temporal changes in gene networks and identifies critical genes. It provides a valuable tool for studying temporal dynamics in biological networks and can aid in understanding the underlying mechanisms of diseases such as LUAD.

Peer Review reports

Introduction

Complex networks are common in real life and the research has shifted from discovering the macro laws of structure and dynamics to uncovering the role of macro elements as nodes in real systems [1,2,3,4,5]. In the past several decades, designing effective centrality methods to identify critical nodes in complex networks has become a significant topic. Many methods have been designed to measure the importance of nodes in static networks, such as degree [6], closeness [7] and betweenness [8, 9]. Those centrality measures have been used in predicting essential proteins [10], identifying influential nodes from social networks [11, 12], finding critical links [13]. With the development and evolution of organisms, the structure of biological networks changes dynamically over time [14,15,16,17], by increasing or decreasing the number of nodes or edges [18, 19]. The research on dynamic biological networks and the identification of critical nodes are helpful to better understand biological processes [20].

At present, several methods have paying attention to identify critical genes in biological networks [21,22,23]. Liu et al. [24] identified potential critical genes related to the pathogenesis and prognosis of gastric cancer by protein-protein interaction (PPI) network and Cox proportional hazards. Li et al. [25] identified critical miRNAs, genes and transcription factors of lung adenocarcinoma by analyzing Gene Ontology terms, pathways, and PPI networks. Liu et al. [26] used the robust rank aggregation method, re-constructed the PPI network and performed modules analysis to identify critical genes. However, most of those methods are based on static network and ignoring the stage heterogeneity of complex diseases. He et al. [20] investigated miRNAs in serum exosome-like microvesicles to identify stage-common and stage-specific miRNAs, but ignored the connections between stages. Kim et al. [27] defined the temporal version of degree, closeness and betweenness on temporal networks, which reduced a dynamic network to a static one with directed flows. Nevertheless, those methods simply calculated the degree, closness and betweenness centrality of nodes in different time snapshots and obtained a mean value. The information of nodes changing with time would be lost. In our previous studies, we have proven that the studies of cancer stages is important for understanding the evolution of cancers [28, 29].

Fig. 1
figure 1

The module of forming the tensor and decomposing tensor. a The way of constructing temporal network into tensor. The edges, consist of inter-stage edges and intra-stage edges are both taken into consideration. The yellow one is the network of Stage I, the green one is Stage II, the blue one is Stage III and the red one is Stage IV. The black represents inter-stage network. b The decomposition of tensor

In this study, a lightweight and effective method that quantify the importance of genes with tensor decomposition (QIGTD) was proposed to identify the critical genes along with the progression of lung adenocarcinoma (LUAD). To start with, a time-series network was constructed to represent the molecular connections of individual pathological stages of LUAD, and a third-order tensor was employed to capture topological information of both intra-stage and inter-stage. The intra-stage topological connections were obtained from gene co-expression relationships, while the inter-stage topological connections were calculated by combining both local similarities and a pre-defined parameters. Then, a tensor decomposition method was proposed to identify critical gene from the temporal network, which considers not only the intra-stage topological information, but also the inter-stage temporal characteristics. It is also a learning free method, which can work well with a small amount of samples. The precision and mAP are presented to evaluate the performance of QIGTD, and the other three state-of-the-art methods: temporal versions of degree, betweenness and closeness [30, 31] were employed as benchmark methods. The overall framework of the proposed method was show in Fig. 1.

Materials and methods

Data collection and processing

The critical genes are identified in the Stage I - Stage IV temporal networks of LUAD. The LUAD related gene expression dataset were downloaded from Xena (https://xenabrowser.net/), where there are 206 samples in Stage I, 93 samples in Stage II, 59 samples in Stage III, and 20 samples in Stage IV.

The networks of the four stages are constructed separately according to the PCC and the obtained p-value. In this study, the selection criteria were p-value\(< 0.01\) and \(|PCC| > 0.8\) according to the characteristics of biological networks. As a result, there are 17,830 edges in Stage I, 21,951 in Stage II, 11,170 in Stage III and 611 edges in Stage IV.

The known critical genes can be obtained from DisGeNET (https://www.disgenet.org/), where there are 3,899 genes appearing in the temporal network, and 566 of them have been verified to be associated with LUAD.

The temporal network construction

The network representing each stage of LUAD was constructed with pearson correlation coefficient (PCC) calculated with gene expression. Besides the connections within stage, there was also a fixed set of genes connected between different stages of LUAD.

Currently, there are two typical ways to construct connections between networks of adjacent stages. One is to use a fixed constant to represent the interlayer relationship, and the value of the parameter can indicate the strength of the interlayer relationship. The other method is to use similarity metrics to measure inter layer relationships. It was stated that the features in temporal networks are studied by converting time into a snapshot sequence of the network, so the similarity measurement method between nodes in static networks can be extended to the node relationships between adjacent layers.

In this study, a novel method for measuring inter layer node similarity in temporal networks was proposed by combining the calculation of node local similarity index with fixed parameters, which is

$$\begin{aligned} TLS_{i}^{(t, t+1)}= C + \frac{\sum _{j}w_{ij}^{t} + \sum _{j}w_{ij}^{t+1}}{2N} + \frac{| SN_{i}^{(t, t+1)} |}{N} \end{aligned}$$
(1)

where C is a constant parameter that indicate the strength of the interlayer relationship, N is the number of vertices in the network, if \(\sum _{j}w_{ij}^{t} = 1\), then the vertex i and vertex j has connection in the network of \(G_t\), while if \(\sum _{j}w_{ij}^{t} = 0\), then the vertex i and vertex j does not have connection. \(SN_{i}^{(t, t+1)}\) represents the number of common vertices in two adjacent network \(G_t\) and \(G_{t+1}\).

The first part of Eq. 1 is a constant parameter, which can be setup according to the experimental requirement. If a relatively small parameter was used, then it enhances the importance of vertices with high inter layer similarity in temporal networks, while selecting larger fixed parameters strengthens the importance of isolated vertices. In this study, the value of C was set to 0.5 based on the characteristics of the biological network. The second part of Eq. 1 is network local similarity, which represents the proportion of local neighbors of adjacent snapshot nodes in the entire network at different times. The third part of Eq. 1 represents the proportion of shared neighbors of adjacent snapshot vertices in the entire network. Hence, the overall value of TLS can characterize the degree of vertices in a temporal network and the inter layer relationship of node adjacency in different time snapshots. The larger the TLS value, the higher the probability of the node continuously appearing on two snapshot layers, and the more stable the node adjacency relationship.

The tensor description of the temporal network

The temporal network was represented as \(X= \left\{ G_{t},C \right\}\). The \(G_{t}\) is the network of different stages of LUAD and C is the set of interconnections between different networks. The elements in C were concerned as ‘cross network’. The temporal network could be represented in tensor as follows. Let \(X\in \mathbb {R}^{I\times J\times K}\). The elements can be defined according to Formula 2.

$$\begin{aligned} \ X_{ijk}=\left\{ \begin{array}{ll} w_{ijk}& \text { if } X_{ijk}\in G_{t} \\ c_{ijk}& \text { if } X_{ijk}\in C \\ 0& \text { otherwise } \end{array}\right. \end{aligned}$$
(2)

where \(0\le i<I,0\le j<J,0\le k<K\), the \(w_{ijk}\) is the element in \(G_{t}\) and \(c_{ijk}\) is the element in C.

The process to form the tensor from temporal network is shown in Fig. 1a, where different colors represent different stages. The edges between the different stages compose the cross network. There are four kinds of networks in \(G_{t}\) and three kinds of cross networks in C.

The tensor could be transformed into matrix by unfolding or flattening. In this study, we expanded the n-order tensor X along mode-n into a matrix \(X_{n}\). The mode-1 corresponds to the 1-order of tensor, mode-2 corresponds to the 2-order of tensor and mode-3 corresponds to the 3-order of tensor. After the matricization of the tensor, the Kronecker, Khatri-Rao, and Hadamard products can be calculated respectively as follows.

$$\begin{aligned} A\odot B=\left[ a_{1}\otimes b_{1} a_{1}\otimes b_{1} a_{2}\otimes b_{3} \cdots a_{N}\otimes b_{N} \right] \end{aligned}$$
(3)
$$\begin{aligned} A\otimes B = \left| \begin{array}{ccc} a_{11}B & \cdots & a_{1N}B\\ \vdots & \ddots & \vdots \\ a_{N1}B & \cdots & a_{NN}B\\ \end{array}\right| \end{aligned}$$
(4)
$$\begin{aligned} A*B = \left| \begin{array}{ccc} a_{11}B_{11} & \cdots & a_{1N}B_{1N}\\ \vdots & \ddots & \vdots \\ a_{N1}B_{N1} & \cdots & a_{NN}B_{NN}\\ \end{array}\right| \end{aligned}$$
(5)

The Canonical Polyadic (CP) decomposition of tensor

A tensor can be expressed as the sum of finite rank tensors. In this study, the 3-order could be decomposed as follows.

$$\begin{aligned} X\approx [\left[ A,B,C \right] ] = \sum \limits _{r=1}^{R}a_{r}\circ b_{r}\circ c_{r} \end{aligned}$$
(6)

where

$$\begin{aligned} & X\in \mathbb {R}^{I\times J\times K} \\ & A=(a_{1},a_{2},a_{3},...,a_{R}) \in \mathbb {R}^{I\times R}\\ & B=(b_{1},b_{2},b_{3},...,b_{R}) \in \mathbb {R}^{J\times R}\\ & C=(c_{1},c_{2},c_{3},...,c_{R}) \in \mathbb {R}^{K\times R}\\ \end{aligned}$$

The symbol “\(\circ\)” is the outer product, the vector \(a_{r}\in \mathbb {R}^{I}\) is column r of factor matrix \(A\in \mathbb {R}^{I\times R}\), the vector \(b_{r}\in \mathbb {R}^{J}\) is column r of factor matrix \(B\in \mathbb {R}^{J\times R}\), and the vector \(c_{r}\in \mathbb {R}^{K}\) is column r of factor matrix \(C\in \mathbb {R}^{K\times R}\).

The outer product of these vectors is a rank one tensor, so the R rank-one tensors was used to approximate the original data, which is shown in Fig. 1b. By utilizing the factor matrix, the 3-order tensor can be decomposed as follows.

$$\begin{aligned} \begin{array}{l} min_{A}\sum \limits _{i,j,k} \left(x_{ijk}-\sum \limits _{r=1}^{R}a_{ir}b_{jr}c_{kr}\right)^2 = min_{A}\left\| X_{(1)}-A(C\odot B)\top \right\| ^2_{F} \\ min_{B}\sum \limits _{i,j,k} \left(x_{ijk}-\sum \limits _{r=1}^{R}a_{ir}b_{jr}c_{kr}\right)^2 = min_{B}\left\| X_{(2)}-B(C\odot A)\top \right\| ^2_{F} \\ min_{C}\sum \limits _{i,j,k} \left(x_{ijk}-\sum \limits _{r=1}^{R}a_{ir}b_{jr}c_{kr}\right)^2 = min_{B}\left\| X_{(2)}-B(C\odot A)\top \right\| ^2_{F} \\ \end{array} \end{aligned}$$
(7)

The formulas can be approximately as

$$\begin{aligned} \begin{array}{l} X_{(1)}\approx [A(C\odot B)\top ] \\ X_{(2)}\approx [B(C\odot A)\top ] \\ X_{(3)}\approx [C(B\odot A)\top ] \\ \end{array} \end{aligned}$$
(8)

Consequently, the \(A_{[n]}\) could be calculated with back propagation and gradient descent. Since the goal is to make the tensor \(\hat{X}\) estimated by ABC as close as possible to the original tensor X, the loss function is set as follows.

$$\begin{aligned} \begin{array}{l} Loss_{1} = \frac{1}{2}[X_{(1)}-A(C\odot B)\top ] \\ Loss_{2} = \frac{1}{2}[X_{(2)}-B(C\odot A)\top ] \\ Loss_{3} = \frac{1}{2}[X_{(3)}-C(B\odot A)\top ] \\ \end{array} \end{aligned}$$
(9)

The partial derivative of A, B and C could be quantified and the parameters could be updated by the following formulas.

$$\begin{aligned} \begin{array}{l} A = A-\alpha * \frac{\partial Loss_{1}}{\partial A} \\ B = B-\alpha * \frac{\partial Loss_{2}}{\partial B} \\ C = C-\alpha * \frac{\partial Loss_{3}}{\partial C} \\ \end{array} \end{aligned}$$
(10)

where \(\alpha\) is the learning rate. The vertex centrality is now can be calculated as

$$\begin{aligned} s_i = \frac{1}{T}\sum \limits _{t=1}^{T}((a_1)_i (c_1)_t + (b_1)_i (c_1)_t) \end{aligned}$$
(11)

In this study, where \(I=J\) represents the number of genes, and \(K=4\) represents the number of layers in the network. The importance score of every gene could be determined with either \(I = A\odot C\) or \(I = B\odot C\). Additionally, if R is set to 1, so \(I = A\) or \(I = B\).

Results

The evaluation indices

The performance of QIGTD is evaluated by the precision, mean average precision (mAP) and fold enrichment.

The precision show the true positive ratio by giving a list of predictions, which is

$$\begin{aligned} precision = \frac{TP}{TP + FP} \end{aligned}$$
(12)

The primary objective revolves around the task of ranking, where precision alone may not insufficiently reflect the algorithm’s performance. The mAP does not only consider the accuracy of identifying the critical genes, but also considers the differences in genes order. More robustly, the mAP is utilized to reflect the model’s performance, which can be defined as Formula 13.

$$\begin{aligned} AP\left( q_{i} \right) = \frac{\sum _{i\in i_{1},i_{2},...,i_{M}} P_{i}\times L_{i}}{M} \end{aligned}$$
(13)

where \(P_{i}=\frac{1}{\sum _{j=1}^{i}j}\), and the \(L_{i}\) is the label of the i-th gene. In this study, the label comprises 0 and 1. Since there is only one query in the problem, so the mAP is equal to the AP.

Comparing with the benchmark methods

The important score of every gene in temporal network were calculated with three benchmark methods: T-degree, T-closeness, T-betweenness and the proposed QIGTD method.

The T-degree is defined as

$$\begin{aligned} T\text {-}deg(v) = \frac{\sum _{t=1}^{T} D_t(v)}{T} \end{aligned}$$
(14)

where \(D_t(v)\) represents the degree of vertex v at the \(t^{th}\) network snapshot, and T is the total number of network snapshot.

The T-closeness is defined as

$$\begin{aligned} T\text {-}clo(v) = \sum \limits _{1 \le t \le T} \sum \limits _{u \in V\setminus v} \frac{1}{\Delta _{t,T(u,v)}} \end{aligned}$$
(15)

where \(\Delta _{t,T(u,v)}\) represents the shortest path length between vertex u and vertex v.

The T-betweenness is defined as

$$\begin{aligned} T\text {-}bet(v) = \sum \limits _{1 \le t \le T} \sum \limits _{s \ne v \ne d \in V} \frac{\sigma (s,d,v)}{\sigma (s,v)} \end{aligned}$$
(16)

where \(\sigma (s,d,v)\) represents the number of shortest path between vertex s and vertex v that through vertex d, while\(\sigma (s,v)\) represents the number of all shortest path between vertex s and vertex v.

The precision of different methods are summarized in Table 1. The QIGTD consistently performs better than the other three state-of-the-art methods from the top 10 to top 500 predictions. In top 10, the precision of QIGTD is 0.50, while the best in the other three methods is T-betweenness with the precision of only 0.20. This situation also hold for predictions from the top 50 to top 500. The top three important genes calculated by QIGTD are highly related to LUAD.

Table 1 The precision of QIGTD and the three SOTA methods

The results of mAP@M, presented in Table 2, indicate that QIGTD outperforms the other three methods. QIGTD exhibits superior performance in accurately identifying the LUAD related genes without learning.

Table 2 The mAP of the four methods

The fold enrichment is carried out to measure the performance of the model, which indicates how precisely the method can locate disease-related genes. QIGTD consistently exhibits significantly higher values compared to the other three methods as shown in Fig. 2.

Fig. 2
figure 2

The curve of fold enrichment in the top 500 genes. The x axis is the rank of genes in every method. The y axis is the score of fold enrichment. The fold enrichment could be calculated with precision and the correlation rate between all genes and LUAD

Biological evidences of the predictions

Table 3 illustrated top 10 genes identified by the four methods as well as their rank. Among the top 10 genes identified by QIGTD, 5 genes are associated with LUAD in DisGenNET, indicating a higher level of association with the disease compared to the other methods.

Additionally, the rest 5 genes have the potential to become biomarkers of LUAD. The NCAPH was verified to be negatively associated with Mcl-1 in non-small cell lung cancer [32]. Nguyen et al. [33] found that CDCA5 (cell division cycle associated 5) upregulated in the majority of lung cancers. The study of Wei et al. [34] found that the knockdown of HJURP inhibits non-small cell lung cancer cell proliferation, migration, and invasion. Coincidentally, there are many researchers found that BUB1 may hopefully become a novel marker and therapeutic target for LUAD [35,36,37]. The BUB1B was also identified to be a significant biomarker for a poor prognosis and poor clinicopathological outcomes in patients with LUAD [38].

Table 3 The top 10 genes identified by the four methods and verified by DisGeNET

The differentially expressed analysis is also performed on the top 10 genes, which is shown in Fig. 3. The blue box is the gene expression in control and the rest 4 boxes are that in four stages. The top 10 genes are obviously differentially expressed in stages compared to control.

Fig. 3
figure 3

The boxplot of the expression of top 10 genes. The different color in the plots is different stages. The blue box represents the expression in control. It shows that the genes identified with QIGTD are differentially expressed genes

Moreover, in top 50 genes identified by QIGTD, 29 genes are verified have a strong association with the disease, which is demonstrated in Table 4.

Table 4 The top 50 genes identified by QIGTD and verified in DisGeNET

The sub-networks of top 10 genes are extracted as Fig. 4a. The figure shows the subgraphs of top 10 genes of Stage I to Stage IV respectively. The thickness of the edge in the figure indicates the weight of the edge. The thick edges gradually decrease from Stage I to Stage IV, and some edges also disappear at Stage IV, thus the subgraphs of top 10 genes exhibit the evolution of LUAD.

The sub-networks of top 50 genes are extracted as Fig. 4b. The red nodes are top 10 genes, the purple are top 20 genes and the green are top 50. The thickness of the edges is not obviously as there are too many edges in the networks, but the sub-networks gradually become sparse with the stages, which shows a signal of the evolution of LUAD.

Fig. 4
figure 4

The sub networks of Top 10 genes and Top 50 genes

Among top 50 genes, 36 genes are enriched to 5 GO terms in Fig. 5. The GO terms are nuclear division, mitotic nuclear division, chromosome segregation, organelle fission and mitotic sister chromatid segregation, all of which have been verified to be associated with LUAD [39,40,41]. The different color of the ribbon represents the different GO terms. The numbers of ribbon means the number of GO terms that genes enrich. For example, the DLGAP5 has five ribbons, which means it enriches all 5 GO terms. The SPC25 has a green ribbon, which means it only enriches the chromosome segregation.

Fig. 5
figure 5

The GO enrichment of top 50 genes. 5 Go terms chosen from the result of the enrichment are exhibited. The ribbons in different colors represents different GO terms. The number of the ribbons in gene is the number of GO terms it enriches. The boxed genes are LUAD-related genes. There are 32 genes enriched on nuclear division, 29 enriched on mitotic nuclear division, 30 on chromosome segregation, 32 on organelle fission and 25 enriched on mitotic sister chromatid segregation

Discussion

The investigation of critical genes in temporal networks has become increasingly prevalent. Most of previous studies concentrate on the the structure of the network itself, but ignore the connections and changes between network at adjacent stages. Inspired by tensor decomposition, QIGTD is proposed in this research. Both the connections of genes inter and intra are taken into consideration.

The experimental results show that QIGTD outperforms the other three SOTA methods, especially in identifying the most critical genes. In the result, 5 genes in top 10 identified by QIGTD have been verified to be critical. At the same time, the other five genes may also be critical according to recent researches. The top 10 genes also differentially expression in stages compared to control. Furthermore, 29 genes are highly related to LUAD in top 50. The GO terms show indicate the top 50 genes ranked by QIGTD is associated with LUAD. The sub network of top 10 to top 50 undergoes changes across stages, which means the genes identified are potential to be biomarkers of the evolution of LUAD.

Additionally, QIGTD is a learning free and effective method, which does not require too many samples. The QIGTD has a low computational complexity and can be utilized in large-scale networks, which also could be easily embedded into the research of other complex problems.

Availability of data and materials

The publicly dataset of LUAD could be downloaded from Xena: https://xenabrowser.net/. The gene-disease association information could be collected via DisGeNET: https://www.disgenet.org/.

References

  1. Boccaletti S, Bianconi G, Criado R, Del Genio CI, Gómez-Gardenes J, Romance M, et al. The structure and dynamics of multilayer networks. Phys Rep. 2014;544(1):1–122.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Jin S, Li Y, Pan R, Zou X. Characterizing and controlling the inflammatory network during influenza A virus infection. Sci Rep. 2014;4(1):1–14.

    Article  Google Scholar 

  3. Li Y, Jin S, Lei L, Pan Z, Zou X. Deciphering deterioration mechanisms of complex diseases based on the construction of dynamic networks and systems analysis. Sci Rep. 2015;5(1):1–11.

    Google Scholar 

  4. Morone F, Makse HA. Influence maximization in complex networks through optimal percolation. Nature. 2015;524(7563):65–8.

    Article  CAS  PubMed  Google Scholar 

  5. Lü L, Chen D, Ren XL, Zhang QM, Zhang YC, Zhou T. Vital nodes identification in complex networks. Phys Rep. 2016;650:1–63.

    Article  Google Scholar 

  6. Bonacich P. Factoring and weighting approaches to status scores and clique identification. J Math Sociol. 1972;2(1):113–20.

    Article  Google Scholar 

  7. Freeman LC. Centrality in social networks conceptual clarification. Soc Networks. 1978;1(3):215–39.

    Article  Google Scholar 

  8. Zhang J, Luo Y. Degree centrality, betweenness centrality, and closeness centrality in social network. In: Proceedings of the 2017 2nd International Conference on Modelling, Simulation and Applied Mathematics (MSAM2017), vol. 132. Atlantis press. 2017. pp. 300–303.

  9. Freeman LC. A set of measures of centrality based on betweenness. Sociometry. 1977;40:35–41.

  10. Tang X, Wang J, Zhong J, Pan Y. Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans Comput Biol Bioinforma. 2013;11(2):407–18.

    Article  Google Scholar 

  11. Srinivas A, Velusamy RL, Identification of influential nodes from social networks based on Enhanced Degree Centrality Measure. In: 2015 IEEE international advance computing conference (IACC). IEEE; 2015. pp. 1179–84.

  12. Okamoto K, Chen W, Li XY. Ranking of closeness centrality for large-scale social networks. In: International workshop on frontiers in algorithmics. Springer; 2008. pp. 186–195.

  13. Veremyev A, Prokopyev OA, Pasiliao EL. Finding critical links for closeness centrality. INFORMS J Comput. 2019;31(2):367–89.

    Article  Google Scholar 

  14. Hintze A, Adami C. Evolution of complex modular biological networks. PLoS Comput Biol. 2008;4(2):e23.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Tenazinha N, Vinga S. A survey on methods for modeling and analyzing integrated biological networks. IEEE/ACM Trans Comput Biol Bioinforma. 2010;8(4):943–58.

    Article  Google Scholar 

  16. Pavlopoulos GA, Secrier M, Moschopoulos CN, Soldatos TG, Kossida S, Aerts J, et al. Using graph theory to analyze biological networks. BioData Min. 2011;4(1):1–27.

    Article  Google Scholar 

  17. Alon U. Network motifs: theory and experimental approaches. Nat Rev Genet. 2007;8(6):450–61.

    Article  CAS  PubMed  Google Scholar 

  18. Kramer MA, Kolaczyk ED, Kirsch HE. Emergent network topology at seizure onset in humans. Epilepsy Res. 2008;79(2–3):173–86.

    Article  PubMed  Google Scholar 

  19. Gao W, Gilmore JH, Giovanello KS, Smith JK, Shen D, Zhu H, et al. Temporal and spatial evolution of brain network topology during the first two years of life. PLoS ONE. 2011;6(9):e25278.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Yang Q, He S, Huang L, Shao C, Nie T, Xia L, et al. Serum Exosomal miRNAs as Biomarkers of Early Diagnosis and Progression in Parkinson’s Disease. Transl Neurodegener. 2021;10:25.

  21. Liu X, Hong Z, Liu J, Lin Y, Rodríguez-Patón A, Zou Q, et al. Computational methods for identifying the critical nodes in biological networks. Brief Bioinform. 2020;21(2):486–97.

    Article  CAS  PubMed  Google Scholar 

  22. Abedi M, Gheisari Y. Nodes with high centrality in protein interaction networks are responsible for driving signaling pathways in diabetic nephropathy. PeerJ. 2015;3:e1284.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Rezaei J, Zare Mirakabad F, Marashi SA, MirHassani SA. The assessment of essential genes in the stability of PPI networks using critical node detection problem. AUT J Math Comput. 2022;3(1):59–76.

    Google Scholar 

  24. Liu X, Wu J, Zhang D, Bing Z, Tian J, Ni M, et al. Identification of potential key genes associated with the pathogenesis and prognosis of gastric cancer based on integrated bioinformatics analysis. Front Genet. 2018;9:265.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Li J, Li Z, Zhao S, Song Y, Si L, Wang X. Identification key genes, key miRNAs and key transcription factors of lung adenocarcinoma. J Thorac Dis. 2020;12(5):1917.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Liu L, He C, Zhou Q, Wang G, Lv Z, Liu J. Identification of key genes and pathways of thyroid cancer by integrated bioinformatics analysis. J Cell Physiol. 2019;234(12):23647–57.

    Article  CAS  PubMed  Google Scholar 

  27. Kim H, Anderson R. Temporal node centrality in complex networks. Phys Rev E. 2012;85(2):026107.

    Article  Google Scholar 

  28. Chen B, Wang Y, Zhang J, Han Y, Benhammouda H, Bian J, et al. Specific feature recognition on group specific networks (SFR-GSN): a biomarker identification model for cancer stages. Front Genet. 2024;15:1407072.

  29. Chen B, Chakrobortty N, Saha AK, Shang X. Identifying colon cancer stage related genes and their cellular pathways. Front Gen. 2023;14:1120185.

  30. Tang J, Musolesi M, Mascolo C, Latora V, Nicosia V. Analysing information flows and key mediators through temporal centrality metrics. In: Proceedings of the 3rd Workshop on Social Network Systems. New York, Paris: Association for Computing Machinery; 2010. pp. 1–6. https://doi.org/10.1145/1852658.1852661.

  31. Tsalouchidou I, Baeza-Yates R, Bonchi F, Liao K, Sellis T. Temporal betweenness centrality in dynamic graphs. Int J Data Sci Anal. 2020;9(3):257–72.

    Article  Google Scholar 

  32. Xiong YC, Wang J, Cheng Y, Zhang XY, Ye XQ. Overexpression of MYBL2 promotes proliferation and migration of non-small-cell lung cancer via upregulating NCAPH. Mol Cell Biochem. 2020;468(1):185–93.

    Article  CAS  PubMed  Google Scholar 

  33. Nguyen MH, Koinuma J, Ueda K, Ito T, Tsuchiya E, Nakamura Y, et al. Phosphorylation and activation of cell division cycle associated 5 by mitogen-activated protein kinase play a crucial role in human lung carcinogenesis. Cancer Res. 2010;70(13):5337–47.

    Article  CAS  PubMed  Google Scholar 

  34. Wei Y, Ouyang G, Yao W, Zhu Y, Li X, Huang L, et al. Knockdown of HJURP inhibits non-small cell lung cancer cell proliferation, migration, and invasion by repressing Wnt/\(\beta\)-catenin signaling. Eur Rev Med Pharmacol Sci. 2019;23(9):3847–56.

  35. Wang L, Li S, Wang Y, Tang Z, Liu C, Jiao W, et al. Identification of differentially expressed protein-coding genes in lung adenocarcinomas. Exp Ther Med. 2020;19(2):1103–11.

    PubMed  Google Scholar 

  36. Jeganathan K, Malureanu L, Baker DJ, Abraham SC, Van Deursen JM. Bub1 mediates cell death in response to chromosome missegregation and acts to suppress spontaneous tumorigenesis. J Cell Biol. 2007;179(2):255–67.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Zhou X, Yuan Y, Kuang H, Tang B, Zhang H, Zhang M. BUB1B (BUB1 Mitotic Checkpoint Serine/Threonine Kinase B) Promotes Lung Adenocarcinoma by Interacting with Zinc Finger Protein ZNF143 and Regulating Glycolysis. Bioengineered. 2022;13(2):2471–85.

  38. Chen J, Liao Y, Fan X. Prognostic and clinicopathological value of BUB1B expression in patients with lung adenocarcinoma: a meta-analysis. Expert Rev Anticancer Ther. 2021;21(7):795–803.

    Article  CAS  PubMed  Google Scholar 

  39. Sun ZY, Wang W, Gao H, Chen QF. Potential therapeutic targets of the nuclear division cycle 80 (NDC80) complexes genes in lung adenocarcinoma. J Cancer. 2020;11(10):2921.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Rao CV, Yamada HY, Yao Y, Dai W. Enhanced genomic instabilities caused by deregulated microtubule dynamics and chromosome segregation: a perspective from genetic studies in mice. Carcinogenesis. 2009;30(9):1469–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Chiang YY, Chen SL, Hsiao YT, Huang CH, Lin TY, Chiang IP, et al. Nuclear expression of dynamin-related protein 1 in lung adenocarcinomas. Mod Pathol. 2009;22(9):1139–50.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

Thanks to all members of the laboratory for their valuable discussion and comments.

Funding

This work was supported by the National Key R&D Program of China under Grant No. 2021YFA1000402, the National Natural Science Foundation of China under Grant No. 61972320, and Xi’an municipal bureau of science and technology under Grant No. 22YXYJ0057.

Author information

Authors and Affiliations

Authors

Contributions

B.C. initialized this study. J.Z. conducted the numerical experiments and drafted the manuscript. C.S. gave suggestions many times, also gave idea of some part of model to J.Z.. Everyone read the manuscript and revised it, and agreed with the final version.

Corresponding author

Correspondence to Bolin Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, B., Zhang, J., Shao, C. et al. QIGTD: identifying critical genes in the evolution of lung adenocarcinoma with tensor decomposition. BioData Mining 17, 30 (2024). https://doi.org/10.1186/s13040-024-00386-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13040-024-00386-w

Keywords