RNSCLC-PRSP software to predict the prognostic risk and survival in patients with resected T1-3N0–2 M0 non-small cell lung cancer
BioData Mining volume 12, Article number: 17 (2019)
The clinical outcomes of patients with resected T1-3N0–2M0 non-small cell lung cancer (NSCLC) with the same tumor-node-metastasis (TNM) stage are diverse. Although other prognostic factors and prognostic prediction tools have been reported in many published studies, a convenient, accurate and specific prognostic prediction software for clinicians has not been developed. The purpose of our research was to develop this type of software that can analyze subdivided T and N staging and additional factors to predict prognostic risk and the corresponding mean and median survival time and 1–5-year survival rates of patients with resected T1-3N0–2M0 NSCLC.
Using a Cox proportional hazard regression model, we determined the independent prognostic factors and obtained a prognostic index (PI) eq. PI = ∑βixi.
=0.379X1–0.403X2–0.267X51–0.167X61–0.298X62 + 0.460X71 + 0.617X72–0.344X81–0.105X91–0.243X92 + 0.305X101 + 0.508X102 + 0.754X103 + 0.143X111 + 0.170X112 + 0.434X113–0.327X122–0.247X123 + 0.517X133 + 0.340X134 + 0.457X143 + 0.419X144 + 0.407X145. Using the PI equation, we determined the PI value of every patient. According to the quantile of the PI value, patients were divided into three risk groups: low-, intermediate-, and high-risk groups with significantly different survival rates. Meanwhile, we obtained the mean and median survival times and 1–5-year survival rates of the three groups. We developed the RNSCLC-PRSP software which is freely available on the web at http://www.rnsclcpps.com with all major browsers supported to determine the prognostic risk and associated survival of patients with resected T1-3N0–2 M0 non-small cell lung cancer.
After prognostic factor analysis, prognostic risk grouping and corresponding survival assessment, we developed a novel software program. It is practical and convenient for clinicians to evaluate the prognostic risk and corresponding survival of patients with resected T1-3N0–2M0 NSCLC. Additionally, it has guiding significance for clinicians to make decisions about complementary treatment for patients.
Lung cancer is the first leading cause of cancer death among men and the second leading cause of cancer death for women worldwide . At present, the eighth edition of non-small cell lung cancer (NSCLC) tumor-node-metastasis (TNM) staging system developed and validated by the International Association for the Staging of Lung Cancer (IASLC) project is considered to be the most significant prognostic predictor and the main guider of postoperative supplementary treatment . The following factors were incorporated into the IASLC system: histological grade, gender, age, and performance status. No molecular prognostic factors are used in the clinic because of the lack of cross-validation, Even the new biomarker programmed cell death protein 1 ligand (PD-L1) is a predictive marker of good response to immunotherapy drugs but poor prognostic indicator of survival . However, clinicians know that the outcomes are diverse among resected NSCLC patients with the same TNM stage and other similar clinical features. Some die early after surgical treatment, while some remain alive, even living longer than expected. Therefore, for clinicians, subgroups of T and N staging and other more clinicopathological features should be considered in prognostic risk and survival prediction.
Recently, there have been many studies on the prognostic factors for patients with resected NSCLC [4,5,6,7]. Prognostic factors can be divided into clinical factors, tumor-related factors and treatment-related factors. TNM stage, gender, age, number of examined regional lymph nodes (NELNs), number of positive regional lymph nodes (NPLNs), surgery type, histological grade, histology, and marital status have been reported to be prognostic factors for patients with resected NSCLC [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22]. There have been few studies on T and N staging subgroups as prognostic factors. Meanwhile, some prognostic prediction tools, such as prognostic nomograms, scores, and survival models for patients with resected NSCLC, have been reported in many published studies [23,24,25,26,27]. Unfortunately, for clinicians who are busy in clinical work, it is inconvenient to use the TNM stage system and tools for which the results were inaccurate and vague. Therefore, we aimed to develop software that can conveniently, specifically, accurately predict the prognostic risk and survival of patients with T1-3N0–2M0 NSCLC. In the process of building the model, T and N staging subgroups and other more clinical features were analyzed as prognostic factors.
We collected information on patients from the Surveillance, Epidemiology, and End Results (SEER) database, which provides cancer statistics for U.S. patients. In this study, 6886 patients were obtained. Eligibility criteria included the following:  histological diagnosis of NSCLC;  suffering from only single primary NSCLC in their lifetime and had NSCLC between 2004 and 2014;  received resection only;  had definitive surgical information;  survival time equal to or greater than one month; and  ≥20 years old. Moreover, the following criteria were used to exclude patients from the study:  M1 stage or without definitive information on M stage;  without definitive information on primary site, laterality or histological grade;  with T4>7 and without definitive information on tumor size;  with T4 Inv, T4 Ipsi Nod and without definitive information on tumor extension;  with N3 stage or without definitive information on N stage;  without definitive information on the number of examined and positive regional lymph nodes;  unknown marital status and race. Figure 1 shows the flow chart of the process used to screen patients according to the inclusion and exclusion criteria. Clinicopathological characteristics and follow-up information were collected, as shown in Table 1, including gender, age, laterality, race, N stage, NELNs, NPLNs, surgery type, primary site, histological grade, histology, marital status, tumor extension, tumor size, survival months and status.
First, In this data set, approximately 70% of patients were randomly assigned to the training set (resulting in 4821 patients), while the remaining patients comprised the test set (resulting in 2065 patients). The training set was used to build the model, and the test set was used to verify the model. Second, based on the training set, the Cox proportional hazard regression model was used to identify independent prognostic factors and their model coefficients. Third, we obtained a prognostic index (PI) equation, which is the value of each independent prognostic factor and the sum of the corresponding regression coefficient product. Fourth, according to the quantile of the PI value, patients were divided into three risk groups: the low-, intermediary-, and high-risk groups with significantly different survival rates according to Kaplan-Meier analysis and log-rank test. Meanwhile, we obtained the mean and median survival times and 1–5-year survival rates of the three risk groups. We used a test set to verify the model. Finally, we developed a software program named RNSCLC-PRSP to predict the prognostic risk and survival of patients with resected T1-3N0–2M0 non-small cell lung cancer by selecting their clinicopathological features. The software is freely available on the web at http://www.rnsclcpps.com with all major browsers supported. Clinicians register and log in and then they select the clinicopathological characteristics of patients, and the prognostic risk and survival outcome are predicted.
We used SPSS (version 16.0) software (Inc, Chicago, IL, USA) for all statistical calculations, and P<0.05 was considered to be significant. Meanwhile, the tree model analysis method was also used to rank the importance of each variable for prediction,
Univariate analysis of prognostic factors
Variables codes and assignment methods of clinicopathological characteristics are provided in the Additional file 1: Table S1. After the univariate analysis, the result of which are presented in Table 2, gender, age, N stage, NELNs, NPLNs, surgery type, primary site, histological grade, histology, marital status, tumor extension, and tumor size were significant prognostic factors (P<0.05).
Multivariate analysis of prognostic factors
By multivariate analysis of prognostic factors, the results of which are shown in Table 3, gender, age, N1 stage, NELNs (6<N ≤ 12, N>12), NPLN (1 ≤ N ≤ 3, N ≥ 4), lobectomy (LET), primary site (UL, ML), histological grade (II, III, IV), histology (AC, S, ASC), marital status (married, divorced), tumor extension (T3 Inv, T3 Satell), and tumor size (T2a>3–4(3<T ≤ 4), T2b>4–5(4<T ≤ 5), T3>5–7(5<T ≤ 7)) were identified as independent prognostic factors.
The tree model analysis
The tree model analysis method was used to rank the importance of each variable for prediction. The results are shown in Table 4. The third column is standardized importance. The first 12 variables were selected into the model, which was consistent with the Cox regression results.
Prognostic risk model construction and software development
Using the Cox proportional hazard regression model, we obtained the PI equation, PI = ∑βixi.
=0.379X1–0.403X2–0.267X51–0.167X61–0.298X62 + 0.460X71 + 0.617X72–0.344X81–0.105X91–0.243X92 + 0.305X101 + 0.508X102 + 0.754X103 + 0.143X111 + 0.170X112 + 0.434X113–0.327X122–0.247X123 + 0.517X133 + 0.340X134 + 0.457X143 + 0.419X144 + 0.407X145. Using the PI equation, we obtained the PI value of every patient. As shown in Table 5, we obtained PI ranges for the training and test sets. According to the quantile of the PI value, we divided patients in the training and test sets into three risk groups. The three risk groups were divided based on the PI values as follow: 0~50%, 50~90%, and 90 + %. The quantiles are divided into low-, intermediary-, and high-risk groups. We obtained three risk groups and their corresponding mean and median survival times and 1–5-year survival rates of the training and test sets (Tables 6 and 7, respectively). Using K-M curves and log-rank tests, we found that, from the low-, intermediate- and high-risk groups, the survival rates of the training and test sets were worse stepwise (P<0.001) (Fig. 2). Through the test set verification, the model effect is good.
We developed a software named RNSCLC-PRSP to predict the prognostic risk and survival of patients with resected T1-3N0–2M0 NSCLC.
We have invented a novel tool to predict the prognosis of patients with resected T1-3N0–2M0 NSCLC. We determined the independent risk factors and obtained prognostic risk models and risk groups and their corresponding survival times. This paper highlights that comprehensive and further refined analysis that is capable with the incorporation of clinical pathological factors to predict prognosis of resected T1-3N0–2M0 NSCLC.
To access the program, clinicians can enter the url http://www.rnsclcpps.com in a browse to reach the login screen of the software. At the bottom of interface is a brief introduction of the software and an explanation of the relevant abbreviations. Above the interface is the login box. New users can click the button of register on the login box to register. After successful registration, users can click the button to return to the login, enter the username and password, click the button to login and enter the software interface. The first line of interface is titled Prognostic risk and survival prediction software RNSCLC-PRSP for resected T1-3N0–2M0 NSCLC (according to the eighth edition AJCC/UICC stage classification). Operational tips (notes) are located under the title, under the note is an explanation of the relevant abbreviations, and there are alternative options located under the abbreviations. According to the note and explanation of abbreviations, clinicians first need to determine the clinicopathological characteristics of patients. Taking a resected T1-3N0–2M0 (according to the eighth edition of AJCC/UICC stage classification) non-small cell lung cancer patient as an example, the clinicopathological characteristics of a representative patient were gender (man), age (≤65), N stage (N0), NELNs (N>12) ,NPLNs (N ≥ 4) ,surgery type (LET) ,primary site (UL) ,histological grade (III) ,histology (S) ,marital status (married) ,tumor extension (T3 Inv) ,tumor size (T2b>4–5(4<T ≤ 5)). For these clinicopathological characteristics, clinicians can choose the appropriate response for each factor. If there are no corresponding options, clinicians should choose none and then click the button to submit their entry, and the prognostic risk and survival prediction results will be shown on the next page. Here are the prognostic and prediction results for the representative patients: high-risk group, PI value is PI≥0.79, mean and median survival time are 42.93 and 24.0 months respectively, and the 1–5 year survival rates are 68.6, 49.7, 41.6, 32.6, 26.8% respectively.
The RNSCLC-PRSP software we have developed is based on the actual needs of clinicians predicting the prognosis of patients with resected NSCLC. Clinicians are very busy in clinical work; meanwhile, the prognosis of resected NSCLC patients is affected by many factors. There is no more time for clinicians to evaluate every factor to obtain a more accurate prognosis. We provide quantitative and relative analysis software, and clinicians can conveniently and swiftly get every patient’s prognostic risk and survival calculated accurately just by choosing some of the clinicopathological features. The RNSCLC-PRSP software would be gladly accepted by clinicians. At present, there have been no relative prognostic predictive software programs for resected T1-3N0–2M0 NSCLC. Pilotto S et al. developed clinicopathological prognostic nomograms for resected squamous cell lung cancer, Based on clinicopathological factors including age, T descriptor (according to the seventh edition of the TNM classification), lymph node status, and grading in the model. Every patient was assigned a prognostic score . Francesco Guerrera et al. designed a prognostic model predicting 5-year survival after surgical resection for stage I non-small cell lung cancer based on clinical, pathological and surgical covariates . Compared to the above two tools, our software analysis includes more clinicopathological features and more detail for more patients with resected non-small cell lung cancer and our novel software is more convenient and practical for clinicians.
Although we have established predictive software using relative prognostic factors, we may need to analyze more clinicopathological factors to improve the software. Thus, further research will be conducted. The potential valuable prognostic prediction factors such as smoking status, performance status, comorbidity, molecular biological factors, biochemical and biomarker test results, lung function, tumor vascular or lymphatic invasion, surgical method (minimally invasive or open), and surgery margins, were not able to be determined or researched in more recent database. However, with the expansion of databases, further research will be carried out, and our software can be updated and improved to provide better service.
Using the SEER database and the Cox proportional hazard model, we identified the independent prognostic factors and corresponding PI value of patients with resected T1-3N0–2M0 NSCLC. According to different PI ranges, three prognostic risk groups (the low-, intermediate-, high-risk groups) were determined, and their corresponding survival times were obtained. We developed the RNSCLC-PRSP software for clinicians to conveniently and practically predict the prognosis of patients with resected T1-3N0–2M0 NSCLC to guide further treatment. We have shown that the software we have developed opens a new predictive method in this field.
Availability and requirements
Project name: My bioinformatics project.
Project home page: http://www.rnsclcpps.com
Operating system(s): Platform independent.
Programming language: Java.
Other requirements: no.
Any restrictions to use by non-academics: no.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
American Joint Committee on Cancer
American Thoracic Society
Disease free survive
European Respiratory Society
International Association for the Study of Lung Cancer
Number of examined regional lymph nodes
Number of positive regional lymph nodes
Non-small cell lung cancer
Programmed cell death protein 1 ligand
The Surveillance Epidemiology and End Results
Tumor Node Metastasis
Union for International Cancer Control
Torre LA, Siegel RL, Jemal A. Lung Cancer statistics. Adv Exp Med Biol. 2016;893:1–19.
Goldstraw P, Chansky K, Crowley J, Rami-Porta R, Asamura H, Eberhardt WE, et al. The IASLC lung Cancer staging project: proposals for revision of the TNM stage groupings in the forthcoming (eighth) edition of the TNM classification for lung Cancer. J Thorac Oncol. 2016;11(1):39–51.
Woodard GA, Jones KD, Jablons DM. Lung Cancer staging and prognosis. Cancer Treat Res. 2016;170:47–75.
Yu WS, Lee CY, Park SY, Suh JW, Narm KS, Kim DJ, et al. Prognostic factors for resected non-small cell lung cancer in patients with type 2 diabetes mellitus. J Surg Oncol. 2018;117(5):985–93.
Cho S, Kim K, Jheon S. Characteristics and prognostic factors of node-negative non-small cell lung cancer larger than 5 cm. Interact Cardiovasc Thorac Surg. 2017;25(3):448–54.
Berezowska S, Galvan JA, Langer R, Bubendorf L, Savic S, Gugger M, et al. Glycine decarboxylase and HIF-1alpha expression are negative prognostic factors in primary resected early-stage non-small cell lung cancer. Virchows Archiv Int J Pathol. 2017;470(3):323–30.
Hsieh CP, Hsieh MJ, Wu CF, Fu JY, Liu YH, Wu YC, et al. Prognostic factors in non-small cell lung cancer patients who received neoadjuvant therapy and curative resection. J Thorac Dis. 2016;8(7):1477–86.
Yoshida Y, Murayama T, Sato Y, Suzuki Y, Saito H, Nomura Y. Gender differences in long-term survival after surgery for non-small cell lung Cancer. Thorac Cardiovasc Surg. 2016;64(6):507–14.
Bugge A, Kongerud J, Brunborg C, Solberg S, Lund MB. Gender-specific survival after surgical resection for early stage non-small cell lung cancer. Acta Oncol. 2017;56(3):448–54.
Xia W, Wang A, Jin M, Mao Q, Xia W, Dong G, et al. Young age increases risk for lymph node positivity but decreases risk for non-small cell lung cancer death. Cancer Manag Res. 2018;10:41–8.
Eguchi T, Bains S, Lee MC, Tan KS, Hristov B, Buitrago DH, et al. Impact of increasing age on cause-specific mortality and morbidity in patients with stage I non-small-cell lung Cancer: a competing risks analysis. J Clin Oncol. 2017;35(3):281–90.
Liang W, He J, Shen Y, Shen J, He Q, Zhang J, et al. Impact of examined lymph node count on precise staging and long-term survival of resected non-small-cell lung Cancer: a population study of the US SEER database and a Chinese multi-institutional registry. J Clin Oncol. 2017;35(11):1162–70.
Ding N, Pang Z, Zhang X, Huang C, Yang Y, Liu Q, et al. Prognostic and predictive effects of positive lymph node number or ratio in NSCLC. Sci Rep. 2017;7(1):584.
Fukui T, Mori S, Yokoi K, Mitsudomi T. Significance of the number of positive lymph nodes in resected non-small cell lung Cancer. J Thorac Oncol. 2006;1(2):120–5.
Sun F, Ma K, Yang X, Li M, Shi Y, Zhan C, et al. A nomogram to predict prognosis after surgery in early stage non-small cell lung cancer in elderly patients. Int J Surg. 2017;42:11–6.
Takahashi Y, Horio H, Hato T, Harada M, Matsutani N, Kawamura M. Predictors of post-recurrence survival in patients with non-small-cell lung cancer initially completely resected. Interact Cardiovasc Thorac Surg. 2015;21(1):14–20.
Riquet M, Mordant P, Pricopi C, Legras A, Foucault C, Dujon A, et al. A review of 250 ten-year survivors after pneumonectomy for non-small-cell lung cancer. Eur J Cardiothorac Surg. 2014;45(5):876–81.
Lin CY, Wu YM, Hsieh MH, Wang CW, Wu CY, Chen YJ, et al. Prognostic implication of EGFR gene mutations and histological classification in patients with resected stage I lung adenocarcinoma. PLoS One. 2017;12(10):e0186567.
Wu Y, Ai Z, Xu G. Marital status and survival in patients with non-small cell lung cancer: an analysis of 70006 patients in the SEER database. Oncotarget. 2017;8:103518–34.
Bergman P, Brodin D, Lewensohn R, de Petris L. Validation of the 7th TNM classification for non-small cell lung cancer: a retrospective analysis on prognostic implications for operated node-negative cases. Acta Oncol. 2013;52(6):1189–94.
Jin Y, Chen M, Yu X. Comparison of the 7th and proposed 8th editions of the AJCC/UICC TNM staging system for non-small cell lung cancer undergoing radical surgery. Sci Rep. 2016;6(1):1–6.
Chen K, Chen H, Yang F, Sui X, Li X, Wang J. Validation of the eighth edition of the TNM staging system for lung Cancer in 2043 surgically treated patients with non-small-cell lung Cancer. Clin Lung Cancer. 2017;18(6):e457–e66.
Wu J, Zhou L, Huang L, Gu J, Li S, Liu B, et al. Nomogram integrating gene expression signatures with clinicopathological features to predict survival in operable NSCLC: a pooled analysis of 2164 patients. J Exp Clin Cancer Res. 2017;36(1):1–14.
Lopez-Encuentra A, Lopez-Rios F, Conde E, Garcia-Lujan R, Suarez-Gauthier A, Manes N, et al. Composite anatomical-clinical-molecular prognostic model in non-small cell lung cancer. Eur Respir J. 2011;37(1):136–42.
Guerrera F, Errico L, Evangelista A, Filosso PL, Ruffini E, Lisi E, et al. Exploring stage I non-small-cell lung cancer: development of a prognostic model predicting 5-year survival after surgical resectiondagger. Eur J Cardiothorac Surg. 2015;47(6):1037–43.
Birim O, Kappetein AP, Waleboer M, Puvimanasinghe JP, Eijkemans MJ, Steyerberg EW, et al. Long-term survival after non-small cell lung cancer surgery: development and validation of a prognostic model with a preoperative and postoperative mode. J Thorac Cardiovasc Surg. 2006;132(3):491–8.
Tanvetyanon T, Finley DJ, Fabian T, Riquet M, Voltolini L, Kocaturk C, et al. Prognostic nomogram to predict survival after surgery for synchronous multiple lung cancers in multiple lobes. J Thorac Oncol. 2015;10(2):338–45.
Pilotto S, Sperduti I, Leuzzi G, Chiappetta M, Mucilli F, Ratto GB, et al. Prognostic model for resected squamous cell lung Cancer: external multicenter validation and propensity score analysis exploring the impact of adjuvant and neoadjuvant treatment. J Thorac Oncol. 2018;13(4):568–75.
We thank Rongsheng Zhang and Yujie Zhang, Shanxi Tumor Hospital, for collecting and assembling the data; and YaoChen Li, The central laboratory of Cancer Hospital of Shantou University Medical College, for data analysis and interpretation and revision of the manuscript; and the SEER program tumor registries for their efforts to create the SEER database.
Ethics approval and consent to participate
The ethics committee of the first affiliated hospital of Soochow University approved this study.
Consent for publication
We provide consent for publication.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Zhang, Y., Li, Y., Zhang, R. et al. RNSCLC-PRSP software to predict the prognostic risk and survival in patients with resected T1-3N0–2 M0 non-small cell lung cancer. BioData Mining 12, 17 (2019). https://doi.org/10.1186/s13040-019-0205-0
- Resected non-small cell lung cancer
- Prognostic risk prediction
- Survival prediction
- Prognostic index