RNSCLC-PRSP software to predict the prognostic risk and survival in patients with resected T1-3N0–2 M0 non-small cell lung cancer

Background The clinical outcomes of patients with resected T1-3N0–2M0 non-small cell lung cancer (NSCLC) with the same tumor-node-metastasis (TNM) stage are diverse. Although other prognostic factors and prognostic prediction tools have been reported in many published studies, a convenient, accurate and specific prognostic prediction software for clinicians has not been developed. The purpose of our research was to develop this type of software that can analyze subdivided T and N staging and additional factors to predict prognostic risk and the corresponding mean and median survival time and 1–5-year survival rates of patients with resected T1-3N0–2M0 NSCLC. Results Using a Cox proportional hazard regression model, we determined the independent prognostic factors and obtained a prognostic index (PI) eq. PI = ∑βixi. =0.379X1–0.403X2–0.267X51–0.167X61–0.298X62 + 0.460X71 + 0.617X72–0.344X81–0.105X91–0.243X92 + 0.305X101 + 0.508X102 + 0.754X103 + 0.143X111 + 0.170X112 + 0.434X113–0.327X122–0.247X123 + 0.517X133 + 0.340X134 + 0.457X143 + 0.419X144 + 0.407X145. Using the PI equation, we determined the PI value of every patient. According to the quantile of the PI value, patients were divided into three risk groups: low-, intermediate-, and high-risk groups with significantly different survival rates. Meanwhile, we obtained the mean and median survival times and 1–5-year survival rates of the three groups. We developed the RNSCLC-PRSP software which is freely available on the web at http://www.rnsclcpps.com with all major browsers supported to determine the prognostic risk and associated survival of patients with resected T1-3N0–2 M0 non-small cell lung cancer. Conclusions After prognostic factor analysis, prognostic risk grouping and corresponding survival assessment, we developed a novel software program. It is practical and convenient for clinicians to evaluate the prognostic risk and corresponding survival of patients with resected T1-3N0–2M0 NSCLC. Additionally, it has guiding significance for clinicians to make decisions about complementary treatment for patients. Electronic supplementary material The online version of this article (10.1186/s13040-019-0205-0) contains supplementary material, which is available to authorized users.


Background
Lung cancer is the first leading cause of cancer death among men and the second leading cause of cancer death for women worldwide [1]. At present, the eighth edition of non-small cell lung cancer (NSCLC) tumor-node-metastasis (TNM) staging system developed and validated by the International Association for the Staging of Lung Cancer (IASLC) project is considered to be the most significant prognostic predictor and the main guider of postoperative supplementary treatment [2]. The following factors were incorporated into the IASLC system: histological grade, gender, age, and performance status. No molecular prognostic factors are used in the clinic because of the lack of cross-validation, Even the new biomarker programmed cell death protein 1 ligand (PD-L1) is a predictive marker of good response to immunotherapy drugs but poor prognostic indicator of survival [3]. However, clinicians know that the outcomes are diverse among resected NSCLC patients with the same TNM stage and other similar clinical features. Some die early after surgical treatment, while some remain alive, even living longer than expected. Therefore, for clinicians, subgroups of T and N staging and other more clinicopathological features should be considered in prognostic risk and survival prediction.
There have been few studies on T and N staging subgroups as prognostic factors. Meanwhile, some prognostic prediction tools, such as prognostic nomograms, scores, and survival models for patients with resected NSCLC, have been reported in many published studies [23][24][25][26][27]. Unfortunately, for clinicians who are busy in clinical work, it is inconvenient to use the TNM stage system and tools for which the results were inaccurate and vague. Therefore, we aimed to develop software that can conveniently, specifically, accurately predict the prognostic risk and survival of patients with T 1-3 N 0-2 M 0 NSCLC. In the process of building the model, T and N staging subgroups and other more clinical features were analyzed as prognostic factors.

Implementation
We collected information on patients from the Surveillance, Epidemiology, and End Results (SEER) database, which provides cancer statistics for U.S. patients. In this study, 6886 patients were obtained. Eligibility criteria included the following: [1] histological diagnosis of NSCLC; [2] suffering from only single primary NSCLC in their lifetime and had NSCLC between 2004 and 2014; [3] received resection only; [4] had definitive surgical information; [5] survival time equal to or greater than one month; and [6] ≥20 years old. Moreover, the following criteria were used to exclude patients from the study: [1] M 1 stage or without definitive information on M stage; [2] without definitive information on primary site, laterality or histological grade; [3] with T 4>7 and without definitive information on tumor size; [4] with T 4 Inv , T 4 Ipsi Nod and without definitive information on tumor extension; [5] with N 3 stage or without definitive information on N stage; [6] without definitive information on the number of examined and positive regional lymph nodes; [7] unknown marital status and race. Figure 1 shows the flow chart of the process used to screen patients according to the inclusion and exclusion criteria. Clinicopathological characteristics and follow-up information were collected, as shown in Table 1, including gender, age, laterality, race, N stage, NELNs, NPLNs, surgery type, primary site, histological grade, histology, marital status, tumor extension, tumor size, survival months and status.  First, In this data set, approximately 70% of patients were randomly assigned to the training set (resulting in 4821 patients), while the remaining patients comprised the test set (resulting in 2065 patients). The training set was used to build the model, and the test set was used to verify the model. Second, based on the training set, the Cox proportional hazard regression model was used to identify independent prognostic factors and their model coefficients. Third, we obtained a prognostic index (PI) equation, which is the value of each independent prognostic factor and the sum of the corresponding regression coefficient product. Fourth, according to the quantile of the PI value, patients were divided into three risk groups: the low-, intermediary-, and highrisk groups with significantly different survival rates according to Kaplan-Meier analysis and log-rank test. Meanwhile, we obtained the mean and median survival times and 1-5-year survival rates of the three risk groups. We used a test set to verify the model. Finally, we developed a software program named RNSCLC-PRSP to predict the prognostic risk and survival of patients with resected T 1-3 N 0-2 M 0 non-small cell lung cancer by selecting their clinicopathological features. The software is freely available on the web at http://www.rnsclcpps.com with all major browsers supported. Clinicians register and log in and then they select the clinicopathological characteristics of patients, and the prognostic risk and survival outcome are predicted.
We used SPSS (version 16.0) software (Inc, Chicago, IL, USA) for all statistical calculations, and P<0.05 was considered to be significant. Meanwhile, the tree model analysis method was also used to rank the importance of each variable for prediction,

Univariate analysis of prognostic factors
Variables codes and assignment methods of clinicopathological characteristics are provided in the Additional file 1: Table S1. After the univariate analysis, the result of which are presented in Table 2, gender, age, N stage, NELNs, NPLNs, surgery type, primary site, histological grade, histology, marital status, tumor extension, and tumor size were significant prognostic factors (P<0.05).

Multivariate analysis of prognostic factors
By multivariate analysis of prognostic factors, the results of which are shown in Table 3 The tree model analysis The tree model analysis method was used to rank the importance of each variable for prediction. The results are shown in Table 4. The third column is standardized importance. The first 12 variables were selected into the model, which was consistent with the Cox regression results.  Table 5, we obtained PI ranges for the training and test sets. According to the quantile of the PI value, we divided patients in the training and test sets into three risk groups. The three risk groups were divided based on the PI values as follow: 0~50%, 50~90%, and 90 + %. The quantiles are divided into low-, intermediary-, and high-risk groups. We obtained three risk groups and their corresponding mean and median survival times and 1-5-year survival rates of the training and test sets (Tables 6 and 7, respectively). Using K-M curves and log-rank tests, we found that, from the low-, intermediate-and high-risk groups, the survival rates of the training and test sets were worse stepwise (P<0.001) (Fig. 2). Through the test set verification, the model effect is good.
We developed a software named RNSCLC-PRSP to predict the prognostic risk and survival of patients with resected T 1-3 N 0-2 M 0 NSCLC.

Discussion
We have invented a novel tool to predict the prognosis of patients with resected T 1-3 N 0-2 M 0 NSCLC. We determined the independent risk factors and obtained prognostic risk models and risk groups and their corresponding survival times. This paper highlights that comprehensive and further refined analysis that is capable with the incorporation of clinical pathological factors to predict prognosis of resected T 1-3 N 0-2 M 0 NSCLC. To access the program, clinicians can enter the url http://www.rnsclcpps.com in a browse to reach the login screen of the software. At the bottom of interface is a brief introduction of the software and an explanation of the relevant abbreviations. Above the interface is the login box. New users can click the button of register on the login box to register. After successful registration, users can click the button to return to the login, enter the username and password, click the button to login and enter the software interface. The first line of interface is titled Prognostic risk and survival prediction software RNSCLC-PRSP for resected T 1-3 N 0-2 M 0 NSCLC (according to the eighth edition AJCC/UICC stage classification). Operational tips (notes) are located under the title, under the note is an explanation of the relevant abbreviations, and there are alternative options located under the abbreviations. According to the note and explanation of abbreviations, clinicians first need to determine the clinicopathological characteristics of patients. Taking a resected T 1-3 N 0-2 M 0 (according to the eighth edition of AJCC/UICC stage classification) non-small cell lung cancer patient as an example, the clinicopathological characteristics of a representative patient were gender (man), age (≤65), N stage (N 0 ), NELNs (N>12) ,NPLNs (N ≥ 4) ,surgery type (LET) ,primary site (UL) ,histological grade (III) ,histology (S) ,marital status (married) ,tumor extension (T 3 Inv ) ,tumor size (T 2b>4-5(4<T ≤ 5) ). For these clinicopathological characteristics, clinicians can choose the appropriate response for each factor. If there are no corresponding options, clinicians should choose none and then click the button to submit their entry, and the prognostic risk and survival prediction results will be shown on the next page. Here are the prognostic and prediction results for the representative patients: high-risk group, PI value is PI≥0.79, mean and median survival time are 42.93 and 24.0  The RNSCLC-PRSP software we have developed is based on the actual needs of clinicians predicting the prognosis of patients with resected NSCLC. Clinicians are very busy in clinical work; meanwhile, the prognosis of resected NSCLC patients is affected by many factors. There is no more time for clinicians to evaluate every factor to obtain a more accurate prognosis. We provide quantitative and relative analysis software, and clinicians can conveniently and swiftly get every patient's prognostic risk and survival calculated accurately just by choosing some of the clinicopathological features. The RNSCLC-PRSP software would be gladly accepted by clinicians. At present, there have been no relative prognostic predictive software programs for resected T 1-3 N 0-2 M 0 NSCLC. Pilotto S et al. developed clinicopathological prognostic nomograms for resected squamous cell lung cancer, Based on clinicopathological factors including age, T descriptor (according to the seventh edition of the TNM classification), lymph node status, and grading in the model. Every patient was assigned a prognostic score [28].   and surgical covariates [25]. Compared to the above two tools, our software analysis includes more clinicopathological features and more detail for more patients with resected non-small cell lung cancer and our novel software is more convenient and practical for clinicians. Although we have established predictive software using relative prognostic factors, we may need to analyze more clinicopathological factors to improve the software. Thus, further research will be conducted. The potential valuable prognostic prediction factors such as smoking status, performance status, comorbidity, molecular biological factors, biochemical and biomarker test results, lung function, tumor vascular or lymphatic invasion, surgical method (minimally invasive or open), and surgery margins, were not able to be determined or researched in more recent database. However, with the expansion of databases, further research will be carried out, and our software can be updated and improved to provide better service.

Conclusions
Using the SEER database and the Cox proportional hazard model, we identified the independent prognostic factors and corresponding PI value of patients with resected T 1-3 N 0-2 M 0 NSCLC. According to different PI ranges, three prognostic risk groups (the low-, intermediate-, high-risk groups) were determined, and their corresponding survival times were obtained. We developed the RNSCLC-PRSP software for clinicians to conveniently and practically predict the prognosis of patients with resected T 1-3 N 0-2 M 0 NSCLC to guide further treatment. We have shown that the software we have developed opens a new predictive method in this field.

Availability and requirements
Project name: My bioinformatics project.