Multi-scale pathology image texture signature is a prognostic factor for resectable lung adenocarcinoma: a multi-center, retrospective study
Journal of Translational Medicine volume 20, Article number: 595 (2022)
Tumor histomorphology analysis plays a crucial role in predicting the prognosis of resectable lung adenocarcinoma (LUAD). Computer-extracted image texture features have been previously shown to be correlated with outcome. However, a comprehensive, quantitative, and interpretable predictor remains to be developed.
In this multi-center study, we included patients with resectable LUAD from four independent cohorts. An automated pipeline was designed for extracting texture features from the tumor region in hematoxylin and eosin (H&E)-stained whole slide images (WSIs) at multiple magnifications. A multi-scale pathology image texture signature (MPIS) was constructed with the discriminative texture features in terms of overall survival (OS) selected by the LASSO method. The prognostic value of MPIS for OS was evaluated through univariable and multivariable analysis in the discovery set (n = 111) and the three external validation sets (V1, n = 115; V2, n = 116; and V3, n = 246). We constructed a Cox proportional hazards model incorporating clinicopathological variables and MPIS to assess whether MPIS could improve prognostic stratification. We also performed histo-genomics analysis to explore the associations between texture features and biological pathways.
A set of eight texture features was selected to construct MPIS. In multivariable analysis, a higher MPIS was associated with significantly worse OS in the discovery set (HR 5.32, 95%CI 1.72–16.44; P = 0.0037) and the three external validation sets (V1: HR 2.63, 95%CI 1.10–6.29, P = 0.0292; V2: HR 2.99, 95%CI 1.34–6.66, P = 0.0075; V3: HR 1.93, 95%CI 1.15–3.23, P = 0.0125). The model that integrated clinicopathological variables and MPIS had better discrimination for OS compared to the clinicopathological variables-based model in the discovery set (C-index, 0.837 vs. 0.798) and the three external validation sets (V1: 0.704 vs. 0.679; V2: 0.728 vs. 0.666; V3: 0.696 vs. 0.669). Furthermore, the identified texture features were associated with biological pathways, such as cytokine activity, structural constituent of cytoskeleton, and extracellular matrix structural constituent.
MPIS was an independent prognostic biomarker that was robust and interpretable. Integration of MPIS with clinicopathological variables improved prognostic stratification in resectable LUAD and might help enhance the quality of individualized postoperative care.
Lung cancer is one of the most common malignant tumors worldwide, with the highest mortality rate [1, 2]. Lung adenocarcinoma (LUAD) is the most common subtype of lung cancer , accounting for 40% of all lung cancer types and more than 55% of non-small cell lung cancer. For patients with resectable LUAD, surgical resection with curative intent is the standard of care , but a significant portion of patients develop disease recurrence and die even after resection of the entire tumor mass . Tumor-node-metastasis (TNM) stage  and tumor differentiation are traditionally considered to be the important postoperative prognostic factors, but significant differences in postoperative prognosis exist among LUAD patients with the same TNM stage and tumor differentiation due to tumor heterogeneity . Therefore, a novel prognostic biomarker is needed to quantify the biological behavior of the tumor for precise risk stratification in resectable LUAD.
Histopathological slide, providing morphological information on tumors and their microenvironment at the tissue and cellular levels, is the gold standard for lung cancer diagnosis [8, 9]. Tumor development and growth depend highly on their interactions with the associated microenvironment . Typically, pathologists visually examine Hematoxylin and Eosin (H&E)-stained slides from low to high magnification under a microscope to qualitatively assess the histopathological pattern of the tumor, which can help predict cancer behavior to a certain degree. Nevertheless, manual assessment is time-consuming and subjective. In addition, there are many sub-visual attributes of tumors in complex histopathological slides , allowing for a comprehensive characterization of the morphology of tumors and their microenvironment.
The rapid advancement of computer technology  and digital whole-slide images (WSIs) has opened up opportunities for identifying and quantifying sub-visual features correlated with prognosis. For example, texture features could quantitatively measure interactions between pixel intensities within a region of interest in an image. Recent studies also showed that image texture analysis plays an important role in quantifying underlying sub-visual tumor heterogeneity [13, 14]. However, these studies focused solely on single-scale image features, such as a single cell or tissue type, and ignored multi-scale information, which could diminish the accuracy of outcome prediction. Moreover, computer-extracted deep features from WSIs also appeared to be prognostic . Nevertheless, deep learning models lack interpretability, and may have difficulties gaining widespread acceptance in clinical settings . Thus, while previous studies have identified many prognostic biomarkers, there is still possible for improvement in terms of accuracy and interpretability.
In this study, we developed and validated a multi-scale pathology image texture signature (referred to as MPIS) using texture features at multiple magnifications extracted from digital H&E-stained WSIs, and then used MPIS in conjunction with Cox proportional hazards model to predict overall survival (OS) in patients with resectable LUAD. We hypothesized that MPIS was an independent prognostic factor for OS, and the integration of MPIS with clinicopathological variables would improve prognostic stratification in patients with resectable LUAD. Meanwhile, we also sought to demonstrate that the image-derived texture features correlated with the gene expression of biological pathways affecting tumor development.
This multi-center study was conducted using patients from four independent cohorts: a discovery set (Guangdong Provincial People's Hospital, GDPH) and three external validation sets (Yunnan Cancer Hospital, YNCH; Shanxi Provincial Cancer Hospital, SXCH; The Cancer Genome Atlas, TCGA). We enrolled LUAD patients who were treated with surgical therapy with curative intent at GDPH between 2007 and 2014, patients with resectable LUAD treated at YNCH from 2012 to 2014, and those treated at SXCH from 2014 to 2020. This study was approved by the Research Ethics Committee. Informed consent was waived because only retrospective imaging analysis was performed. Additionally, the TCGA dataset was downloaded from the Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov/).
OS, defined as the time interval from surgery to death, was chosen as the endpoint event for our study. The baseline and clinicopathological variables were collected, including age at surgery, sex, smoking status, tumor site, adjuvant chemotherapy, differentiation, and TNM stage. We excluded the cases with treating with neoadjuvant therapy, remaining residual tumors, or dying within 1 month. The inclusion and exclusion criteria are detailed in Additional file 1: Section 1.
Digital WSIs were acquired from the H&E-stained diagnostic tissue slides of the primary tumor. The H&E-stained slides were scanned by Leica Aperio-AT2 USA scanner at 40 × magnification (0.252 μm/pixel). We controlled the image quality by excluding WSIs that were blurry, contained artifacts, exhibited poor staining, or lacked sufficient tumor tissues. In the TCGA dataset, some cases had multiple slides (one slide was selected for analysis according to image quality). Pathologists (BB Li with 5 years of clinical experience and LX Yan with 15 years of clinical experience) reviewed and agreed on the image quality for all WSIs. Additionally, these experienced pathologists annotated tumors and normal tissues on a set of 67 WSIs from GDPH for fine-tuning a pre-trained tumor segmentation model based on ResNet50 .
Automatic tumor segmentation on WSIs
The overall workflow of this study is shown in Fig. 1. First, ResNet50 was employed to conduct tumor region segmentation. To reduce the amount of annotation, we used data from a similar domain for transfer learning. We obtained 270 (tumor = 160, normal = 110) WSIs of breast cancer from the Camelyon16  dataset. We then extracted millions of small positive and negative image patches with a size of 224 × 224 pixels (40 × magnification) to pre-train the model for classifying tumor or normal tissues. The pre-trained model was fine-tuned using 100,000 image patches from 67 annotated WSIs from GDPH.
We used the OTSU method  to obtain the tissue region mask of WSIs. A window with a size of 224 × 224 pixels was slid without overlapping area on the whole tissue region. We used the trained model to predict the image patch under a sliding window, and the predicted probability was generated for each image patch. The predicted probability heatmap was further generated for each histopathological image. Finally, we binarized the predicted heatmap using the OTSU method, and retained the largest connected region as the tumor mask for each WSI. The framework is shown in Fig. 1a.
Multi-scale texture feature extraction
The multi-scale texture feature extraction process is shown in Fig. 1b. Based on the results of tumor region segmentation, several image patches were acquired at magnifications of 2.5 × , 10 × , and 40 × . Color normalization  was performed for these patches to reduce the effect of staining differences on the texture distribution of images. In the case of 2.5 × magnification, the image of the whole tumor region was acquired directly. In the cases of 10 × and 40 × magnifications, we obtained image patches with a size of 1024 × 1024 pixels in the tumor region. To facilitate the acquisition of relatively dense image patches in the tumor region, the image patch with above 75% tissue area was used in this study. In the case of 40 × magnification, we randomly sampled 200 patches to reduce computational time for each WSI and avoid potential subjective bias .
We automatically extracted 68 texture features of tumor regions at each scale, including texture features such as first-order statistics (n = 17), gray level co-occurrence matrix (GLCM, n = 7), and gray level run length matrix (GLRLM, n = 44). First-order statistics features describe the distribution of pixel intensities within an image region. GLCM-based features consider the variation in pixel grey levels within a certain distance. GLRLM-based features quantify gray level runs defined as the number of consecutive pixels with the same gray level. Overall, a total of 204 texture features were extracted at three scales (i.e., 2.5 × , 10 × , and 40 × magnifications). Details of these features are provided in Additional file 1: Section 2.
Feature selection and signature construction
To regularize the number of features proportionate to sample size, the features related to prognosis were selected through the least absolute shrinkage and selection operator (LASSO) method with tenfold cross-validation from the discovery set (Fig. 1c). Before feature selection, we normalized the feature values based on the Z-score method. Furthermore, it was crucial to visualize texture features related to prognosis so that all the clinicians could understand them. We quantified and visualized the selected texture features by the violin plot and feature heatmap.
MPIS was computed via a weighted linear combination of the discriminative texture features and their corresponding coefficients. The median value of MPIS in the discovery set was determined as the cut-off for distinguishing patients by risk level, with any value greater than the cut-off categorized as high-risk and any value equal to or less than the cut-off categorized as low-risk. The threshold identified from the discovery set was then applied to the external validation sets to distinguish high-risk and low-risk groups.
For the TCGA cohort, there were 244 patients available with normalized messenger ribonucleic acid (mRNA) expression data, after matching with the amount of TCGA data in survival analysis. We removed the genes whose mRNA expression levels were 0 in patient samples to explore the associations between gene expression of biological pathways and texture features derived from the histopathological image. First, patients were categorized as high-risk or low-risk according to the MPIS. We used the Wilcoxon rank-sum test to identify genes that were significantly differentially expressed across the high-risk and low-risk groups. The Benjamini & Hochberg method was employed to adjust P-value. We then used the differentially expressed genes (DEGs) for Gene Ontology (GO) enrichment analysis  to identify the biological pathways with over-represented genes in the gene set. Based on the identified pathways, we selected the ones potentially representative of biological processes related to tumor growth and development. Finally, we assessed the associations between the gene expression of biological pathways and the image-derived texture features by single-sample gene set enrichment analysis (ssGSEA) . A ssGSEA enrichment score within each gene set was calculated for each patient, which assessed the degree to which member gene of a gene set in a sample was coordinately upregulated or downregulated. We used the Wilcoxon rank-sum test to select the significant differentially expressed pathways related to the image-derived texture features.
Categorical data were reported as count (percentage). Differences in age, sex, smoking status, tumor site, treatment, differentiation, and TNM stage between four cohorts were evaluated through Pearson's chi-squared test or Fisher’s exact test, where appropriate. The data distribution of MPIS corresponding to different tumor differentiation degrees was also analyzed by the independent samples t-test. We used the log-rank test to estimate differences in OS between the high-risk and low-risk groups for Kaplan–Meier survival analysis. The prognostic abilities of MPIS and other clinical variables (i.e., age, sex, smoking status, tumor site, treatment, differentiation, and TNM stage) were assessed via univariable analysis. Then, the factors with P < 0.05 in the univariable analysis were adopted in the multivariable analysis. Akaike information criterion (AIC) was used in multivariable analysis to determine and evaluate the independent prognostic factors.
In the discovery set, a full model was established by incorporating the independent factors selected in the multivariable analysis, and the clinical model was built by independent clinicopathological variables. The full model and the clinical model were validated in the three independent external validation sets. Harrell’s concordance index (C-index) was used to determine the discriminative ability of models. The prognostic accuracy was evaluated using the time-dependent receiver operating characteristic (ROC) curve and area under the curve (AUC) at 5-year OS.
We conducted statistical analysis using R software (version 4.1.2, http://www.R-project.org) . The packages of R software used for statistical analysis included glmnet, cutoff, survival, survminer, rms, timeROC, and vioplot. A factor was reported as statistically significant when the two-sided P < 0.05.
We summarized the qualified patients in this study after applying all inclusion and exclusion criteria. The process is shown in (Additional file 1: Figure S1). The discovery set (n = 111) was established from GDPH, and employed for feature discovery and model training. Three independent cohorts were used for validating the trained model, collected from YNCH, SXCH, and TCGA. The three cohorts are denoted as external validation set V1 (n = 115), external validation set V2 (n = 116), and external validation set V3 (n = 246). Table 1 shows the detailed distributions of demographic and clinicopathological variables in the four cohorts. Significant differences were observed among the four cohorts in all included clinical characteristics, except for sex (P = 0.1603) and tumor site (P = 0.2230).
Feature selection and signature construction
A set of eight potential predictors was selected from 204 multi-scale texture features using the LASSO method, namely glrlm_SRLGLE_90_2.5, glrlm_SRLGLE_90_40, glcm_dissimilarity_0_2.5, Kurtosis_10, glrlm_LRHGLE_90_2.5, glrlm_SRE_0_40, glcm_ASM_0_2.5, and Percentile_10th_40 (see in Additional file 1: Table S1 for specific definitions of these texture features). These texture features and corresponding regression coefficients are shown in (Additional file 1: Table S2). MPIS was computed for each patient through a linear combination of these feature values, weighted by the corresponding regression coefficients. The median value (-0.061) of MPIS in the discovery set was taken as the cut-off for stratifying patients.
As shown in Fig. 2, we quantified and visualized the image texture features in which significant differences were observed between the two representative images from the high-risk and low-risk groups determined by the corresponding feature. The low-risk example had higher feature values than the high-risk example in the cases of features glrlm_SRLGLE_90_2.5, glrlm_SRLGLE_90_40, glcm_dissimilarity_0_2.5, and Kurtosis_10 (Fig. 2(a–d)), while had lower feature values than the high-risk example in the cases of features glrlm_LRHGLE_90_2.5, glrlm_SRE_0_40, glcm_ASM_0_2.5, Percentile_10th_40 (Fig. 2(e–h)).
Evaluation and validation of MPIS
Kaplan–Meier curves for predicting OS by MPIS showed that the low-risk group had a significantly better survival rate compared with the high-risk group (Fig. 3). On univariable analysis, MPIS was statistically significant in the four cohorts, as shown in Table 2. MPIS was associated with OS in the discovery set (hazard ratio [HR], 9.90; 95% confidence interval [CI], 3.44–28.49; P < 0.0001). Furthermore, MPIS was also prognostic of OS on the external validation set V1 (HR, 2.36; 95%CI, 1.08–5.16; P = 0.0312), external validation set V2 (HR, 3.47; 95%CI, 1.60–7.52; P = 0.0016), and external validation set V3 (HR, 2.57; 95%CI 1.59–4.17; P = 0.0001). Multivariable analysis was conducted using factors (treatment, TNM stage, differentiation, MPIS) that achieved statistical significance (P < 0.05) in univariable analysis. On multivariable analysis, we further demonstrated that MPIS was an independent prognostic factor on the discovery set (HR, 5.32; 95% CI 1.17–16.44; P = 0.0037), external validation set V1 (HR, 2.63; 95% CI 1.10–6.29; P = 0.0292), external validation set V2 (HR, 2.99; 95% CI 1.34–6.66; P = 0.0075), and external validation set V3 (HR, 1.93; 95% CI 1.15–3.23; P = 0.0125).
MPIS could predict OS in patients with TNM stage I and early-stage (TNM stages I and II) LUAD (Additional file 1: Figures S2, S3). For early-stage LUAD patients, the survival outcomes of patients in the high-risk group were significantly worse than those in the low-risk group. Although no statistical association was found between MPIS and OS in the external validation set V1 (P = 0.13), we could still observe a clear trend for poor prognosis in the high-risk group. For TNM stage I LUAD patients, the low-risk group had a better prognosis. Additionally, when stratifying patients by clinicopathological variables, including age (≥ 65 or < 65 years), sex (female or male), smoking status (ever smoke or never smoke), treatment (surgery alone or received chemotherapy), and differentiation (well-moderately differentiated or poorly undifferentiated), MPIS was associated with OS in most of the subgroups (Additional file 1: Figures S4–S8).
In addition, MPIS was significantly higher in the poorly undifferentiated group compared with the well-moderately differentiated groups on the discovery set (t = −7.02; P < 0.0001), external validation set V1 (t = −2.19; P = 0.0314), and external validation set V2 (t = −2.61; P = 0.0104). The violin plots in Fig. 4 show the distribution of MPIS across the well-moderately differentiated and poorly undifferentiated LUAD patient groups.
Evaluation and validation of the full model
Using stepwise regression based on the AIC, independent prognostic factors were identified, including MPIS, differentiation, and TNM stage (Table 2). In the discovery set, we built the full model incorporating the above independent factors, and established the clinical model incorporating two clinicopathological variables (i.e., differentiation and TNM stage). It was observed that the C-index of the full model (0.837; 95% CI 0.784–0.890; Table 3) was higher than that of the clinical model (C-index, 0.798; 95% CI 0.729–0.867), and the AIC of the full model was smaller than that of the clinical model (235.991 vs. 244.905; Table 3). Therefore, the full model showed higher discrimination and calibration than the clinical model. Meanwhile, we demonstrated that integrating the MPIS into the clinical model significantly improved the prediction for OS (P = 0.0010, likelihood ratio test), as shown in Table 3. Time-dependent ROC curves at 60 months and time-dependent AUC curves at different times were plotted, as shown in Fig. 5a. The full model (AUC, 0.890; 95%CI, 0.822–0.958; for 5-year OS) showed significantly improved predictive performance compared with the clinical model (AUC, 0.843; 95%CI, 0.759–0.927; for 5-year OS). Furthermore, we visualized the full model and the clinical model as nomograms to facilitate the application of the full model (Additional file 1: Figure S9).
We further validated the performance of the full model in the independent external validation sets (Table 3). The full model had better discriminative and calibration (V1: C-index, 0.704 vs. 0.679; P < 0.0001, likelihood ratio; AIC, 219.568 vs. 222.908; V2: 0.728 vs. 0.666; P < 0.0001; 307.537 vs. 313.815) than the clinical model in the two external validation sets. In Figs. 5b, c, AUC curves showed that the full model had better performance at every time point in the two external validation sets (V1: AUC, 0.732 vs. 0.708; for 5-year OS; V2: 0.789 vs. 0.658; for 3-year OS). Besides, due to the lack of information related to tumor differentiation in the external validation set V3, the full model was established with two variables (i.e., TNM stage and MPIS), and the clinical model was established with one variable (i.e., TNM stage). It can be observed that the full model (C-index, 0.696 vs. 0.669; AIC, 717.869 vs. 722.453; likelihood ratio, P < 0.0001; AUC, 0.706 vs. 0.671; for 3-year OS) still outperformed the clinical model in terms of discrimination and calibration (Table 3, Fig. 5d).
To further demonstrate the incremental value of MPIS, we also selected features from the individual scale, calculated the corresponding single-scale pathology image signature, and constructed single-scale models including a 2.5 × model, a 10 × model, and a 40 × model. The feature selected at each scale and their corresponding coefficients are detailed in (Additional file 1: Tables S3–S5 ). The single-scale texture signature at 2.5 × , 10 × , and 40 × magnifications were associated with the OS in the discovery set and the three external validation sets (Additional file 1: Figures S10–S12). Compared to the single-scale models, the full model still had a higher AUC value at most time points (Additional file 1: Figure S13).
The transcriptomic data consisted of 19,645 annotated genes across TCGA-LUAD. We performed differential gene expression analysis, and found 194 DEGs between the MPIS-defined high-risk and low-risk groups. These DEGs identified 16 significant biological pathways through GO enrichment analysis. These significant pathways were involved in cytokine activity, cell proliferation, metabolism, growth, division, and extracellular matrix structure, and they were considered to be correlated with the growth and development of tumors. Specifically, DEGs showed significant enrichment in biological pathways such as humoral immune response, regulation of peptidase activity, signal release, and extracellular structure organization (Additional file 1: Figure S14). The full list of DEGs and pathways is presented in Additional file 2. Furthermore, we evaluated the associations between the gene expression of biological pathways and the image-derived texture features with ssGSEA. We used 16 biological pathways to calculate the enrichment scores for each of the eight texture features used to construct the MPIS. As shown in Fig. 6, the texture features of the tumor region derived from histopathological images (i.e., glrlm_SRLGLE_90_2.5, glcm_ASM_0_2.5, and Percentile_10th_40) were significantly associated with biological pathways such as extracellular structure organization, structural constituent of cytoskeleton, hormone activity, and extracellular matrix structural constituent.
Accurate prognosis for resectable LUAD could guide clinical decision-making and improve risk stratification. Although morphological examination of tumors in routine histopathological slides by pathologists could help predict cancer behavior, manual review fails to quantify sub-visual features of tumors. In this study, we developed a fully automated pipeline to analyze the tumor and its microenvironment through extracting multi-scale texture information from the tumor region in H&E-stained WSIs. We used the texture information to construct MPIS and evaluated its prognostic ability for predicting OS in patients with resectable LUAD. The results demonstrated that MPIS was an independent prognostic factor for OS. Moreover, integrating MPIS with clinicopathological variables improved the prognostic stratification in resectable LUAD. In addition, the image-derived multi-scale texture features were associated with biological pathways affecting tumor development. We validated the prognostic model in four independent cohorts, including large multi-institutional data from the TCGA cohort. MPIS was an independent prognostic factor in all four cohorts, even though there were statistically significant differences among these four cohorts (Table 1). At the same time, we observed significant stratification in most subgroups (Additional file 1: Figure S2–S8). This suggested that MPIS is a robust prognostic biomarker of OS in resectable LUAD and can be easily generalized to other centers.
In recent years, many histopathological biomarkers have been developed for the prognosis of patients with lung cancer. For instance, Yu et al.  and Chen et al.  employed CellProfiler [25,26,27] software to quantitatively measure cellular phenotypes in histopathological images, and correlated these features with prognosis. Several studies [28,29,30] captured cellular-level feature descriptors from segmented nuclei for predicting prognosis in early-stage non-small cell lung cancer. In addition, Wang et al.  have provided insights into the relationship between tumor shape and prognosis in patients with LUAD. However, most of these potential biomarkers are mainly focused on single-scale information, on either the cellular level or the tissue level of histopathological images. Differently, this study leveraged multi-scale texture features from tumor regions to construct an image signature for prognosticating OS of LUAD patients. The motivation for quantifying multi-scale texture features was based on routine examination of histopathological slides by pathologists. Pathologists generally first observe the whole slide tissue at the tissue level with low magnification, and then selectively examine the morphological features at the cellular level with high magnification. Specifically, a 2.5 × magnification image contains global information about the whole tumor, a 10 × magnification image contains the characteristics of the tumor region at the tissue level, and a 40 × magnification image contains tumor features at the cellular level. Compared with single-scale texture signatures, we found that MPIS could improve the prognostic stratification in resectable LUAD, and the full model that integrated MPIS and clinicopathological variables had better prediction power (Additional file 1: Figure S13). This seems to indicate that MPIS can effectively capture multi-scale information from the cellular level to the tissue level in histopathological images, and can comprehensively assess morphological characterization of tumors.
Over the past few years, different deep learning approaches have been proposed to quantify tumors and their surrounding microenvironment, resulting in various potential biomarkers based on deep features for prognosis [32,33,34]. For example, Coudray et al.  demonstrated that deep learning models could assist pathologists in automatically detecting cancer subtypes or gene mutations. Shi et al.  proposed an efficient and labor-saving deep learning method for providing a valuable means of patient risk stratification. Nevertheless, they only enabled subjectively provide hypothetical explanations based on slide-by-slide qualitative assessments, let alone objectively connect deep features to biological phenomena, although class activation maps [35, 36] could visualize interested image regions in end-to-end CNN models.
In contrast, our work could directly correlate with biological concepts, and provide interpretability in histopathology and genomics. On the one hand, we tracked down observable texture features from a histopathological standpoint to reduce the risk of spurious correlations. Specifically, we observed significant differences in the distribution of MPIS between the well-moderately differentiated and poorly undifferentiated groups (Fig. 4). This seems to suggest that a significant association exists between MPIS and tumor differentiation performed by pathologists. For example, the abundance and spatial distribution of tumor cells and the growth pattern of stroma might be reflected in the texture features of WSIs. MPIS could discriminate the degree of tumor differentiation by quantifying these texture features. Furthermore, we found that the selected multi-scale texture features might be directly correlated with biological phenomena by quantifying phenotypic information in histopathological images, and could provide interpretability for investigators. More specifically, the feature glrlm_ SRLGLE measured the pattern of consecutive pixels with lower gray value in an image. In the context of histopathological images, a larger glrlm_SRLGLE feature value might reflect the sparser distribution of cells in the tissue image. This biological phenomenon might indicate a lepidic or acinar growth pattern of LUAD (Fig. 2a, b). The feature glcm_ASM measured the gray scale uniformity of the image. A larger value indicated a higher degree of uniformity. As shown in Fig. 2g, the bottom figure had a higher glcm_ASM feature value. One may observe that the tissues and cells grow relatively more densely in the tumor compared with that of the top figure, and the tumor growth pattern seems to be solid.
On the other hand, we also investigated the biological pathways that might drive tumor development by histo-genomics analysis, which further elaborated the interpretability of texture features from a genomics perspective. In this study, the selected texture features were associated with significant biological pathways affecting tumor development. For instance, the extracellular matrix structural constituent was significantly associated with the features glrlm_SRLGLE_90_2.5, and Percentile_10th_40. Gene expression of these pathways has been shown to affect tumors and their microenvironment , possibly suggesting that the stromal tissue structure influences the texture distribution of the tumor region. Moreover, the cellular microenvironment constantly regulates cell growth, apoptosis, and differentiation by cytoskeletal remodeling . We found a significant correlation between the structural constituent of cytoskeleton and the image-derived texture features such as glrlm_SRLGLE_90_40, clearly suggesting that the texture feature might be driven by pathways related to cellular apoptosis and differentiation. Cytokine activity [39, 40], which could be another latent reason for affecting the texture distribution of the tumor region, reflects the survival, growth, differentiation, and effector function of tissues and cells.
This study had some limitations. First, our study was based on retrospective cohorts, which may be impressionable to bias from some risk variables and the loss of follow-up. In the future, we will further validate our model in larger cohorts or a prospective study. Second, MPIS was developed and validated with data from different institutions, which meant some relevant demographic parameters were unavailable in some datasets. Third, this study employed a deep learning method based on transfer learning to segment the tumor region. However, pathologists still needed to annotate a small number of slides to fine-tune the segmentation model, improving the model's performance. In the future, we will use weakly supervised or unsupervised learning models for quantitative analysis to minimize the labeling work of pathologists.
In summary, we developed and validated MPIS, which could successfully stratify patients with resectable LUAD into high-risk and low-risk groups with significant differences in OS. MPIS was an independent prognostic factor for OS, and the integration of MPIS with clinicopathological variables improved the prognostic stratification for patients with resectable LUAD. The study demonstrated that MPIS was a comprehensive, robust, and interpretable predictor and could contribute to the field of precision oncology by helping to improve the quality of individualized postoperative care.
Availability of data and materials
The histopathology images and clinical information of the TCGA cohort are available in a public repository from the Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov/). All other data supporting the findings of this study are available from corresponding authors upon reasonable request. The source code for survival analysis can be accessed online: https://github.com/YuMeng-W/MPIS-LUNG.
Hematoxylin and Eosin
Whole slide image
Multi-scale pathology image texture signature
Guangdong Provincial People's Hospital
Yunnan Cancer Hospital
Shanxi Provincial Cancer Hospital
The Cancer Genome Atlas
Gray level co-occurrence matrix
Gray level run length matrix
Least absolute shrinkage and selection operator
Messenger ribonucleic acid
Differentially expressed gene
Single-sample gene set enrichment analysis
Akaike information criterion
Receiver operating characteristic curve
Area under the curve
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality Worldwide for 36 cancers in 185 Countries. CA Cancer J Clin. 2021;71:209–49. https://doi.org/10.3322/caac.21660.
Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J Clin. 2022;72:7–33. https://doi.org/10.3322/caac.21708.
Thai AA, Solomon BJ, Sequist LV, Gainor JF, Heist RS. Lung cancer. Lancet. 2021;398:535–54. https://doi.org/10.1016/S0140-6736(21)00312-3.
Ettinger DS, Wood DE, Aisner DL, Akerley W, Bauman JR, Bharat A, et al. Non-small cell lung cancer, version 3.2022, NCCN clinical practice guidelines in oncology. J Natl Compr Canc Netw. 2022;20:497–530. https://doi.org/10.6004/jnccn.2022.0025.
Uramoto H, Tanaka F. Recurrence after surgery in patients with NSCLC. Transl Lung Cancer Res. 2014;3:242–9. https://doi.org/10.3978/j.issn.2218-6751.2013.12.05.
Amin MB, Greene FL, Edge SB, Compton CC, Gershenwald JE, Brookland RK, et al. The eighth Edition cancer staging manual: continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. Cancer J Clin. 2017;67:93–9. https://doi.org/10.3322/caac.21388.
Jamal-Hanjani M, Wilson GA, McGranahan N, Birkbak NJ, Watkins TBK, Veeriah S, et al. Tracking the evolution of non–small-cell lung cancer. N Engl J Med. 2017;376:2109–21. https://doi.org/10.1056/NEJMoa1616288.
Bremnes RM, Dønnem T, Al-Saad S, Al-Shibli K, Andersen S, Sirera R, et al. The role of tumor stroma in cancer progression and prognosis: emphasis on carcinoma-associated fibroblasts and non-small cell lung cancer. J Thorac Oncol. 2011;6:209–17. https://doi.org/10.1097/JTO.0b013e3181f8a1bd.
McAllister SS, Weinberg RA. Tumor-host interactions: a far-reaching relationship. J Clin Oncol. 2010;28:4022–8. https://doi.org/10.1200/JCO.2010.28.4257.
Fidler IJ. The pathogenesis of cancer metastasis: the “seed and soil” hypothesis revisited. Nat Rev Cancer. 2003;3:453–8. https://doi.org/10.1038/nrc1098.
Bhargava R, Madabhushi A. Emerging themes in image informatics and molecular analysis for digital pathology. Annu Rev Biomed Eng. 2016;18:387–412. https://doi.org/10.1146/annurev-bioeng-112415-114722.
Hipp J, Flotte T, Monaco J, Cheng J, Madabhushi A, Yagi Y, et al. Computer aided diagnostic tools aim to empower rather than replace pathologists: lessons learned from computational chess. J Pathol Inform. 2011;2:25. https://doi.org/10.4103/2153-3539.82050.
Yu KH, Zhang C, Berry GJ, Altman RB, Ré C, Rubin DL, et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun. 2016;7:12474. https://doi.org/10.1038/ncomms12474.
Luo X, Zang X, Yang L, Huang J, Liang F, Rodriguez-Canales J, et al. Comprehensive computational pathological image analysis predicts lung cancer prognosis. J Thorac Oncol. 2017;12:501–9. https://doi.org/10.1016/j.jtho.2016.10.017.
Coudray N, Ocampo PS, Sakellaropoulos T, Narula N, Snuderl M, Fenyö D, et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat Med. 2018;24:1559–67. https://doi.org/10.1038/s41591-018-0177-5.
Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 2019;17:195. https://doi.org/10.1186/s12916-019-1426-2.
He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE; 2016. p. 770–778. Doi: https://doi.org/10.1109/CVPR.2016.90.
Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken B, Karssemeijer N, Litjens G, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA. 2017;318:2199–210. https://doi.org/10.1001/jama.2017.14585.
Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern. 1979;9:62–6. https://doi.org/10.1109/TSMC.1979.4310076.
Macenko M, Niethammer M, Marron JS, Borland D, Woosley JT, Guan X, et al. A method for normalizing histology slides for quantitative analysis. 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro. Boston, MA, USA: IEEE; 2009. p. 1107–1110. Doi: https://doi.org/10.1109/ISBI.2009.5193250
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–9. https://doi.org/10.1038/75556.
Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462:108–12. https://doi.org/10.1038/nature08460.
Ihaka R, Gentleman R. R: a Language for Data Analysis and Graphics. J Comput Graph Stat. 1996;5:299–314. https://doi.org/10.1080/10618600.1996.10474713.
Chen L, Zeng H, Xiang Y, Huang Y, Luo Y, Ma X. Histopathological images and multi-omics integration predict molecular characteristics and survival in lung adenocarcinoma. Front Cell Dev Biol. 2021;9: 720110. https://doi.org/10.3389/fcell.2021.720110.
McQuin C, Goodman A, Chernyshev V, Kamentsky L, Cimini BA, Karhohs KW, et al. Cell Profiler 3.0: Next-generation image processing for biology. PLoS Biol. 2018;16: e2005970. https://doi.org/10.1371/journal.pbio.2005970.
Carpenter AE, Jones TR, Lamprecht MR, Clarke C, Kang IH, Friman O, et al. Cell Profiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biol. 2006;7:R100. https://doi.org/10.1186/gb-2006-7-10-r100.
Kamentsky L, Jones TR, Fraser A, Bray MA, Logan DJ, Madden KL, et al. Improved structure, function and compatibility for cell profiler: modular high-throughput image analysis software. Bioinformatics. 2011;27:1179–80. https://doi.org/10.1093/bioinformatics/btr095.
Corredor G, Wang X, Zhou Y, Lu C, Fu P, Syrigos K, et al. Spatial architecture and arrangement of tumor-infiltrating lymphocytes for predicting likelihood of recurrence in early-stage non-small cell lung cancer. Clin Cancer Res. 2019;25:1526–34. https://doi.org/10.1158/1078-0432.CCR-18-2013.
Lu C, Bera K, Wang X, Prasanna P, Xu J, Janowczyk A, et al. A prognostic model for overall survival of patients with early-stage non-small cell lung cancer: a multicentre, retrospective study. The Lancet Digital Health. 2020;2:e594–606. https://doi.org/10.1016/s2589-7500(20)30225-9.
Wang X, Bera K, Barrera C, Zhou Y, Lu C, Vaidya P, et al. A prognostic and predictive computational pathology image signature for added benefit of adjuvant chemotherapy in early stage non-small-cell lung cancer. eBioMedicine. 2021;69: 103481. https://doi.org/10.1016/j.ebiom.2021.103481.
Wang S, Chen A, Yang L, Cai L, Xie Y, Fujimoto J, et al. Comprehensive analysis of lung cancer pathology images to discover tumor shape and boundary features that predict survival outcome. Sci Rep. 2018;8:10393. https://doi.org/10.1038/s41598-018-27707-4.
Shim WS, Yim K, Kim TJ, Sung YE, Lee G, Hong JH, et al. DeepRePath: identifying the prognostic features of early-stage lung adenocarcinoma using multi-scale pathology images and deep convolutional neural networks. Cancers. 2021;13:3308. https://doi.org/10.3390/cancers13133308.
Shi JY, Wang X, Ding GY, Dong Z, Han J, Guan Z, et al. Exploring prognostic indicators in the pathological images of hepatocellular carcinoma based on deep learning. Gut. 2021;70:951–61. https://doi.org/10.1136/gutjnl-2020-320930.
Kather JN, Pearson AT, Halama N, Jäger D, Krause J, Loosen SH, et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat Med. 2019;25:1054–6. https://doi.org/10.1038/s41591-019-0462-y.
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE; 2016. p. 2921–2929. Doi: https://doi.org/10.1109/CVPR.2016.319.
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. 2017 IEEE International Conference on Computer Vision (ICCV). 2017; p. 618–626. Doi: https://doi.org/10.1109/ICCV.2017.74.
Wang S, Rong R, Yang DM, Fujimoto J, Yan S, Cai L, et al. Computational staining of pathology images to study the tumor microenvironment in lung cancer. Cancer Res. 2020;80:2056–66. https://doi.org/10.1158/0008-5472.CAN-19-1629.
Park JS, Burckhardt CJ, Lazcano R, Solis LM, Isogai T, Li L, et al. Mechanical regulation of glycolysis via cytoskeleton architecture. Nature. 2020;578:621–6. https://doi.org/10.1038/s41586-020-1998-1.
Zhang J, Li H, Wu Q, Chen Y, Deng Y, Yang Z, et al. Tumoral NOX4 recruits M2 tumor-associated macrophages via ROS/PI3K signaling-dependent various cytokine production to promote NSCLC growth. Redox Biol. 2019;22: 101116. https://doi.org/10.1016/j.redox.2019.101116.
Lu CS, Shiau AL, Su BH, Hsu TS, Wang CT, Su YC, et al. Oct4 promotes M2 macrophage polarization through upregulation of macrophage colony-stimulating factor in lung cancer. J Hematol Oncol. 2020;13:62. https://doi.org/10.1186/s13045-020-00887-1.
We sincerely thank Chao Zhang (Guangdong Provincial People’s Hospital) for genomics consultation.
This research was supported by the Key-Area Research and Development Program of Guangdong Province, China [No. 2021B0101420006]; National Science Fund for Distinguished Young Scholars of China [No. 81925023]; National Science Foundation for Young Scientists of China [No. 62002082, 62102103, 82001986, 82102034]; National Natural Science Foundation of China [No. 82072090, 61866009, 82272075]; China Postdoctoral Science Foundation [No. 2021M690753, 2021M700897, 2022M710843]; Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application [No. 2022B1212010011]; High-level Hospital Construction Project (No. DFJHBF202105); Guangxi Natural Science Foundation [No. 2020GXNSFBA238014, 2020GXNSFAA297061]; Guangxi Key Research and Development Project [No. AB21220037]; Yunnan digitalization, development and application of biotic resource [No. 202002AA100007]; the Outstanding Youth Science Foundation of Yunnan Basic Research Project [No. 202101AW070001]; Yunnan Fundamental Research Projects [No. 202201AT070010]; Innovation Team of Kunming Medical University [No. CXTD202110]; Funding by Science and technology Projects in Guangzhou [No. 202201020001, 202201010513]; and Regional Innovation and Development Joint Fund of National Natural Science Foundation of China [No. U22A20345].
Ethics approval and consent to participate
The study was approved by the Research Ethics Committee of Guangdong Provincial People's Hospital, the Institutional Review Board of Yunnan Cancer Hospital, and the Ethics Committee of Shanxi Provincial Cancer Hospital (approval number: KY-Z-2021–030-02, KY2020139, and 202106). Informed consent was waived because only retrospective imaging analysis was performed.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1. Inclusion and exclusion criteria. Section 2. Texture feature definition. Figure S1. Data preparation and demographics of all cohorts. Figure S2. Kaplan–Meier curves of patients stratified by MPIS in the subgroup of patients with TNM stage I LUAD in the (a) discovery set; (b) external validation set V1; (c) external validation set V2; (d) external validation set V3. Figure S3. Kaplan–Meier curves of patients stratified by MPIS in the subgroup of patients with early-stage LUAD in the (a) discovery set; (b) external validation set V1; (c) external validation set V2; (d) external validation set V3. Figure S4. Kaplan–Meier curves of patients stratified by MPIS in the subgroups: (a) patients with age < 65 years in the discovery set; (b) patients with age ≥ 65 years in the discovery set; (c) patients with age < 65 years in the external validation set V1; (d) patients with age ≥ 65 years in the external validation set V1; (e) patients with age < 65 years in the external validation set V2; (f) patients with age ≥ 65 years in the external validation set V2; (g) patients with age < 65 years in the external validation set V3; (h) patients with age ≥ 65 years in the external validation set V3. Figure S5. Kaplan–Meier curves of patients stratified by MPIS in the subgroups: (a) male sex in the discovery set; (b) female sex in the discovery set; (c) male sex in the external validation set V1; (d) female sex in the external validation set V1; (e) male sex in the external validation set V2; (f) female sex in the external validation set V2; (g) male sex in the external validation set V3; (h) female sex in the external validation set V3. Figure S6. Kaplan–Meier curves of patients stratified by MPIS in the subgroups: (a) non-smoker in the discovery set; (b) smoker in the discovery set; (c) non-smoker in the external validation set V1; (d) smoker in the external validation set V1; (e) non-smoker in the external validation set V2; (f) smoker in the external validation set V2; (g) non-smoker in the external validation set V3; (h) smoker in the external validation set V3. Figure S7. Kaplan–Meier curves of patients stratified by MPIS in the subgroups: (a) patients without adjuvant chemotherapy in the discovery set; (b) patients received adjuvant chemotherapy in the discovery set; (c) patients without adjuvant chemotherapy in the external validation set V1; (d) patients received adjuvant chemotherapy in the external validation set V1; (e) patients without adjuvant chemotherapy in the external validation set V2; (f) patients received adjuvant chemotherapy in the external validation set V2; (g) patients without adjuvant chemotherapy in the external validation set V3; (h) patients received adjuvant chemotherapy in the external validation set V3. Figure S8. Kaplan–Meier curves of patients stratified by MPIS in the subgroups: (a) patients with well-moderately differentiated cancer in the discovery set; (b) patients with poorly-undifferentiated cancer in the discovery set; (c) patients with well-moderately differentiated cancer in the external validation set V1; (d) patients with poorly-undifferentiated cancer in the external validation set V1; (e) patients with well-moderately differentiated cancer in the external validation set V2; (f) patients with poorly-undifferentiated cancer in the external validation set V2. Figure S9. The visualization of the full model (a) and clinical model (b) as nomograms for patients with resectable LUAD. Figure S10. Kaplan–Meier curves of patients stratified by single-scale pathology image signature at 2.5 × magnification in the (a) discovery set, (b) external validation set V1, (c) external validation set V2, and (d) external validation set V3. Figure S11. Kaplan–Meier curves of patients stratified by single-scale pathology image signature at 10 × magnification in the (a) discovery set, (b) external validation set V1, (c) external validation set V2, and (d) external validation set V3. Figure S12. Kaplan–Meier curves of patients stratified by single-scale pathology image signature at 40 × magnification in the (a) discovery set, (b) external validation set V1, (c) external validation set V2, and (d) external validation set V3. Figure S13. Time-dependent ROC curves and AUC curves of models in the (a) discovery set, (b) external validation set V1, (c) external validation set V2, and (d) external validation set V3. Time-dependent ROC curves are evaluated for 5-year OS (or for 3-year OS), and time-dependent AUC curves are plotted for 12 to 60 months (or 12 to 36 months). Figure S14. Significantly enriched biological pathways in Gene Ontology (GO) enrichment analysis. Table S1. The specific definitions of the selected texture features. Table S2. The LASSO Cox selected features and corresponding coefficients to construct the MPIS. Table S3. The LASSO Cox selected features and corresponding coefficients to construct the single-scale pathology image signature at 2.5 × magnification. Table S4. The LASSO Cox selected features and corresponding coefficients to construct the single-scale pathology image signature at 10 × magnification. Table S5. The LASSO Cox selected features and corresponding coefficients to construct the single-scale pathology image signature at 40 × magnification.
The full list of DEGs and pathways.
About this article
Cite this article
Wang, Y., Pan, X., Lin, H. et al. Multi-scale pathology image texture signature is a prognostic factor for resectable lung adenocarcinoma: a multi-center, retrospective study. J Transl Med 20, 595 (2022). https://doi.org/10.1186/s12967-022-03777-x