Skip to main content


Glandular orientation and shape determined by computational pathology could identify aggressive tumor for early colon carcinoma: a triple-center study



Identifying the early-stage colon adenocarcinoma (ECA) patients who have lower risk cancer vs. the higher risk cancer could improve disease prognosis. Our study aimed to explore whether the glandular morphological features determined by computational pathology could identify high risk cancer in ECA via H&E images digitally.


532 ECA patients retrospectively from 2 independent data centers, as well as 113 from The Cancer Genome Atlas (TCGA), were enrolled in this study. Four tissue microarrays (TMAs) were constructed across ECA hematoxylin and eosin (H&E) stained slides. 797 quantitative glandular morphometric features were extracted and 5 most prognostic features were identified using minimum redundancy maximum relevance to construct an image classifier. The image classifier was evaluated on D2/D3 = 223, D4 = 46, D5 = 113. The expression of Ki67 and serum CEA levels were scored on D3, aiming to explore the correlations between image classifier and immunohistochemistry data and serum CEA levels. The roles of clinicopathological data and ECAHBC were evaluated by univariate and multivariate analyses for prognostic value.


The image classifier could predict ECA recurrence (accuracy of 88.1%). ECA histomorphometric-based image classifier (ECAHBC) was an independent prognostic factor for poorer disease-specific survival [DSS, (HR = 9.65, 95% CI 2.15–43.12, P = 0.003)]. Significant correlations were observed between ECAHBC-positive patients and positivity of Ki67 labeling index (Ki67Li) and serum CEA.


Glandular orientation and shape could predict the high risk cancer in ECA and contribute to precision oncology. Computational pathology is emerging as a viable and objective means of identifying predictive biomarkers for cancer patients.


Colon cancer is one of the most common cancer type and cancer-related death worldwide [1], with 80% colon adenocarcinoma (CA). Detection of early-stage colon adenocarcinoma (T1N0M0–T4N0M0) could improve the survival rates and prognosis [2]. Nowadays, ECA is primary treated with colon radical resection or endoscopic resection, with or without adjuvant radiation and/or chemotherapy. According to a survey from American Joint Committee on Cancer [3], 5 years survival rate as follows: I stage (T1-2N0)–93%; IIA stage (T3N0)–85%, IIB stage (T4N0)–72%. The overall recurrence rate of ECA is less than 20% [4]. Once the tumor relapses, the survival time will be significantly shortened. Therefore, this has prompted efforts to utilize clinicopathologic and molecular features to select groups of patients with higher-risk early stage disease who have a greater risk of recurrence and might derive a greater absolute degree of benefit from adjuvant chemotherapy. Most trials have been developed to identify molecular signatures that provide an accurate and personalized assessment of the risk of relapse, such as 12-gene recurrence score (Oncotype-DX Colon Cancer Assay) [5,6,7], 18-gene classifier (ColoPrint) [8, 9], 13-gene classifier (ColoGuideEx) [10] and other micro-based tests [11]. There are different biomarkers postulated to have a role in the clinical and therefore therapeutic aspects of the disease i.e. Ki-67 [12] and p53 [13, 14]. However, these assays tend to be expensive and tissue destructive. Currently, pathological analysis of haematoxylin and eosin (H&E) stained section is still the gold standard in the prognosis assessment of CA and other types of cancers. However, these decisions often suffered intra-observer and inter-observer variability [15, 16].

Many recent researches focus on mining quantitative morphological features from diagnostic pathology slides, which has been proved to be an effective way to alleviate the intra-observer and inter-observer variability, by analyzing digital pathology images in context of cancer grading [17, 18], risk stratification [19,20,21,22], and tumor outcomes prediction [20, 21, 23,24,25,26]. Wang et al. [19] presented an image classifier using nuclear orientation, texture, shape and tumor architecture to predict disease recurrence in early stage non-small cell lung cancer from digitized H&E images. Yu et al. [24] extracted 9879 quantitative image features and use regularized machine-learning methods to select the top features and to stratify patients into long-term vs. short-term survivors.

Glands are important histological structures that are comprised of a single sheet of columnar epithelium, forming a finger-like tubular structure that extends from the inner surface of the colon into the underlying connective tissue [27]. A typical gland is composed of a lumen area forming the interior tubular structure and epithelial cell nuclei surrounding the cytoplasm. Within malignant tumors, the irregularly degenerated formed gland morphology has been widely used in the routine of histopathological examination for assessing the malignancy degree of breast [28], prostate [29, 30], and colon [31]. Numerous evidences have indicated genetic instability could be displayed by diversity of gland shape, size [28, 32,33,34] and polarity [35, 36], playing a key role in tumor metastasis, proliferation and recurrence.

The gland histomorphometric features includes the gland shape, size, orientation and spatial relationships, have been shown playing an important role for cancer grading, and cancer prognosis [18, 28, 36, 37]. However, to our best knowledge, the quantitative analyses of gland morphology have never been reported in colon cancer literature. In this paper, we aimed to investigate whether computer-extracted gland morphologic features, related to gland orientation, shape and size, based machine learning risk score could distinguish aggressive tumor verse indolent tumor in ECA. 797 gland morphometric features were captured and thereby composed a quantitative histomorphometry model to stratify ECA patients into different recurrence groups. Finally, the image classifier predicted labels were validated on different independent validation cohorts, as well as TCGA cohort, and compared with human grading on D1 and D2 along with immunohistochemical data and serum CEA level. Our methods may ultimately provide prognostic information for the patients, and contribute to precision medicine of colon cancer. The overall schema of the proposed method is shown in Fig. 1.

Fig. 1

The overall schema of the proposed method. The overall workflow consists of model construction, recurrence prediction, survival analysis and immunohistochemical and CEA validation. C+ recurrence, C− non-recurrence, CEA carcinoembryonic antigen


Study population and TMA construction

With approval from the ethical committee of WRH and WCH, Four TMAs (TMA107–TMA110) were constructed as described in Additional file 1: File S1 by FFPE tissue samples from 2 independent data centers, representing a total of 532 patients (486 from WRH and 46 from WCH) between January 2000 and December 2011. The period of recurrence was limited from the time after the surgery to the diagnosis of recurrence or the time of final follow-up. The disease-specific survival time was from surgery to death or the endpoint of follow-up. The deadline for follow-up was on December 31st, 2017. The workflow of patient selection could be found in Additional file 1: Figure S1. In all cohorts, the inclusion criteria included: pathologically confirmed CA, with stage T1N0M0–T4N0M0 according to the AJCC/UICC TNM staging system 8th edition; radical (R0) resection of the primary tumor; and complete clinic pathological data. Patients who underwent other primary malignant tumors or chemotherapy and/or immunotherapy before surgery or palliative surgery were excluded in this study. TCGA cohort was also included in this study for validation. For the TCGA cohort, the inclusion criteria covered pathologically confirmed CA, with clinical stage I or stage II according to the clinical AJCC/UICC TNM staging system 8th edition with complete clinic pathological data.

Image processing and model construction

Individual gland was automatically segmented as ref [38, 39]. A total of 797 gland histomorphometric features were calculated. A summary list of the gland morphometric features referred in this study is shown in Additional file 1: Table S1. A comprehensive list of all 797 quantitative features could be found in Additional file 1: Table S2. Additional file 1: File S2 described the technical and mathematical details relating to gland shape/size feature extraction.

The minimum redundancy maximum relevance (MRMR) [40] was employed to identify the most informative features from D1. Only 5 top ranked gland morphological features were included for model construction to avoid overfitting problem. SVM (support vector machine), RF (Random Forest), DAC (discriminant analysis classifier), were used to construct supervised machine learning classifiers for discriminating recurrence ones (C+) vs. non-recurrence cases (C−). A five-fold cross-validation on the training cohort was applied to ensure the classifier robustness. The optimal predictive model was locked down based off the classifiers’ performance. All the patients in the validation cohort were classify into different risk groups according to the predictive risk score.

Immunohistochemistry and scoring of immunohistochemical stains and serum CEA

Immunohistochemical (IHC) staining was performed on FFPE tissue microarray sections according to the standard protocol described by Additional file 1: File S3. Ki67Li was determined by the proportion of positive tumor cells observing in 5 randomly selected areas of the section with 400× high-power fields; 200 tumor cells were counted in each area. The Ki67Li was assigned as positive (≥ 14% reactive tumor cells) and negative (< 14% reactive tumor cells) as recommended by Goldhirsch [41]. Serum CEA levels were determined with an enzyme immunoassay test kit (DPC Diagnostic Product Co., Los Angeles, CA, USA). Serum CEA values with the upper limit of 5 ng/ml referred as normal according to the manufacturers of the kits used.

Inter-observer variability in ECA estimation by human readers

Two expert pathologists (Z.Z and N.Z) were invited to estimate cancer grade blindly across inspecting each digital H&E image in D1 and D2, respectively. Each pathologist was asked to assign an in-house score to each case according to a widely used two-tiered criterion referred by Compton et al. [42]. Namely, the tumor was defined as low grade (≥ 50% tumor is glandular) and high grade (< 50 tumor is glandular) based on the degree of gland formation, respectively. The Kappa index was utilized to measure the inter-observer variability among human readers.

Survival analysis

The SPSS 17.0 software package was employed to report hazard ratios (HR), as well as corresponding 95% confidence intervals (95% CI), and P values, with P < 0.05 was considered to be statistically significant. Chi-square test was used to assess the expression rates of Ki67 between ECAHBC-positive and ECAHBC-negative. The Kaplan–Meier analysis was used to detect cum survivals illustrated by KM curves and the log-rank test was used to analyze the survival differences. Multivariate Cox proportional hazard models were employed to investigate the independence of prognostic variables. Correlations between the binary classifier results and the other categorical clinical and pathologic variables were determined by Chi-square tests.


Study population characteristics

532 patients with ECA from 2 independent institutions were enrolled in this study, the details of clinic characteristics were shown in Table 1. Of those 532 patients, patients were primarily from Asia. 335 (63.0%) were men and 197 were (37.0%) women. 393 patients (73.9%) were in T1/T2 whereas 139 (26.1%) had advanced disease (T3/T4). 125 (23.5%) of the 532 cases differentiated poorly, with 24.7%, 22.9% and 19.6% in D1, D2/D3 and D4, respectively. Approximate 25% patients’ tumor size was ≥ 5 cm. At the end points of follow-up, 112 (21.1%) patients suffered tumor recurrence, with 58 patients (22.1%), 53 patients (23.8%) and 6 patients (13.0%) in D1, D2/D3 and D4, respectively. More patient characteristics of TCGA cohort details refer to Additional file 1: Table S3.

Table 1 Summary of patients’ clinicopathological characteristics

Representative features

The top 5 discriminative morphologic features identified within the training cohort were (1) mean tensor information_measure1, (2) mean tensor contrast average, (3) mean circularity entropy, (4) mean tensor contrast energy and (5) Standard Deviation energy of Fractal Dimension (more details refer to Additional file 1: Table S4). Among these representative features, the nuclear orientation related morphometric features (mean tensor information_measure1, mean tensor contrast energy, and mean tensor contrast energy) were predominated (3 out of 5). Likewise, the gland shape-based features (mean circularity entropy and SD energy of Fractal Dimension) also account for 40% of the discriminative features (2 out of 5). For non-recurrence ECA patients, the gland shapes seems more uniform and regular compared with the recurrences group (Fig. 2b, f). Similarly, the arrows on each gland were almost all uniformly oriented in the same direction, while those in the recurrence group displayed a higher degree of orientation disorder (Fig. 2c, g). Comparatively, the underlying distribution of gland shape in terms of non-recurrences cohort appeared more uniform than those of the recurrence groups (Fig. 2d, h).

Fig. 2

Representative digital H&E image for recurrence and non-recurrence patient, respectively. a, e Original image of ECA with recurrence and non-recurrence, separately. b, f Gland contours by gland segmentation automatically. c, g Gland orientation map, the arrow on each gland represented the orientation direction. d, h Underlying distribution of gland shape

Image classifier evaluation

The 5 most outstanding gland morphometric features were used for constructing three classifiers (SVM, DAC and RF). The performances of the three models were shown in Additional file 1: Table S5. As illustrated in Additional file 1: Table S5, SVM predicted 74 cases as high risk tumor verse 75 cases by DAC and 84 cases by RF on the validation cohort. The SVM yielded an accuracy = 0.881, PPV = 0.649, NPV = 0.969 verse accuracy = 0.723, PPV = 0.560, NPV = 0.938 by DAC and accuracy = 0.754, PPV = 0.536, NPV = 0.951 by RF in distinguishing high risk tumor and low risk cancer on D2. So, we locked down the SVM as the optimal ECAHBC. Likewise, the ECAHBC predicted 78 and 11 patients as recurrence cases on D3 and D4, with accuracy = 0.866 & 0.869, PPV = 0.615 & 0.636 and NPV = 0.949 & 0.943.

Correlations between image classifier and other clinicopathologic features

In D2, the image classifier predicted 74 of 269 as positive. 48 of the 74 ECHBC-positive patients developed disease recurrence compared with 6 of 195 ECHBC-negative patients correspondingly. The recurrence rate of ECHBC-positive patients was over 20 times higher than that of ECHBC-negative patients, comparatively. The ECAHBC yield an accuracy of 0.881, with PPV = 0.649 and NPV = 0.969, respectively. The ECAHBC had the best predictive ability compared with other single other clinicopathologic feature (Additional file 1: Table S6). Among the traditional clinical and pathologic variables, patients with T4 verse T1/T2 or T3 (accuracy = 0.822, PPV = 0.563, NPV = 0.878), and Poor verse W/M (accuracy = 0.781, PPV = 0.453, NPV = 0.861) had better ability in predicting disease recurrence. Details of the correlation analysis are shown in Additional file 1: Table S6.

Correlations between immunohistochemical data, CEA and image classifier

Additionally, our study showed that the high serum CEA level was observed in 87.8% (65/74) of the ECAHBC-positive patients, as well as did the normal serum CEA level found in 95.8% (183/191) of the ECAHBC-negative patients. The Ki67 labeling index (Fig. 3a, b) positive rate was much higher [66/78 (89.2%)] in ECAHBC-positive patients, whereas the Ki67 positive rate was relatively lower [6/185 (3.1%)] in ECAHBC-negative cases. The relative expression levels of Ki67 in ECAHBC-positive patients and ECAHBC-negative patients were shown in Fig. 3c. There was statistically significant difference between ECAHBC-positive vs. NGAHIC-negative with serum CEA level (P < 0.001) and Ki67 labeling index (P < 0.001), respectively. More details could be found in Additional file 1: Table S7.

Fig. 3

Representative images of IHC for the markers of ECA tested on D3. The first column is high risk of recurrence identified by ECAHBC accompanying with a positive Ki67 IHC staining, b negative Ki67 IHC staining, c IHC expression levels. IHC immunohistochemistry, ECAHBC early-stage colon adenocarcinoma histomorphometric-based image classifier

Image classifier evaluation on WSIs from TCGA

The histopathology images, pathology reports, and clinical information of the TCGA data set are available in a public repository from the TCGA Data Portal ( Performances of the image classifier on TCGA cohort were reported in Additional file 1: Table S8. The image model successfully distinguished high risk recurrence patients from low risk recurrence patients with ECA (P < 0.01). Additionally, histopathology patterns, such as tumor stage, were insufficient for predicting the recurrence outcomes of patients with ECA significantly (P = 0.18).

Survival analysis

Survival analysis was conducted to explore the relationship between traditional clinic pathological characteristics along with the image classifier on D2. Table 2 summarized the univariate log-rank survival analysis and multivariate survival analysis for DSS on D2. As seen from Table 2, the ECAHBC-positive patients had worse DSS statistically and significantly. The Kaplan–Meier survival curve was plotted in Fig. 4. Clearly, the disease recurrence hazard increased the risk by 9.65 times (HR = 9.65, 95% CI 2.15–43.12, P = 0.003). Namely, patients, considered as high-risk of recurrence by the ECAHBC (ECAHBC-positive), were more easily to develop disease recurrence and had worse DSS. This indicated the image classifier might be an attractive image marker for ECA tumor behavior. Some major clinicopathologic variables with patients’ survival time could found in Additional file 1: Figure S1. Multivariate survival analysis conducted on D4 could be found in Additional file 1: Table S9.

Table 2 Univariate log-rank analysis and multivariate survival analysis conducted on D2
Fig. 4

Prognostic prediction results for human readers for D1 and D2, as well as ECAHBC, tumor grade, histology grade and manual grade for D2. a, b Kaplan–Meier curves of reader1 for D1 and D2; c, d Kaplan–Meier curves of reader2 for D1 and D2; e, f Kaplan–Meier curves of ECAHBC for D2 and D3; g, h Kaplan–Meier curves of histology, tumor grade. i Kaplan–Meier curves of ECAHBC for D4


Worldwide, the colon cancer is the fourth most common gastrointestinal tumor with high mortality [43]. In clinical routine, the morphology of glands has been widely used for assessing the malignancy degree of CA and informs prognosis and treatment planning. Unfortunately, diagnosis of early-stage colon cancer, estimated by manual observation provided limited prognostic information.

Computerized methods for automatic estimations have proved to mitigate the subjectivity and low reproducibility associated with human grading, across utilizing of quantitative morphology. In this work, we first identified the gland automatically and extracted 797 morphological features relating to gland heterogeneity from the H&E digital images. These morphological features covered gland orientation, gland shape/size, texture, density and gland architecture descriptors. By utilization of these computer-extracted objective features, an image classifier could identify high risk recurrence verse low risk recurrence in ECA, indicating our computer-extracted gland features could efficiently capture the important aggressive tumor features, while difficultly spotted by manual inspection.

The informatics image classifier for recurrence prediction was validated on D2 and D4, yielding an accuracy of 0.881 and 0.869, respectively. Focusing specifically on disease recurrence, only 48 of 269 (14.4%) patients had a positive ECAHBC result, but these ECAHBC-positive patients were over 20 times more likely to develop disease recurrent (64.9% vs. 3.1%) compared with ECAHBC-negative patients. Among the other major clinic and pathological variables, having a T4 tumor made a patient 4.6 times more likely to develop recurrent disease, and having poor histology grade disease made a patient 3 times more likely to develop recurrent disease. Thus, ECAHBC was the most predictive feature for recurrent disease in this patient cohort. Besides, on another cohort D3 (from different tumor region of the same patient of D2), the image classifier was able to achieve an accuracy of 0.866 in predicting disease recurrence in these patients, indicating ECAHBC could deal with intra-tumor heterogeneity efficiently. We also validated the glandular features based image classifier for recurrence prediction by an independent WSI data set from TCGA, demonstrating the generalizability of our approach. The model itself could locate the aggressive cancer-related features among the very large set of measurements of the image. The image model yield information (accuracy = 0.849, PPV = 0.409, NPV = 0.945, P < 0.01, Additional file 1: Table S6) above and beyond that from other major clinicopathologic measures of cancer severity, such as tumor stage (accuracy = 0.849, PPV = 0.333, NPV = 0.888, P < 0.18, Additional file 1: Table S6). A Kaplan–Meier analysis demonstrated a strong relationship between the prognosis and ECAHBC predictions for D2 (P = 0.001), D3 (P = 0.005) and D4 (P = 0.009), respectively (Fig. 4e, f, i).

We further investigated the associations between the most discriminative features and the prognosis in ECA. A multivariate Cox proportional survival analysis revealed that the image classifier tent to be prognostic in both D2 (P = 0.03, Table 2) and D4 (P = 0.006, Additional file 1: Table S9). The most representative prognostic morphology features included (1) mean tensor information_measure1, (2) mean tensor contrast average, (3) mean circularity entropy, (4) mean tensor contrast variance, (5) Standard Deviation energy of Fractal Dimension. Among those prognostic morphological features, the gland disorder features were predominated [3 out of 5, (60%)]. The mean tensor information_measure1 reflects the chaotic degree of the glands in a TMA core. Higher values indicate a higher likelihood of the presence of deformed, closely packed glands cluster, spanning the aggressive tumor regions, resulting in the greater presence of heterogeneous values in linear directions. This could be explained by the large numbers of tumor glands proliferation in aggressive colon cancer. The second most predictive gland morphological feature was the mean tensor contrast average, which quantifies the disorder in the orientation of neighbor glands. Another gland morphological features relating to the disorder of gland orientation was the mean tensor contrast variance, which quantified the chaotic of the gland orientation. Intuitively, in the aggressive tumors, highly irregular organizational glandular patters were formed because of the rapid disorganized tumor proliferation, differentiation and apoptosis. Additionally, the morphological features relating to gland shape/size also tend to be prognostic in ECA [2 out of 5, (40%)]. The mean entropy of circularity and Standard Deviation energy of Fractal Dimension are the most discriminative gland shape/size features, related to worse prognosis in ECA. The mean entropy of circularity measures the homogeneity of the TMA glands; low values indicate the increasingly heterogeneous gland circularity. The SD energy of Fractal Dimension quantifies variants of glandular boundaries; high value indicates variants of the glandular boundaries. Intuitively, for high risk of recurrence ECA patients, greater variability could be seen in the context of gland shape. Indeed, changes in gland shape and size of histologic primitives are hallmarks in terms of different type of cancers, and our model could capture these variations precisely. These findings appear to be corroborated with the studies by Farjam [29] and Naik [28], who both declared that gland shape/size features were linked with tumor grade and behaviors.

We further investigated the correlation between the manual cancer grading based off estimation of gland morphologic heterogeneity and ECA prognosis. However, no significant correlation was found between manual cancer grading by N.Z and ECA prognosis for D1 and D2 (P > 0.05), demonstrated by Kaplan–Meier analysis results (Fig. 4a, b). Meanwhile, a strong significant relationship was found in human cancer grading by Z.Z and ECA tumor outcomes for D1 (P = 0.025, Fig. 4c), but not for D2 (P = 0.608, Fig. 4d). Additionally, for D1 and D2, a moderate inter-observer agreement between Z.Z and N.Z was observed (kappa = 0.51). The moderate agreement could be elucidated by the following facts. First, the criteria for cancer grading and the prognostic value of the cancer grading in ECA have not been defined precisely. Therefore, each pathologist might emphasize on the different tissue regions during optical evaluation (e.g. gland roundness, solidity, major axis length, minor axis length or eccentricity). Finally, variations exist in perception of colors, shapes, roundness, eccentricity and relative min axis length/max axis length for different pathologists. On the contrary, a strong association was found between the image classifier based off the computer-extracted features and ECA survival outcomes (P < 0.05).

Additionally, we found that Ki67 and CEA were prognostic biomarker for ECA (Additional file 1: Table S10). Interestingly, the high expression of Ki67 associated with the ECAHBC-positive cohort, whereas the ECAHBC-negative patients always have low Ki67 expression. These interesting results showed that our image model could indicate the expression of Ki67, or reflect the cell proliferation and thereby used to guide the prognostic evaluation in patients with ECA. These findings were consistent with previous studies by Salminen et al. [12]. Meanwhile, we found that the ECAHBC overexpression was an independent predictor of cancer recurrence and was associated with DSS in ECA. This is because the CEA aberrant expressions always fall within the ECAHBC-positive group, and the high serum CEA levels may be displayed by the image model potentially. And CEA overexpression is always linked with increased metastatic potential in many types of cancers [44]. These preliminary findings corroborate the researches of Thirunavukarasu [45] and Quah [46].

The main contributions of this paper are: (1) a quantitative gland histomorphometric-based image classifier was constructed for predicting the ECA recurrence. This is a preliminary attempt for stratifying ECA patients into different recurrence risk groups based on gland morphological features by using traditional digital H&E images. (2) The image classifier could identify as a clinic pathological characteristic in patients with ECA in clinical routine. In this study, the new clinic pathological characteristics generated by the binary classifier outcomes along with the collected clinic pathological characteristic were analyzed. In multivariate analysis, ECAHBC-positive patients showed statistically significantly poorer DSS independently (HR = 9.65, 95% CI 2.15–43.12, P = 0.003). With the help of the informatics model, we imagine pathologist could identify more aggressive tumors across H&E stained digital images from surgical specimen. Providing the accurate pathologic diagnosis, clinicians could make an individualized treatment, such as postoperative close chemotherapy and radiation therapy and follow-up. Furthermore, it tends to cost-effective and repeatable for patients. Certainly, ECAHBC needs to be tested in multicenter study of large samples.

We acknowledge there are several limitations in our study. First, we utilized the TMAs, not WSIs, to extract the most representative gland morphological features for predicting recurrence in ECA. Comparatively, the TMAs contained much smaller snapshot of the overall tumor characteristics. While the additional studies on WSI from TCGA cohort showed our image model could be extensible to whole slide histopathology images. Next, all the enrolled patients were collected from a handful of institutions, as did the image data digitized by the limited facilities, which could affect the image analysis procedure. Independent large data cohorts need to validate our informatics model in the future work. Future work will also be extended to the utilization of integrating quantitative features from immunochemical stained digital images, immune scores [47, 48] or molecular data for predicting ECA recurrence.


Conclusively, ECAHBC can facilitate prognostic prediction based off the collected H&E stained slides routinely, and thereby contributing to the precision oncology, personalized cancer management and advance care planning.

Availability of data and materials

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.



Wuhan University Renmin Hospital


The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology


The Cancer Genome Atlas


Early-stage colon adenocarcinoma


Early-stage colon adenocarcinoma histomorphometric-based image classifier


Ki67 labeling index


Hematoxylin and eosin


Tissue microarray


Formalin-fixed and paraffin-embedded


  1. 1.

    Hegde SR, Sun W, Lynch JP. Systemic and targeted therapy for advanced colon cancer. Expert Rev Gastroenterol Hepatol. 2008;2:135–49.

  2. 2.

    Goel G. Evolving role of gene expression signatures as biomarkers in early-stage colon cancer. J Gastrointest Cancer. 2014;45:399–404.

  3. 3.

    Kobayashi H, Mochizuki H, Sugihara K, Morita T, Kotake K, Teramoto T, Kameoka S, Saito Y, Takahashi K, Hase K, et al. Characteristics of recurrence and surveillance tools after curative resection for colorectal cancer: a multicenter study. Surgery. 2007;141:67–75.

  4. 4.

    O’Connell JB, Maggard MA, Ko CY. Colon cancer survival rates with the new American Joint Committee on Cancer sixth edition staging. J Natl Cancer Inst. 2004;96:1420–5.

  5. 5.

    Barrier A, Boelle PY, Roser F, Gregg J, Tse C, Brault D, Lacaine F, Houry S, Huguier M, Franc B, et al. Stage II colon cancer prognosis prediction by tumor gene expression profiling. J Clin Oncol. 2006;24:4685–91.

  6. 6.

    Lu ATT, Salpeter SR, Reeve AE, Eschrich S, Johnston PG, Barrier AJ, Bertucci F, Buckley NS, Salpeter EE, Lin AY. Gene expression profiles as predictors of poor outcomes in stage II colorectal cancer: a systematic review and meta-analysis. Clin Colorectal Cancer. 2009;8:207–14.

  7. 7.

    Eschrich S, Yang I, Bloom G, Kwong KY, Boulware D, Cantor A, Coppola D, Kruhoffer M, Aaltonen L, Orntoft TF, et al. Molecular staging for survival prediction of colorectal cancer patients. J Clin Oncol. 2005;23:3526–35.

  8. 8.

    Salazar R, Roepman P, Capella G, Moreno V, Simon I, Dreezen C, Lopez-Doriga A, Santos C, Marijnen C, Westerga J, et al. Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer. J Clin Oncol. 2011;29:17–24.

  9. 9.

    O’Connell MJ, Lavery I, Yothers G, Paik S, Clark-Langone KM, Lopatin M, Watson D, Baehner FL, Shak S, Baker J, et al. Relationship between tumor gene expression and recurrence in four independent studies of patients with stage II/III colon cancer treated with surgery alone or surgery plus adjuvant fluorouracil plus leucovorin. J Clin Oncol. 2010;28:3937–44.

  10. 10.

    Maak M, Simon I, Nitsche U, Roepman P, Snel M, Glas AM, Schuster T, Keller G, Zeestraten E, Goossens I, et al. Independent validation of a prognostic genomic signature (ColoPrint) for patients with stage II colon cancer. Ann Surg. 2013;257:1053–8.

  11. 11.

    Kopetz S, Tabernero J, Rosenberg R, Jiang ZQ, Moreno V, Bachleitner-Hofmann T, Lanza G, Stork-Sloots L, Maru D, Simon I, et al. Genomic classifier ColoPrint predicts recurrence in stage II colorectal cancer patients more accurately than clinical factors. Oncologist. 2015;20:127–33.

  12. 12.

    Salminen E, Palmu S, Vahlberg T, Roberts PJ, Soderstrom KO. Increased proliferation activity measured by immunoreactive Ki67 is associated with survival improvement in rectal/recto sigmoid cancer. World J Gastroenterol. 2005;11:3245–9.

  13. 13.

    El-Serafi MM, Bahnassy AA, Ali NM, Eid SM, Kamel MM, Abdel-Hamid NA, Zekri AR. The prognostic value of c-Kit, K-ras codon 12, and p53 codon 72 mutations in Egyptian patients with stage II colorectal cancer. Cancer. 2010;116:4954–64.

  14. 14.

    Resnick MB, Routhier J, Konkin T, Sabo E, Pricolo VE. Epidermal growth factor receptor, c-MET, beta-catenin, and p53 expression as prognostic indicators in stage II colon cancer: a tissue microarray study. Clin Cancer Res. 2004;10:3069–75.

  15. 15.

    Ismail SM, Colclough AB, Dinnen JS, Eakins D, Evans DM, Gradwell E, O’Sullivan JP, Summerell JM, Newcombe RG. Observer variation in histopathological diagnosis and grading of cervical intraepithelial neoplasia. BMJ. 1989;298:707–10.

  16. 16.

    Ruijter E, van Leenders G, Miller G, Debruyne F, van de Kaa C. Errors in histological grading by prostatic needle biopsy specimens: frequency and predisposing factors. J Pathol. 2000;192:229–33.

  17. 17.

    Barker J, Hoogi A, Depeursinge A, Rubin DL. Automated classification of brain tumor type in whole-slide digital pathology images using local representative tiles. Med Image Anal. 2016;30:60–71.

  18. 18.

    Naik S, Madabhushi A, Tomaszeweski J, Feldman MD. A quantitative exploration of efficacy of gland morphology in prostate cancer grading. In: 2007 IEEE 33rd annual northeast bioengineering conference; 2007. p. 58.

  19. 19.

    Wang XX, Janowczyk A, Zhou Y, Thawani R, Fu PF, Schalper K, Velcheti V, Madabhushi A. Prediction of recurrence in early stage non-small cell lung cancer using computer extracted nuclear features from digital H&E images. Sci Rep. 2017;7:13543.

  20. 20.

    Lu C, Lewis JS, Dupont WD, Plummer WD, Janowczyk A, Madabhushi A. An oral cavity squamous cell carcinoma quantitative histomorphometric-based image classifier of nuclear morphology can risk stratify patients for disease-specific survival. Mod Pathol. 2017;30:1655–65.

  21. 21.

    Lewis JS Jr, Ali S, Luo J, Thorstad WL, Madabhushi A. A quantitative histomorphometric classifier (QuHbIC) identifies aggressive versus indolent p16-positive oropharyngeal squamous cell carcinoma. Am J Surg Pathol. 2014;38:128–37.

  22. 22.

    Ali S, Veltri R, Epstein JA, Christudass C, Madabhushi A. Cell cluster graph for prediction of biochemical recurrence in prostate cancer patients from tissue microarrays. In: Medical imaging 2013: digital pathology; 2013. p. 8676.

  23. 23.

    Ji MY, Yuan L, Jiang XD, Zeng Z, Zhan N, Huang PX, Lu C, Dong WG. Nuclear shape, architecture and orientation features from H&E images are able to predict recurrence in node-negative gastric adenocarcinoma. J Transl Med. 2019;17:92.

  24. 24.

    Yu KH, Zhang C, Berry GJ, Altman RB, Re C, Rubin DL, Snyder M. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat Commun. 2016;7:12474.

  25. 25.

    Luo X, Zang X, Yang L, Huang J, Liang F, Rodriguez-Canales J, Wistuba II, Gazdar A, Xie Y, Xiao G. Comprehensive computational pathological image analysis predicts lung cancer prognosis. J Thorac Oncol. 2017;12:501–9.

  26. 26.

    Lu C, Romo-Bucheli D, Wang XX, Janowczyk A, Ganesan S, Gilmore H, Rimm D, Madabhushi A. Nuclear shape and orientation features from H&E images predict survival in early-stage estrogen receptor-positive breast cancers. Lab Invest. 2018;98:1438–48.

  27. 27.

    Humphries A, Wright NA. Colonic crypt organization and tumorigenesis. Nat Rev Cancer. 2008;8:415–24.

  28. 28.

    Naik S, Doyle S, Agner S, Madabhushi A, Feldman M, Tomaszewski J. Automated gland and nuclei segmentation for grading of prostate and breast cancer histopathology. In: 2008 IEEE international symposium on biomedical imaging: from nano to macro, vol. 1–4; 2008. p. 284.

  29. 29.

    Farjam R, Soltanian-Zadeh H, Jafari-Khouzani K, Zoroofi RA. An image analysis approach for automatic malignancy determination of prostate pathological images. Cytometry Part B Clin Cytometry. 2007;72B:227–40.

  30. 30.

    Lee G, Sparks R, Ali S, Shih NN, Feldman MD, Spangler E, Rebbeck T, Tomaszewski JE, Madabhushi A. Co-occurring gland angularity in localized subgraphs: predicting biochemical recurrence in intermediate-risk prostate cancer patients. PLoS ONE. 2014;9:e97954.

  31. 31.

    Awan R, Sirinukunwattana K, Epstein D, Jefferyes S, Qidwai U, Aftab Z, Mujeeb I, Snead D, Rajpoot N. Glandular morphometrics for objective grading of colorectal adenocarcinoma histology images. Sci Rep. 2017;7:16852.

  32. 32.

    Awan R, Sirinukunwattana K, Epstein D, Jefferyes S, Qidwai U, Aftab Z, Mujeeb I, Snead D, Rajpoot N. Glandular morphometrics for objective grading of colorectal adenocarcinoma histology images. Sci Rep. 2017;7:1–12.

  33. 33.

    Nguyen K, Barnes M, Srinivas C, Chefd’hotel C. Automatic glandular and tubule region segmentation in histological grading of breast cancer. In: Medical imaging 2015: digital pathology; 2015. p. 9420.

  34. 34.

    Nguyen K, Sarkar A, Jain AK. Structure and context in prostatic gland segmentation and classification. Med Image Comput Comput Assist Interven Miccai. 2012;7510(Pt I):115–23.

  35. 35.

    Lee G, Sparks R, Ali S, Shih NNC, Feldman MD, Spangler E, Rebbeck T, Tomaszewski JE, Madabhushi A. Co-occurring gland angularity in localized subgraphs: predicting biochemical recurrence in intermediate-risk prostate cancer patients. PLoS ONE. 2014.

  36. 36.

    Leo P, Shankar E, Elliott R, Janowczyk A, Madabhushi A, Gupta S. Combination of nuclear NF-kappa B/p65 localization and gland morphological features from surgical specimens is predictive of early biochemical recurrence in prostate cancer patients. Medical imaging 2018: digital pathology; 2018. p. 10581.

  37. 37.

    Fleming M, Ravula S, Tatishchev SF, Wang HL. Colorectal carcinoma: pathologic aspects. J Gastrointest Oncol. 2012;3:153–73.

  38. 38.

    Monaco JP, Tomaszewski JE, Feldman MD, Hagemann I, Moradi M, Mousavi P, Boag A, Davidson C, Abolmaesumi P, Madabhushi A. High-throughput detection of prostate cancer in histological sections using probabilistic pairwise Markov models. Med Image Anal. 2010;14:617–29.

  39. 39.

    Nguyen K, Sarkar A, Jain AK. Structure and context in prostatic gland segmentation and classification. Med Image Comput Comput Assist Interv. 2012;15:115–23.

  40. 40.

    Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27:1226–38.

  41. 41.

    Goldhirsch A, Wood WC, Coates AS, Gelber RD, Thurlimann B, Senn HJ, Panel HJ. Strategies for subtypes—dealing with the diversity of breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2011. Ann Oncol. 2011;22:1736–47.

  42. 42.

    Compton CC, Fielding LP, Burgart LJ, Conley B, Cooper HS, Hamilton SR, Hammond ME, Henson DE, Hutter RV, Nagle RB, et al. Prognostic factors in colorectal cancer College of American Pathologists Consensus Statement 1999. Arch Pathol Lab Med. 2000;124:979–94.

  43. 43.

    Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68:394–424.

  44. 44.

    Thomas P, Gangopadhyay A, Steele G Jr, Andrews C, Nakazato H, Oikawa S, Jessup JM. The effect of transfection of the CEA gene on the metastatic behavior of the human colorectal cancer cell line MIP-101. Cancer Lett. 1995;92:59–66.

  45. 45.

    Thirunavukarasu P, Talati C, Munjal S, Attwood K, Edge SB, Francescutti V. Effect of incorporation of pretreatment serum carcinoembryonic antigen levels into AJCC staging for colon cancer on 5-year survival. JAMA Surg. 2015;150:747–55.

  46. 46.

    Quah HM, Chou JF, Gonen M, Shia J, Schrag D, Landmann RG, Guillem JG, Paty PB, Temple LK, Wong WD, Weiser MR. Identification of patients with high-risk stage II colon cancer for adjuvant therapy. Dis Colon Rectum. 2008;51:503–7.

  47. 47.

    Pages F, Mlecnik B, Marliot F, Bindea G, Ou FS, Bifulco C, Lugli A, Zlobec I, Rau TT, Berger MD, et al. International validation of the consensus Immunoscore for the classification of colon cancer: a prognostic and accuracy study. Lancet. 2018;391:2128–39.

  48. 48.

    Saltz J, Gupta R, Hou L, Kurc T, Singh P, Nguyen V, Samaras D, Shroyer KR, Zhao T, Batiste R, et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 2018;23(181–193):e187.

Download references


We thank ZZ and NZ for their expert technical assistance with the tissue microarray construction. Special thanks to MTG for her experienced assistance with statistical analysis. Special thanks to the team members of CCIPD in Case Western Reserve University for their wonderful assistance with digital image analysis support.


This work was funded by the Hubei Province Natural Science Foundation of China (No. 2018CFB136) and the National Natural Science Foundation of China (Grant No. 61401263, No. 61672333 and Nos. 81602535, No. 81901817), Innovation Seed Funding of Wuhan University (TFZZ2018020), Fundamental Research Funds for the Central Universities (GK201903096, GK201901010).

Author information

Conceptualization, MYJ, LY, WGD and CL; methodology, MYJ, LY and CL; software, LY and CL; validation, all authors; formal analysis, all authors; investigation, MYJ and LY; resources, all authors; data curation, MYJ, NN, YID, ZRL and ZZ; writing—original draft preparation, MYJ and LY; writing—review and editing, MYJ and LY; visualization, MYJ, LY and CL; supervision, LY, CL, WGD and PXH; project administration, CL, WGD, LY; funding acquisition, MYJ, CL and ZZ. All authors read and approved the final manuscript.

Correspondence to Lei Yuan or Cheng Lu or Wei-Guo Dong.

Ethics declarations

Ethics approval and consent to participate

Our study was approved by the ethical committee of Renmin Hospital Wuhan University and The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, and abided with the Declaration of Helsinki before using tissue samples for scientific researches purpose only. The written informed consent was waived by the ethical committee for this retrospective study.

Consent for publication

Not applicable.

Competing interests

None declared.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: File S1. TMA Construction. File S2. Description of Gland Co-occurrence Morphological Feature Extraction. File S3. Immunohistochemistry. Figure S1. The workflow of patient selection. Figure S2. Kaplan–Meier curves of perineural invasion, vascular invasion on D2. Table S1. Summary of gland morphometric features. Table S2. A comprehensive list of all 797 quantitative features. Table S3. Patient characteristics of TCGA cohort. Table S4. The top 5 representative Feature and descriptions. Table S5. The performance of the classifiers on D2/D3. Table S6. Correlations between ECHBC and other major clinicopathologic features and disease recurrence on D2. Table S7. Comparative analysis of the image classifier and immunohistochemistry&CEA on D2. Table S8. The performances of the image classifier on TCGA cohort. Table S9. Multivariate survival analysis conducted on D4. Table S10. Ki67 and CEA Multivariate survival analysis conducted on D1/D2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ji, M., Yuan, L., Lu, S. et al. Glandular orientation and shape determined by computational pathology could identify aggressive tumor for early colon carcinoma: a triple-center study. J Transl Med 18, 129 (2020).

Download citation


  • Gland heterogeneity
  • Quantitative histopathology images
  • Colon adenocarcinoma
  • Prognosis