Skip to main content

Development of a machine learning-based model to predict prognosis of alpha-fetoprotein-positive hepatocellular carcinoma

Abstract

Background

Patients with alpha-fetoprotein (AFP)-positive hepatocellular carcinoma (HCC) have aggressive biological behavior and poor prognosis. Therefore, survival time is one of the greatest concerns for patients with AFP-positive HCC. This study aimed to demonstrate the utilization of six machine learning (ML)-based prognostic models to predict overall survival of patients with AFP-positive HCC.

Methods

Data on patients with AFP-positive HCC were extracted from the Surveillance, Epidemiology, and End Results database. Six ML algorithms (extreme gradient boosting [XGBoost], logistic regression [LR], support vector machine [SVM], random forest [RF], K-nearest neighbor [KNN], and decision tree [ID3]) were used to develop the prognostic models of patients with AFP-positive HCC at one year, three years, and five years. Area under the receiver operating characteristic curve (AUC), confusion matrix, calibration curves, and decision curve analysis (DCA) were used to evaluate the model.

Results

A total of 2,038 patients with AFP-positive HCC were included for analysis. The 1-, 3-, and 5-year overall survival rates were 60.7%, 28.9%, and 14.3%, respectively. Seventeen features regarding demographics and clinicopathology were included in six ML algorithms to generate a prognostic model. The XGBoost model showed the best performance in predicting survival at 1-year (train set: AUC = 0.771; test set: AUC = 0.782), 3-year (train set: AUC = 0.763; test set: AUC = 0.749) and 5-year (train set: AUC = 0.807; test set: AUC = 0.740). Furthermore, for 1-, 3-, and 5-year survival prediction, the accuracy in the training and test sets was 0.709 and 0.726, 0.721 and 0.726, and 0.778 and 0.784 for the XGBoost model, respectively. Calibration curves and DCA exhibited good predictive performance as well.

Conclusions

The XGBoost model exhibited good predictive performance, which may provide physicians with an effective tool for early medical intervention and improve the survival of patients.

Introduction

Hepatocellular carcinoma (HCC) is the most common form of liver cancer, accounting for approximately 75‒85% of cases [1, 2]. It is a highly fatal cancer and a major cause of cancer-related death worldwide, leading to more than 700,000 deaths each year [3].

Alpha-fetoprotein (AFP) is often expressed at high levels in HCC, and approximately 75% of patients with HCC were AFP positive [4, 5]. Compared to patients with AFP-negative HCC, patients with AFP-positive HCC were associated with worse biological behavior and inferior survival [4, 6]. Patients with AFP-positive HCC were more likely to present with higher clinical stage, TNM classification, fibrosis scores, and a more vessel invasion [4, 7, 8]. A recent study showed that regardless of surgical or adjuvant therapy, the median overall survival time of patients with AFP-positive HCC was much lower than those of patients with AFP-negative HCC (13 months vs. 48 months) [4]. Therefore, it is imperative to create prognostic prediction models for patients with AFP-positive HCC, thereby contributing to accurately answer their concerns about survival and helping to implement individualized management.

Machine learning, a new type of artificial intelligence (AI), has recently become a topic of paramount importance, providing methods, techniques, and tools for the analysis of data generated by the biological sciences [9,10,11]. It can learn from examples to make patient-level survival predictions and establish clinical AI prognostic models with significantly improved accuracy [9, 12]. Extreme gradient boosting (XGBoost) is a newer ensemble-learning algorithm, which can be applied to adjust the errors generated by existing models [13, 14]. XGBoost has been used for effective survival prediction of cancer patients [14,15,16,17]. However, it has rarely been applied for the prediction of prognosis for patients with AFP-positive HCC.

In this study, we implemented six machine learning algorithms including XGBoost, logistic regression (LR), support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), and decision tree (ID3) to predict 1-, 3- and 5-year survival of patients with AFP-positive HCC, using data retrieved from the Surveillance, Epidemiology, and End Results (SEER) database. The present study contributes to developing machine learning-based models to provide insight into the prognosis of patients with AFP-positive HCC.

Methods

Data source and patient selection

Data on patients with AFP-positive HCC were extracted from the SEER database, which is an important population-based program of the National Cancer Institute and covers approximately 30% of the United States population [18]. According to the International Classification of Diseases for Oncology, Third Edition (ICD-O-3), the inclusion primary site code was C22.0 and the histological codes were 8170/3‒8175/3. Patients diagnosed between 2004 and 2015 were collected. The following cases were excluded: (1) patients with AFP-negative HCC patients; (2) patients with multiple primary tumors; (3) incomplete information including tumor size, race, survival data, AFP, fibrosis score, grade, cause of death, marital status, insurance status, and median household income; (4) unknown TNM stage; and (5) unknown whether surgery was performed. Finally, 2,038 eligible patients with AFP-positive HCC were included and further analyzed in this study. Figure 1 presents the flowchart of study design and patient selection.

Fig. 1
figure 1

Flowchart of study design and patient selection. AFP alpha-fetoprotein; HCC hepatocellular carcinoma; SEER Surveillance, Epidemiology, and End Results; TNM tumor lymph node metastasis; ROC curve receiver operating characteristic curve; AUC area under the curve

Study variables

The following factors were included as explanatory variables: race, sex, age at diagnosis, histological grade, tumor size, TNM stage [American Joint Committee on Cancer (AJCC) 7th version], SEER stage, fibrosis score, marital status, insurance status, median household income, and treatment strategy (surgery, radiotherapy, and chemotherapy). The outcome variables were survival months and overall survival.

XGBoost model

XGBoost is a newer ensemble-learning algorithm, which was officially published in 2016 [13, 14]. It is more novel and complex compared to traditional machine learning algorithms [19]. The basic concepts of each machine learning algorithm are presented in Supplementary Text 1. In this study, the model was built on the training set by 10-fold cross-validation, in order to ensure the stability of the model. We tested and adjusted the model repeatedly and finally determined the key hyperparameters. In addition, a test set was devoted to further validate the model. Here, we aimed to develop a machine learning-based model to predict the overall survival of patients with AFP-positive HCC at 1-, 3-, and 5-year.

Statistical analysis

In terms of basic characteristics, categorical variables were presented as number (n) and percentage (%). Chi-square test was used to compare differences between training and test sets. Normally distributed continuous variables were expressed as mean ± standard deviation, and non-normally distributed continuous variables were illustrated as median (range). When appropriate, t test or Mann-Whitney U test was used. Age, tumor size, and median household income were presented as continuous variables.

In this study, six machine learning algorithms (XGBoost, LR, SVM, RF, KNN, and ID3) were used to develop the prognostic models for patients with AFP-positive HCC. We evaluated the predictive performance of six machine learning-based prognostic models using the receiver operating characteristic (ROC) analysis and confusion matrix. Area under the ROC curve (AUC) was calculated to evaluate the model, using the ROC curve analysis. Accuracy was also calculated, which is one of the primary assessment parameters in the confusion matrix [15]. In addition, calibration curves and decision curve analyses (DCA) were also performed. All statistical analyses were performed with SPSS version 26 and Python version 3.6 (Python Software Foundation). A P value < 0.05 was considered statistically significant.

Results

Patient characteristics

We obtained the information on 2,038 eligible patients with AFP-positive HCC from the SEER program. The 1-, 3-, and 5-year overall survival rates of patients with AFP-positive HCC were 60.7%, 28.9%, and 14.3%, respectively. The baseline characteristics of the training and test sets are shown in Table 1 and summarized below. There was no difference in baseline data (except for marital status and median household income) between the training and test sets.

Table 1 Baseline characteristics of AFP-positive HCC patients

Of these patients, 76.3% were male, and 63.4% were white. The average age was 61.07 years. Patients with grade III or IV tumors accounted for 23.3%. In terms of marital status, about 57.9% of patients were married. There were 1,509 (74.0%) patients who were insured. The majority of patients (74.0%) had a high fibrosis score (fibrosis score 5–6, i.e., severe fibrosis or cirrhosis). Regarding tumor size, tumors with ≤ 3 cm, 3–5 cm, and ≥ 5 cm accounted for 33.3%, 27.2%, and 39.5% of patients, respectively. In the treatment field, across the entire study population, more than half of the patients received surgical treatment, accounting for approximately 59.0%, followed by 41.6% with chemotherapy, while only 6.7% received radiotherapy.

Feature predictor selection

The importance of each feature in the XGBoost prognostic model is illustrated in Fig. 2. The findings revealed that for the 1-year prognostic model, the top five variables affecting prognosis were surgery, AJCC stage, tumor size, marital status, and median household income, while surgery, AJCC stage, tumor size, SEER stage, and age were the top five variables for 3- and 5-year prognostic models. Among them, surgery was the most important variable for 1-, 3- and 5-year prognostic models of XGBoost.

Fig. 2
figure 2

The importance of each feature in the XGBoost prognostic model. A The importance of each feature in the 1- year prognostic model; B the importance of each feature in the 3-year prognostic model; C the importance of each feature in the 5-year prognostic model. XGBoost extreme gradient boosting

Construction of AI prognostic model

The total cases were randomly divided into a training set (n = 1,428) and a test set (n = 610) at a ratio of 7:3, for the construction and verification of AI prognostic models, respectively. In the training set, we used ten-fold cross-validation for iterative testing and tuning, and tested and adjusted the model repeatedly. The key hyperparameters were finally confirmed. The main parameters of the XGBoost model are summarized as follows: Colsample_bytree = 0.8, Gamma = 0, Learning_rate = 0.1, Max_depth = 1, Min_child_weight = 1, and Subsample = 1.

Evaluating predictive models for estimating the prognosis of patients with AFP-positive HCC

Using ROC curve analysis, we calculated the corresponding AUCs for the training and test sets. The XGBoost model performed well in predicting survival of patients with AFP-positive HCC at 1-year (train set: AUC = 0.771; test set: AUC = 0.782), 3-year (train set: AUC = 0.763; test set: AUC = 0.749) and 5-year (train set: AUC = 0.807; test set: AUC = 0.740) (Fig. 3).

Fig. 3
figure 3

XGBoost model evaluation. A ROC curve for the 1-year prognostic model in the training and test sets; B ROC curve for the 3-year prognostic model in the training and test sets; C ROC curve for the 5-year prognostic model in the training and test sets. XGBoost extreme gradient boosting; ROC receiver operating characteristic curve; AUC area under the curve

In the ROC curve analysis, the 1-year AUC values of LR, SVM, RF, KNN, and ID3 were 0.758, 0.703, 0.761, 0.746, and 0.762, respectively, in the training set, corresponding to 0.750, 0.734, 0.779, 0.631, and 0.750 in the test set (Table 2). In the 3-year prognostic model, the AUC values of LR, SVM, RF, KNN, and ID3 were 0.756, 0.687, 0.760, 0.744, and 0.752, respectively, in the training set, corresponding to 0.740, 0.739, 0.753, 0.607, and 0.718 in the test set. In the 5-year prognostic model, the AUC values of LR, SVM, RF, KNN, and ID3 were 0.753, 0.686, 0.754, 0.786, and 0.748, respectively, in the training set, corresponding to 0.708, 0.715, 0.718, 0.586, and 0.699 in the test set. Compared to the five machine learning algorithms, the XGBoost model performed the best.

Table 2 Performance of prognostic models built by machine learning algorithms in the training and test sets (area under the ROC curve)

Furthermore, we evaluated the accuracy of the XGBoost model by constructing a confusion matrix (Supplementary Fig. 1). For 1-, 3-, and 5-year survival prediction, the accuracy in the training and test sets was 0.709 and 0.726, 0.721 and 0.726, and 0.778 and 0.784, respectively. Supplementary Table 1 shows the accuracy of each model in predicting 1-, 3-, and 5-year survival in the training and test sets.

The XGBoost model-related calibration curves displayed good consistency in the probability of 1-, 3-, and 5-year survival between the actual observation and the model prediction in the training (Supplementary Fig. 2A, B and C; respectively) and test (Supplementary Fig. 2D, E and F; respectively) sets. Meanwhile, the DCA curves of 1-, 3-, and 5-year survival in the training (Fig. 4A, B and C; respectively) and test (Fig. 4D, E and F; respectively) sets also demonstrated good clinical utility, showing preferable positive net benefit.

Fig. 4
figure 4

Decision curve analysis curves of the XGBoost model in the training and test sets. Decision curve analysis curves for A 1-year, B 3-year, and C 5-year prognostic models in the training set; and D 1-year, E 3-year, and F 5-year prognostic models in the test set. XGBoost extreme gradient boosting

Discussion

Patients with AFP-positive HCC have aggressive biological behavior and poor prognosis, therefore, survival time is one of the greatest concerns [4]. In current clinic practice, however, there is a lack of reliable predictive models. Accurate and powerful models are thus clearly needed. In this study, we developed six machine learning-based prognostic models for AFP-positive HCC to comprehensively analyze survival data. The 1-, 3-, and 5-year overall survival rates of AFP-positive HCC patients were 60.7%, 28.9%, and 14.3%, respectively.

To our knowledge, the current study is the first investigation to create AI prognostic models for patients with AFP-positive HCC. The XGBoost model showed good prediction accuracy, and the AUCs of the ROC curves in 1-, 3- and 5-year overall survival were 0.771, 0.763, and 0.807, respectively, in the training set, corresponding to 0.782, 0.749, and 0.740 in the test set. Compared to the five machine learning algorithms including LR, SVM, RF, KNN, and ID3, our results revealed that the XGBoost model performed best. It holds promise for early medical intervention and improving the survival of patients.

In recent years, machine learning-based AI models attracted increasing attention in clinical practice [14, 20, 21]. Especially, AI-based technologies have made a significant contribution to the field of cancer research [21]. Recent studies have examined the use of the XGBoost model in predicting the survival of cancer patients, and verified that this model is of better prediction ability in various types of cancer. In a recent study, Xu et al. [14] reported that the XGBoost model exhibited a better performance than the AJCC staging system in predict postoperative survival in elderly intrahepatic cholangiocarcinoma patients, with the AUCs of more than 0.7 both in the training and test sets. Li et al. [15] found that the XGBoost model behaved efficiently and successfully in predict the survival of patients with breast cancer brain metastases, with an AUC of 0.8 or above (test data). In addition, Zhong et al. [16] applied the XGBoost algorithm to create a prognostic model for patients with breast cancer with bone metastasis and showed AUC values of 0.88 and 0.80 in the training and test sets. Consistent with the previous studies [14,15,16], our present study also revealed that the XGBoost model showed good performance in prognostic survival prediction models, showing AUCs greater than 0.7 and even the 5-year AUC value over 0.8 (training data). Generally, an AUC ≥ 0.7 indicates that the model has an adequate predictive ability [22]. This suggests that XGBoost is an efficient machine learning classifier.

Notably, in this study, a total of 17 features in the basic characteristics of patients with AFP-positive HCC were considered in the survival prediction, which could be helpful in providing a comprehensive and accurate prediction. Our findings revealed that surgery, AJCC stage, tumor size, marital status, median household income, SEER stage, and age were relatively important variables affecting prognosis. Among them, surgery was the most important one. This is consistent with previous results. Several recent studies showed that surgery was an independent prognostic factor for patients with HCC [23,24,25,26]. Currently, surgical resection is still considered to be the gold standard treatment for HCC [27]. This result suggested the importance of surgical treatment in AFP-positive HCC, which is a favorable conclusion for both clinicians and patients. Consistently, AJCC stage, tumor size, and age were related to the survival of HCC patients [23, 25]. Previous studies have shown that patients with HCC with a tumor diameter ≤ 3 cm was low malignant potential and had better survival after treatment [28, 29]. Of note, age, tumor size, and median household income were presented as continuous variables rather than categorical variables. This implies that individualized survival prediction could be made for a particular patient, as opposed to a collective prediction for a group of patients, thus highlighting the concept of personalized prognosis prediction. In this study, marital status and median household income, two socio‑economic factors, were also identified as important predictors for survival in patients with AFP-positive HCC. Psychological and economic support from spouses may help to improve survival in married patients [30].

This study has its unique aspects. This is the first study to create AI prognostic models for patients with AFP-positive HCC. We implemented six machine learning algorithms and used ten-fold cross-validation for iterative testing and tuning, and tested and adjusted the model repeatedly. Moreover, based on different machine learning algorithms, we comprehensively analyzed 17 demographic/clinicopathological features, thus helping to provide an accurate prediction. Nonetheless, the present study has some potential limitations. First, this is a retrospective study. Second, we obtained the information on patients with AFP-positive HCC from the SEER database and, therefore, representativeness for other populations may be limited. Third, some other important information, such as concrete values of AFP, vascular invasion, etiology of HCC, and serum biochemical parameters, was not available in the SEER program. The model may miss some important features and lead to results bias. For example, previous studies revealed that microvascular invasion was an important and independent prognostic factor for patients with HCC [31, 32]. Finally, the AI prognostic models we created were internally validated, and despite their promising predictive performance, external validation using prospective studies is required to assess their applicability.

Conclusions

In conclusion, our study developed six novel machine learning-based prognostic models for the survival of patients with AFP-positive HCC. The XGBoost model exhibited good predictive performance, which may provide physicians with an effective tool for early medical intervention and improve the survival of patients.

Data availability

Publicly available datasets were analyzed in this study. This data can be found here: https://seer.cancer.gov/.

Abbreviations

HCC:

Hepatocellular carcinoma

AFP:

Alpha-fetoprotein

AI:

Artificial intelligence

XGBoost:

Extreme gradient boosting

SEER:

Surveillance, Epidemiology, and End Results

ICD-O-3:

International Classification of Diseases for Oncology, Third Edition

AJCC:

American Joint Committee on Cancer

LR:

Logistic regression

SVM:

Support vector machine

RF:

Random forest

KNN:

K-nearest neighbor

ID3:

Decision tree

ROC:

Receiver operating characteristic

AUC:

Area under the receiver operating characteristic curve

DCA:

Decision curve analyses

References

  1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer statistics 2020: GLOBOCAN estimates of incidence and Mortality Worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.

    Article  PubMed  Google Scholar 

  2. Suk FM, Liu CL, Hsu MH, Chuang YT, Wang JP, Liao YJ. Treatment with a new benzimidazole derivative bearing a pyrrolidine side chain overcomes sorafenib resistance in hepatocellular carcinoma. Sci Rep. 2019;9(1):17259.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Villanueva A, Hepatocellular Carcinoma. N Engl J Med. 2019;380(15):1450–62.

    Article  CAS  PubMed  Google Scholar 

  4. He H, Chen S, Fan Z, Dong Y, Wang Y, Li S, et al. Multi-dimensional single-cell characterization revealed suppressive immune microenvironment in AFP-positive hepatocellular carcinoma. Cell Discov. 2023;9(1):60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Taketa K. Alpha-fetoprotein: reevaluation in hepatology. Hepatology. 1990;12(6):1420–32.

    Article  CAS  PubMed  Google Scholar 

  6. Zhao T, Jia L, Li J, Ma C, Wu J, Shen J, et al. Heterogeneities of site-specific N-Glycosylation in HCC Tumors with Low and High AFP concentrations. Front Oncol. 2020;10:496.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Bai DS, Zhang C, Chen P, Jin SJ, Jiang GQ. The prognostic correlation of AFP level at diagnosis with pathological grade, progression, and survival of patients with hepatocellular carcinoma. Sci Rep. 2017;7(1):12870.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Munson PV, Adamik J, Butterfield LH. Immunomodulatory impact of α-fetoprotein. Trends Immunol. 2022;43(6):438–48.

    Article  CAS  PubMed  Google Scholar 

  9. Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med. 2021;13(1):152.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Nguyen TT, Ho CT, Bui HTT, Ho LK, Ta VT. Multidimensional Machine Learning for assessing parameters Associated with COVID-19 in Vietnam: Validation Study. JMIR Form Res. 2023;7:e42895.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Sajda P. Machine learning for detection and diagnosis of disease. Annu Rev Biomed Eng. 2006;8:537–65.

    Article  CAS  PubMed  Google Scholar 

  12. Senders JT, Staples P, Mehrtash A, Cote DJ, Taphoorn MJB, Reardon DA, et al. An online calculator for the prediction of Survival in Glioblastoma patients using classical statistics and machine learning. Neurosurgery. 2020;86(2):E184–92.

    Article  PubMed  Google Scholar 

  13. Chen T, Guestrin C, XGBoost:. A Scalable Tree Boosting System. 2016.

  14. Xu Q, Lu X. Development and validation of an XGBoost model to predict 5-year survival in elderly patients with intrahepatic cholangiocarcinoma after surgery: a SEER-based study. J Gastrointest Oncol. 2022;13(6):3290–9.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Li C, Liu M, Zhang Y, Wang Y, Li J, Sun S, et al. Novel models by machine learning to predict prognosis of breast cancer brain metastases. J Transl Med. 2023;21(1):404.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Zhong X, Lin Y, Zhang W, Bi Q. Predicting diagnosis and survival of bone metastasis in breast cancer using machine learning. Sci Rep. 2023;13(1):18301.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Kinoshita F, Takenaka T, Yamashita T, Matsumoto K, Oku Y, Ono Y, et al. Development of artificial intelligence prognostic model for surgically resected non-small cell lung cancer. Sci Rep. 2023;13(1):15683.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Duggan MA, Anderson WF, Altekruse S, Penberthy L, Sherman ME. The Surveillance, Epidemiology, and end results (SEER) Program and Pathology: toward strengthening the critical relationship. Am J Surg Pathol. 2016;40(12):e94–102.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Jiang J, Pan H, Li M, Qian B, Lin X, Fan S. Predictive model for the 5-year survival status of osteosarcoma patients based on the SEER database and XGBoost algorithm. Sci Rep. 2021;11(1):5542.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Çubukçu HC, Topcu Dİ, Yenice S. Machine learning-based clinical decision support using laboratory data. Clin Chem Lab Med. 2023;62(5):793–823.

    Article  PubMed  Google Scholar 

  21. Kumar Y, Gupta S, Singla R, Hu YC. A systematic review of Artificial Intelligence techniques in Cancer Prediction and diagnosis. Arch Comput Methods Eng. 2022;29(4):2043–70.

    Article  PubMed  Google Scholar 

  22. Fischer JE, Bachmann LM, Jaeschke R. A readers’ guide to the interpretation of diagnostic test properties: clinical example of sepsis. Intensive Care Med. 2003;29(7):1043–51.

    Article  PubMed  Google Scholar 

  23. Yang R, Yu X, Zeng P. Construction and validation of a SEER-based prognostic nomogram for young and middle-aged males patients with hepatocellular carcinoma. J Cancer Res Clin Oncol. 2023;149(12):10099–108.

    Article  CAS  PubMed  Google Scholar 

  24. Liu K, Huang G, Chang P, Zhang W, Li T, Dai Z, et al. Construction and validation of a nomogram for predicting cancer-specific survival in hepatocellular carcinoma patients. Sci Rep. 2020;10(1):21376.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Yan B, Su BB, Bai DS, Qian JJ, Zhang C, Jin SJ, et al. A practical nomogram and risk stratification system predicting the cancer-specific survival for patients with early hepatocellular carcinoma. Cancer Med. 2021;10(2):496–506.

    Article  PubMed  Google Scholar 

  26. Xiao Z, Yan Y, Zhou Q, Liu H, Huang P, Zhou Q, et al. Development and external validation of prognostic nomograms in hepatocellular carcinoma patients: a population based study. Cancer Manag Res. 2019;11:2691–708.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Yang LY, Fang F, Ou DP, Wu W, Zeng ZJ, Wu F. Solitary large hepatocellular carcinoma: a specific subtype of hepatocellular carcinoma with good outcome after hepatic resection. Ann Surg. 2009;249(1):118–23.

    Article  PubMed  Google Scholar 

  28. Yamashita YI, Imai K, Yusa T, Nakao Y, Kitano Y, Nakagawa S, et al. Microvascular invasion of single small hepatocellular carcinoma ≤ 3 cm: predictors and optimal treatments. Ann Gastroenterol Surg. 2018;2(3):197–203.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Cammà C, Di Marco V, Orlando A, Sandonato L, Casaril A, Parisi P, et al. Treatment of hepatocellular carcinoma in compensated cirrhosis with radio-frequency thermal ablation (RFTA): a prospective study. J Hepatol. 2005;42(4):535–40.

    Article  PubMed  Google Scholar 

  30. Chen Z, Cui J, Dai W, Yang H, He Y, Song X. Influence of marital status on small intestinal adenocarcinoma survival: an analysis of the Surveillance, Epidemiology, and end results (SEER) database. Cancer Manag Res. 2018;10:5667–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Ouyang X, Yan Y, Zhang S, Li M, Li M, Liu Q. Microvascular invasion is associated with poor survival in patients with dual-phenotype hepatocellular carcinoma. Am J Clin Pathol. 2023:aqad143.

  32. Wu F, Sun H, Zhou C, Huang P, Xiao Y, Yang C, et al. Prognostic factors for long-term outcome in bifocal hepatocellular carcinoma after resection. Eur Radiol. 2023;33(5):3604–16.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank the SEER database for its open data access. We also thank Boya Du for the help with machine learning analysis.

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, BD, HZ and YD; methodology, BD, HZ, YD and SY; formal analysis, BD and HZ; data curation, BD, HZ, YD and YC; writing-original draft preparation, BD and HZ; writing-review and editing, YD, SY and YC; supervision, YC and CZ. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Yongjian Chen or Chaoxue Zhang.

Ethics declarations

Ethics approval and consent to participate

Ethical review and approval were waived for this study due to the fact that the data are fully de-identified and no intervention on patients was performed.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, B., Zhang, H., Duan, Y. et al. Development of a machine learning-based model to predict prognosis of alpha-fetoprotein-positive hepatocellular carcinoma. J Transl Med 22, 455 (2024). https://doi.org/10.1186/s12967-024-05203-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12967-024-05203-w

Keywords