Development and validation of a novel immune-related prognostic model in hepatocellular carcinoma

Background Growing evidence has suggested that immune-related genes play crucial roles in the development and progression of hepatocellular carcinoma (HCC). Nevertheless, the utility of immune-related genes for evaluating the prognosis of HCC patients are still lacking. The study aimed to explore gene signatures and prognostic values of immune-related genes in HCC. Methods We comprehensively integrated gene expression data acquired from 374 HCC and 50 normal tissues in The Cancer Genome Atlas (TCGA). Differentially expressed genes (DEGs) analysis and univariate Cox regression analysis were performed to identify DEGs that related to overall survival. An immune prognostic model was constructed using the Lasso and multivariate Cox regression analyses. Furthermore, Cox regression analysis was applied to identify independent prognostic factors in HCC. The correlation analysis between immune-related signature and immune cells infiltration were also investigated. Finally, the signature was validated in an external independent dataset. Results A total of 329 differentially expressed immune‐related genes were detected. 64 immune‐related genes were identified to be markedly related to overall survival in HCC patients using univariate Cox regression analysis. Then we established a TF-mediated network for exploring the regulatory mechanisms of these genes. Lasso and multivariate Cox regression analyses were applied to construct the immune-based prognostic model, which consisted of nine immune‐related genes. Further analysis indicated that this immune-related prognostic model could be an independent prognostic indicator after adjusting to other clinical factors. The relationships between the risk score model and immune cell infiltration suggested that the nine-gene signature could reflect the status of tumor immune microenvironment. The prognostic value of this nine-gene prognostic model was further successfully validated in an independent database. Conclusions Together, our study screened potential prognostic immune-related genes and established a novel immune-based prognostic model of HCC, which not only provides new potential prognostic biomarkers and therapeutic targets, but also deepens our understanding of tumor immune microenvironment status and lays a theoretical foundation for immunotherapy.

each year, and nearly 850,000 new cases occur [1,2]. Hepatocellular carcinoma (HCC) accounts for 85-90% of all liver cancers and has received public attention. Despite rapid advances in new tests and treatments, the 5-year survival rate for HCC is still less than one in five [3]. At present, surgery is still the main treatment for early liver cancer [4]. However, a significant proportion of patients will have postoperative recurrence or distant metastasis [5]. Recently, drugs such as sorafenib and regorafenib have been shown to be effective against advanced HCC [6,7]. It is worth mentioning that patients with the same pathological type and clinical stage often have different outcomes after the same treatment, which is mainly due to the genetic heterogeneity of patients [8].
The immune system is thought to be a decisive factor in the development of cancer [9,10], including HCC. Immune cells are major components of the tumor microenvironment and play a role in many key steps of HCC development from tumor growth to the development of metastasis [11,12]. Besides, a large amount of inflammatory mediators were found to be associated with HCC development. IL-22, belongs to the cytokine family, was overexpressed in the HCC microenvironment and leading to tumor growth [13]. Chemokines (e.g. CXCL12, CCL20), as the immune molecules members of immune system, also play a crucial role in HCC growth, invasion and metastasis [14,15]. It is proven that immunogenicity makes the immunotherapy of HCC a promising prospect [16]. Research progress discovery that programmed cell death-1 (PD-1) pathway is a new target for HCC immunotherapy [17]. As an anti-PD-1 monoclonal antibody, nivolumab can block PD-1 and restore the body's anticancer immune response by interfering with the signaling pathway, thereby preventing T cell activation [18]. In HCC, nivolumab showed significant benefits in objective response rates and overall survival [19]. Therefore, nivolumab may provide a safe, effective and promising treatment for HCC [20]. Increasing studies have suggested that immune-related genes in HCC are closely related to the tumorigenesis and development of HCC [21]. However, there is currently no prognostic model based on immunerelated genes to systematically evaluate tumor immune environment and predict the overall prognosis of HCC patients. Therefore, the construction of an immunebased prognosis model that can reliably predict HCC prognosis is of great clinical significance.
In the first step of this study, we screened differentially expressed immune-related genes closely related to HCC through bioinformatics analysis of large-scale sequencing database. Next, immune-related genes significantly related to prognosis were further detected. Then we constructed an immune-related prognostic model by integrating immune-related genes for HCC. Moreover, the prognostic value of our immune-related prognostic model was further validated in an independent International Cancer Genome Consortium (ICGC) database. We here aimed to provide novel biomarkers that would be effective in predicting the prognosis and monitoring tumor immune microenvironment in HCC patients.

Data collection
Gene expression data and clinical information of HCC samples were acquired from TCGA data portal (https :// porta l.gdc.cance r.gov/cart; up to September 16, 2019). Processed RNA-Seq FPKM data of 374 HCC and 50 adjacent normal HCC tissues were downloaded for further analyses. After careful search and examination, 224 HCC patients were accompanied by hepatitis B virus. This included 81 HBsAg positive patients, 60 HBsAg and HBsAb both positive patients, and 83 patients whose history risk factors were hepatitis B. The International Cancer Genome Consortium (ICGC; https ://dcc.icgc.org/searc h?filte rs=%7B%22don or%22:%7B%22pro jectI d%22:%7B%22is%22:%5B%22LIR I -J P % 2 2 % 5 D % 7 D , % 2 2 a v a i l a b l e D a t a T y p e s % 2 2 : % 7 B % 2 2 i s % 2 2 : % 5 B % 2 2 e x p _ seq%22%5D%7D%7D%7D) was a web-based portal that provided comprehensive molecular genetic profiles of 50 different tumor types. ICGC represents a valuable database for analyzing cancer genome at the genomic and transcriptomic levels. For validation cohort, gene expression data and the corresponding survival information of 231 HCC patients were retrieved from the ICGC database. We download 1811 immune-related genes via the Immunology Database and Analysis Portal (ImmPort; https ://www.immpo rt.org/share d/genel ists) database, which contains 17 immune categories based on various molecular function [22]. The cistrome Cancer (http://cistr ome.org/Cistr omeCa ncer/Cance rTarg et/) represents a useful database for biomedical and genetic research and includes totally 318 transcription factors (TFs) [23]. In order to investigate the regulatory mechanism of immune-related genes, we extracted these TFs for subsequent research. Because our data were downloaded directly from public databases and we strictly abided by the publishing guidelines provided by TCGA and ICGC, there were no requirement for ethical approvals.

Differential expression analyses
The differentially expressed immune-related genes and TFs in HCC and normal tissues were detected using the Wilcoxon test method in R. |log2 foldchange| > 1 and FDR < 0.05 were considered as significant. Heatmaps were generated using pheatmap package and volcano plots were also conducted in R software. To assess the potential biologic functions of differentially expressed immune-related genes, Gene Ontology (GO) [24] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis [25] were performed by the cluster Profiler package [26] in R. Functional categories with a adjusted P value < 0.05 were considered as significant pathways.

Survival analysis
Only patients with a follow-up time less than 2000 days were used for the survival analyses. To investigate the prognostic value of differentially expressed immunerelated genes in HCC patients, univariate Cox analysis was implemented by the survival package. Only these genes with a P value < 0.01 were considered as prognostic immune-related genes. These prognostic immune-related genes were further analyzed by GO and KEGG analysis.
To evaluate how TFs regulating these immune-related genes, we first screened prognosis-related TFs using univariate Cox analysis with a P value < 0.01 Then the correlation test between prognosis-related TFs and prognostic immune-related genes was investigated. This step was performed using cor.test function in R, whose core method was Pearson test. The correlation coefficient and P value were calculated by cor.test. The cut-off criteria were set as correlation coefficient > 0.5 and P < 0.05. In order to make the picture clear, we only chose TFs that regulated more than nine immune-related genes. Cytoscape was utilized for constructing and visualizing the regulatory network [27].

Construction of the immune-related signature for HCC
To develop a prognostic model, Lasso and multivariate Cox regression analyses were utilized to assess the relationship between prognostic immune-related genes expressions and overall survival (OS). To avoid over-fitting and delete highly related genes, Lasso Cox regression was carried out using survival and glmnet package. Genes detected via Lasso algorithm were evaluated by step wise multivariate Cox regression analysis. Risk scores were acquired based on genes expression multiplied a linear combination of regression coefficient obtained from the multivariate Cox regression. Patients were assigned to high risk and low risk groups according to the median risk score. The Kaplan-Meier analysis was performed to compare overall survival between high risk and low risk groups via survival package in R. The receiver operating characteristic (ROC) curve was implemented by the R software package survival ROC. In addition, univariate and multivariate analyses were utilized to assess the effect of risk scores on overall survival and several clinical features.

Correlation analysis between immune-related signature and immune cells infiltration
To explore the associations between prognostic model and immune cells infiltration, we employed Tumor Immune Estimation Resource (TIMER) [28], a useful resource for comprehensive analysis of tumorinfiltrating immune cells. TIMER algorithm allows users to estimate the composition of six tumorinfiltrating immune cells subsets (B cells, CD4+ T cells, CD8+ T cells, macrophages, neutrophils, and dendritic cells). The immune infiltrate levels of HCC patients were derived from TIMER website and the correlation between the prognostic model and six tumor-infiltrating immune cells were conducted in R.

Genetical alteration of the immune-related signature
The cBio Cancer Genomics Portal (CBioPortal) represents an important online platform for visualization and analysis of various cancer genomics data [29,30]. CBioPortal was conducted to analyse genetic alterations of prognostic genes in HCC patients (TCGA, Provisional). Anti-cancer drugs that target these genes were also identified.

External validation of the immune-related signature
To verify the prognostic value of immune-related signature risk score model, we used the ICGC database as the validation cohort. The same formula was used to calculate risk scores and patients were classified into high risk and low risk groups based on the optimal cutoff point. Kaplan-Meier and ROC curve analyses were carried out as described above.

Statistical analysis
All analyses were performed using Rversion 3.5.1. Unless otherwise noted, P < 0.05 was considered to be significant.

Differentially expressed immune-related genes and TFs in HCC
A total of 329 immune-related genes (267 upregulated and 62 downregulated) and 117 TFs (108 upregulated and 9 downregulated) were identified as differentially expressed in HCC tissues compared with normal tissues. The heat maps revealed that HCC samples can be obviously distinguished from the normal samples according to differentially expressed immune-related genes and TFs (Fig. 1a, b). Volcano plots shows the distribution of differentially expressed immunerelated genes and TFs between HCC and normal controls (Fig. 1c, d). The 329 differentially expressed immune-related genes were further analyzed by GO and KEGG analysis. GO analysis revealed that primary functional categories in the biological processes (BP) were leukocyte migration, positive regulation of cytokine production and positive regulation of defense response. For cellular components (CC), the major enriched GO terms were receptor complex and external side of plasma membrane. The most enriched cellular components (CC) were receptor ligand activity, cytokine activity and cytokine receptor binding (Fig. 2a). KEGG pathway indicated that the differentially expressed immune-related genes were mainly involved in Cytokine-cytokine receptor interaction, MAPK signaling pathway and PI3K-Akt signaling pathway (Fig. 2b).

Screening of immune-related genes with prognostic value in HCC
To determine the differentially expressed immunerelated genes with prognostic characteristics, the 329 genes expression in the 337 HCC samples were evaluated by univariate Cox analysis. Totally, 64 immune-related genes were found to be related to OS. The prognostic immune-related genes were shown in Table 1. GO and KEGG analysis suggested that these prognostic immunerelated genes mainly participated in semaphorin-plexin signaling pathway, ErbB signaling pathway, Hepatitis C. These pathways were shown to be significantly correlated with the development of cancer (Fig. 3a, b).

TF regulatory network
To explore the regulatory mechanisms of prognostic immune-related genes, we selected the prognosis-related TFs among HCC patients. Univariate Cox regression analysis detected that 54 TFs were correlated to patient overall survival (Table 2). We established a regulatory network according to 54 TFs and 64 immune-related genes. We built the TF regulatory network in three steps: (1) 54 prognosis-related TFs and 64 prognostic immune-related genes were selected. (2) The correlation test between each TF and each immune-related gene was conducted using cor.test function, whose core method was Pearson test. Based on the cut-off criteria, 12 prognosis-related TFs and 32 prognostic immune-related genes were identified to establish the network. (3) Here, we utilized Cytoscape to construct and visualize the main  regulatory network. As shown in Fig. 4, HCFC1 regulated most of the immune-related genes and occupied the dominant position. This transcriptional regulatory network revealed the regulatory relationships among these immune-related genes. . All the nine genes were risky prognostic genes with hazard ratio > 1. Risk scores were based on genes expression levels multiplied its corresponding regression coefficients. Regression coefficients were calculated by multivariate Cox regression. The risk scores were not only related to the expression levels of these genes, but also related to the correlation coefficients. Then 337 HCC samples were classified into a high risk group (n = 168) and low risk group (n = 169) based on the median risk score (Fig. 5a). The survival overview and gene expression heatmap were presented in Fig. 5b-c. Survival analysis   18:67 indicated that patients in the high risk group showed markedly poorer overall survival than those in the low risk group (P < 0.0001; Fig. 5d). The area under the ROC curve for 1 year, 3 year, and 5 year OS were 0.811, 0.711, 0.734, suggesting that this prognostic model exhibited a good sensitivity and specificity (Fig. 5e).

Construction of immune-related prognostic model for HCC
The relationships between the risk score model and immune cell infiltration were investigated. As shown in Fig. 6, dendritic cells, neutrophil and macrophage were positive correlated with risk score. However, no significant correlations were observed between B cells, CD4+ T cells, CD8+ T cells and risk score.  Univariate and multivariate Cox regression analyses were conducted to assess the independent predictive power of immune-related prognostic signature. Univariate analysis indicated that pathological stage (P < 0.0001), T classification (P < 0.0001) and immune-related prognostic model (P < 0.0001) were markedly correlated with overall survival. After the multivariate analysis, only immunerelated prognostic model remained as an independent prognostic factor associated with OS (P < 0.0001; Table 3).

Genetic alterations of nine immune-related prognostic genes
The cBioPortal tool was employed to analyze genomic alternations and potential drugs of nine immunerelated prognostic genes. As shown in Fig. 7a, ANGPT1 and NDRG1 were most commonly altered genes. Amplification was the main frequent genetic alterations and the nine immune-related prognostic genes altered in 94 (25.2%) of 373 cases. Figure 7b illustrated the network built by nine immune-related prognostic genes and their 50 most frequently mutated neighbor genes. Anticancer drugs targeting these genes were exhibited. Among them, two genes (DCK and ANGPT1) were currently regarded as drugs targets. We considered that other genes might act as potential novel therapeutic targets.

Validation of the immune-related prognostic signature by ICGC database
The ICGC database including 231 HCC samples were used for the validation of the immune-related signature. According to the median risk score, we divided patients into high risk (n = 115) and low risk groups (n = 116). In agreement with results of TCGA cohort, the Kaplan-Meier curve demonstrated that patients in the high risk group exhibited markedly poorer overall survival than those in the low risk group (P < 0.001; Fig. 8a). The AUCs for 1 year and 3 year OS were 0.781 and 0.783, demonstrating good performance of the immunerelated signature in predicting OS (Fig. 8b). Because there was only one patient with a 5-year follow-up period, we did not plot a 5-year ROC curve.

Discussion
Hepatocellular carcinoma still remains a lethal malignancy with extremely unfavorable prognosis globally. Precise prediction of HCC overall survival is of great significance for the choice of therapeutic methods and amelioration of prognosis. To date, there are no reliable and effective biomarkers to accurately predict the survival of HCC patients. There is a critical demand to identify robust biomarkers and prediction model to forecast HCC outcomes.
In the current study, based on the analysis of TCGA dataset, 329 differentially expressed immune-related genes were screened out from 374 cases of HCC and 50 normal tissues. According to the results of the GO enrichment, the mentioned genes were primarily  associated with immune response. The KEGG pathways were mostly concentrated on several cancer-related pathways (e.g., MAPK signaling pathway and PI3K-Akt signaling pathway). In univariate regression analysis on the differentially expressed immune-related genes, 64 genes were detected to display significant association with OS. To delve into the regulatory mechanisms of the prognostic immune-related genes, a TF-mediated network was built to reveal crucial TFs that are capable of regulating these immune-related genes. The main network suggested that HCFC1 was the critical key regulator in the network. HCFC1 (host cell factor C1), belongs to the host cell factor family. It is noteworthy that a recent study reported HCFC1 as a clinically hub gene that was remarkably correlated with the survival time, grade and TNM stage of HCC patients [31]. To   [32]. Another study also constructed a sixgene prognostic signature for HCC overall survival prediction based on gene expression data from TCGA [33]. A recent study investigating the prognostic value of TP53-associated immune genes in HCC identified and validated a two-gene (TREM1 and EXO1) prognostic model [34]. However, these studies did not use a large number of samples to comprehensively explore the relationship between immune genes and prognosis of HCC. Compared with the previous researches, this study has several advantages: (1) we utilized the specialized immunology database, which allowed us to analyze as many immune genes as we can. To our knowledge, this is the first study to explore the relationships between a large number of immune-related genes and prognosis in HCC patients. (2) We obtained a number of prognostic immune-related genes and established a novel immunerelated prognostic model. This prognostic model exhibited a prominent performance for OS prediction based on TCGA database. According to the in-depth analysis, the immune-related prognostic model was demonstrated to be an independent prognostic indicator after adjusting to other clinical factors. Subsequently, such model that consists of nine immune-related genes was then successfully validated as a prognostic factor in an independent ICGC dataset. All the mentioned results revealed that immune-related prognostic model could act as an effective marker for HCC prognosis prediction.
To characterize the tumor immune microenvironment status, the relationships between immune-related prognostic model and immune cell infiltration were investigated. The data here indicated that higher infiltration levels of dendritic cells, neutrophil and macrophage may be observed in high risk patients. Dendritic cells, neutrophil and macrophage displayed positive correlation with immune-related prognostic model, revealing that the model may serve as predictor for increased immune cells infiltration. A recent study reported that intratumoral infiltration by dendritic cells had a close relation to the poor prognosis in HCC patients [35], which is consistent with our findings. An existing study reported that neutrophil infiltration within HCC might display an association with a poor clinical outcome [36]. Neutrophils contribute to the activation, regulation and effector function of immune cells [37]; they are also capable of HCC progression by secreting a wide range of cytokines [38], thereby demonstrating their crucial role in the pathogenesis of HCC. A number of studies have reported that increased macrophages were related to poor prognosis in HCC [39]. Macrophages infiltration within the tumor microenvironment could facilitate tumor growth, angiogenesis, invasion, as well as metastasis [40]. Targeting macrophages have been considered a promising adjuvant immunotherapy for HCC patients [41,42]. The role of immune cells in HCC has not been fully elucidated. It may be a promising way to cure HCC by broadening the relationship between immune cells and tumor progression. Nine immune-related genes that constituted the prognostic model were identified as potential biomarkers in HCC. Out of the nine genes, RBP2, NDRG1 and HSPA have been well studied in HCC compared to other immune-related genes. RBP2 (retinol binding protein 2), belongs to the Fatty-acid binding protein (FABP) family, was reported to be involved in the pathogenesis of diverse types of cancer [43]. It has been previously evidenced that RBP2 was overexpressed in HCC and was associated with unfavorable prognosis of HCC [44]. Overexpressed RBP2 was markedly correlated with AFP and TNM stage. RBP2 might be critical to the angiogenesis and progression of HCC. Elevated NDRG1 expression was observed in HCC and dramatically related to overall survival and tumor stage [45]. NDRG1 was suggested to play significant roles in the metastasis, recurrence and and prognosis of HCC [46]. Moreover, overexpressed NDRG1 displayed a significant association with hepatocarcinogenesis [47]. Accordingly, targeting NDRG1 might act as an attractive therapeutic strategy for HCC. HSPA4, also known as hsp70, was demonstrated to enhance the proliferation, invasion and metastasis of various cancers [48]. High Expression of HSPA4 was significantly correlated with worse overall survival of HCC; it was aslo an independent prognostic parameter for OS [49]. Given the findings here, HSPA4 demonstrated huge potential as a therapeutic target in HCC treatment. In the HCC tissues, up-regulated expression of ANGPT1 was detected as compared with normal liver tissues [50], whereas the prognostic implication in HCC was not studied. A previously study showed IL17D had a diagnostic value for HCC and the DNA methylation status of IL17D was related to OS [51]. Nevertheless, the specific role of IL17D in HCC has been rarely known. Only one study has reported OSGIN1 may be a tumor suppressor that was downregulated in HCC, which contradicted our findings [52]. Its exact role in HCC is as yet unclear. Thus far, no relevant research reported MAPT, DCK and SEMA3F in HCC. Further researches are required to elucidate the function of these potential immune-related genes in HCC.
Some shortcomings of this study should be addressed. First, this study was completely based on public databases and the results should be external validated by further experiments. Second, the efficiency of the immunerelated prognostic model should be identified in a large number of HCC samples using experimental methods. Third, the biological functions of nine immune-related genes in HCC require further examined by a series of experiments.

Conclusion
In conclusion, for the first time, numerous immunerelated genes were detected to be significantly related to HCC prognosis by comprehensive analyses. Moreover, we constructed a novel immune-related prognostic model as an independent prognostic predictor for HCC. Validation in an external ICGC database further confirmed the prognostic value of this model. This prognostic model may also serve as predictor for increased immune cells infiltration, proving its key role in tumor immune microenvironment. The current study deepens our understanding of immune-related genes in HCC and provides new potential prognostic and therapeutic biomarkers.