Study design overview
A total of 905 primary PCa cases with available survival data were included for downstream analysis. The flowchart of this study was shown in Fig. 1.
Identification of DNA replication stress-associated features in TCGA-PRAD
A total of 982 genes were curated from 21 DNA replication stress signatures and 894 were recovered in TCGA-PRAD. Univariate cox regression analysis identified 198 genes that were significantly related to PCa biochemical recurrence in the TCGA-PRAD datasets (Additional file 1: Table S6). The bootstrap approach further selected 136 of 198 prognostic genes that were robust to sample resampling and that were also identified in validating datasets (Additional file 1: Table S7). Furthermore, we adopted the Boruta algorithm and narrowed down the selected genes to 47 genes that were confirmed more relevant to recurrence (Additional file 3: Figure S2A, B). As shown in Fig. 2A, 47 genes were ranked according to the importance degree inferred by the Boruta algorithm, with the top 5 including EMD, HJURP, PLK1, TROAP, and CENPK. In addition, we confirmed that these 47 genes have higher mRNA expression in recurrent PCa samples in TCGA-PRAD (Additional file 3: Figure S2C).
Construction of DNA replication stress signature
Using features selected by the Boruta algorithm, we benchmarked 7 survival-related machine learning algorithms including Enet, lasso, Ridge, XGBoost, plsRcox, SuperPC, and CoxBoost, to screen for a hyperparameter-tuned model with the best accuracy and lower risk of overfitting. Nested CV with outer 10 folds for validation and inner 5 folds for hyperparameter tuning was performed in the TCGA-PRAD. As shown in Fig. 2B–D and Additional file 1: Table S8, the XGBoost survival model achieved the best performances with the highest mean C-index (0.725), lowest mean IBS (0.156), and highest mean AUC values (1-year: 0.807; 3-year: 0.746; 5-year: 0.703: 10-year: 0.742). The XGBoost model with tuned hyperparameters was then fitted to the entire TCGA-PRAD dataset and referred as RSS. The inferred feature contribution to RSS was demonstrated in Fig. 2E, with the top 5 features including EMD, CCNE2, PTTG1, TROAP, and TK1.
Evaluation of DNA replication stress signature
Next, we interrogated the prognostic values of RSS in the TCGA-PRAD training cohort and 4 external validating cohorts using 1-year AUC, 3-year AUC, 5-year AUC, and C-index. The 1-year, 3-year and 5-year AUC values were 0.869, 0.890, 0.864 for TCGA-PRAD, and 0.748, 0.732, 0.695 for DKFZ-PRAD, and 0.832, 0.658, 0.636 for GSE70768, and 0.740, 0.689, 0.677 for GSE70769, and 0.701, 0.712, 0.659 for GSE94767 (Fig. 3A–E). The C-index values were 0.851 for TCGA-PRAD, 0.700 for DKFZ-PRAD, 0.724 for GSE70768, 0.654 for GSE70769, and 0.670 for GSE94767. Overall, RSS showed robust predictive power across validating datasets.
We then conducted univariate and multivariate Cox regression analyses and revealed that RSS as a continuous variable was significantly associated with a shorter time to biochemical recurrence in all datasets and was therefore considered an independent risk factor for PCa recurrence. (TCGA-PRAD: HR: 3.461, 95%CI: 2.708–4.424; DKFZ-PRAD: HR: 4.202, 95%CI: 1.541–11.453; GSE70768: HR: 3.859, 95%CI: 1.752–8.500; GSE70769: HR: 2.271, 95%CI: 1.215–4.243; GSE94767: HR: 1.632, 95%CI: 1.024–2.601, Fig. 3F). In addition, we tested whether a fixed RSS threshold could be used to classify all included PCa patients as high- or low-risk. To this end, we used “SurvivalROC” R package and found that a cutoff point of 0.536 in GSE94767 was able to divide all patients into high- and low-risk groups with significant differences in time to recurrence for all datasets, as shown by the Kaplan–Meier analyses (all log-rank P < = 0.05, Fig. 3G–K). Taken together, RSS conferred great potential to facilitate discrete risk stratification for primary PCa.
Comparison of RSS to clinical variables and published signatures
Since clinical variables such as Gleason score, serum PSA, and TNM staging are commonly used to guide PCa management and predict prognosis, we compared them with our proposed RSS using the C-index. Overall, RSS showed better predictive accuracy than most clinical features in the TCGA-PRAD and GSE70768 datasets, and non-inferior predictive power in the DKFZ-PRAD, GSE70769, and GSE94767 datasets (Fig. 4A–E).
We next compared RSS with published PCa signatures and found that RSS had a higher hazard ratio, C-index, and AUC values than other signatures in the TCGA-PRAD cohort (Fig. 4F–J) and demonstrated superior or comparable results in other cohorts (Additional file 4: Figure S3 and Additional file 1: Table S9).
Multiomic comparison between RSS-high and RSS-low groups in TCGA-PRAD
Using GISTIC2.0, we found that the RSS-high group conferred more recurrent copy number alterations than the RSS-low group (Fig. 5A–D). Common deletions in prostate cancers such as 8p21.3 (RSS-high: 70.4%; RSS-low: 60.2%) and 13q14.13 (RSS-high: 61.2%; RSS-low: 42.8%) occurred more frequently in the RSS-high group than the RSS-low group. Similarly, recurrent CNV gains such as 8q24.21 were only detected in the RSS-high group (46.4%). We also investigated common genes affected by somatic copy number alterations. As shown in Fig. 5E and Additional file 1: Table S10, RSS-high patients conferred more deletions of tumor suppressor genes such as TP53 (RSS-high: 48.5%; RSS-low: 29.0%), PTEN (RSS-high: 45.5%; RSS-low: 29.0%), and RB1(RSS-high: 60.5%; RSS-low: 41.2%; all adjusted P < 0.001). Also, amplification of MYC (RSS-high: 46.0%; RSS-low: 22.4%) and CCND1 (RSS-high: 18.5%; RSS-low: 4.4%) was frequently identified in the RSS-high group (all adjusted P < 0.001). We next compared common somatic mutations between RSS-high and RSS-low groups (Fig. 5F and Additional file 1: Table S11) and found that TP53 was more frequently mutated in RSS-high patients (18.6%) than in RSS-low patients (6.2%). Although DNA damage response-related gene mutations were rare (BRCA2: 2%; ATM: 4%) in primary PCa and comparable between groups, the RSS-high group had significantly higher HRD-detect scores (Additional file 5: Figure S4), suggesting that despite the low mutation rate, RSS-high patients exhibited more homologous deficiency at the transcriptome level. Additionally, the aneuploidy score, tumor mutation burden, and tumor neoantigen burden were significantly higher in the RSS-high group than in the RSS-low group (all P < 0.05, Fig. 5G–I). Altogether, the RSS-high group displayed a genomic pattern reminiscent of advanced PCa.
Association of RSS with clinical features and biological processes
We compared the clinical characteristics between RSS-high and RSS-low groups in all cohorts (Additional file 1: Table S12). In general, RSS-high patients were associated with more advanced tumor stage (T3-4), and higher Gleason scores (> = 8). We next investigated the impact of RSS on biological pathways using ssGSEA. As shown in Fig. 6 and Additional file 1: Table S13, the RSS-high group was significantly enriched with cell-cycle related pathways such as mitotic spindle, E2F targets, G2M checkpoint, MYC targets, DNA replication, and DNA repair-related pathways such as base excision repair, nucleotide excision repair, mismatch repair, and several cancer-related pathways such as WNT/beta-catenin signaling, Notch signaling, and angiogenesis (all P < 0.05). In contrast, the RSS-low group was significantly associated with increased androgen response and apoptosis (all P < 0.05). Of note, a lower RSS score was characterized by significant activation of metabolism-related pathways such as fatty acid metabolism, steroid biosynthesis, and amino acid-related metabolism pathways while only several metabolism pathways such as oxidative phosphorylation and pyrimidine metabolism were enriched in the RSS-high group (all P < 0.05). In summary, the RSS-high group was highly proliferative and invasive, whereas the RSS-low group had elevated androgen response and metabolism activity that are reminiscent of prostate tumors derived from primary human luminal epithelial cells [31].
Association of RSS with the immune microenvironment
Replication stress was reported to activate pro-inflammatory responses and change the tumor microenvironment [32, 33]. We, therefore, leveraged CIBERSORT to quantify immune cell infiltration in 905 PCa samples (Additional file 1: Table S14) and investigated the association between RSS and immune infiltration. Interestingly, the RSS-high group showed an increased proportion of CD8 + T cells, regulatory T cells, and M2 macrophages compared to the RSS-low group (all P < 0.001, Fig. 7A). Furthermore, RSS was positively correlated with proportions of CD8 T cells (R = 0.138, P < 0.001, Fig. 7B), regulatory T cells (R = 0.316, P < 0.001, Fig. 7C) and M2 macrophages (R = 0.15, P < 0.001, Fig. 7D). Additionally, the RSS-high group showed significantly higher expression of immunosuppressive markers such as FOXP3, HAVCR2, LAG3, PDCD1, and ARG1 (all P < 0.05, Fig. 7E). As higher RSS exhibited more of an immune-exhausted phenotype, we asked whether RSS could influence the therapeutic response to immune checkpoint inhibitors in cancer patients. We then calculated RSS scores for the IMvigor210 cohort and found that atezolizumab responders had significantly higher RSS scores than non-responders (P < 0.05, Fig. 7F). We also used the threshold 0.536 to stratify the cohort into RSS-high and RSS-low groups and found significantly more responders in RSS-high patients (32.9%, P = 0.016, Fig. 7G).
In silico discovery of potential targets and drugs for RSS-high PCa patients
To identify potential targets for RSS-high patients, we first performed Spearman’s rank correlation analysis between RSS and RNA expression of druggable genes in the TCGA-PRAD and DKFZ-PRAD cohorts (Fig. 8A, 8B and Additional file 1: Table S15, S16) and considered a common subset of positively correlated genes (R > 0.30, FDR < 0.05) in both cohorts as RSS-related targets (n = 54). In addition, we leveraged CERES scores to measure the essentiality of the RSS-related targets in 7 PCa cell lines and narrowed down to 13 potential therapeutic targets with CERES scores mostly < -1 (Fig. 8C). We found that many therapeutic targets such as TOP2A, CDK9, CHEK1, RRM2, and AURKB were tightly linked to cell cycle processes.
Next, CMap analysis was carried out to infer potentially effective chemical compounds. For this purpose, we performed differential gene analysis across 5 PCa cohorts and applied meta-analysis with a random effect model to make a consensus list of differentially expressed genes (Additional file 6: Figure S5 and Additional file 1: Table S17). The 150 most upregulated genes and 150 most downregulated genes were then used as the RSS signature to predict the CMap score for each chemical compound. Through this, we identified a total of 84 compounds with a CMap score below − 95 and with the ability to reverse RSS signature (Fig. 8D and Additional file 1: Table S18). Of 84 compounds, 11.9% and 7.1% belonged to topoisomerase inhibitors and CDK inhibitors, respectively. To improve the confidence of CMap inference, the PRISM-derived drug response data were used to infer AUC values of the compounds selected by CMap. We found that 2 topoisomerase inhibitors, including irinotecan and topotecan, consistently showed lower AUC values in the RSS-high groups in both TCGA-PRAD and DKFZ-PRAD cohorts (Fig. 8E, F), which supported our inference that topoisomerase was one of the potential targets. In addition, we investigated whether RSS could predict therapeutic response to conventional PCa therapy. As shown in Fig. 8G, H, RSS-high patients were more susceptible to taxane-based chemotherapy including docetaxel and paclitaxel, and PARP inhibitors including olaparib and talazoparib. In contrast, RSS-low patients were more sensitive to ADT such as abiraterone.
Knockdown of FEN1 and RFC5 inhibits cell growth
We chose FEN1 and RFC5 for experimental validation because they showed higher mRNA expression in recurrent PCa and are rarely investigated in PCa. We first confirmed the successful knockdown of FEN1 and RFC5 at mRNA and protein levels in C4-2B and PC-3 cells(Fig. 9A, B and Additional file 7). We then performed CCK-8 and colony formation assays in transfected C4-2B and PC-3 cells, which revealed that the knockdown of FEN1 and RFC5 significantly inhibited cell growth (all P < 0.05, Fig. 9C, D). In addition, we used AV and PI staining to assess the percentages of apoptotic cells after transfection. Results showed increased cellular apoptosis rates in C4-2B and PC-3 cells after knocking down FEN1 and RFC5 (Fig. 9E). Taken together, FEN1 and RFC5 could promote PCa progression by promoting cell growth.