The landscape of genetic variation in SFs of hnRNP and SR families
We first analyzed the expression characteristics of two splicing family regulators in AML samples. The expression of 31 out of the 32 SFs was detected in both AML and normal samples. Compared with normal samples, the expression of 10 SFs (HNRNPH2, SRSF4, HNRNPL, HNRNPC, SRSF9, HNRNPK, HNRNPUL1, HNRNPF, HNRNPUL2, HNRNPM) was significantly down-regulated in AML samples. The expression of 17 SFs (HNRNPD, HNRNPH3, SRSF8, HNRNPU, SRSF5, SRSF2, HNRNPDL, HNRNPAB, SRSF7, HNRNPA1, SRSF1, HNRNPR, SRSF6, SRSF11, HNRNPH1, SRSF10, SRSF12) was up-regulated in AML samples. The expression of SRSF3, HNRNPA2B1, HNRNPA3, and HNRNPA0 did not demonstrate significant differences (Fig. 1A). We further analyzed their somatic mutation characteristics, and the results showed that the mutation rate of SFs was low in AML samples. Only 6 out of 134 mutations remained changed, and two of the HNRNPK mutations were frameshift DEL and multihit mutations, respectively. Missense mutations were observed in SRSF2, SRSF11, HNRNPF, and HNRNPH1 in four different samples, and the base changes in these mutations primarily involved conversion from C to T (Fig. 1B). We further conducted copy number variation analysis of SFs to explore the relationship between copy number changes and mRNA expression levels. We observed that the frequency of the increase in the copy number of HNRNPU, HNRNPR, SRSF10, and SRSF8 was upregulated, which may be related to the up-regulation of corresponding mRNA levels. The copy number deletion of HNRNPK and HNRNPC may be one of the reasons for their down-regulation in AML samples (Fig. 1C). Figure 1D shows the positions of the 32 SFs in the chromosome. These results suggest that SFs of hnRNP family and SR family exhibit heterogeneous genetic and expression landscapes in AML samples and may be involved in the onset and progression of AML.
Regulatory patterns of splicing mediated by 32 SFs
The abnormal changes in the genetic and expression of SFs may be constituted as the signatures of the malignant development of AML. In order to better identify the relationship between splicing factors and the biological process of AML, we aimed to elucidate the relationship between SFs and AML based on the overall expression pattern. The correlation in expression and prognostic characteristics of SFs were analyzed. We found that almost all SFs of the SR family were prognostic risk factors, and hazard ratio (HR) > 1 and a high expression of these genes predicted poor prognosis of patients. However, the 20 SFs of hnRNP family comprised 12 risk factors and 8 protective factors (HR < 1) (Additional file 1: Fig. S1A). Survival analysis showed that the expression of 13 SFs was significantly correlated with prognosis (P < 0.05), and patients with high expression of SRSF12 and HNRNPA1 had a better prognosis. Patients with high expression of SRSF4, HNRNPAB, HNRNPH2, HNRNPUL1, HNRNPF, HNRNPC, HNRNPR, SRSF11, HNRNPL, SRSF6, and SRSF1 showed a significantly worse prognosis (Additional file 1: Fig. S1B). Correlation analysis of the co-expression relationships of these genes showed that they were positively correlated (P < 0.001) (Additional file 1: Fig. S1A).
We further performed unsupervised clustering based on the expression of SFs. The results showed that patients with AML could be stably divided into four groups, which were termed clusters A–D (Fig. 2A, B). The expression of SFs was generally low in cluster A and high in cluster B, while the overall expression level in cluster C was between that of cluster A and cluster B. The expression of Cluster D was not as uniform as that of the first three clusters (Fig. 2C). Further survival analysis showed that patients in cluster D had the best prognosis, those in cluster C had the worst prognosis, and those in clusters A and B were in the middle (Fig. 2D). Figure 2E further confirms the differential expressed characteristics of SFs in the four clusters. Somatic mutation analysis showed that the TP53 mutation frequency was the highest in cluster A. The percentage of patients with somatic mutations in cluster B was higher, and many patients have multiple mutated genes that may be associated with abnormally activated expression of SFs in this group. The mutant genes in cluster c were concentrated. These genes mainly included DNMT3A, NPM1, FLT3, and TP53 (Fig. 2F).
Differences in biological characteristics with respect to different splicing regulation patterns
To better analyze the biological differences with respect to splicing regulation patterns, we used the GSVA algorithm to analyze the enrichment differences in KEGG signaling pathways and cancer-related hallmark gene sets associated with different splicing regulation patterns (Fig. 3A, B). In cluster A, the activities of the KRAS signaling pathway mediated up-regulated/down-regulated gene set, IL2/STAT5 signaling pathway, TNF-α signaling pathway via NF-κB, and immune-related pathways, such as coagulation and complement cascade, and the inflammatory response were highly enriched (Fig. 3B), which mainly activates more cell signaling, leading to the continuous release of signals in the pro-cancer pathway. This may be related to a higher proportion of NRAS gene mutations in cluster A. Moreover, cluster B showed increased activity of proliferation-related pathways, such as E2F target, cell cycle G2/M checkpoint, MYC-targeted variant 1/2, mitotic spindle, protein regulatory signaling pathways, such as protein secretion and unfolded protein response, and multiple DNA damage repair pathways (Fig. 3A, B). These findings indicate that cluster B showed a significant promotion in cell proliferation, protein expression, and genetic regulation of the genome. Increased activity of DNA damage repair pathways also favors cell survival. Cluster C showed high activity of a large number of immune- and inflammation-related signaling pathways, such as B/T cell receptor signaling pathway, chemokine signaling pathway, NOD-like receptor cell pathway, Toll-like receptor signaling pathway (Fig. 3A), complement cascade, interferon α/γ signaling pathway, IL6-JAK-STAT3 signaling pathway, and inflammatory response (Fig. 3B). Interestingly, the activity of these pathways was lowest in Cluster D. A high activity of these pathways in tumor cells can promote the development of chronic inflammation and immune escape in the tumor microenvironment and may lead to deterioration, which may be an important reason for the poor prognosis of cluster C patients. Cluster C also showed abnormal metabolic reprogramming, with elevated activities of fatty acid metabolism, oxidative phosphorylation, adipogenesis, reactive oxygen species pathway, and heterologous metabolism pathway (Fig. 3B). Abnormal immune and metabolic changes were the main biological characteristics of cluster C.
Apparent differences were observed in biological processes, immune characteristics, and clinicopathological factors among different splicing regulation patterns. We further examined whether different regulation patterns are associated with AS events. We compared four groups pairwise and via veen diagram analysis (Fig. 3C). We found that Cluster A showed the highest number of overlapping differentially expressed AS events with the other three clusters. The heatmap showed that the expression trend of these differential AS events in cluster A was in contrast to that of the other three clusters, with the differences with cluster B being the most apparent (Fig. 3D). We observed that most of the genes in cluster A have a high and a low expressed AS event, which have opposite expression trends, but their splicing types are the same, only the splice sites are different. Further statistical analysis revealed that the splicing types of these events were concentrated at AP and AT (Fig. 3E). AT and AP are different in the last or first exon of the two transcripts, respectively, indicating that different splicing regulation patterns have different effects on the 5′ or 3′ end of the transcript. These differential AS events were observed corresponding to 209 genes. We performed the KEGG enrichment analysis. Results show that these genes were mainly associated with RNA transport, RIG-I-like receptor signaling pathway, Chemokine signaling pathway, ribosome, mRNA surveillance pathway, oxidative phosphorylation, MAPK signaling pathway, NF-kappa B signaling pathway, Toll-like receptor signaling pathway, cytosolic DNA-sensing pathway, and acute myeloid leukemia. Most of these pathways are the same as the significant enrichment pathways of each cluster. SFs are largely involved in the regulation of these AS events. Therefore, the abnormal expression of SFs may be one of the reasons why different clusters have significant differences in biological characteristics.
Differences in immune-related features among different splicing regulation patterns
Pathway enrichment analysis showed significant differences in immune-related pathways among different splicing regulation patterns. We further analyzed the proportion of immune cell infiltration, immune functional activity, and immune checkpoint expression corresponding to different patterns. We observed that cluster C contained more inflammatory immune cells, including monocytes, M2 macrophages, and neutrophils, and the infiltration ratio of CD8+ T cells was the lowest. The infiltration of naive B cells, CD8+ T cells, follicular helper T cells, resting mast cells, and eosinophils was significantly increased in cluster D (Fig. 4A). Cluster A and cluster B showed no significant immune infiltration characteristics. In terms of the expression activity of immune function (Fig. 4B), cluster A showed high activity of antigen-presenting cell (APC) costimulatory molecules, C-C-motif chemokine receptor (CCR), para-inflammation, T-cell costimulatory molecules, and type I interferon (IFN) response. In cluster C, the expression of APC coinhibitory/costimulatory molecules, CCR, and para-inflammation molecules was high. Compared with other clusters, most immune functions were less potent in cluster D, and only T-cell costimulatory molecules show high activity. Differential expression analysis of immune checkpoints showed that the expression levels of HAVCR2, PD-L2, and CD86 in cluster C were significantly higher than those in other clusters (Fig. 4C). Moreover, the overall expression of immune checkpoints was downregulated in cluster D. The characteristics of high infiltration of inflammatory immune cells, high activity of immune cell function, such as APC inhibition, proinflammatory response, and high expression of immune checkpoints in cluster C reflect the possible presence of chronic inflammatory and highly immunosuppressive microenvironment. Hence, these may be the reasons for the poor prognosis of patients in cluster C. In contrast, T cells may play an immune role in cluster D, which may be the reason why patients in cluster D had a better prognosis.
Prediction of therapeutic sensitivity with different splicing regulation patterns
We predicted the therapeutic sensitivity of common AML drugs based on global gene expression with different splicing regulation patterns. These drugs included cytarabine, doxorubicin, and midostaurin, the first two of which are chemotherapy drugs, and the last one is a targeting agent of FLT3 gene mutation. The results showed that the IC50 of the three drugs was the highest in cluster A and the lowest in cluster B, and cluster D exhibited higher sensitivity to cytarabine than the other three clusters (Fig. 4D). We further compared the four splicing regulation patterns with the expression dataset of patients with melanoma who responded to immunotherapy, notably the potential therapeutic value of anti-PD-1 treatment for cluster C patients (Fig. 4E).
Construction and validation of prognostic risk score model
Splicing regulation patterns reveal the pathological characteristics and potential treatment modalities of patients with AML, and we further explored their prognostic value. The expressed difference of SFs between cluster A and cluster B was the largest, and the prognosis of patients in custer C and cluster D was significantly different. We performed differential expression analysis for cluster A and cluster B, cluster C and cluster D, and identified 1261 and 754 DEGs (|logFC| > 1, adjusted P value < 0.05), respectively, of which 53 contains both in two groups of DEGs (Fig. 5A). We believe that these genes are closely related to SFs and the prognosis of AML. Cox regression analysis showed that 21 genes were significantly associated with the prognosis of patients with AML (P < 0.05) (Fig. 5B). These prognostic genes were used to construct the prognostic risk model. To prevent overfitting of the model, LASSO regression analysis was used to reduce their dimensionality and eliminate redundant prognostic genes. After tenfold cross-validation, we determined the penalty parameter (λ) of the model and the corresponding eight genes involved in model construction (Fig. 5C), LST1, SSBP2, ETS2, TRIM16, TM7SF3, PLXNB1, AUTS2, and MAP7. We identified the corresponding correlation coefficients of model genes according to the λ value (Fig. 5D) (Additional file 2: Table S1). Finally, we calculated the risk score for each sample using the model formula. Based on the cut-off value, patients with AML were divided into high-risk and low-risk groups. Log-rank test results showed that the prognosis of patients in the high-risk group was significantly worse than that in the low-risk group (P < 0.001) (Fig. 5E). The time-dependent ROC curve analysis showed that the AUC values of the risk scores in predicting the 1-year, 3-year and 5-year overall survival (OS) of patients with AML were 0.774, 0.729, and 0.772, respectively, indicating that the prognostic risk score model had high predictive accuracy (Fig. 5F). Univariate and multivariate Cox analysis showed that the risk score could be used as an independent prognostic factor (P < 0.001) (Fig. 5G, H). Attributable changes in patients with AML were visualized using the alluvial diagram (Fig. 5I). The risk score further quantified the prognostic characteristics of different splicing regulation patterns. For example, cluster C had the worst prognosis and the highest risk score, while cluster D showed contrasting results (Fig. 5J). Meanwhile, significant differences in risk scores were observed among patients with different survival status (Fig. 5J).
Next, four validation cohorts confirmed the prognostic value of the risk score. The OS of patients with high-risk scores was significantly shortened (Fig. 6A–D), and the ROC curve also indicated the predictive robustness of the risk score model (Fig. 6E–H).
Construction of a nomogram to predict OS
In order to predict the OS of patients more accurately, we combined the clinicopathological factors (age and cytogenetic risk) significantly related to the prognosis of patients with AML with the risk score model to construct a nomogram (Fig. 6I). By calculating the total score of each patient in the nomogram, the corresponding 1-, 3-, and 5-year survival rates were observed. The 1-, 3-, and 5-year calibration curves also proved that nomograms could accurately predict OS (Fig. 6J). The time-dependent ROC curve showed that the nomogram had the highest AUC value (Fig. 6K). Taken together, these results indicate that the nomogram we constructed can further improve the accuracy of OS prediction in patients with AML. Furthermore, it also provides a new method for clinical prognosis evaluation.
Expression of the splicing factor SRSF10 was up-regulated in AML
Among the 32 SFs, we observed that the expression of SRSF12 and SRSF10 was most significantly upregulated in AML. Based on the low expression of SRSF12 and its prognostic protection factor, we only conducted an in-depth study on the risk factor SRSF10 to explore its relationship with the onset and development of AML. The up-regulated expression of SRSF10 in AML samples of the TCGA database and normal blood samples of the GTEx database is shown in Fig. 7A. Meanwhile, we analyzed the expression level of SRSF10 in the pan-cancer atlas of the TCGA database and found that the expression of SRSF10 was most up-regulated in AML and acute lymphoblastic leukemia (ALL), indicating that it may be involved in hematological tumorigenicity and development (Fig. 7B). We further verified the expression of SRSF10 in AML clinical samples. PCR results showed that the expression of SRSF10 in AML bone marrow samples and peripheral blood samples was significantly higher than that in normal control samples (Fig. 7C, D).
Splicing factor SRSF10 plays a cancer-promoting role in AML cells
To further clarify the biological role of SRSF10 in the development of AML, we obtained cDNA containing SRSF10 sequence and a plasmid targeting shRNA, which were packaged with lentivirus, followed by infection of the AML cell line THP-1. Stably transfected cell lines with SRSF10 overexpression (oeSRSF10) and knockdown (shSRSF10) were screened. As shown in Fig. 7E, F, the transfection efficiency was more than 80% and was verified by PCR and WB experiments.
The proliferation of THP-1 cells was detected by CCK8 assay and EdU assay after SRSF10 overexpression and knockdown. CCK8 assay showed that compared with the control-oeSRSF10 group, the proliferation ability of THP-1 cells in the SRSF10-oe group was significantly enhanced. Compared with the control-shSRSF10 group, the proliferation of THP-1 cells in the SRSF10-SH1 and SRSF10-SH2 groups was significantly reduced (Fig. 7G). EdU staining also showed that more cells underwent DNA replication in the SRSF10-oe group than in the control-oeSRSF10 group. However, fewer cells underwent DNA replication in the SRSF10-SH1 and SRSF10-SH2 groups than in the control-ShSRSF10 group (Fig. 7H, I). All these results demonstrated that SRSF10 overexpression promoted THP-1 cell proliferation, while SRSF10 knockdown inhibited THP-1 cell proliferation.
We used Annexin V-FITC/PI staining to label apoptotic cells. Apoptosis in control-shSRSF10, SRSF10-SH1, and SRSF10-SH2 groups were detected by flow cytometry. The apoptosis rate of the SRSF10-SH1 and SRSF10-SH2 groups was higher than that of the control group (Fig. 7J). The apoptosis trend was consistent with the results of CCK8 method and EdU staining, which were performed to detect cell proliferation ability in different transfection groups. A stronger proliferation of cells corresponded to weaker apoptosis. We further detected the changes in the cell cycle of THP-1 after SRSF10 knockdown by flow cytometry. The results showed that compared with control-ShSRSF10, the arrest of THP-1 cells in the G1 phase was more apparent in the SRSF10-SH1 and SRSF10-SH2 groups (Fig. 7K).