Combinations of immuno-checkpoint inhibitors predictive biomarkers only marginally improve their individual accuracy
Journal of Translational Medicine volume 17, Article number: 131 (2019)
There are no accepted universal biomarkers capable to accurately predict response to immuno-checkpoint inhibitors (ICI). Although recent literature has been flooded with studies on ICI predictive biomarkers, available data show that currently approved companion diagnostics either leave out many possible responders, as in the case of PD-L1 testing for first-line metastatic lung cancer, or apply to a small subset of patients, such as the recently approved treatment for microsatellite instability-high or mismatch repair deficiency tumors. In this study, we conducted a survey of the available data on ICI trials with matched genomic or transcriptomic datasets in order to cross-validate the proposed biomarkers, to assess whether their prediction power was confirmed and, mainly, to investigate if their combination was able to generate a better predictive tool.
We extracted clinical information and sequencing data details from publicly available datasets, along with a list of possible biomarkers obtained from the recent literature. After an operation of data harmonization, we validated the performance of all the biomarkers taken individually. Furthermore, we tested two strategies to combine the best performing biomarkers in order to improve their predictive value.
When considered individually, some of the biomarkers, such as the ImmunoPhenoScore, and the IFN-γ signature, did not confirm their originally proposed predictive power. The best absolute scoring biomarkers are TIDE, one of the ICB resistance signatures and CTLA4 with a mean AUC > 0.66. Among the combinations tested, generalized linear models showed the best performance with an AUC of 0.78.
We confirmed that the available biomarkers, taken individually, fail to provide a satisfactory predictive value. Unfortunately, also combination of some of them only provides marginal improvements. Hence, in order to generate a more robust way to predict ICI efficacy it is necessary to analyze and combine additional biomarkers and interrogate a wider set of clinical data.
Tumors escape from immunological control by activating during their growth and development a wealth of negative immune-checkpoints of which the best characterized and clinically validated are B7/CTLA4 and PD-L1/PD-L2/PD1 . These checkpoints prevent the priming and/or activation of tumor-infiltrating lymphocytes, activate regulatory T cells or cause T cell exhaustion, which all together facilitate unchecked tumor growth. Monoclonal antibodies (mAb) such as anti-CTLA4, anti-PD-1 and anti-PD-L1 were developed to specifically target these pathways. Over the last years, clinical development of these antibodies have led to unprecedented outcomes characterized by increased overall survival and long term disease control for metastatic cancer patients in a growing number of indications [2, 3]. Currently, the US FDA and EU have approved the commercialization of these immuno-checkpoint inhibitors (ICIs) in several metastatic solid tumors. Also one of those inhibitors was recently approved for any metastatic solid tumor with microsatellite instability-high (MSI-H) or mismatch repair deficiency (dMMR), the first drug approval based on genetic features shared across different tumor types .
Despite these major breakthroughs, responses to immunotherapy are observed only in a minority of patients, and some biomarkers approved as companion diagnostics to stratify patients only marginally help to select responders from non-responders. As an example, patients are currently eligible for treatment with anti-PD1 mAb pembrolizumab as first-line treatment of metastatic non-small-cell lung cancer (mNSCLC) if their tumor cells result ≥ 50% positive to immunohistochemistry (IHC) analysis of ligand PD-L1 expression. As a matter of fact, according to a recent meta-analysis, the percentage of responders in PD-L1-negative patients is not strongly dissimilar from that of the PD-L1 positive group (23–29% vs 5–12%) ; this further supports the need for additional and more robust predictive biomarkers, and to integrate a more extensive amount of information into a final prediction score.
Along this line, the ImmunoPhenoScore (IPS), an approach based on RNA-seq, was recently described as a novel system for the prediction of responders to ICI treatment in melanoma patients . Relative expression levels of a number of intrinsic/extrinsic factors were combined (by addition or subtraction) in an algorithm computing a final score called the IPS. The whole model was built with the purpose of predicting the expression of Granzyme and Perforin genes as an indirect measure of immunogenicity. It was developed using TCGA data and validated by the sole available datasets in which ICI clinical data and transcriptomic profile were available, i.e. two melanoma datasets associated to anti-PD-1 and anti-CTLA4 treatment [7, 8]. The discriminating power of the IPS system looked very accurate (mostly for the anti-PD-1 dataset), where the overall responder group appeared to be clearly separated by a particular IPS cutoff (> 6).
Apart from this case, several discrete DNA/RNA-based biomarkers have emerged in recent literature. For instance, the study of inflammatory processes, which led to the identification of an interferon gamma (IFN-γ) signature, called T cell-inflamed GEP (Gene Expression Profile) . The IFN-γ signature was more recently employed in a pan-cancer analysis of 300 patients treated with ICI, resulting in higher response rates when integrated with Tumor Mutational Burden (TMB), even if the authors did not report an AUC performance for this new combinatorial biomarker classifier . Furthermore, animal modeling of the genetic status of MLH1 and POLE genes confirmed the importance of the impairment of DNA mismatch repair (MMR) machinery , as expected. This was then able to increase the neoantigen load and sensitivity to ICI treatment . Recently, the TIDE signature was introduced as a mixed T cell classification, partitioning patients into cytotoxic T lymphocyte (CTL) high (dysfunction score) or CTL low (exclusion score) . Finally, it has been noted that epithelial–mesenchymal transition expression traits could represent the link between tumor intrinsic and tumor extrinsic factors, which lead to immune checkpoint blockade (ICB) resistance, resulting in the emergence of several possible biologic signatures [7, 13,14,15,16,17,18,19,20]. Starting from this background information, we conducted an in silico validation and combination analysis of the proposed biomarkers using publicly available datasets from ICI clinical trials and associated molecular annotations.
We collected genomic and/or transcriptomic data from anti-PD-1 and anti-CTLA4 studies with clinical information (Table 1). There is a clear imbalance regarding the tumor type distribution, with a strong prevalence of melanoma datasets (4) and only one dataset for NSCLC. Furthermore, the dimensionality of these datasets is quite small (mean of 39 and 66 patients for RNA and WES data, respectively), therefore making it difficult to infer a robust model.
We did not re-evaluate the expression levels via sequence re-analysis in order to create a whole comprehensive dataset, since the dramatic clinical confounding factors would have vanished our efforts (Additional file 1: Methods, Additional file 2: Table S1, Additional file 3: Table S2 and Additional file 4: Table S3).
A list of putative immunotherapy biomarkers computable from genomic and transcriptomic data (Table 2) was derived from the aforementioned datasets and other in silico data. For protein-based ones, we included their own RNA derivation, such as PD-L1 gene expression (CD274). These were included with the assumption that there was a strong correlation between the protein and its own RNA transcript levels, while being aware that there are other processes involved in the regulation of final protein expression levels (e.g. degradation, stabilization). On the other hand, our in-house RNA-sequencing of lung specimens demonstrates a strong correlation between the PD-L1 gene FPKM and the IHC protein level (CLONE 22C3, DAKO) (data not shown). We also included multi-gene markers such as the IFN-γ signature (originally computed via NanoString technology). All the genes included in the signature gene lists are listed in Table 3, with the exclusion of the algorithmic ones for which we refer to the original sources (e.g. TIDE and IPS). Furthermore, a high-level summary of the overall analysis process is provided (Additional file 5: Figure S1).
Taking into consideration the lack of uniqueness in the separation of patients in responder and non-responder groups (all the five studies from which we extrapolated data used different methods and response evaluation criteria, Additional file 1: Methods), we selected a method that also takes into consideration the overall survival of progressive disease patients. Moreover, it allowed a good balancing with regards to the number of samples between the two groups (ranging from 38% of responders with respect to the total number of patients, to 70%).
We then computed the classification performance of all the proposed biomarkers in each available study. This resulted in 56 tests since there was a lack of RNA information in some datasets (Additional file 6: Table S4 and Fig. 1). The area under the curve (AUC) values markedly vary from 0.43 to 0.87. Of note, we could not reproduce the classification power of the IPS neither in the Hugo et al.  or in the Van Allen et al.  melanoma datasets (AUC 0.68 and 0.56 vs 1.00/1.00, using their same responder/non-responder separation types, respectively, as confirmed by ). TIDE signature performance was validated in the Hugo dataset (AUC 0.82) and slightly different in the Van Allen dataset (AUC 0.73 vs 0.80). This is due to their multi-class ROC with a third class on long-term survival patients, which we included in the responder group. However, we confirmed the published results with a three-class model (AUC 0.80). Finally, the performance of the IFN-γ reduced marker was slightly lower in our three transcriptomic datasets (AUC: 0.50, 0.64, 0.69) when compared to those of the published study (AUC 0.80, 0.66). This difference may reflect one of the technological biases of comparing RNA-seq results with Nanostring counts.
Moreover, we computed the mean AUC for all datasets for every classifier (Additional file 7: Table S5) to provide a better representation of every performance value among all studies. Therefore, every marker proposed in literature shows a discrete performance delta (0.01–0.35) when tested among different datasets, and, in general, the best performing parameters are TIDE, ICB resistance signature 1 and CTLA4 with a mean AUC of 0.71, 0.67 and 0.66, respectively. Observing the peak performances, TIDE, ICB signature 1 and AXL are the only individual markers with AUC > 0.8.
As a next step, we merged all the three datasets with transcriptomic data and the dichotomous marker predictions in a single whole dataset. This is necessary in order to have a dataset with a sufficient number of observations, which in turn increases statistical significance.
Next, we tried to address the problem of correlation arising from markers that were linked or were function of others (e.g. PD-L1 and IPS, IFN-γ reduced and expanded sets). A Pearson correlation analysis (Additional file 8: Table S6 and Fig. 2a) showed that PD-1 gene and the IFN-γ reduced set are the only correlated markers at > 0.5 (at a 0.01 significance level). Notably, several parameters do not correlate significantly with the most accepted markers (e.g. IPS vs PD-L1, even if PD-L1 is one of the inputs of the IPS model), potentially suggesting that they could contain different information, and may benefit from a combinatorial approach.
In order to validate this hypothesis, we performed a majority voting exploration analysis in which we included all the uncorrelated markers, with the exclusion of PD-1 gene (Additional file 9: Table S7 and Fig. 2b).
All of these biomarker subsets were explored trying all possible combinations to find the best consensus group in 17, with 131,054 possibilities, according to the basic combinatory formula:
where c is the number of combinations and n is the number of markers.
The acceptable combinations in terms of AUC, defined as the percentage of those being equal to or better than 0.65 was then explored. For such analyses, we observed that the percentage of acceptable combinations is 2.1%. The frequencies of these uncorrelated markers in the acceptable combinations was computed (Additional file 10: Table S8): POLE4, ICB resistance signature 1 and TIDE are the most present, with frequency values of 0.68, 0.65 and 0.61, respectively.
Generalized linear models (GLMs) were then built as an additional strategy to combine these classifiers. We iterated cross-validation for 10,000-times, randomly selecting the 20% of the original set for testing. From the GLM predictions, we obtained mean AUC of 0.78, mean sensitivity of 0.87, mean specificity of 0.73 (Fig. 2c). Considering the significance of each coefficient, at 1% we had TIDE and ICB resistance signature 1, at 5% CTLA4 (Additional file 11: Table S9). It is noteworthy that TIDE and ICB resistance signature 1 have also been found among the top three most present markers of the best majority voting analyses and those with the best mean AUC when taken individually.
Taken together, these results confirm that the combination of multiple parameters enables us to improve the prediction power to a mean AUC of 0.71–0.78.
We explored whether several proposed RNA/DNA predictive biomarkers of efficacy of cancer immunotherapy with antibodies inhibitors of immunological checkpoints were confirmed by a cross-comparison among publicly available datasets. From these analyses we demonstrated that their combination led only to a modest improvement of the overall predictive performance. What we could learn from this study is the dramatic difficulty of putting the concept of immunotherapy response in a mathematical and statistical frame. The confounding factors here are hidden in several steps, namely the clinical classification, the patient history and the tumor sampling. The clinical classification, deriving from several different criteria, has already been discussed (see Additional file 1: Methods). The patient’s history is a determinant factor, since these molecular profiles are usually derived from tumor samples that have been collected months, if not years, before ICI therapy initiation. One can easily assume that during this often long period of time, also as a consequence of sequential non-ICI therapies (chemotherapy and target therapy), patients’ tumors have undergone clonal selection processes that have reshaped their molecular landscape. In a scenario of a biopsy with a time frame closer to the therapy, the tumor sampling effect comes at play. These issues have challenged researchers worldwide who are currently designing liquid biopsies strategies in order to find DNA/RNA biomarkers in the patient’s blood. This strategy would incredibly speed up the protocol if such biomarkers do exist and are actually detectable with novel highly sensitive technologies. Finally, a limitation of the currently employed –omic technologies is the lack of information about the spatial organization of the immune infiltrate, critical for an effective action of immune cells on the tumor mass. With the release of novel, larger and better annotated datasets one could stratify the patient population using all these criteria.
We performed a validation study of a large panel of proposed biomarkers related to ICI response, in several solid tumors. Furthermore, a combination effort was carried out, leading to a slight increase in accuracy. We plan to carry on this study by expanding the patient cohort as additional data will become available, and to validate the top scoring biomarkers in selected tumor specimens.
Clinical information and sequencing data from publicly available datasets was extracted. We searched GEO and PubMed records for genomic datasets matching the keyword “Immuno-Checkpoint Blockade”, “PD1”, “CTLA4”, “RNA-seq”, “Exome Sequencing”. We harmonized data in a list of patients with the relative response and for each patient we got the list of mutations derived by WES analysis and/or the list of expressed genes in TPM. All the scripts are written in R and are available at https://gitlab.com/bioinfo-ire-release/ici-biomarker-review, along with input datasets to reproduce all the singular and combinatorial analyses.
Responder and non-responder separation
One of the main challenges of this work was to tackle the variability of how clinical benefit was measured in the different studies (Additional file 1: Methods). All the results shown in this paper are derived from the separation in which LB, CR, PR SD and PD with overall survival (when available) greater than 2 years are in the responder group, while NB and PD with overall survival lower than 2 years (or without this kind information) are in the non-responder group.
Performance of individual markers
We used a partitioning system with the aim of building a receiver operating characteristics (ROC) curve for each biomarker in each study and set the threshold for each marker as the value for which the sum of sensitivity and specificity is maximized, according to Youden’s index. Basic statistics were then extracted: for each biomarker we joined the AUCs coming from different studies to have a minimum, a maximum and a mean value.
Performance of combined markers
Before the combination step, we computed Pearson correlation coefficient among all the biomarkers. Then we excluded the minimum number of them in order to avoid correlation coefficients greater than 0.5. In the majority voting approach, we combined all the patients with all the uncorrelated biomarkers according to all the possible combinations starting from picking up just two of them to selecting all of them. We predicted a patient as responder whether at least half of the biomarkers agreed on that class. For each marker combination, we calculated confusion matrix, sensitivity, specificity and AUC, then we filtered out those with AUC < 0.65 and computed the frequency of each marker in these best combinations. Regarding GLM, we randomly selected a training set containing the 80% of the original set and built GLM models using the binomial family function (logit as link). Next, using the coefficients computed for such models, the response for each patient of the remaining test set was predicted and compared with the real response in order to calculate the ROC curve. We iterated these operations for 10,000-times and computed the mean AUCs and the mean of the best sensitivities and specificities.
cytotoxic T-lymphocyte associated protein 4
programmed cell death protein 1
programmed death-ligand 1
programmed death-ligand 2
non-small-cell lung cancer
immune checkpoint blockade
cytotoxic T lymphocytes
whole exome sequencing
fragments per kilobase million
transcripts per kilobase million
response evaluation criteria in solid tumors
long-term clinical benefit
minimal or no benefit
area under the curve
generalized linear model
receiver operating characteristics
Beatty GL, Gladney WL. Immune escape mechanisms as a guide for cancer immunotherapy. Clin Cancer Res. NIH Public Access. 2015;21:687–92.
Ott PA, Hodi FS, Robert C. CTLA-4 and PD-1/PD-L1 blockade: new immunotherapeutic modalities with durable clinical benefit in melanoma patients. Clin Cancer Res. 2013;19:5300–9.
Philips GK, Atkins M. Therapeutic uses of anti-PD-1 and anti-PD-L1 antibodies. Int Immunol. 2015;27:39–46.
FDA. FDA approves first cancer treatment for any solid tumor with a specific genetic feature. FDA. 2017. https://www.fda.gov/newsevents/newsroom/pressannouncements/ucm560167.htm.
Aguiar PN, Santoro IL, Tadokoro H, de Lima Lopes G, Filardi BA, Oliveira P, et al. The role of PD-L1 expression as a predictive biomarker in advanced non-small-cell lung cancer: a network meta-analysis. Immunotherapy. 2016;8:479–88.
Charoentong P, Angelova M, Charoentong P, Finotello F, Angelova M, Mayer C, et al. Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkpoint blockade resource pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkp. Cell Reports. 2017;18:248–62.
Hugo W, Zaretsky JM, Sun L, Song C, Moreno BH, Hu-Lieskovan S, et al. Genomic and transcriptomic features of response to anti-PD-1 therapy in metastatic melanoma. Cell. 2016;165:35–44.
Van Allen EM, Miao D, Schilling B, Shukla SA, Blank C, Zimmer L, et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science (80−). 2015;350:207–11.
Ayers M, Lunceford J, Nebozhyn M, Murphy E, Loboda A, Kaufman DR, et al. IFN-γ-related mRNA profile predicts clinical response to PD-1 blockade. J Clin Invest. 2017;127:2930–40.
Cristescu R, Mogg R, Ayers M, Albright A, Murphy E, Yearley J, et al. Pan-tumor genomic biomarkers for PD-1 checkpoint blockade-based immunotherapy. Science. 2018;362:3593.
Germano G, Lamba S, Rospo G, Barault L, Magrì A, Maione F, et al. Inactivation of DNA repair triggers neoantigen generation and impairs tumour growth. Nature. 2017;552:116.
Jiang P, Gu S, Pan D, Fu J, Sahu A, Hu X, et al. Signatures of T cell dysfunction and exclusion predict cancer immunotherapy response. Nat Med. 2018;24:1550–8.
Zhang T, Xie J, Arai S, Wang L, Shi X, Shi N, et al. The efficacy and safety of anti-PD-1/PD-L1 antibodies for treatment of advanced or refractory cancers: a meta-analysis. Oncotarget. 2016;7:73068.
Seiwert TY, Burtness B, Mehra R, Weiss J, Berger R, Eder JP, et al. Safety and clinical activity of pembrolizumab for treatment of recurrent or metastatic squamous cell carcinoma of the head and neck (KEYNOTE-012): an open-label, multicentre, phase 1b trial. Lancet Oncol. 2016;17:956–65.
Prat A, Navarro A, Paré L, Reguart N, Galván P, Pascual T, et al. Immune-related gene expression profiling after PD-1 blockade in non–small cell lung carcinoma, head and neck squamous cell carcinoma, and melanoma. Cancer Res. 2017;77:3540–50.
Gao J, Shi LZ, Zhao H, Chen J, Xiong L, He Q, et al. Loss of IFN-γ pathway genes in tumor cells as a mechanism of resistance to anti-CTLA-4 therapy. Cell. 2016;167(397–404):e9.
Shields BD, Mahmoud F, Taylor EM, Byrum SD, Sengupta D, Koss B, et al. Indicators of responsiveness to immune checkpoint inhibitors. Sci Rep. 2017;7:807.
Ascierto ML, Makohon-Moore A, Lipson EJ, Taube JM, McMiller TL, Berger AE, et al. Transcriptional mechanisms of resistance to anti-PD-1 therapy. Clin Cancer Res. 2017;23:3168–80.
Aguilera TA, Giaccia AJ. Molecular pathways: oncologic pathways and their role in T-cell exclusion and immune evasion—a new role for the AXL receptor tyrosine kinase. Clin Cancer Res. 2017;23:2928–33.
Trono P, Sistigu A, Palermo B, Ciliberto G, Nisticò P. Mesenchymal traits at the convergence of tumor-intrinsic and -extrinsic mechanisms of resistance to immune checkpoint blockers. emergtoplifesci.org. 2017. p. 471–86.
Reck M, Rodríguez-Abreu D, Robinson AG, Hui R, Csőszi T, Fülöp A, et al. Pembrolizumab versus chemotherapy for PD-L1–positive non–small-cell lung cancer. N Engl J Med. 2016;375:1823–33.
Snyder A, Makarov V, Merghoub T, Yuan J, Zaretsky JM, Desrichard A, et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N Engl J Med. 2014;371:2189–99.
Tumeh PC, Harview CL, Yearley JH, Shintaku IP, Taylor EJM, Robert L, et al. PD-1 blockade induces responses by inhibiting adaptive immune resistance. Nature. 2014;515:568–71.
Baptista MZ, Sarian LO, Derchain SFM, Pinto GA, Vassallo J. Prognostic significance of PD-L1 and PD-L2 in breast cancer. Hum Pathol. 2016;47:78–84.
Owonikoko T, et. al. PD-L1, PD-1, and CTLA-4 as prognostic biomarkers in resected non-small cell lung cancer. http://ascopubs.org/doi/abs/10.1200/jco.2015.33.15_suppl.7551.
GC, MP, FP conceived the first idea; MP and DA analyzed data; DA generated plots and tables; MP, DA, GC, FP wrote the manuscript; FS biostatistics supervision; FG, FDN, MF molecular biology and biotechnology supervision; PN and CQ immunological supervision; MF, GC, FP revised the manuscript and the data analysis process. All authors read and approved the final manuscript.
This work was published on behalf of Alleanza Contro il Cancro, Immunotherapy Working Group. We thank the members of our laboratory for critical reading the manuscript and to Dr. Luca Giacomelli, Ph.D., and Aashni Shah (Polistudium srl) for the English language editing of this manuscript. We acknowledge the CINECA award under the ISCRA initiative, for the availability of high-performance computing resources and support. We thank Sara Errigo for graphical visualizations and editorial assistance.
The authors declare that they have no competing interests.
Availability of data and materials
All R scripts developed to produce the tables and plots are available at https://gitlab.com/bioinfo-ire-release/ici-biomarker-review.
Consent for publication
Ethics approval and consent to participate
Work and publication costs supported by Ministero della Salute-Ricerca Corrente 2016 CdC104101, Italian Association for Cancer research (AIRC) to MF (#19499), to PN (#15224) and to GC (#19865) and Alleanza Contro il Cancro-Immunotherapy (ACC-Immuno).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Separation criteria for responder and non-responder groups in the datasets.
Overview of responders and non-responders across datasets where both DNA and RNA data are available.
Per-patient summary of clinical classification where both DNA and RNA data are available.
High level workflow for ICI biomarker validation and combination. The N biomarkers were individually tested for each dataset, this test serving as a primary validation of the proposed performance. Mean accuracy was computed for each classifier in all available datasets. Combinatorial analysis was carried out with majority voting and Generalized Linear Models. The E[X] formula stands as an example of the model to create when estimating the a0…aN linear factors.
Performance of each marker in the single datasets.
Summary of the biomarker performance. The second column indicates the number of datasets on which the marker was tested; the last column is the difference between the maximum and the minimum AUC for each biomarker.
Pearson correlation coefficients among each marker.
Combinations of uncorrelated markers and evaluation according to majority voting algorithm. For each combination, number of markers involved and AUC are shown.
Marker frequencies in the previous combinations with AUC greater than or equal to 0.65.
GLM coefficient estimates, with their respective standard errors, t-values and p-values.
About this article
Cite this article
Pallocca, M., Angeli, D., Palombo, F. et al. Combinations of immuno-checkpoint inhibitors predictive biomarkers only marginally improve their individual accuracy. J Transl Med 17, 131 (2019). https://doi.org/10.1186/s12967-019-1865-8