Skip to main content

Table 2 Description of top-performing models

From: A community challenge to predict clinical outcomes after immune checkpoint blockade in non-small cell lung cancer

Model Name

Model Description

Aginome-Amoy

Top-performer in the BOR sub-challenge

A rule-based model was generated using patients stratified into three groups based on their PD-L1 and TMB expression scores:

Group 1: PD-L1 score below median

Group 2: PD-L1 score above median and TMB score below median

Group 3: Both PD-L1 and TMB expression scores above median

The following heuristic rules were used to decide the ranking of samples:

A. Group 3 > Group 1 > Group 2

B. Within Group 3, the ranking of samples was based on the following score: Score_{response} = TMB_{norm} + 2 * PD-L1_{norm}

C. Within Group 1, the ranking of samples was based on the following score: Score_{response} = TMB_{norm} + PD-L1_{norm}

D. Within Group 2, the ranking of samples was based on the following score: Score_{response} = TMB_{norm} – PD-L1_{norm}

cSysImmunoOnco

Top-performer in the BOR sub-challenge

A score of immune response was computed for each patient using EaSIeR [43], which makes use of elastic-net regularized multitask linear regression models trained on TCGA data using quantitative descriptors of the TME as model input and 10 published transcriptomic signatures of immune response as model output. The quantitative descriptors of the TME included relative abundances of different immune cell types [44], scores of pathway [45] and transcription factor activities [46], and scores of inter-cellular communication and were derived by combining prior knowledge about the tumor microenvironment and patients’ transcriptomics data. The models were fine-tuned by associating penalties with markers of tumor foreignness based on TMB, wherever available, or MSI status estimated using an RNA-seq based signature

DukeLKB1

Top-performer in the OS sub-challenge

A model with six derived features (TMB, PD-L1, 4-gene inflammatory signature, LKB1 loss signature, NRF2 activation signature, and neuroendocrine differentiation signature) was generated [47, 48]

The scores included in the model were calculated as follows: for TMB and PD-L1 components, tumors with respective phenotype > 67th percentile were given a score of 1, and remaining tumors were scored 0. The 4-gene inflammatory signature and the three tumor-intrinsic gene expression variables were taken as means of the scaled expression scores for the corresponding signature genes. Because we anticipated differences in gene expression and distribution according to tumor histology, the dataset was first separated into squamous and non-squamous subsets, with scaling and averaging across genes performed separately between the two groups

FICAN-OSCAR

Top-performer in the OS sub-challenge

A single linear regression model using a novel Optimal Subset CArdinality Regression (oscar) L0-quasinorm regularization was generated using the R package available at https://github.com/Syksy/oscar/releases/tag/v0.6.1 [49, 50]. The model is a linear product of the data matrix X and regularized beta coefficients b. Gene expression signature (CUSTOM FOPANEL) was estimated using a custom gene panel analyzed with GSVA (with the parameter mx.diff = TRUE). Other variables included in the model were sex, histology (squamous vs. not), smoking history, ECOG performance status (0 vs. not), TMB, and PD-L1. A description of each coefficient is available in Additional file 1: Supplementary Methods 1

FICAN-OSCAR model equation:

Y =  − 0.693 × CUSTOM_FOPANEL − 0.357 × isTMBhigh − 0.105 × isMale − 0.198 × isSquamous − 0.05 × isSquamous&Above5PDL1 − 0.223 × isEversmoker − 0.105 × isECOG0

@jacob.pfeil

Top-performer in the OS sub-challenge

The AbbVie Taux model used an unbiased feature engineering strategy to identify gene expression ratios that differentiate anti–PD-1 responders from non-responders. The reason for using gene expression ratios was to down-weight the effect of response markers by a factor proportional to resistance marker expression level. Cross-validation and regularization were used to mitigate overfitting on the small number of available training samples. An SVM with radial basis function kernel identified a non-linear boundary separating the responder ratio values from non-responder values. Predictive gene expression ratios balanced markers of response (e.g., immune cell markers, Type-I interferon, HLA presentation) with markers of resistance (e.g., proliferation and inhibitors of immune recognition)

I-MIRACLE

Top-performer in the OS sub-challenge

A rule-based prediction model was generated based on classifying TMB and PD-L1 as high or low as follows:

  • TMB: TMB values were classified as high if greater than or equal to the upper tertile and as low otherwise. When TMB was missing, the proliferation score [51] was used as a proxy, as it correlates highly with TMB in NSCLC (see prediction of OS sub-challenge)

  o The proliferation score was calculated for each patient using the yaGST R package (http://github.com/miccec/yaGST) [52]. Patients with missing TMB were classified as TMB high if their proliferation score was greater than or equal to the upper tertile and as TMB low otherwise

  • PD-L1: Patients were classified as PD-L1 high if their PD-L1 value was ≥ 50 and PD-L1 low otherwise. When PD-L1 values were missing, the ICR score was used instead

  o The ICR score was derived from a 20-gene signature that reflects the presence of a Th1/cytotoxic immune response [14, 16]. The ICR score was calculated for all patients using the yaGST R package. Patients with missing PD-L1 were classified as PD-L1 high if their ICR score was greater than or equal to the upper tertile and as PD-L1 low otherwise

  • Patients were given a I-MIRACLE score of 1, 2, or 3 based on their TMB and PD-L1 values, as shown in Fig. 2B and in Additional file 1: Supplementary Methods 1. If TMB was high (or the proliferation score was high when TMB was missing) and PD-L1 expression was high (or the ICR score was high when PD-L1 was missing), we gave a score of 3. A score of 1 was given when both TMB/proliferation score and PD-L1/ICR were low. A score of 2 was given otherwise

Netphar

Top-performer in the PFS sub-challenge

A decision tree-based model was generated using TMB high (≥ 243) or low (< 243) as a first branching point (prior knowledge: TMB is necessary but not sufficient for triggering the checkpoint inhibitor response) and the expression of PD-L1 in the TMB high branch as the second branching point. The model was designed to be conservative on the TMB low branch with all predictions equal to zero

Model equation: Y = 10 × TMB_binarized + TMB_binarized × PD-L1

Team TIDE

Top-performer in the BOR sub-challenge

The model integrated TIDE [24] with other clinical phenotypes (e.g., PD-L1, TMB, and smoking) by the rank aggregation method to enhance the prediction performance on patient survival and response. Treatment-naïve ICI clinical trial data from the TIDE database and late-stage chemotherapy patients of LUAD, LUSC, and SKCM from TCGA were used as the training data. C-index values for survival with each feature within individual cohort and rank features were calculated according to a custom scoring metric. Features such as TMB, PD-L1, CTL, SMOKE, Dysfunction, Exclusion, T.cell.CD4.non.regulatory from QUANTISEQ [44], B-cell naive from xCell [53], IFNG signature, and antigen presentation by MHC-I were selected in the model prediction

  1. BOR best overall response, C-index concordance index, CTL cytotoxic T lymphocytes, EaSIeR estimate systems immune response, ECOG Eastern Cooperative Oncology Group, GSVA gene set variation analysis, HLA human leukocyte antigen, ICI immune checkpoint inhibitor, ICR immune constant of rejection, IFNG interferon gamma, LUAD lung adenocarcinoma, LUSC lung squamous cell carcinoma, MHC-I major histocompatibility complex I, MSI microsatellite instability, NRF2 nuclear factor erythroid 2–related factor 2, NSCLC non-small cell lung cancer, OS overall survival, PD-1 programmed death-1, PD-L1, programmed death ligand 1, PFS progression-free survival; RNA-seq, RNA sequencing; SKCM, skin cutaneous melanoma, SVM Support Vector Machine, TCGA The Cancer Genome Atlas, TIDE tumor immune dysfunction and exclusion, TMB tumor mutational burden, TME tumor microenvironment