Skip to main content

Table 4 Response prediction performance comparison between longitudinal and ensemble models in the independent test set for endpoint PFS6 and PFS9 by evaluating AUC, ACC, SENS, SPEC, PREC and bACC, respectively

From: Integration of longitudinal deep-radiomics and clinical data improves the prediction of durable benefits to anti-PD-1/PD-L1 immunotherapy in advanced NSCLC patients

Endpoint

Model

Features

N test

AUC

[95% CI]

ACC

[95% CI]

SENS

[95% CI]

SPES

[95% CI]

PREC

[95% CI]

bACC

[95% CI]

PFS6

Ensemble RF-baseline

DF-imm

Clinical data

43

0.678

[0.513,0.836]

0.605

[0.442,0.744]

0.875

[0.731,1.000]

0.263

[0.071,0.467]

0.600

[0.436,0.758]

0.569

[0.448,0.684]

 

Ensemble RF-longitudinal

DF-imm

Clinical data

32

0.824

[0.658,0.953]

0.750

[0.594,0.906]

0.733

[0.500,0.938]

0.765

[0.533,0.947]

0.733

[0.471,0.933]

0.749

[0.594,0.897]

PFS9

Ensemble RF-baseline

DF-imm

Clinical data

43

0.560

[0.377,0.731]

0.581

[0.442,0.721]

0.793

[0.643,0.933]

0.143

[0.000,0.364]

0.657

[0.487,0.811]

0.468

[0.360,0.590]

 

Ensemble RF-longitudinal

DF-imm

Clinical data

32

0.753

[0.549,0.931]

0.813

[0.656,0.938]

0.947

[0.826,1.000]

0.615

[0.357,0.889]

0.783

[0.609,0.950]

0.781

[0.631,0.923]

  1. For each metric, the 95% confidence interval is shown and the highest value for each endpoint is highlighted in bold