24h-gene variation effect of combined bevacizumab/erlotinib in advanced non-squamous non-small cell lung cancer using exon array blood profiling

Background The SAKK 19/05 trial investigated the safety and efficacy of the combined targeted therapy bevacizumab and erlotinib (BE) in unselected patients with advanced non-squamous non-small cell lung cancer (NSCLC). Although activating EGFR mutations were the strongest predictors of the response to BE, some patients not harboring driver mutations could benefit from the combined therapy. The identification of predictive biomarkers before or short after initiation of therapy is therefore paramount for proper patient selection, especially among EGFR wild-types. The first aim of this study was to investigate the early change in blood gene expression in unselected patients with advanced non-squamous NSCLC treated by BE. The second aim was to assess the predictive value of blood gene expression levels at baseline and 24h after BE therapy. Methods Blood samples from 43 advanced non-squamous NSCLC patients taken at baseline and 24h after initiation of therapy were profiled using Affymetrix’ exon arrays. The 24h gene dysregulation was investigated in the light of gene functional annotations using gene set enrichment analysis. The predictive value of blood gene expression levels was assessed and validated using an independent dataset. Results Significant gene dysregulations associated with the 24h-effect of BE were detected from blood-based whole-genome profiling. BE had a direct effect on “Pathways in cancer”, by significantly down-regulating genes involved in cytokine–cytokine receptor interaction, MAPK signaling pathway and mTOR signaling pathway. These pathways contribute to phenomena of evasion of apoptosis, proliferation and sustained angiogenesis. Other signaling pathways specifically reflecting the mechanisms of action of erlotinib and the anti-angiogenesis effect of bevacizumab were activated. The magnitude of change of the most dysregulated genes at 24h did not have a predictive value regarding the patients’ response to BE. However, predictive markers were identified from the gene expression levels at 24h regarding time to progression under BE. Conclusions The 24h-effect of the combined targeted therapy BE could be accurately monitored in advanced non-squamous NSCLC blood samples using whole-genome exon arrays. Putative predictive markers at 24h could reflect patients’ response to BE after adjusting for their mutational status. Trial registration ClinicalTrials.gov: NCT00354549 Electronic supplementary material The online version of this article (doi:10.1186/s12967-017-1174-z) contains supplementary material, which is available to authorized users.


Ordinary correspondence analysis
The core procedure in DCCA is correspondence analysis (CA), a powerful ordination method classically used for the analysis of contingency tables [1], and more generally applicable for the analysis of tables of positive or null values (e.g. genomics data) [2]. Ordinary correspondence analyses are used to investigate the dependence between rows and columns in a data set. Theoretical basis underlying CA can be summarized by defining the following: • X the n×m matrix of exon-level expression data (n samples, m exons) • P = X/N the data matrix divided by its grand total • r the n−dim vector of row sums of P (row weights) • c the m−dim vector of column sums of P (column weights) • D r the n × n diagonal matrix of row sums • D c the m × m diagonal matrix of column sums In correspondence analysis, the main matrix of interest is converted into a χ 2 distance matrix after the following pre-processing data transformation: Correspondence analysis performs the singular value decomposition of Z: with Λ the k × k (k = rank(Z)) diagonal matrix of singular values associated with Z with λ 1 ≥ . . . ≥ λ k > 0, U an n × k matrix whose columns are the left singular vectors of Z and V an m × k matrix whose columns are the right singular vectors of Z. The rows of U and V are orthonormal with respect to D r and D c , respectively: The principal components and row coordinates are given by D

Three-way correspondence analysis
Experimental designs including a repeated measurement of all variables for all subjects at a second time-point imply 3 dimensional data sets (patient × exons × time). In the current setting, a pair of tables fully matched on rows and columns was considered. Several approaches have been proposed for the analysis of these specific data in the framework of correspondence analysis. These include methods directly dealing with three-dimensional structures [3], and a series of methods unrolling the third dimension of the data into data structures that can be handled by conventional 2-dimensional CA, such as Foucart's CA [4] or STATIS-CoA [5]. However, these methods generally focus on the similarity of the matched tables (by means of a consensus matrix) rather than on their differences. Our data specifically contain 2 exon-array data sets measuring the same set of exon-level expressions for the same patients, before and 24h after initiation of the treatment. In this situation where the 2 data sets are fully matched, the main question of interest is the identification of variations related to the immediate treatment effect. We are interested in analyzing the expression changes measured at 24h, taking the expression at baseline as reference. The analysis should properly take into account the within-patient experimental design. A few solutions to this problem have been proposed [6,7,8]. The simplest procedure proposed by Torre & Chessel [7], and used in the current work, consists in staking observation-wise the 2 matrices into one table, and carrying out a within-class analysis by defining a categorical variable describing each pair of samples. In this analysis, the within-patient design is accounted for by partialling out the patient effect, which enables to directly investigate the differences before vs. after treatment.

Dually constrained correspondence analysis
Incorporating external constraints in CA is desirable for the direct interpretation of CA in the light of external information structuring the rows and/or columns of a data set. DCCA is an extension of ordinary CA where 2 sets of linear constraints are applied on both rows and columns. Two complementary approaches can be used to impose constraints in the analysis: the reparametrization method and the null space method [9].
Positive constraints can be applied row-wise and column-wise using the respective projection operators: Negative constraints can be applied row-wise and column-wise using the respective orthogonal projection operators: Notice that positive and negative constraints can be combined by examining the effect of one set of variable X 2 while statistically controlling for the effects of a second set of variables X 1 . This is done by partialling out the effect of X 1 from X 2 using the reparametrization method. The resulting row-wise and column-wise partial constraints are the following: with N * = (I − X 1 (X T 1 D c X 1 ) −1 X T 1 D c )X 2 Notice that when applied to the observations only, this type of constraints corresponds to the partial canonical correspondence analysis described by ter Braak [10].
Depending on the study objective three types of constraints can be applied row-wise and columns-wise. The last step consists in performing one of the 9 possible generalized singular value decompositions: The principal components and row coordinates are given by D with c j the weight of the j th variable, and q 2 j,l the coordinate of the j th variable (loading) on the l th dimension (principal axes).

Design of experiment
The structure of the current data set and the scheme of DCCA are summarized in Figure 1. The CA model is dually constrained. Variable-wise, a categorical variable indicating which exon belongs to which gene is used either as a positive constraint (gene-level analysis) or as a negative constraint (exon-level/alternative splicing analysis). Observation-wise, two categorical variables are defined, corresponding to the patient identifier and the time when the measurement was made. Following the previous annotations, the following two transformations are considered: and for the gene-level and exon-level analysis, respectively.

Selection of responders based the metagene score
An illustration of selection of responders based the metagene score is given in Figure 2. Figure 2 depicts the time to progression under BE as a function of the metagene score. Each dot represent a patient either showing a progression (plain dot) or censored (empty dot). The red color indicate the presence of characterized EGFR mutations. Patients with a high metagene score showed a better response to BE. All EGFR mutated patients (red dots) had a TTPBE above the median time to progression. However, as shown in the upper right corner of the plot, several patients without characterized EGFR mutation showed a satisfactory response to BE.