A systematic approach to biomarker discovery; Preamble to "the iSBTc-FDA taskforce on immunotherapy biomarkers"

The International Society for the Biological Therapy of Cancer (iSBTc) has initiated in collaboration with the United States Food and Drug Administration (FDA) a programmatic look at innovative avenues for the identification of relevant parameters to assist clinical and basic scientists who study the natural course of host/tumor interactions or their response to immune manipulation. The task force has two primary goals: 1) identify best practices of standardized and validated immune monitoring procedures and assays to promote inter-trial comparisons and 2) develop strategies for the identification of novel biomarkers that may enhance our understating of principles governing human cancer immune biology and, consequently, implement their clinical application. Two working groups were created that will report the developed best practices at an NCI/FDA/iSBTc sponsored workshop tied to the annual meeting of the iSBTc to be held in Washington DC in the Fall of 2009. This foreword provides an overview of the task force and invites feedback from readers that might be incorporated in the discussions and in the final document.


Background
Assumptions about correlation between immunological end-points and clinical outcomes of immunotherapy or anti-cancer vaccine therapy are not supported by current monitoring strategies; standard immunological assays may inform about immunological outcomes but cannot yet predict the efficacy of treatment [1].
The failure of past clinical investigations to identify measurable, reliable biomarkers predictive of treatment efficacy may be explained two ways: A. The current understanding of the immune biology of tumor/host interactions and the immunological requirements for the induction of immune-mediated, tissue-specific destruction is insufficient. Thus, novel hypothesisgenerating strategies should be considered.
B. The power of immunotherapy clinical studies is often not sufficient to provide robust statistical information because of their small size and because the immune assays are not sufficiently standardized or broad to allow inter-trial, inter-institutional comparisons to enhance statistical power.
To address the first point, a working group (Novel Assays for Immunotherapy Clinical Trials) has been organized under the leadership of Peter Lee and Francesco Marincola aimed at the identification of experimental, bioinformatics and clinical strategies to increase the yield of information relevant to the mechanism of immune-mediated, tissue-specific rejection to develop clinically useful markers and assays.
To address the second point, another working group (Biomarker Validation and Application) has been organized under the leadership of Lisa Butterfield, Nora Disis and Karolina Palucka to evaluate current approaches to the validation of known immune response biomarkers and the standardization of the respective assays to enhance the likelihood of obtaining informative returns from ongoing immunotherapy protocols at different institutions. This working group will focus primarily on the standardization and corroboration of commonly utilized assays for measurement of host-tumor interaction and immune response to therapeutic intervention; in addition, it will develop best practices for the standardization and corroboration of novel assays.

Working group on novel assays for immunotherapy clinical trials
Co-Chairs: Peter P Lee, MD -Stanford University Francesco M Marincola MD -Clinical Center, NIH

Goals
This working group goal consists of testing novel, cuttingedge strategies suitable for high-throughput screening of clinical samples for the identification, selection and validation of biomarkers relevant to disease outcome and/or to serve as surrogate equivalents to clinical outcome. In particular, the working group will focus on: A. Predictors of immune responsiveness are defined as a set of biomarkers that could predict at the time of patient's enrollment her/his responsiveness to treatment [2,3]. This type of markers will be particularly important in immunotherapies since standard response criteria (RECIST and WHO) to define tumor response and disease progression (tumor shrinkage) might not adequately capture the clinical benefit. In immunotherapy trials, some patients demonstrate long-term survival benefit from treatment but delayed responses and show continued tumor growth initially [4]. By standard criteria, such patients would be classified as having progressive disease and taken off study.
B. Markers predicting risk of toxicity are defined as biomarkers that could predict at the time of patient's enrollment her/his likelihood to suffer major toxicity from a specific therapy.
C. Mechanistic biomarkers are defined as those that may explain or validate the mechanism(s) of action of a given treatment in humans; such biomarkers will be more likely identified by paired comparison of pre-and post-treatment samples [5]. Critical to the design of studies aimed at the identification of mechanistic biomarkers will be the inclusion of relevant control samples to allow the differentiation between treatment related effects from the effects on tissues of serial biopsies that induce wound repair associated genes and proteins [6]. D. Prognostic markers predicting survival/clinical benefit could predict overall outcome independent of clinical responsiveness based on standard response criteria [7,8].
E. Surrogate (end-point) biomarkers are defined as those biomarkers that could provide information about the likelihood of clinical benefit/survival at earlier stages compared to prolonged disease-free or overall survival analysis.
The goals of this working group are especially challenging since there are multiple categories of immunotherapies having their own complexities often representing multicomponent systems such as vaccines. Nevertheless, there is a need for biomarkers to determine the effect of the drug on the tumor as well as assessment of the host immune response. Thus, the goals are broader and less restrictive than those of the working group on Biomarker Validation and Application because specific challenges to the identification and validation of biomarkers using novel and rapidly evolving approaches have been less clearly characterized. Consequently, the establishment of sub-committees addressing specific issues is planned at a later time either before or after the 2009 workshop when defined scientific or practical hurdles will be prioritized and framed into specific questions. Furthermore, the selection and implementation of different sub-committees will follow an adhocracy model according to evolving and progressively recognized needs [9].

Basic considerations
Success will only be achieved by boldly following new strategies likely to provide informative data independent of other practical or financial considerations. In other words, a study should be primarily designed following rigorous and stringent criteria that allow the achievement of its scientific goals. As the design proceeds to the implementation phase, other considerations should obviously be taken into consideration and negotiated carefully, optimizing the balance between them and the likelihood to obtain the originally desired outcomes. A good example is the implementation of serial sampling for mechanistic studies [10]; such strategies have been discussed for a long time but rarely applied due to a hesitant attitude on the side of clinicians. On the other hand, examples of the applicability of such strategies in institutionally approved protocols is emerging because of the enormous scientific return that can be obtained from these kinds of studies [5,11,12]. Therefore, the basic belief in relation to the purposes of this working group is that a clinical study should be entertained only if likely to provide significant enhancement of the science of immunotherapy; in other words, a poorly designed clinical study is worse than no study at all. Furthermore, identification of novel and relevant biomarkers should be sought by prospectively designing clinical studies with that purpose rather than piggybacking ongoing studies.
Marker discovery/development for immunotherapy is especially challenging since humans are: None of these factors are controllable. Therefore, future studies should confront the challenges of clinical investigation by accruing materials that could comprise the genetic background of patients, the heterogeneity of their cancers and other indeterminate factors that may contribute to patients' and cancer cell phenotypes. This goal can likely be achieved through a non-linear mathematical approach based on pattern recognition [13][14][15]. The leading hypothesis is that, within a heterogeneous system, commonalities observed during the occurrence of a particular phenomenology (i.e. response to therapy) are most likely to be relevant and/or causative [16]. Thus, the general strategy will be to obtain: i. Samples to address the genetic background of the patients (germ line DNA, i.e. peripheral blood mononuclear cells, PBMCs) ii. Samples to address the altering phenotypes of immune cells in relation to the natural history of disease and/or treatment (i.e. pre, during, and post-treatment PBMCs, sera or plasma at the same time points, pre-treatment and/ or serial biopsies) that could provide insights about the identification of biomarkers predictive of responsiveness or toxicity.
iii. Samples that may provide mechanistic insights about the relationship between tumor biology and treatment (i.e. tumor biopsies, sentinel node biopsy etc).
Appropriate sample collection should be considered the independent variable while the technologies applied for their analysis may rapidly evolve and will have to adjust; experts in various fields of genomics, functional genomics, and proteomics will provide useful insights. In addition, recent interest has risen toward the characterization of cellular products, tissue or genetically engineered products for adoptive transfer by high throughput technologies including transcriptional profiling at the messenger RNA [17] and microRNA [18] level.
It should be emphasized that there is no priority scale about which of the three lines of investigation is most important; indeed, only the combination of them can provide a global view of the pathological process. Furthermore, questions regarding the type of material to be utilized (i.e., DNA, RNA or proteins) underline some naiveté in the way clinical investigations may be approached. In an oversimplified view, humans, as multi-cellular organisms, are structured according to a hierarchy of genetic interactions that go from genomic DNA, to transcription into RNA and translation into functional units (proteins in different functional statuses) that may or may not differ among cells within a tissue or from different tissues. The study of each layer within this hierarchy provides distinct information: DNA analysis provides information about relatively stable characteristic of cells and tissues that may explain variations among individual patients, or aberrances between normal and abnormal tissues; messenger RNA informs mostly about the reaction of cells to environmental conditions; we compare transcriptional analysis to the electroencephalographic responses to stimulation which inform about the reaction to stimulus; thus, while mRNA provides information about the "brain response" of a cell (spikes in response to light), protein analysis (including functional assays descriptive of protein activation [19] and/or expression by immune cell subsets [20]) provides information about what a cell is doing as the hand covers the eyes when the light is too strong. Since each component provides different types of information and one kind cannot be assumed from the other, clinical research should study humans by evaluating all components simultaneously at moments relevant to the natural history of a disease or its response to therapy. Of importance is the realization that protein analysis confronts particular challenges when studying immunologically relevant soluble factors that are generally present in low concentrations (though biologically significant) in body fluids like serum or plasma [21] and potentially exist as isoforms with different functional implications [22].
Advances in metabolic imaging based on positron emitting tomography (PET) and in sensitive protein assays based on nanotechnology platforms provide the promise of non-invasive and minimally-invasive immune monitoring. The use of PET-based probes preferentially taken up by activated T cells enables non-invasive imaging of immune responses in vivo without perturbing the biological process with blood cell or tissue sampling [23,24]. In addition, the increased knowledge of the proteins secreted during immune activation and tumor cell killing (secretome) can be detected in small volume serum samples (ideally from a finger-prick) when analyzed by high throughput nanotechnology-based assays [25,26]. These new technologies applied to immune monitoring would enable the sequential and repetitive analysis of an effective immune response. Ideally, the novel assay technologies will need to first be compared to more standard approaches to define their analytical bias, leading to adequate correlation with biological processes and clinical outcomes. Furthermore, circulating RNA profiling measures predominantly transcriptional activation of circulating cells, while protein profiling measures abundance of proteins produced by several tissues.

General strategy
Experience from non-linear, pattern-recognizing approaches such as whole genome analysis or functional genomics suggest that the best and most efficient statistical strategy for biomarker identification/validation is a two (three) step process that includes: i. A discovery/training step This step may require a relatively limited number of samples to be tested extensively to identify putative informative pathways or genetic traits using costly, highthroughput and comprehensive strategies.
ii. A training/validation step This bears the same characteristics of the training set with two exceptions: a) should be performed by an independent group; b) could be better powered because the study can be designed with a priori knowledge of experimental variance.

iii. A validation step
The validation set follows to validate the previously identified pathway or genetic trait using less costly and more focused analyses on larger patient populations. Thus, the validation step bears the same characteristics of the training/discovery set but it should be performed in a large independent specimen cohort sufficient to provide the results to support the clinical use of the marker (prognostic response, toxicity, etc.). It should include a clear statistical design to assure the marker correlation with the clinical parameter of interest Key to successful implementation of this strategy is the decision to move from the "discovery phase" (training set) to the "validation phase". Arguably, in the past the scientific community has been too eager to move from the first to the second without substantial evidence that the first phase had been truly completed. It could be argued that a second "training/validation" set should be added to independently test the reproducibility of the results in a small cohort; several strategies may be adopted including a paired performance of identical studies at two different institutions blinded about each others results. Bioinformatic and statistical support are critical in defining the most effective and least time-consuming strategies and we advocate that a biostatistician/computational biologist should play a significant role in the committee. Moreover, the separation between training and validation phases is critical because sample collection, storage and utilization may significantly vary; less material may be required during the validation step when narrower questions are approached. However, while some features of sample collection may change, experimental consistency will not be negotiable. The three step strategy may be able to provide the highest yield of information during the transition from a high cost per patient during the exploratory phase to a less costly per patient but highly powered validation phase. Bioinformatics and statistical support are critical in defining the most effective and least time-consuming strategies and we advocate that a biostatistician, computational biologist should play a significant role in the working group starting from the clinical study design.

Strategy for sample collection
A working hypothesis of the working group is that the biggest obstacle to the identification of useful biomarkers is the difficulty in obtaining relevant material to study, while the potential of current technologies is proportionally limitless. Due to practical, ethical and financial rationalizations, samples are rarely collected with a methodology that allows broad testing opportunities and at a time or anatomical site relevant to the question asked. The working group will address each of these questions by including a bioethicist, members of regulatory agencies and a statistician together with the clinical and research input provided by other members and, potentially, patients' advocacy groups. The contention is that 1) excessive and unnecessary regulatory burdens ultimately result in a disservice to present and future patients, 2) studies limited for financial reasons are likely to be more wasteful than well-designed costly studies because they will eventually need to be repeated; 3) the application of training/ validation strategies may significantly reduce costs without compromising the scientific yield of well-designed studies. Strategies for sample collection include the following:

i. Time of collection
The time of collection critically impacts functional studies. Obviously, it is less important when analyzing the genetic background of individuals since germ line DNA does not change throughout the natural history of the disease. However, functional studies involving the utilization of messenger RNA or protein from samples before and during treatment are highly affected by the rapid kinetics of the immune response and the evolving nature of cancer cell phenotypes.

ii. Method of collection
Clinical samples are often difficult to obtain, impractical and require invasive technology. Although these are important considerations, none should compromise the collection of informative material. Non-invasive technologies have been developed, validated and optimized during the last decade to improve the feasibility of highthroughput studies in clinical settings [10]. Furthermore, use of anti-coagulants and/or other preservatives may have significant impact on measurements [27].
iii. Method of preservation Strategies can be implemented to preserve materials prospectively in selected cohorts of patients (training set strategy) to improve the quality of the specimens; rapid freezing methods, use of anti-proteases or anti-RNAase, aliquoting of material to avoid serial freeze-and-thaw cycles. These precautions will increase significantly the likelihood of obtaining informative results by reducing variance.
iv. Type of sample DNA, RNA and protein material should be obtained whenever possible. Germ-line DNA is important for testing genetic predisposition/influence on treatment outcome. However, genetic testing often requires a large number of cases due to the functional redundancy of human genes and the co-segregation of genetic traits according to geo-ethnical origin independent of specific phenomenologies. Expertise from immunogeneticists will be important. Transcriptional analysis has matured during the last decade and expertise in RNA handling and amplification will be present in the working group. A protein biochemist will be included that could provide expertise about the sample handling and research approaches appropriate for immunological studies (i.e. low concentration of cytokines and chemokines below the sensitivity of present discovery-driven proteomic approaches).

v. Number of samples
Individual protocols will require a different number of samples to achieve the same statistical power according to the variance expected in the study population and its responsiveness to therapy and/or susceptibility to toxic side effects (i.e. the expected frequency of responders to a given treatment will dictate the size of training and prediction set). Moreover, definition in mathematical terms of biological equivalence vs diversity of cellular and biological products will be discussed (i.e. what parameter defines equality or difference of dendritic cell processing following "identical" procedures).

vi. Methods of analysis
Concerns often focus on methods for sample collection and storage and validation and cross-validation on novel technologies. We believe that the significance of these concerns is overrated, particularly in the case of hypothesis-generating studies where the main goal is to screen clinical material for the identification of novel ideas to be validated later on by other techniques. This opinion is based on evidence that results obtained by various groups collimate conceptually with results obtained by others using different platforms and samples and with common sense biological knowledge [28][29][30][31][32]. As human biology is an independent variable, different platforms applied to its study should provide concordant results as the essence of life is not changed by the spectacles through which we observe it, though our perceptions might vary from jolly to gloomy in accordance with the pink or dark lenses that we wear. This is critical in clinical research: by far, the key concern should be timing, site and method of sample accrual while rapidly evolving technologies will have to adapt to what is available and worth studying. Although counterintuitive, the methods applied for the study are less critical than the quality of the material accrued. Experience with various functional genomics platforms suggest that results are quite comparable as long as the same material is tested but most discrepancies occur when studies performed at different institutions or on samples received from different institutions are compared. The potentials of modern technology are proportionately limitless and flexible; bioinformatics tools can robustly evaluate concordance of results, identify consistent and random biases and sieve reliable data. As technology rapidly evolves, tools can be adapted to compare platforms and provide biologically consistent results. Thus, although the quality of the material will remain a primary focus of the working group, the need for platform standardization or, at least comparability of results to facilitate inter-trial, inter-institutional comparisons will be a focus of discussion. Furthermore, the definition used for the collection of clinical information or metadata derived from the bedside vary widely and are likely to make the task of consolidating clinical trials results even more daunting.

vii. Standardization, Centralization, Validation
Although the principles of standardization and validation of assays are the primary purpose of the working group on "Biomarker Validation and Application", sound strategies should be applied to address the imminent needs of the present working group evaluating novel technologies in uncharted territories; it is our opinion that assay standardization is most important in the early phases of biomarker discovery when limited sample size of different protocols can be counterbalanced by the accumulation of comparable results from different studies/institutions. Thus, the following concepts will be considered:

i. Standardization
It is generally difficult to enforce standardization of methods when novel technologies are approached due to the unsolved biases among individual investigators about the pros and cons of emerging technologies. Thus, standardization could be enforced by proposing standardization of sample collection (comparable material) and cross validation of the samples among different institutions to assure similar results independent of platform used.

ii. Sample exchange
The comparability of results could be compared by exchange of training samples among trials/institutions. This may obviate biased selection of platforms based on limited knowledge about their pros and cons.

iii. Centralization
A super core facility could support the analysis of samples from different but comparable trials as, for instance, the novel Center for Human Immunology which is part of an inter-NIH initiative with pre-dominant intra-mural scopes but open to extra-mural interactions.
iv. Validation it is important to distinguish between these two concepts: 1) assay validation; 2) biomarker validation 1. Assay validation: is not the purpose of this working group; validation of assay deemed useful by this working group will be performed by the sister working group after discussion of its potential benefits.
2. Biomarker validation: potential discovery of a new robust candidate as a biomarker will need to be validated by a validation set as described above: this is part of the goals of the working group; arguably, a robust biomarker should be useful independent of the test applied. In general, concordant results about the validity of a biomarker by different platforms should provide stronger confidence about its clinical relevance. Hence, this working group will not focus particular attention on assay validation but rather on biomarker validation.

Data exchange
Data collection and data exchange is becoming extremely burdensome: a whole genome SNP array from Affymetrix requires approximately 1 Gbyte of memory. Data exchange requires compatible databases and similar languages which are not readily available. Thus, informatics distances are large in spite of the disruption of geographical distances through the World Wide Web. Centralization of information may represent a solution as exemplified by the Center of Information Technology at NCI that standardizes and collects all high-density data for the intramural program. Similarly, data analysis could be centralized as several inter-institutional cooperative groups are already doing for low density data handling. Large bioinformatics wastelands could be avoided if data could be effectively mined by various groups interested in similar problems; however, in our experience this seldom occurs due to the complexity of exchanging basic information about the strategies in which data bases were prepared particularly considering the little incentive due to little funding available for re-analysis and unclear publication opportunities.

Desired outcomes
This working group has clearly defined goals that can be summarized as follows: 1) Identification of recommended SOPs for blood, serum/ plasma and PBMC transportation, processing, cryopreservation and thawing. Many of these have been previously tested, standardized and published [33][34][35]. Specific protocols and SOPs should be posted on the web and broadly available for use and citation. In addition, sample collection and storage should take into account new assays. Similar considerations should be taken into account when collecting sera or plasma during the conduct of clinical trials [36].
2) The identification of specific standardized and validated immunological assays for both potency of products and testing of immunologic biomarkers which incorporate intra-assay and inter-assay reference standards for comparison between laboratories and potentially between clinical trials, as well as standardization of assay data reporting. Again, there have been many reports published in these areas [37], and this group proposes to review the state of the art, including recent undertakings of related international societies, and present a consensus. Our goals are to identify a few assays which are minimally required in a trial to identify successfully vaccinated patients and patients who would respond to specific immunotherapy (and to allow for potential inter-trial comparisons). Also, the activity of this group will focus on criteria for assessment of analytical range and sensitivity, accuracy, precision and reproducibility for assay validation. The group will also identify the most commonly used assay controls and reagents which might be recommended and made available for common use. Recommended cellular product potency assays should be tested now, in Phase I/II trials, in preparation for use in any Phase III trials.
Lastly, 3) the integration of standardized and/or validated assays (with recommended data reporting parameters) into new clinical trial design and outcome structure will be recommended.

Critical Issue for discussion
How to take best advantage of the work in the infectious disease and immune tolerance fields where much standardization has already been worked through and implemented?
Charges 1. Identification of validated SOPs for blood handling and transportation, processing, cryopreservation and thawing, with new assays in mind.

2.
Development of guidelines for pre-analytical standardization, requirements for assay validation and results reporting that meet CLIA requirements.
3. Development of scientifically sound and statistically significant definitions of immune response based on immune monitoring assays. This would require defining the performance specifications within the reportable range of the assay, as described [38]. Assays should specify whether they are quantitative or semi-quantitative, the scoring system and threshold values that differentiate between responders and non responders must be specified.

5.
Identification of potency assays for cellular products for development and testing in current immunotherapy trials: a) cellular vaccine phenotypes (DC, other APC, CTL/ TIL, NK, NK/T), b) cytokine/chemokine production, c) antigen uptake/presentation and d) functional assessment [39].

6.
Develop specific guidelines for detection of T cell frequencies: IFN-γ ELISPOT [40] and for "other cytokine" ELISPOTs, intracellular cytokine staining, cytotoxicity assays, proliferation, (focus on non radioactive and multiparameter), specific antigen ELISA/Luminex and MHC class I tetramer flow cytometry. For most routine assays, a simple statement of general parameters with citations.

7.
Develop strategies for standardization and validation of monitoring non-HLA-A2.1 patients, particularly the use of long peptides, peptide libraries and full-length antigens.

8.
Identification of a few core assays which are minimally required in a trial to identify successfully vaccinated patients and/or patients who respond to a specific immunotherapy. Particularly, the least costly assay which is standardized and/or validated, with freely available reference standards which can be used in each assay run. This should include specific recommendations for assay parameters, coefficient of variation (CV) and data analysis to report in publications. This should also include defining the analytical variation of the assay as well as determining the biological fluctuations of antigen-specific T cells in humans over time in the absence of an intervention [41].
9. Development of assay reference standards that meet CLIA requirements. Recommend optimal sources of critical reagents.
10. Identification of scientific areas in which assays should be developed, including apoptosis, myeloidderived suppressor cells, tumor microenvironment assessment, discussion of issues inherent to antigen-specific DTH testing, and T regulatory cells assessment. This should be based on a systematic approach of method selection, evaluation, development and implementation (specific recommendations available on the web, [42]. There are increasingly frequent reports of statistically significant correlations between measures of anti-tumor immunity and clinical outcome. Greater standardization is required to strengthen these associations and provide more mechanistic insights to inform future trial design. In addition, utilization of CLIA-certified and inspected central laboratories allows for standardization of most aspects of assay conduct and also for cost effective assay development and validation. • Preparation of a document with input from all participants at the end of the task force to be published after the 2009 Workshop (as done in previous occasions [43,44]).
• Provision of links to recommended SOPs and the resultant document on the iSBTc web site with links to the web sites of participating societies and organizations.

Expected outcomes of the taskforce
• Potential collaborations among different laboratories, institutions, companies and international societies which are also focused on similar efforts of standardization and harmonization of goals.
• Development of cooperative groups for the study design, identification and sharing of resources, centralization of analyses in core laboratories, establishment of ad hoc tissue and data banks and development of easy to access data repositories.