The results generated in this MULTIMER proficiency panel phase show that the introduction of a DUMP channel to a MULTIMER experiment on average will decrease the amount of non-specific MULTIMER-positive events in the CD8-cell population. The beneficial effects of applying a DUMP channel strategy were observed in non-censored data sets that employed laboratory-specific criteria for gating, as well as in a censored data set where a common strategy for excluded poor replicates and gating was employed. The reduction of non-specific MULTIMER-binding after introduction of a DUMP channel was observed in nearly half of all experiments performed (Figures 1a and 1b). Notably, we observed a 1.65-fold reduction of measured background MULTIMER-binding in the whole group with a large sub-group of experiments (approximately 50% of stainings) that showed a 4.1-fold median reduction of the background. The absolute median reduction in the fraction of experiments (48 of 100) that showed a clear decrease was 0.049% (about 1 in 2000 CD8 cells) and could be observed in protocols that used or did not use a DEAD cell dye. An in silico gating study showed a similar median background reduction for the independent use of DUMP channel markers and or dead cell dyes confirming the favorable effects of measures to gate out unwanted cells.
Although the observed differences might appear small, they can play a critical role. According to ICH guidelines (ICH Q2 (R1)) the background noise of an analytical test may be used to determine the lower limit of detection of an analytical test. Hence, measures to reduce background increase assay sensitivity. Consequently, the use of a DUMP channel and/or a dead cell marker can become essential to attain assay sensitivity in the range of 1 specific cell in 1,000-3,000 CD8+ lymphocytes. Since most of the tumor antigen-specific CD8 T-cell responses, and also subdominant microbial specific CD8 T cells, are in this range, achieving a reliable sensitivity around this threshold value is central to establishing MULTIMER staining as a monitoring tool in translational immunological research [14, 15]. The data sets generated in this proficiency panel phase suggests that in about half of all experiments performed in a variety of representative laboratories the detection of low frequency T-cell responses will not be technically feasible without use of a DUMP channel. In addition to increasing the test sensitivity, the use of DUMP channel antibodies may provide a more accurate measure of the true antigen-specific signal by decreasing the number of non-specific events in the CD8+ cell population. Although use of a DUMP channel might lead to a reduced number of false-positive events in the quadrant displaying the MULTIMER-positive CD8-positive cells the only way to indeed confirm that a given event is a true positive signal would be to clone and functionally characterize the respective T cell or TCR.
A second outcome of this proficiency panel is that the use of intuitive filters for response determination can lead to an unexpected high number of experiments that will not be considered of being a successfully detected response. The organizers of this panel acknowledge that the cut-off value (200% difference) used to exclude inconsistent duplicates and the dot plot evaluation score were arbitrarily chosen and should not be considered as a standard strategy to filter results from MULTIMER experiments. The chosen filters should rather be seen as a pragmatic way to remove data sets that might include artefacts and to compute response detection rates to compare assay performance in the two tested conditions (DUMP vs. NO DUMP) of this proficiency panel. It is remarkable that although visual evaluation of dot plots is supposed to be highly subjective, disagreement between the central evaluation and the lab evaluation was only observed in 12% (74/636 stainings) of all collected dot plots. These results demonstrate that although visual inspection is a rather crude and highly subjective method for response determination, results generated across institutions lead to clearly discordant conclusions from a central evaluation only in the minority of cases. Although central optical evaluation of the dot plots can be a valid method to consistently rate data from MULTIMER experiments, the optical evaluation will always be inherently subjective. Hence there is an urgent need to develop algorithms and computer-based tools to identify clustered populations of events in a multi-dimensional data space which are under development [16–19]. Such algorithms could potentially lead to higher reproducibility, save time, and importantly, enhance gating strategies even for experienced operators.
The data shown in the third part of the results section (Table 8) clearly demonstrate that gating style can dramatically change the result of an experiment. Accordingly, we recommend adding at least one representative dot plot whenever results from MULTIMER experiments are published. This could be done either as part of the material and methods section or as supplementary electronic material and should enable better understanding of the experiment. This study also provided evidence that binding of MULTIMER to CD8- and CD8+ cells does not always occur independently of each other as suggested by the strong linear correlation shown in Figure 3a. Thus, we recommend that MULTIMER results be displayed in such a way that the investigator will also be able to view the amount of MULTIMER binding in the CD8-negative cell fraction.
Based on these results, we revisited the Harmonization Guidelines for MULTIMER experiments that were recently published . Confirming the findings of the previous panel, the number of CD8+ events acquired from the samples influenced the response detection rates. In experiments with less than an average of 100,000 positive CD8 cells counted, only 50% had a response detected. However, in experiments with more than 100,000 CD8 positive cells counted, 79% of all (including both pp65 and Melan-A) responses were detected (Additional file 1, Table S2). An additional confirmation of previous findings was that the use of more than 3 colors increased detection rates, compared to the use of only 2 or 3 colors (Additional file 1, Table S2). These findings confirm the relevance of the previously published harmonization guidelines.