Skip to main content

Advertisement

A methodology for deriving the sensitivity of pooled testing, based on viral load progression and pooling dilution

Abstract

Background

Pooled testing, in which biological specimens from multiple subjects are combined into a testing pool and tested via a single test, is a common testing method for both surveillance and screening activities. The sensitivity of pooled testing for various pool sizes is an essential input for surveillance and screening optimization, including testing pool design. However, clinical data on test sensitivity values for different pool sizes are limited, and do not provide a functional relationship between test sensitivity and pool size. We develop a novel methodology to accurately compute the sensitivity of pooled testing, while accounting for viral load progression and pooling dilution. We demonstrate our methodology on the nucleic acid amplification testing (NAT) technology for the human immunodeficiency virus (HIV).

Methods

Our methodology integrates mathematical models of viral load progression and pooling dilution to derive test sensitivity values for various pool sizes. This methodology derives the conditional test sensitivity, conditioned on the number of infected specimens in a pool, and uses the law of total probability, along with higher dimensional integrals, to derive pooled test sensitivity values. We also develop a highly accurate and easy-to-compute approximation function for pooled test sensitivity of the HIV ULTRIO Plus NAT Assay. We calibrate model parameters using published efficacy data for the HIV ULTRIO Plus NAT Assay, and clinical data on viral RNA load progression in HIV-infected patients, and use this methodology to derive and validate the sensitivity of the HIV ULTRIO Plus Assay for various pool sizes.

Results

We demonstrate the value of this methodology through optimal testing pool design for HIV prevalence estimation in Sub-Saharan Africa. This case study indicates that the optimal testing pool design is highly efficient, and outperforms a benchmark pool design.

Conclusions

The proposed methodology accounts for both viral load progression and pooling dilution, and is computationally tractable. We calibrate this model for the HIV ULTRIO Plus NAT Assay, show that it provides highly accurate sensitivity estimates for various pool sizes, and, thus, yields efficient testing pool design for HIV prevalence estimation. Our model is generic, and can be calibrated for other infections.

Background

Pooled testing, in which biological specimens (e.g., blood, urine, tissue swabs) from multiple subjects are combined into a testing pool and tested via a single test, can substantially improve the efficiency of public health screening and population-level surveillance of diseases; and enables the use of expensive testing technologies, such as the nucleic acid amplification testing (NAT) technology [2]. Ever since its introduction in the 1940’s [9], pooled testing has been commonly used for both surveillance and screening purposes, including donated blood screening for transfusion-transmittable infections, e.g., the human immunodeficiency virus (HIV), or regional HIV surveillance [1, 7].

In general, biomarker tests have less than perfect sensitivity (true positive probability), mainly because the progression of the load (concentration) of a disease-related biomarker (e.g., the HIV viral RNA, measured by the NAT) in the infected host follows various phases post-exposure, with varying growth rates, e.g., pre ramp-up phase, ramp-up phase with accelerating growth rates, and post ramp-up phase, with the biomarker load tending to a plateau or significantly diminishing due to a resolved infection (e.g., [10, 20]). A majority of false negative testing errors occur during those earlier phases of the infection (pre ramp-up and early ramp-up phases), also known as the window period, the length of which depends on the specific infection and the biomarker being measured by the test (e.g., [27]). Pooled testing may further reduce the test’s sensitivity due to pooling dilution, where the biomarker load of an infected specimen is diluted by infection-free specimens in the pool so that the infected specimen may no longer be detectable by the pooled test. Pooled testing sensitivity at various pool sizes is an essential input to key decisions in surveillance and screening efforts, including testing pool design. However, clinical data on test sensitivity values for different pool sizes are limited, and the extant literature that analytically derives the sensitivity of a pooled test does so under restrictive assumptions, including that the test is perfectly reliable outside of the window period, i.e., all infected specimens that are outside of the window period are detected with probability 1 regardless of the pool size (e.g., [4, 27, 28]). There are commonly adopted mathematical models of viral load progression in infected subjects, but these models consider only the window period (e.g., [6]).

Therefore, our objective in this paper is to develop a generic methodology for analytically deriving the sensitivity of pooled testing at various pool sizes, based on models that account for viral load progression and pooling dilution; and by relaxing various restrictive assumptions adopted in the literature. In particular, our methodology integrates the following components within a probabilistic framework: (1) the “doubling time” model [6], which we expand to model the host’s viral load progression throughout the infection’s life-time; and (2) the probit function [27], which we expand to model pooling dilution to consider the number of infected specimens in a pool. The proposed methodology derives the conditional test sensitivity, conditioned on the number of infected specimens in a pool; and uses the law of total probability to derive overall (unconditional) test sensitivity values for a wide range of pool sizes. We validate this methodology via published test sensitivity data and show that it is highly accurate. This methodology utilizes higher dimensional integrals, which may be computationally expensive for large pools. As a result, we also propose an easy-to-compute, and a highly accurate, approximation function that is based on establishing a functional relationship between the sensitivity of pooled testing and the number of infected specimens in a pool. Our methodology can be used to provide important inputs for surveillance and screening activities, including testing pool design, which has received considerable attention in the literature, (e.g., [16,17,18, 25, 26, 30, 31]). Further, our methodology is generic, and can be calibrated for various infections; and we demonstrate its application for the HIV and HIV ULTRIO Plus NAT Assay. For this purpose, we calibrate model parameters using published test efficacy data for the HIV ULTRIO Plus Assay [20, 22], and clinical data on viral RNA load progression in HIV-infected patients [6, 27]; and use this methodology to derive and validate the sensitivity of the HIV ULTRIO Plus Assay for various pool sizes. We also demonstrate the value of this methodology through optimal testing pool design for HIV prevalence estimation in Sub-Saharan Africa. This case study indicates that the optimal testing pool design is highly efficient, and outperforms a benchmark pool design.

Methods

Our methodology is based on the integration of viral load progression and pooling dilution models. A summary of all the notation used in our study is provided in the Appendix.

Pooled sensitivity estimation methodology

Viral load progression model

We first describe the viral load progression model, which expands the widely adopted doubling time model proposed by Busch et al. [6]. The original doubling time model [6] considers viral load progression only through the window period of an infection, and we expand it to model the infection’s life-time. This is needed to relax a common assumption used in test sensitivity calculations, that all infected specimens outside of their window period are detected with probability 1, regardless of the pool size (e.g., [27, 28]). According to numerous studies, the viral load in infected subjects progresses through various phases of growth rates post-exposure: pre ramp-up phase, ramp-up phase with accelerating growth rates, and post ramp-up phase during which growth rate slows down, eventually reaching a plateau or resolution of the infection (e.g., [5, 6, 10, 12, 20, 27, 28]). To model this phenomenon, we let \(t_w\), \(t_p\), and \(t_s\) respectively denote the time at which the window period ends, viral load peaks, and viral load reaches steady state; and let VL(t) denote the infected subject’s viral load at time t post-exposure. Based on clinical data for HIV and hepatitis B and C infections [5, 6, 10, 12], we model the infected subject’s viral load beyond the window period and up to the steady state as follows:

For \(t_w \le t \le t_s\):

$$\begin{aligned} VL(t)=VL(t_w)+\frac{C_w}{t}\exp \left( -\frac{(ln(t-t_w)-a)^2}{b}\right) \ , \end{aligned}$$

where \(C_w\), a, and b are infection-specific calibration parameters. In this study, we assume that the viral load reaches steady state at time \(t_s\), beyond which it remains constant at a level of \(VL(t_s)\) (i.e., \(VL(t)=VL(t_s)\), \(\forall t> t_s\)); this assumption can be easily relaxed. We note that the steady state viral load, denoted by \(VL(t_s)\), can equal zero for acute infections, or can remain at some positive level for chronic infections. Consequently, the complete viral load model follows:

$$\begin{aligned} VL(t)= {\left\{ \begin{array}{ll} C_0 2^{t/\lambda }, &{} \text {if }t \le t_w\\ VL(t_w)+\frac{C_w}{t}\exp \left( -\frac{(ln(t-t_w)-a)^2}{b}\right) , &{} \text {if } t_w<t \le t_s\\ VL(t_s), &{} \text {if }t > t_s, \end{array}\right. }. \end{aligned}$$
(1)

where the window period component, \(C_0 2^{t/\lambda }\), is the doubling time model in Busch et al. [6], with infection-specific calibration parameters \(C_0\) and \(\lambda \), where \(\lambda \) represents the viral load doubling time within the window period. For demonstration, Fig. 1 plots the base 10 logarithm of the HIV viral RNA load, obtained by Eq. (1), versus post-exposure time in HIV-infected subjects, calibrated as discussed in "Viral load progression model" section.

Fig. 1
figure1

HIV viral RNA load progression spanning the infection’s life-time, covering the window period, peak viremia phase, and chronic phase (based on the data in Table 1)

Pooling dilution model

Pooled testing may reduce the test’s sensitivity due to pooling dilution, that is, the biomarker load of an infected specimen is diluted by infection-free specimens in the pool so that the infected specimen may no longer be detectable by the pooled test [22]. In this section, we model the test sensitivity considering pooling dilution. For this purpose, we first describe the probit function, proposed in the literature to model pooling dilution, and discuss how it is expanded to consider the number of infected specimens in a pool.

Towards this end, let \(\tau \in \mathfrak {R}^+\) denote the life-time of a certain infection, and \(n \in Z^+\) denote the testing pool size. We also let \(T^+ (n)\) denote the event that the test outcome is positive for a pool of size n, indicating the presence of at least one infected specimen in the pool; and let \(N_I(n)\) denote the number of infected specimens in the pool, which is a random variable with possible values \(\{1, \ldots , n\}\). Therefore, the test sensitivity for pool size n (Sens(n)), that is, the probability that the test outcome is positive given that the pool contains at least one infected specimen, follows:

$$\begin{aligned} Sens(n) = P(T^+(n); N_I(n) \ge 1), \end{aligned}$$
(2)

where the “;” notation denotes probabilistic conditioning. Weusten et al. [27, 28] propose the following probit model to derive the sensitivity of pooled testing, under the assumptions that the test is perfectly reliable outside of the infection’s window period (i.e., \(\tau = t_w\)), and the pool contains at most one infected specimen, regardless of the pool size (i.e., \(N_I(n) = 1\) with probability 1):

$$\begin{aligned} Sens(n) = \frac{1}{t_w} \int _0^{t_w} \Phi \left( z\dfrac{log\left( \frac{\chi C_02^{t/\lambda }}{n x_{50}}\right) }{log(x_{95}/x_{50})} \right) dt, \quad \hbox {(from [28])} \end{aligned}$$
(3)

where following [28], \(\Phi (.)\) is the cumulative distribution function (CDF) of the standard normal distribution; z is a constant such that \(\Phi (z) = 0.95\), i.e., \(z=1.6449\); \(\chi \) is the number of nucleic acid copies per viral particle, and \(x_{50}\) and \(x_{95}\) respectively denote the viral load measurement at which the probability of a pool testing positive is 50% and 95% [28].

We expand the probit model in Eq. (3) to consider the performance of a pooled test during the infection’s life-time, and to account for the possibility of multiple infected specimens in a testing pool. In particular, we first derive the test’s conditional sensitivity for a pool size of n, given that the pool contains i infected specimens, denoted by \(Sens(n;i)=P(T^+(n); N_I(n)=i)\), \(\forall i \in \{1,\ldots ,n\}, n \in {\mathbb {Z}}^+\):

$$\begin{aligned} Sens(n;i) = P(T^+(n); N_I(n) =i)=\frac{1}{\tau ^i}\underbrace{\int _0^{\tau }\int _0^{\tau }\cdots \int _0^{\tau }}_{i{\text {-fold}}} \Phi \left( z\dfrac{log\left( \frac{\chi \sum _{j=1}^i (VL(t_j)}{n x_{50}}\right) }{log(x_{95}/x_{50})} \right) dt_1dt_2\ldots dt_i, \end{aligned}$$
(4)

where \(t_j\) denotes the (random) post-exposure time for infected specimen \(j, j=1 \ldots ,i\), in the pool, and \(VL(t_j)\) can be derived from Eq. (1). Observe that the probit model in Eq. (3) follows as a special case of Eq. (4), with \(\tau =t_w\) and \(N_I(n)=1\). Then, using the common binomial model for the number of infected specimens in a pool, and the law of total probability, the overall sensitivity of the pooled test, for pool size of n and infection prevalence rate of p, follows:

$$\begin{aligned} Sens(n) = \sum _{i = 1}^{n} Sens(n;i) P(N_I(n)=i) = \sum _{i = 1}^{n} Sens(n;i) \left( {\begin{array}{c}n\\ i\end{array}}\right) p^i (1-p)^{n-i}. \end{aligned}$$
(5)

On the other hand, the test’s specificity (true negative probability), given by:

$$\begin{aligned} Spec=1-P(T^+(n); N_I(n) =0), \quad \text { } \forall n \in {\mathbb {Z}}^+, \end{aligned}$$

is independent of the pool size, because in the absence of infected specimens in the pool (\(N_I (n) =0\)), pooling dilution does not apply.

In summary, the proposed sensitivity estimation model in Eqs. (4)–(5) can be used in conjunction with Eq. (1) to determine the sensitivity of pooled testing for any pool size.

Calibration and validation

Calibration: We calibrate the sensitivity estimation model based on Stramer et al. [22], which provides the test sensitivity of an infected window period blood specimen diluted 16-fold (i.e., tested within a pool of size 16) as 88%. Therefore, \(C_0\) is calibrated such that Eq. (3) equals 0.88 with \(n=16\). Further, according to various studies, the HIV viral RNA load in blood peaks typically around day 17, with an average load of 6.8 \(\log _{10}\) copies/ml, and reaches steady state around day 61, with an average load of 5.1 \(\log _{10}\) copies/ml; the HIV doubling time (\(\lambda \)) is 0.85 days, and the number of nucleic acid copies per viral particle (\(\chi \)) for HIV is 2 [10, 20, 28]. Therefore, we calibrate the remaining parameters of our model, namely \(C_w\), a, and b, in Eq. (1), based on these values; see Table 1 for the clinical data used and the calibrated parameter values. We note that this calibration is for demonstration purposes, and our model parameters can be calibrated for any given set of data.

Table 1 Calibration and validation data for the HIV and HIV ULTRIO Plus NAT Assay

Validation: We validate our sensitivity estimation model using the overall (life-time) efficacy data for the HIV ULTRIO Plus NAT Assay, in terms of the 95% confidence interval (CI), published by the Food and Drug Administration (FDA); see Table 1. We use our model [Eqs. (1), (4), and (5)], with calibrated parameters reported in Table 1, to derive the conditional sensitivity values for the HIV ULTRIO Plus Assay for various pool sizes; see Table 2, which reports the derived conditional test sensitivity values as a function of the pool size and the number of infected specimens in a pool. According to Table 2, both \(Sens(n=1;N_I(1) =1)= 99.98\%\) and \(Sens(n=16;N_I(16) =1)= 99.26\%\) values are contained within the 95% confidence intervals reported by the FDA (see Table 1).

Table 2 Derived conditional sensitivity values for the HIV ULTRIO Plus Assay (in %) as a function of the pool size and the number of infected specimens in a pool (Sens(ni)) (reported in 9 decimal point accuracy)

As discussed above, the overall test sensitivity at any prevalence rate, p, can then be derived from the conditional sensitivity values in Table 2 via the law of total probability; see Eq. (5). As expected, conditional test sensitivity decreases with pool size, and increases with the number of infected specimens in a pool. Moreover, we observe that test sensitivity rapidly approaches 1 as \(N_I(n)\), the number of infected specimens in a pool of size n, increases, and for \(N_I(n) \ge 4\), test sensitivity becomes almost perfect.

We also derive \(P(T^+(n);N_I(n) =0)=0.07\% =1-Spec\), i.e., \(Spec = 99.93\%\), which is also consistent with the efficacy data for the HIV ULTRIO Plus NAT Assay, published by the FDA [11].

An approximation for sensitivity estimation

Our model in "Pooled sensitivity estimation methodology" section derives the conditional test sensitivity values, Sens(ni), and uses the law of total probability, along with higher dimensional integrals (up to pool size), to derive the overall (unconditional) test sensitivity values for a wide range of pool sizes. Thus, it can be computationally expensive, especially for large pool sizes. Therefore, in this section, we provide an approximation function for computing the pooled test sensitivity,which does not require higher dimensional integrals. We do this by fitting a function to the sensitivity data derived in Table 2 via linear regression so as to minimize the mean squared error (MSE) of the proposed approximation.

Consider the following functional form for conditional test sensitivity for pool size n, given i infected specimens in a pool:

$$\begin{aligned} {\widetilde{Sens}}(n;i) = 1-\beta \alpha ^{\left( \dfrac{i}{n^{\gamma }}\right) }, \; \ i \in \{0,1,\ldots ,n\}, n \in {\mathbb {Z}}^+, \end{aligned}$$
(6)

where \(\alpha \), \(\beta \), and \(\gamma \) are calibration parameters. In particular, by definition of pooling dilution, the probability of detection reduces with pool size, implying that \(\gamma \ge 0\) and \(\alpha \in [0,1]\); and \(P(T^+(n);N_I(n)=0)=1-Spec\) (see "Pooling dilution model" section), implying that \(\beta =Spec\). The remaining parameters (i.e., \(\alpha \) and \(\gamma \)) are derived so as to minimize the MSE between the fitted function and the data in Table 2, that is:

$$\begin{aligned} (\alpha ^*,\gamma ^*)={{\,\mathrm{arg\,min}\,}}_{\alpha ,\gamma }\left( \sum _{n=1}^{16}\sum _{i=0}^{n}\left[ Sens(n;i) - {\widetilde{Sens}}(n;i)\right] ^2\right) , \end{aligned}$$

This minimization problem is a non-convex optimization problem, which we solve numerically in Python for the HIV ULTRIO Plus Assay, obtaining \((\alpha ^*=0.00033,\gamma ^*=0.179)\). The goodness of fit, measured by the coefficient of determination (i.e., \(R^2\)), is equal to 0.9995, suggesting that the fit is highly accurate; see Fig. 2 for the fitted model versus the data points in Table 2.

Fig. 2
figure2

Fitted function versus the data points in Table 2

Results

We apply our sensitivity estimation models (both exact and approximation models, respectively detailed in "Pooled sensitivity estimation methodology" and "An approximation for sensitivity estimation" sections) to determine an optimal testing pool design for HIV prevalence estimation in Sub-Saharan Africa. Specifically, we use the methodologies proposed in Pooled sensitivity estimation methodology" and "An approximation for sensitivity estimation" sections, along with the calibrated parameters in "Calibration and validation" section, to derive sensitivity estimates for the HIV ULTRIO Plus Assay for various pool sizes; and use these sensitivity values as inputs to a testing pool design optimization model, studied in the literature [17, 19, 24, 30].

Testing pool design optimization

The optimization model determines an optimal testing pool design for prevalence estimation, in terms of the number of testing pools to be utilized, s, and the size of each testing pool, n, under a testing budget constraint, so as to minimize the asymptotic variance of the maximum likelihood estimator (MLE) of the unknown prevalence rate [19]:

$$\begin{aligned} \begin{aligned} \underset{n,s}{\text {minimize}}& \, \sigma ^2(n,s;p_0) \\ \text {subject to}& \, c_f \ s + c_v \ s n \le B \\& n \le \overline{N} \\&n, s \in {\mathbb {Z}}^+, \end{aligned} \end{aligned}$$
(7)

where \(\sigma ^2(n,s;p_0)\) denotes the asymptotic variance of the MLE for a pool design (ns), given an initial estimate of the unknown prevalence rate p, which we denote by \(p_0\). The testing cost consists of a fixed testing cost per pool (e.g., cost of the testing kit), denoted by \(c_f\), and a collection cost per specimen (e.g., cost of drawing blood), denoted by \(c_v\). The tester has a total testing budget of B for prevalence estimation. Additionally, the maximum pool size that can be used may be restricted due, for example, to technological constraints, regulations, or other considerations, and we denote the maximum allowable pool size by \(\overline{N}\). The asymptotic variance is a commonly used criterion for optimal testing design in prevalence estimation and for evaluation of estimators in statistical inference, and is also related to the Fisher’s information (e.g., [14, 17, 23,24,25, 30, 31]).

In pooled testing, only one test is used on each pool, and the test provides a binary outcome, with a positive outcome indicating the presence of at least one infected specimen in the pool; and a negative outcome indicating that all specimens in the pool are infection-free. Using the test outcomes, the tester derives the MLE of the unknown prevalence rate (\({\hat{p}}\)). In particular, for a given testing design, (ns), let \(S_I(s)\) denote the number of positive-testing pools among s pools, which is a random variable prior to testing. Then, after the testing is conducted and a realization of \(S_I(s)=k\) is observed, the MLE of the prevalence rate corresponds to the value of p that maximizes the following likelihood function:

$$\begin{aligned} \begin{aligned} L (p; S_I (s)= k)&= \left( {\begin{array}{c}s\\ k\end{array}}\right) \Big [Sens(n;p) - (1-p)^n (Sens(n;p) + Spec - 1) \Big ]^{k} \Big [ 1 - Sens(n;p) + (1-p)^n (Sens(n;p) + Spec - 1) \Big ]^{s-k} \\&\Rightarrow {\hat{p}} \equiv \underset{p\in (0,1)}{\text {argmax}} \Big \{ L(p; S_I (s) = k) \Big \}. \end{aligned} \end{aligned}$$
(8)

The asymptotic variance function, \(\sigma ^2(n,s;p)\), for a pool design of (ns), and with respect to the unknown prevalence rate, p, is then given by (e.g., [17]):

$$\begin{aligned} \sigma ^2(n,s;p) = \frac{\left\{ Sens (n;p) - (1-p)^n (Sens (n;p) + Spec - 1) \right\} \left\{ 1 - Sens (n;p) + (1-p)^n (Sens (n;p) + Spec - 1) \right\} }{sn^2(1-p)^{2(n-1)} (Sens (n;p) + Spec - 1)^2}. \end{aligned}$$
(9)

Study design and data

Our goal is in this section is to demonstrate the value of the sensitivity estimation methodologies developed in this paper through a numerical study. We do this by designing an optimal testing pool, based on the sensitivity estimates derived for the HIV ULTRIO Plus Assay for various pool sizes using the methodologies described in "Methods" section; and comparing the efficiency of the optimal testing design with a benchmark design that does not consider pooling dilution (hence does not need to use our methodology for sensitivity estimation at various pool sizes). As discussed above, we consider pool design for prevalence estimation of HIV in Sub-Saharan Africa using the HIV ULTRIO Plus Assay.

Model parameters are as follows. We assume that the actual prevalence rate is \(p = 0.044\) [29], which is unknown to the tester; this prevalence rate is representative of the HIV prevalence rate in Sub-Saharan Africa. In the absence of this information, the tester determines an initial estimate of \(p_0 = 0.022\), i.e., we consider the case of undershooting. Based on published data, we consider a fixed testing cost per pool of $31.5 [15], a collection cost per specimen of $8 [8], and a total testing budget of $5575 [18], which corresponds to a testing budget of 50 pools, each of size 10. Finally, we consider a maximum allowable pool size, of \(\overline{N}=48\) [21]. These parameter values are for demonstration purposes, and one can conduct similar analyses with different parameter values.

As sensitivity inputs, we utilize the sensitivity values in Table 2, which are derived by the sensitivity estimation model in "Pooled sensitivity estimation methodology" section, in conjunction with the calibration parameters in "Calibration and validation" section. The sensitivity values in Table 2 correspond to pool sizes of \(n=\{1, 2, \ldots , 16\}\). As discussed above, the sensitivity estimation model in "Pooled sensitivity estimation methodology" section requires the computation of higher dimensional integrals (up to pool size), and can be computationally expensive. Therefore, we use the approximation in "An approximation for sensitivity estimation" section to derive the sensitivity values for the remaining pool sizes, i.e., \(n=\{17, \ldots , 48\}\). Then, we perform a two-dimensional search, over all possible values of \(\{ (n,s): n \in \{1, \ldots , 48 \}, c_f \ s + c_v \ s n \le B \}\), to determine the optimal testing pool design, i.e., \((n^*, s^*)\), for the optimization model in Eq. (7) that minimizes the asymptotic variance. To determine the “best” benchmark design, we repeat the two-dimensional search, but without considering pooling dilution, that is, by replacing the parameters, Sens(n), \(\forall n \in {\mathbb {Z}}^+\), with \(99.98\%\), i.e., the sensitivity of individual testing for the HIV ULTRIO Plus Assay; see Table 3 for the resulting optimal design and the benchmark design. For each of these designs, we perform a Monte Carlo simulation to derive estimates for the MLE of p, \({\hat{p}}\) (see Eq.(8)); mean squared error (MSE); and the relative bias (rBias (%)), given by:

$$\begin{aligned} MSE = ({\hat{p}} - p)^2, \text { and } rBias(\%) = 100 \times \left| \frac{{\hat{p}} - p}{p}\right|. \end{aligned}$$
(10)

These performance metrics relate to the efficiency of prevalence estimation, and are commonly used in the literature, e.g., [13, 14, 30].

Table 3 Estimation efficiency (mean ± half-width of 95% CI) of the optimal design and the benchmark design for HIV prevalence estimation (with an actual prevalence rate of \(p=0.044\))

In particular, for each testing design, we perform 10,000 simulation replications. In each replication, we randomly generate the infection status of each of the \(n^* \times s^*\) specimens, where each specimen carries an infection with probability p; and is infection-free otherwise; and for each infected specimen, we randomly generate a post-exposure time from a Uniform distribution with support \([0,\tau ]\), and compute the viral load using Eq. (1) and the parameters of "Calibration and validation" section. Then, we randomly assign the specimens into \(s^*\) pools, each of size \(n^*\), and generate the binary test outcomes based on the test sensitivity model given in Eq. (4). Finally, we compute the MLE, MSE, and rBias for each replication using Eqs. (8) and (10).

Numerical study results

Table 3 reports the average estimation efficiency of the optimal design and the benchmark design, over 10,000 simulation replications. All performance metrics are reported in the form of mean ± half-width of 95% confidence interval (CI).

As indicated by Table 3, the optimal design outperforms the benchmark design, and the differences are statistically significant. The benchmark design yields especially high bias in comparison to the optimal pool design, mainly due to the assumption of no pooling dilution, leading to biased estimates of the unknown prevalence rate.

Discussion

Pooled testing is commonly used in public health settings, for both screening and surveillance of diseases and infections. An accurate and tractable method to compute the sensitivity of a pooled test is extremely important in designing the optimal pooled testing scheme for these efforts. As pooled NAT assays are widely used to screen for diseases, several approaches are proposed in the literature to compute the sensitivity of pooled NAT assays. However, these approaches only account for the window period of the infection, and assume perfect sensitivity past the window period, which is a restrictive assumption, especially as pooling dilution plays an important role in the sensitivity of pooled tests. Further, these studies compute the sensitivity of the pooled test based on the assumption of having at most one infected specimen in any testing pool, when the probability of having multiple infected specimens in a pool is, in fact, a function of both the pool size and the prevalence of the disease.

In this paper, we relax the restricting assumptions in the aforementioned studies and propose both exact and approximate models for computing the sensitivity of a pooled test. We expand the doubling time viral load model [6] to mathematically model the various growth phases of an infection; and propose an exact method to compute the conditional sensitivity of a pooled test as a function of the number of infected specimens in the pool and the pool size, by expanding the probit model in [27, 28]. Then, we can use a binomial model for the number of infected specimens in a pool, along with the law of total probability, to calculate the overall sensitivity of the pooled test given the pool size and the prevalence rate of the disease. We calibrate and validate our exact model using published data on the HIV ULTRIO Plus Assay. Finally, we propose an alternative approximation model to derive the sensitivity of pooled testing that is highly accurate and more analytically tractable than the exact method. We demonstrate the value of our exact and approximate models of pooled testing sensitivity in a case study on HIV prevalence estimation. In particular, we incorporate the proposed models into our testing pool design procedure for prevalence estimation of HIV in Sub-Saharan Africa. Our results show that the sensitivity model is very accurate for the HIV ULTRIO Plus Assay, enabling the design procedure to yield efficient testing pool designs that significantly minimize the estimation error, in comparison to a pool design procedure that utilizes less accurate sensitivity values (i.e., assuming no pooling dilution).

Conclusions

In summary, we develop exact and approximate models for computing the sensitivity of a pooled test by expanding upon the commonly used probit model in [27, 28], and relaxing various restricting assumptions, as we previously discuss. Our methodologies are computationally tractable and highly accurate, and can significantly improve the efficiency of testing pool design for prevalence estimation, as demonstrated by our case study, and for public health screening. We further note that the proposed sensitivity estimation methodology is not infection-specific, and can be calibrated with clinical and published data for any infection or disease, e.g., hepatitis B and C viruses. In addition to its application in prevalence estimation, this methodology can be used in conjunction with other optimization models to make optimal decisions for classification efforts (e.g., [3]), and can also be used for setting a classification threshold, i.e., for classifying a subject as infected versus infection-free for the disease in question. Further, as the expanded viral load model considers the life-time of the infection, in regard to the biomarker load in infected subjects, it allows for more precise sensitivity estimation if information is available about the population of interest, e.g., repeat blood donors have lower overall HIV prevalence rates, and, due to their donation history, one can infer which stage of the infection the donor would be in, if infected. Therefore, integrating the sensitivity estimation methodology proposed in this paper with such optimization models would be worthwhile extensions of this research.

Availability of data and materials

The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

References

  1. 1.

    American Red Cross. Blood testing. http://www.redcrossblood.org/learn-about-blood/what-happens-donated-blood/blood-testing. Accessed 27 May 2017.

  2. 2.

    Aprahamian H, Bish DR, Bish EK. Residual risk and waste in donated blood with pooled nucleic acid testing. Stat Med. 2016;35(28):5283–301.

  3. 3.

    Aprahamian H, Bish EK, Bish DR. Adaptive risk-based pooling in public health screening. IISE Trans. 2018;50(9):753–66.

  4. 4.

    Bish EK, Ragavan PK, Bish DR, Slonim AD, Stramer SL. A probabilistic method for the estimation of residual risk in donated blood. Biostatistics. 2014;15(4):620–35.

  5. 5.

    Biswas R, Tabor E, Hsia CC, Wright DJ, Laycock ME, Fiebig EW, Peddada L, Smith R, Schreiber GB, Epstein JS, Nemo GJ, Busch MP. Comparative sensitivity of HBV NATs and HBsAg assays for detection of acute HBV infection. Transfusion. 2003;43(6):788–98.

  6. 6.

    Busch MP. HIV, HBV and HCV: new developments related to transfusion safety. Vox Sang. 2000;78:253–6.

  7. 7.

    Centers for Disease Control and Prevention. HIV testing. https://www.cdc.gov/hiv/testing/index.html. Accessed 20 Nov 2018.

  8. 8.

    Consumers Union. Outrageous health costs: blood test. http://consumersunion.org/outrageous-health-costs/blood-test/. Accessed 14 Dec 2018.

  9. 9.

    Dorfman R. The detection of defective members of large populations. Ann Math Stat. 1943;14(4):436–40.

  10. 10.

    Fiebig EW, Wright DJ, Rawal BD, Garrett PE, Schumacher RT, Peddada L, Heldebrant C, Smith R, Conrad A, Kleinman SH, Busch MP. Dynamics of HIV viremia and antibody seroconversion in plasma donors: implications for diagnosis and staging of primary HIV infection. AIDS. 2003;17(13):1871–9.

  11. 11.

    Food and Drug Administration. Procleix ultrio assay. https://www.fda.gov/ucm/groups/fdagov-public/@fdagov-bio-gen/documents/document/ucm335285.pdf. Accessed 19 Nov 2018.

  12. 12.

    Glynn SA, Wright DJ, Kleinman SH, Hirschkorn D, Tu Y, Heldebrant C, Smith R, Giachetti C, Gallarda J, Busch MP. Dynamics of viremia in early hepatitis C virus infection. Transfusion. 2005;45(6):994–1002.

  13. 13.

    Hughes-Oliver JM, Rosenberger WF. Efficient estimation of the prevalence of multiple rare traits. Biometrika. 2000;87(2):315–27.

  14. 14.

    Hughes-Oliver JM, Swallow WH. A two-stage adaptive group-testing procedure for estimating small proportions. J Am Stat Assoc. 1994;89(427):982–93.

  15. 15.

    Karris MY, Anderson CM, Morris SR, Smith DM, Little SJ. Cost savings associated with testing of antibodies, antigens, and nucleic acids for diagnosis of acute HIV infection. J Clin Microbiol. 2012;50(6):1874–8.

  16. 16.

    Litvak E, Tu XM, Pagano M. Screening for the presence of a disease by pooling sera samples. J Am Stat Assoc. 1994;89(426):424–34.

  17. 17.

    Liu A, Liu C, Zhang Z, Albert PS. Optimality of group testing in the presence of misclassification. Biometrika. 2012;99(1):245–51.

  18. 18.

    Nguyen NT, Bish EK, Aprahamian H. Sequential prevalence estimation with pooling and continuous test outcomes. Stat Med. 2018;37(15):2391–426.

  19. 19.

    Nguyen NT, Bish EK, Bish DR. Optimal pooled testing design for prevalence estimation. Working paper. Department of Industrial and Systems Engineering, Virginia Tech; 2018.

  20. 20.

    Pilcher CD, Joaki G, Hoffman IF, Martinson FEA, Mapanje C, Stewart PW, Powers KA, Galvin S, Chilongozi D, Gama S, Price MA, Fiscus AF, Cohen MS. Amplified transmission of HIV-1: comparison of HIV-1 concentrations in semen and blood during acute and chronic infection. AIDS. 2007;21(13):1723.

  21. 21.

    Shyamala V. Nucleic acid technology (NAT) testing for blood screening: impact of individual donation and mini pool-NAT testing on analytical sensitivity, screening sensitivity and clinical sensitivity. ISBT Sci Ser. 2014;9(2):315–24.

  22. 22.

    Stramer SL, Krysztof DE, Brodsky JP, Fickett TA, Reynolds B, Dodd RY, Kleinman SH. Comparative analysis of triplex nucleic acid test assays in United States blood donors. Transfusion. 2013;53(10):2525–37.

  23. 23.

    Thompson KH. Estimation of the proportion of vectors in a natural population of insects. Biometrics. 1962;18(4):568–78.

  24. 24.

    Tu XM, Litvak E, Pagano M. On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: application to HIV screening. Biometrika. 1995;82(2):287–97.

  25. 25.

    Vansteelandt S, Goetghebeur E, Verstraeten T. Regression models for disease prevalence with diagnostic tests on pools of serum samples. Biometrics. 2000;56(4):1126–33.

  26. 26.

    Wein LM, Zenios SA. Pooled testing for HIV screening: capturing the dilution effect. Oper Res. 1996;44(4):543–69.

  27. 27.

    Weusten J, Van Drimmelen H, Lelie N. Mathematic modeling of the risk of HBV, HCV, and HIV transmission by window-phase donations not detected by NAT. Transfusion. 2002;42(5):537–48.

  28. 28.

    Weusten J, Vermeulen M, van Drimmelen H, Lelie N. Refinement of a viral transmission risk model for blood donations in seroconversion window phase screened by nucleic acid testing in different pool sizes and repeat test algorithms. Transfusion. 2011;51(1):203–15.

  29. 29.

    World Health Organization. HIV/AIDS prevalence in Sub-Saharan Africa data by sex and residence. http://apps.who.int/gho/data/node.main.247?lang=en. Accessed 14 Dec 2018.

  30. 30.

    Zenios SA, Wein LM. Pooled testing for HIV prevalence estimation: exploiting the dilution effect. Stat Med. 1998;17(13):1447–67.

  31. 31.

    Zhang Z, Liu C, Kim S, Liu A. Prevalence estimation subject to misclassification: the mis-substitution bias and some remedies. Stat Med. 2014;33(25):4482–500.

Download references

Acknowledgements

Not applicable.

Funding

This material is based upon work supported in part by the National Science Foundation under Grant No. #1761842. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. We also acknowledge the Virginia Tech Open Access Subvention Fund for providing partial funding for the publication of this article. Any opinion, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of Virginia Tech.

Author information

All authors contributed to developing and analyzing the sensitivity estimation models studied, and to writing the manuscript. All authors read and approved the final manuscript.

Correspondence to Ngoc T. Nguyen.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Tables 4 and 5.

Table 4 Summary of notation
Table 5 Summary of abbreviations

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Keywords

  • Sensitivity estimation
  • Pooled testing
  • Pooling dilution
  • Public health screening
  • Surveillance study