Sonja Hartnack
Valerie Hungerbühler
Eleftherios Meletis
2022-07-14
A particular model formulation that was originally designed for evaluating diagnostic tests in the absence of a gold standard
Not originally/necessarily Bayesian - implemented using Maximum Likelihood
But evaluating an imperfect test against another imperfect test is a bit like pulling a rabbit out of a hat
When the true infectious status is known Se-Sp of test 1 can be estimated:
A particular model formulation that was originally designed for evaluating diagnostic tests in the absence of a gold standard
Also known as the two_test - two_population setting/paradigm
In the Hui-Walter paradigm a condition that connects \(S\) the number of populations (P) and \(R\) the number of tests (T) is desribed:
\[ S \geq \frac {R}{(2\textsuperscript{R-1} - 1)}\]
If this condition is fulfilled, any combination of \(S\) and \(R\) may allow to estimate \(Se\) and \(Sp\), e.g. (2T, 2P), (3T,1P), (4T,1P), …
hw_definition) and continue adding
both populations (\(S\)) and tests
(\(R\)) along the way.The data are summarized in a two_x_two table (2^2 cells)
or in a vector that contains all possible test results combinations
model{
Cross_Classified_Data ~ dmulti(prob, N)
# Test1+ Test2+
prob[1] <- (prev * ((se[1])*(se[2]))) + ((1-prev) * ((1-sp[1])*(1-sp[2])))
# Test1+ Test2-
prob[2] <- (prev * ((se[1])*(1-se[2]))) + ((1-prev) * ((1-sp[1])*(sp[2])))
# Test1- Test2+
prob[3] <- (prev * ((1-se[1])*(se[2]))) + ((1-prev) * ((sp[1])*(1-sp[2])))
# Test1- Test2-
prob[4] <- (prev * ((1-se[1])*(1-se[2]))) + ((1-prev) * ((sp[1])*(sp[2])))
prev ~ dbeta(1, 1)
se[1] ~ dbeta(1, 1)
sp[1] ~ dbeta(1, 1)
se[2] ~ dbeta(1, 1)
sp[2] ~ dbeta(1, 1)
#data# Cross_Classified_Data, N
#monitor# prev, prob, se, sp
#inits# prev, se, sp
}
library('runjags')
Cross_Classified_Data <- as.numeric(twoXtwo)
N <- sum(Cross_Classified_Data)
prev <- list(chain1=0.05, chain2=0.95)
se <- list(chain1=c(0.01,0.99), chain2=c(0.99,0.01))
sp <- list(chain1=c(0.01,0.99), chain2=c(0.99,0.01))
results <- run.jags('basic_hw.txt', n.chains=2)Remember to check convergence and effective sample size!
Effective sample size is a calculation of the equivalent number of independent samples that would contain the same posterior accuracy as the correlated samples from an MCMC. Thus, MCMC Efficiency is the number of effectively independent samples generated per second.
With high autocorrelation, the effective sample size will be lower.
We want SSeff > 1000
| Lower95 | Median | Upper95 | SSeff | psrf | |
|---|---|---|---|---|---|
| prev | 0.328 | 0.501 | 0.669 | 3954 | 2.275 |
| prob[1] | 0.255 | 0.344 | 0.437 | 13822 | 1.000 |
| prob[2] | 0.017 | 0.055 | 0.102 | 9640 | 1.000 |
| prob[3] | 0.073 | 0.132 | 0.201 | 14050 | 1.000 |
| prob[4] | 0.366 | 0.462 | 0.559 | 13257 | 1.000 |
| se[1] | 0.000 | 0.387 | 0.967 | 4459 | 13.597 |
| se[2] | 0.030 | 0.502 | 1.000 | 4683 | 15.382 |
| sp[1] | 0.033 | 0.577 | 1.000 | 4987 | 13.730 |
| sp[2] | 0.000 | 0.456 | 0.971 | 4423 | 15.081 |
How to interpret a test with Se=0% and Sp=0%?
We can force Se+Sp >= 1.
Jouden’s Index Se + Sp -1 >= 0:
Or:
This allows the test to be useless, but not worse than useless.
Alternatively we can have the weakly informative priors:
To give the model some information that we expect the test characteristics to be closer to 100% than 0%.
Or we can use stronger priors for one or both tests.
A quick way to see the distribution of a prior:
This was minimally informative, but how does that compare to a weakly informative prior for e.g. sensitivity?
What we want is e.g. Beta(20,1)
But typically we have median and 95% confidence intervals from a paper, e.g.:
“The median (95% CI) estimates of the sensitivity and specificity of the shiny new test were 94% (92-96%) and 99% (97-100%) respectively”
Median (95% CI) estimates of Se and Sp were 94% (92-96%) and 99% (97-100%)
library("PriorGen")
## Loading required package: rootSolve
findbeta(themedian = 0.94, percentile=0.95, percentile.value = 0.92)
## [1] "The desired Beta distribution that satisfies the specified conditions is: Beta( 429.95 27.76 )"
## [1] "Here is a plot of the specified distribution."
## [1] "Descriptive statistics for this distribution are:"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8923 0.9324 0.9399 0.9394 0.9472 0.9747
## [1] "Verification: The percentile value 0.92 corresponds to the 0.05 th percentile"Note: themedian could also be themean
Part of the problem before was also that we were specifying extreme initial values:
Run the hw_definition model under the following
different scenarios and interpret the results in each case.
Change the priors for Se and Sp and try Beta(2,1).
Estimate the Beta parameters for the Se and Sp of the shiny new test described.
Run the model using the beta priors for Se and Sp from Step 2.
Try to run the model with different initial values.
Force Se + Sp > 1. Be careful to specify initial values that are within the restricted parameter space.
Basically an extension of the single-population model
Works best with multiple populations each with differing prevalence
What changes?
In each population the data are summarized in a two_x_two table (2^2 cells) again
or each population has a vector that contains all possible test results combinations
model{
for(p in 1:Populations){
Tally[1:4, p] ~ dmulti(prob[1:4, p], TotalTests[p])
# Test1- Test2- Pop1
prob[1, p] <- (prev[p] * ((1-se[1])*(1-se[2])) + ((1-prev[p]) * (sp[1]*sp[2]))
## snip ##
prev[p] ~ dbeta(1, 1)
}
se[1] ~ dbeta(se_prior[1,1], se_prior[1,2])T(1-sp[1], )
sp[1] ~ dbeta(sp_prior[1,1], sp_prior[1,2])
se[2] ~ dbeta(se_prior[2,1], se_prior[2,2])T(1-sp[2], )
sp[2] ~ dbeta(sp_prior[2,1], sp_prior[2,2])
#data# Tally, TotalTests, Populations, se_prior, sp_prior
#monitor# prev, prob, se, sp
#inits# prev, se, sp
}We have to be careful to make sure that the length of initial values
for prev in each chain is equal to the number of
populations
For example with 5 populations we need:
The values you choose for different populations in the same chain can be the same - just make sure you pick different values for the same population between chains (i.e. over-dispersed initial values)
Note: specifying your own initial values is technically optional with JAGS, but it is always a good idea (for now at least)!!!
Up to now prevalence has been a parameter, but it can also be (partially) observed:
model{
for(p in 1:Populations){
Tally[1:4, p] ~ dmulti(prob[1:4, p], TotalTests[p])
# Test1- Test2- Pop1
prob[1, p] <- (prev[p] * ((1-se[1])*(1-se[2])) + ((1-prev[p]) * (sp[1]*sp[2]))
## snip ##
prev[p] ~ dbeta(1, 1)
}
## snip ##
#data# Tally, TotalTests, Populations, se_prior, sp_prior, prev
#monitor# prev, prob, se, sp
#inits# prev, se, sp
}To fix the prevalence of population 1 we could do:
There are actually multiple ways to specify data and initial values
to runjags, including via the data and inits
arguments
We will use these to keep separate lists of data and initial values (these could also be data frames, or environments)
See the help file for ?run.jags for more details
Let’s simulate a dataset to work on…
# Set a random seed so that the data are reproducible:
set.seed(2022-07-14)
sensitivity <- c(0.9, 0.6)
specificity <- c(0.95, 0.9)
N <- 1000
# Change the number of populations here:
Populations <- 5
# Change the variation in prevalence here:
(prevalence <- runif(Populations, min=0.1, max=0.9))
data <- tibble(Population = sample(seq_len(Populations), N, replace=TRUE)) %>%
mutate(Status = rbinom(N, 1, prevalence[Population])) %>%
mutate(Test1 = rbinom(N, 1, sensitivity[1]*Status + (1-specificity[1])*(1-Status))) %>%
mutate(Test2 = rbinom(N, 1, sensitivity[2]*Status + (1-specificity[2])*(1-Status)))
(twoXtwoXpop <- with(data, table(Test1, Test2, Population)))
(Tally <- matrix(twoXtwoXpop, ncol=Populations))
(TotalTests <- apply(Tally, 2, sum))Start with 5 populations and analyse the data using the independent prevalence model.
Now try with 2, 3, and 10 populations.
Now change the simulated prevalence so that it varies between 0.4-0.6 rather than 0.1-0.9.
This is what the model should look like:
model{
for(p in 1:Populations){
Tally[1:4, p] ~ dmulti(prob[1:4, p], TotalTests[p])
# Test1- Test2-
prob[1,p] <- (prev[p] * ((1-se[1])*(1-se[2]))) + ((1-prev[p]) * ((sp[1])*(sp[2])))
# Test1+ Test2-
prob[2,p] <- (prev[p] * ((se[1])*(1-se[2]))) + ((1-prev[p]) * ((1-sp[1])*(sp[2])))
# Test1- Test2+
prob[3,p] <- (prev[p] * ((1-se[1])*(se[2]))) + ((1-prev[p]) * ((sp[1])*(1-sp[2])))
# Test1+ Test2+
prob[4,p] <- (prev[p] * ((se[1])*(se[2]))) + ((1-prev[p]) * ((1-sp[1])*(1-sp[2])))
prev[p] ~ dbeta(1, 1)
}
se[1] ~ dbeta(se_prior[1,1], se_prior[1,2])T(1-sp[1], )
sp[1] ~ dbeta(sp_prior[1,1], sp_prior[1,2])
se[2] ~ dbeta(se_prior[2,1], se_prior[2,2])T(1-sp[2], )
sp[2] ~ dbeta(sp_prior[2,1], sp_prior[2,2])
#data# Tally, TotalTests, Populations, se_prior, sp_prior
#monitor# prev, se, sp
#inits# prev, se, sp
#module# lecuyer
}
Here is the R code to run the model:
set.seed(2022-07-14)
# Set up the se_prior and sp_prior variables (optional):
se_prior <- matrix(1, nrow=2, ncol=2)
sp_prior <- matrix(1, nrow=2, ncol=2)
# Set up initial values for 5 populations:
se <- list(chain1=c(0.5,0.99), chain2=c(0.99,0.5))
sp <- list(chain1=c(0.5,0.99), chain2=c(0.99,0.5))
prev <- list(chain1=c(0.1, 0.1, 0.1, 0.9, 0.9), chain2=c(0.9, 0.9, 0.9, 0.1, 0.1))
# And run the model:
results_5p <- run.jags("multipopulation.txt", n.chains=2)
# Remember to check convergence!
# plot(results_5p)summary(results_5p)
## Lower95 Median Upper95 Mean SD Mode MCerr
## prev[1] 0.64871206 0.7499185 0.8412375 0.7484544 0.04931864 NA 0.0008185605
## prev[2] 0.53243100 0.6354872 0.7299415 0.6342936 0.05051519 NA 0.0008191104
## prev[3] 0.22950999 0.3195574 0.4127857 0.3206858 0.04706803 NA 0.0007504695
## prev[4] 0.17078981 0.2612663 0.3600515 0.2632483 0.04879474 NA 0.0009273518
## prev[5] 0.06213104 0.1335139 0.2182322 0.1367285 0.04055939 NA 0.0007190464
## se[1] 0.76954762 0.8453448 0.9206063 0.8451176 0.03901526 NA 0.0007539896
## se[2] 0.55886277 0.6298138 0.7003766 0.6301646 0.03633644 NA 0.0006306248
## sp[1] 0.85893515 0.9125015 0.9699363 0.9130937 0.02836927 NA 0.0005538714
## sp[2] 0.87601852 0.9155925 0.9551532 0.9153571 0.02010692 NA 0.0003505415
## MC%ofSD SSeff AC.10 psrf
## prev[1] 1.7 3630 0.07007367 1.000097
## prev[2] 1.6 3803 0.06272667 1.000599
## prev[3] 1.6 3934 0.06963468 1.000236
## prev[4] 1.9 2769 0.10733603 1.000262
## prev[5] 1.8 3182 0.08766467 1.000951
## se[1] 1.9 2678 0.08612754 1.000194
## se[2] 1.7 3320 0.08199093 1.000313
## sp[1] 2.0 2623 0.11689246 1.000297
## sp[2] 1.7 3290 0.06616639 1.000099To change the number of populations and range of prevalence you just need to modify the simulation code, for example 3 populations with prevalence between 0.4-0.6 can be obtained using:
# Change the number of populations here:
Populations <- 3
# Change the variation in prevalence here:
(prevalence <- runif(Populations, min=0.4, max=0.6))
data <- tibble(Population = sample(seq_len(Populations), N, replace=TRUE)) %>%
mutate(Status = rbinom(N, 1, prevalence[Population])) %>%
mutate(Test1 = rbinom(N, 1, sensitivity[1]*Status + (1-specificity[1])*(1-Status))) %>%
mutate(Test2 = rbinom(N, 1, sensitivity[2]*Status + (1-specificity[2])*(1-Status)))
(twoXtwoXpop <- with(data, table(Test1, Test2, Population)))
(Tally <- matrix(twoXtwoXpop, ncol=Populations))
(TotalTests <- apply(Tally, 2, sum))
# Adjust initial values for 3 populations:
se <- list(chain1=c(0.5,0.99), chain2=c(0.99,0.5))
sp <- list(chain1=c(0.5,0.99), chain2=c(0.99,0.5))
prev <- list(chain1=c(0.1, 0.1, 0.9), chain2=c(0.9, 0.9, 0.1))
# And run the model:
results_3p <- run.jags("multipopulation.txt", n.chains=2)
# Remember to check convergence!
# plot(results_3p)
# summary(results_3p)Note that when the effective sample size is not enough - you either need to run the model for longer in the first place, or extend it to get more samples:
As a general rule, the more populations you have, and the more the prevalence varies between them, the better. However, this is conditional on having a consistent sensitivity and specificity between your populations!!!
Time for another break