Hui-Walter models - Adjusting for conditional dependencies between tests

CA18208 HARMONY Zurich Training School - https://harmony-net.eu/

Sonja Hartnack
Valerie Hungerbühler
Eleftherios Meletis

2022-07-14

Recap Hui-Walter

Criticism of the Hui-Walter paradigm assumption (1)

Criticism of the Hui-Walter paradigm assumption (2)

Criticism of the Hui-Walter paradigm assumption (2)

Criticism of the Hui-Walter paradigm assumption (3)

Condtional independencies

Conditional independence

\[P(T\textsubscript{1}\textsuperscript{+},T\textsubscript{2}\textsuperscript{+} \vert D\textsuperscript{+}) = P(T\textsubscript{1}\textsuperscript{+} \vert D\textsuperscript{+}) \times P(T\textsubscript{2}\textsuperscript{+} \vert D\textsuperscript{+})]\]

Conditional dependence

\[P(T\textsubscript{1}\textsuperscript{+},T\textsubscript{2}\textsuperscript{+} \vert D\textsuperscript{+}) \neq P(T\textsubscript{1}\textsuperscript{+} \vert D\textsuperscript{+}) \times P(T\textsubscript{2}\textsuperscript{+} \vert D\textsuperscript{+})\]

Conditional dependencies

Conditional (in)dependencies Interpretation

Conditional dependencies

Conditional dependencies

Conditional dependencies

Example of a COVID-19 data set

Dealing with correlation

It helps to consider the data simulation as a (simplified) biological process (where my parameters are not representative of real life!).

2 pops - 3 tests

# The probability of infection with COVID in two populations:
prevalence <- c(0.01,0.05)
# The probability of shedding COVID in the nose conditional on infection:
nose_shedding <- 0.8
# The probability of shedding COVID in the throat conditional on infection:
throat_shedding <- 0.8
# The probability of detecting virus with the antigen test:
antigen_detection <- 0.75
# The probability of detecting virus with the PCR test:
pcr_detection <- 0.999
# The probability of random cross-reaction with the antigen test:
antigen_crossreact <- 0.05
# The probability of random cross-reaction with the PCR test:
pcr_crossreact <- 0.01

Note: cross-reactions are assumed to be independent!

Simulating latent states:

N <- 20000
Populations <- length(prevalence)

covid_data <- tibble(Population = sample(seq_len(Populations), N, replace=TRUE)) %>%
  ## True infection status:
  mutate(Status = rbinom(N, 1, prevalence[Population])) %>%
  ## Nose shedding status:
  mutate(Nose = Status * rbinom(N, 1, nose_shedding)) %>%
  ## Throat shedding status:
  mutate(Throat = Status * rbinom(N, 1, throat_shedding))

Simulating test results:

covid_data <- covid_data %>%
  ## The nose swab antigen test may be false or true positive:
  mutate(NoseAG = case_when(
    Nose == 1 ~ rbinom(N, 1, antigen_detection),
    Nose == 0 ~ rbinom(N, 1, antigen_crossreact)
  )) %>%
  ## The throat swab antigen test may be false or true positive:
  mutate(ThroatAG = case_when(
    Throat == 1 ~ rbinom(N, 1, antigen_detection),
    Throat == 0 ~ rbinom(N, 1, antigen_crossreact)
  )) %>%
  ## The PCR test may be false or true positive:
  mutate(ThroatPCR = case_when(
    Throat == 1 ~ rbinom(N, 1, pcr_detection),
    Throat == 0 ~ rbinom(N, 1, pcr_crossreact)
  ))

The overall sensitivity of the tests can be calculated as follows:

covid_sensitivity <- c(
  # Nose antigen:
  nose_shedding*antigen_detection + (1-nose_shedding)*antigen_crossreact,
  # Throat antigen:
  throat_shedding*antigen_detection + (1-throat_shedding)*antigen_crossreact,
  # Throat PCR:
  throat_shedding*pcr_detection + (1-throat_shedding)*pcr_crossreact
)
covid_sensitivity
## [1] 0.6100 0.6100 0.8012

The overall specificity of the tests is more straightforward:

covid_specificity <- c(
  # Nose antigen:
  1 - antigen_crossreact,
  # Throat antigen:
  1 - antigen_crossreact,
  # Throat PCR:
  1 - pcr_crossreact
)
covid_specificity
## [1] 0.95 0.95 0.99

However: this assumes that cross-reactions are independent!

Model specification

prob[1,p] <-  prev[p] * ((1-se[1])*(1-se[2])*(1-se[3]) 
                         +covse12 +covse13 +covse23) +
              (1-prev[p]) * (sp[1]*sp[2]*sp[3] 
                             +covsp12 +covsp13 +covsp23)

prob[2,p] <- prev[p] * (se[1]*(1-se[2])*(1-se[3]) 
                           -covse12 -covse13 +covse23) +
               (1-prev[p]) * ((1-sp[1])*sp[2]*sp[3] 
                              -covsp12 -covsp13 +covsp23)

## snip ##
        
# Covariance in sensitivity between tests 1 and 2:
covse12 ~ dunif( (se[1]-1)*(1-se[2]) , 
                     min(se[1],se[2]) - se[1]*se[2] )
# Covariance in specificity between tests 1 and 2:
covsp12 ~ dunif( (sp[1]-1)*(1-sp[2]) , 
                     min(sp[1],sp[2]) - sp[1]*sp[2] )

It is quite easy to get the terms slightly wrong!

Template Hui-Walter

The model code and data format for an arbitrary number of populations (and tests) can be determined automatically using the template_huiwalter function from the runjas package:

template_huiwalter(
  covid_data %>% select(Population, NoseAG, ThroatAG, ThroatPCR), 
  outfile = 'covidmodel.txt', covariance=TRUE)

This generates self-contained model/data/initial values etc

model{

    ## Observation layer:

    # Complete observations (N=20000):
    for(p in 1:Populations){
        Tally_RRR[1:8,p] ~ dmulti(prob_RRR[1:8,p], N_RRR[p])

        prob_RRR[1:8,p] <- se_prob[1:8,p] + sp_prob[1:8,p]
    }


    ## Observation probabilities:

    for(p in 1:Populations){

        # Probability of observing NoseAG- ThroatAG- ThroatPCR- from a true positive::
        se_prob[1,p] <- prev[p] * ((1-se[1])*(1-se[2])*(1-se[3]) +covse12 +covse13 +covse23)
        # Probability of observing NoseAG- ThroatAG- ThroatPCR- from a true negative::
        sp_prob[1,p] <- (1-prev[p]) * (sp[1]*sp[2]*sp[3] +covsp12 +covsp13 +covsp23)

        # Probability of observing NoseAG+ ThroatAG- ThroatPCR- from a true positive::
        se_prob[2,p] <- prev[p] * (se[1]*(1-se[2])*(1-se[3]) -covse12 -covse13 +covse23)
        # Probability of observing NoseAG+ ThroatAG- ThroatPCR- from a true negative::
        sp_prob[2,p] <- (1-prev[p]) * ((1-sp[1])*sp[2]*sp[3] -covsp12 -covsp13 +covsp23)

        # Probability of observing NoseAG- ThroatAG+ ThroatPCR- from a true positive::
        se_prob[3,p] <- prev[p] * ((1-se[1])*se[2]*(1-se[3]) -covse12 +covse13 -covse23)
        # Probability of observing NoseAG- ThroatAG+ ThroatPCR- from a true negative::
        sp_prob[3,p] <- (1-prev[p]) * (sp[1]*(1-sp[2])*sp[3] -covsp12 +covsp13 -covsp23)

        # Probability of observing NoseAG+ ThroatAG+ ThroatPCR- from a true positive::
        se_prob[4,p] <- prev[p] * (se[1]*se[2]*(1-se[3]) +covse12 -covse13 -covse23)
        # Probability of observing NoseAG+ ThroatAG+ ThroatPCR- from a true negative::
        sp_prob[4,p] <- (1-prev[p]) * ((1-sp[1])*(1-sp[2])*sp[3] +covsp12 -covsp13 -covsp23)

        # Probability of observing NoseAG- ThroatAG- ThroatPCR+ from a true positive::
        se_prob[5,p] <- prev[p] * ((1-se[1])*(1-se[2])*se[3] +covse12 -covse13 -covse23)
        # Probability of observing NoseAG- ThroatAG- ThroatPCR+ from a true negative::
        sp_prob[5,p] <- (1-prev[p]) * (sp[1]*sp[2]*(1-sp[3]) +covsp12 -covsp13 -covsp23)

        # Probability of observing NoseAG+ ThroatAG- ThroatPCR+ from a true positive::
        se_prob[6,p] <- prev[p] * (se[1]*(1-se[2])*se[3] -covse12 +covse13 -covse23)
        # Probability of observing NoseAG+ ThroatAG- ThroatPCR+ from a true negative::
        sp_prob[6,p] <- (1-prev[p]) * ((1-sp[1])*sp[2]*(1-sp[3]) -covsp12 +covsp13 -covsp23)

        # Probability of observing NoseAG- ThroatAG+ ThroatPCR+ from a true positive::
        se_prob[7,p] <- prev[p] * ((1-se[1])*se[2]*se[3] -covse12 -covse13 +covse23)
        # Probability of observing NoseAG- ThroatAG+ ThroatPCR+ from a true negative::
        sp_prob[7,p] <- (1-prev[p]) * (sp[1]*(1-sp[2])*(1-sp[3]) -covsp12 -covsp13 +covsp23)

        # Probability of observing NoseAG+ ThroatAG+ ThroatPCR+ from a true positive::
        se_prob[8,p] <- prev[p] * (se[1]*se[2]*se[3] +covse12 +covse13 +covse23)
        # Probability of observing NoseAG+ ThroatAG+ ThroatPCR+ from a true negative::
        sp_prob[8,p] <- (1-prev[p]) * ((1-sp[1])*(1-sp[2])*(1-sp[3]) +covsp12 +covsp13 +covsp23)

    }


    ## Priors:

    # Prevalence in population 1:
    prev[1] ~ dbeta(1,1)

    # Prevalence in population 2:
    prev[2] ~ dbeta(1,1)


    # Sensitivity of NoseAG test:
    se[1] ~ dbeta(1,1)T(1-sp[1], )
    # Specificity of NoseAG test:
    sp[1] ~ dbeta(1,1)

    # Sensitivity of ThroatAG test:
    se[2] ~ dbeta(1,1)T(1-sp[2], )
    # Specificity of ThroatAG test:
    sp[2] ~ dbeta(1,1)

    # Sensitivity of ThroatPCR test:
    se[3] ~ dbeta(1,1)T(1-sp[3], )
    # Specificity of ThroatPCR test:
    sp[3] ~ dbeta(1,1)


    # Covariance in sensitivity between NoseAG and ThroatAG tests:
    covse12 ~ dunif( (se[1]-1)*(1-se[2]) , min(se[1],se[2]) - se[1]*se[2] )  ## if the sensitivity of these tests may be correlated
    # covse12 <- 0  ## if the sensitivity of these tests can be assumed to be independent
    # Calculated relative to the min/max for ease of interpretation:
    corse12 <- ifelse(covse12 < 0, -covse12 / ((se[1]-1)*(1-se[2])), covse12 / (min(se[1],se[2]) - se[1]*se[2]))

    # Covariance in specificity between NoseAG and ThroatAG tests:
    covsp12 ~ dunif( (sp[1]-1)*(1-sp[2]) , min(sp[1],sp[2]) - sp[1]*sp[2] )  ## if the specificity of these tests may be correlated
    # covsp12 <- 0  ## if the specificity of these tests can be assumed to be independent
    # Calculated relative to the min/max for ease of interpretation:
    corsp12 <- ifelse(covsp12 < 0, -covsp12 / ((sp[1]-1)*(1-sp[2])), covsp12 / (min(sp[1],sp[2]) - sp[1]*sp[2]))

    # Covariance in sensitivity between NoseAG and ThroatPCR tests:
    covse13 ~ dunif( (se[1]-1)*(1-se[3]) , min(se[1],se[3]) - se[1]*se[3] )  ## if the sensitivity of these tests may be correlated
    # covse13 <- 0  ## if the sensitivity of these tests can be assumed to be independent
    # Calculated relative to the min/max for ease of interpretation:
    corse13 <- ifelse(covse13 < 0, -covse13 / ((se[1]-1)*(1-se[3])), covse13 / (min(se[1],se[3]) - se[1]*se[3]))

    # Covariance in specificity between NoseAG and ThroatPCR tests:
    covsp13 ~ dunif( (sp[1]-1)*(1-sp[3]) , min(sp[1],sp[3]) - sp[1]*sp[3] )  ## if the specificity of these tests may be correlated
    # covsp13 <- 0  ## if the specificity of these tests can be assumed to be independent
    # Calculated relative to the min/max for ease of interpretation:
    corsp13 <- ifelse(covsp13 < 0, -covsp13 / ((sp[1]-1)*(1-sp[3])), covsp13 / (min(sp[1],sp[3]) - sp[1]*sp[3]))

    # Covariance in sensitivity between ThroatAG and ThroatPCR tests:
    covse23 ~ dunif( (se[2]-1)*(1-se[3]) , min(se[2],se[3]) - se[2]*se[3] )  ## if the sensitivity of these tests may be correlated
    # covse23 <- 0  ## if the sensitivity of these tests can be assumed to be independent
    # Calculated relative to the min/max for ease of interpretation:
    corse23 <- ifelse(covse23 < 0, -covse23 / ((se[2]-1)*(1-se[3])), covse23 / (min(se[2],se[3]) - se[2]*se[3]))

    # Covariance in specificity between ThroatAG and ThroatPCR tests:
    covsp23 ~ dunif( (sp[2]-1)*(1-sp[3]) , min(sp[2],sp[3]) - sp[2]*sp[3] )  ## if the specificity of these tests may be correlated
    # covsp23 <- 0  ## if the specificity of these tests can be assumed to be independent
    # Calculated relative to the min/max for ease of interpretation:
    corsp23 <- ifelse(covsp23 < 0, -covsp23 / ((sp[2]-1)*(1-sp[3])), covsp23 / (min(sp[2],sp[3]) - sp[2]*sp[3]))

}

#monitor# se, sp, prev, covse12, corse12, covsp12, corsp12, covse13, corse13, covsp13, corsp13, covse23, corse23, covsp23, corsp23

## Inits:
inits{
"se" <- c(0.5, 0.99, 0.5)
"sp" <- c(0.99, 0.75, 0.99)
"prev" <- c(0.05, 0.95)
"covse12" <- 0
"covse13" <- 0
"covse23" <- 0
"covsp12" <- 0
"covsp13" <- 0
"covsp23" <- 0
}
inits{
"se" <- c(0.99, 0.5, 0.99)
"sp" <- c(0.75, 0.99, 0.75)
"prev" <- c(0.95, 0.05)
"covse12" <- 0
"covse13" <- 0
"covse23" <- 0
"covsp12" <- 0
"covsp13" <- 0
"covsp23" <- 0
}

## Data:
data{
"Populations" <- 2
"N_RRR" <- c(9823, 10177)
"Tally_RRR" <- structure(c(8693, 491, 451, 25, 91, 21, 20, 31, 8604, 539, 470, 27, 135, 80, 134, 188), .Dim = c(8, 2))
}

And can be run directly from R:

results <- run.jags('covidmodel.txt')
results
Lower95 Median Upper95 SSeff psrf
se[1] 0.458 0.575 0.672 679 1.002
se[2] 0.519 0.687 0.799 241 1.002
se[3] 0.710 0.932 1.000 155 1.002
sp[1] 0.940 0.944 0.949 1031 1.000
sp[2] 0.946 0.950 0.954 2352 1.000
sp[3] 0.987 0.990 0.992 2111 1.001
prev[1] 0.005 0.008 0.011 392 1.001
prev[2] 0.039 0.046 0.062 172 1.004
covse12 -0.035 -0.001 0.032 1036 1.003
corse12 -0.340 -0.005 0.206 1035 1.005
covsp12 -0.001 0.000 0.001 3098 1.000
corsp12 -0.166 0.001 0.021 3834 1.000
covse13 -0.027 0.001 0.043 649 1.001
corse13 -0.991 0.052 0.788 1607 1.000
covsp13 0.000 0.000 0.001 3298 1.000
corsp13 -0.276 0.040 0.140 4289 1.000
covse23 -0.012 0.014 0.071 227 1.001
corse23 -0.686 0.342 1.000 1502 1.000
covsp23 -0.001 0.000 0.000 3866 1.000
corsp23 -0.917 -0.238 0.059 4705 1.000

Template Hui-Walter

Activating covariance terms

Find the lines for the covariances that we want to activate (i.e. the two Throat tests):

You will also need to uncomment out the relevant initial values for BOTH chains:

Session Summary

Exercise 1

Use the template_huiwalter function to look at the simple 2-test 5-population example from the previous session. Use this data simulation code:

# Set a random seed so that the data are reproducible:
set.seed(2022-07-14)

sensitivity <- c(0.9, 0.6)
specificity <- c(0.95, 0.9)
N <- 1000

# Change the number of populations here:
Populations <- 5
# Change the variation in prevalence here:
(prevalence <- runif(Populations, min=0.1, max=0.9))
## [1] 0.7069277 0.5860048 0.2746717 0.2833154 0.2223198
data <- tibble(Population = sample(seq_len(Populations), N, replace=TRUE)) %>%
  mutate(Status = rbinom(N, 1, prevalence[Population])) %>%
  mutate(Test1 = rbinom(N, 1, sensitivity[1]*Status + (1-specificity[1])*(1-Status))) %>%
  mutate(Test2 = rbinom(N, 1, sensitivity[2]*Status + (1-specificity[2])*(1-Status))) %>%
  select(-Status)

(twoXtwoXpop <- with(data, table(Test1, Test2, Population)))
## , , Population = 1
## 
##      Test2
## Test1   0   1
##     0  47  19
##     1  48  76
## 
## , , Population = 2
## 
##      Test2
## Test1   0   1
##     0  68  15
##     1  47  68
## 
## , , Population = 3
## 
##      Test2
## Test1   0   1
##     0 117  13
##     1  24  38
## 
## , , Population = 4
## 
##      Test2
## Test1   0   1
##     0 141  23
##     1  34  30
## 
## , , Population = 5
## 
##      Test2
## Test1   0   1
##     0 142  14
##     1  22  14
(Tally <- matrix(twoXtwoXpop, ncol=Populations))
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   47   68  117  141  142
## [2,]   48   47   24   34   22
## [3,]   19   15   13   23   14
## [4,]   76   68   38   30   14
(TotalTests <- apply(Tally, 2, sum))
## [1] 190 198 192 228 192
template_huiwalter(data, outfile="template_2test.txt")
## The model and data have been written to template_2test.txt in the current working directory
## You should check and alter priors before running the model

Look at the model code and familiarise yourself with how the model is set out (there are some small differences, but the overall code is equivalent).Run the model.

Now activate the correlation terms between tests 1 and 2. Is anything different about the results?

Solution 1

There is no particular solution to the first part of this exercise, but please ask if you have any questions about the model code that template_huiwalter generates. Remember that re-running the template_huiwalter function will over-write your existing model including any changes you made, so be careful!

We can run the model as follows:

results_nocov <- run.jags("template_2test.txt")
results_nocov
## 
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##                                                                                
##          Lower95  Median Upper95    Mean       SD Mode      MCerr MC%ofSD SSeff
## se[1]    0.76802  0.8461 0.92014  0.8456 0.039191   --  0.0007934       2  2440
## se[2]    0.55836 0.63135 0.70492 0.63123 0.037549   -- 0.00071653     1.9  2746
## sp[1]    0.85586 0.91146 0.97151 0.91225 0.029347   -- 0.00061709     2.1  2262
## sp[2]    0.87501 0.91525 0.95441 0.91515 0.020252   --  0.0003679     1.8  3030
## prev[1]  0.64829 0.74905  0.8444 0.74805 0.050415   -- 0.00093379     1.9  2915
## prev[2]  0.53247 0.63352 0.73414 0.63312 0.051896   -- 0.00094638     1.8  3007
## prev[3]  0.22504 0.31825 0.41138 0.31934 0.047595   --  0.0008117     1.7  3438
## prev[4]  0.16672 0.25931 0.35834  0.2616 0.049737   -- 0.00098809       2  2534
## prev[5] 0.060407 0.13241 0.22357  0.1361  0.04236   -- 0.00079048     1.9  2872
## covse12        0       0       0       0        0    0         --      --    --
## corse12        0       0       0       0        0    0         --      --    --
## covsp12        0       0       0       0        0    0         --      --    --
## corsp12        0       0       0       0        0    0         --      --    --
##                         
##            AC.10    psrf
## se[1]    0.12917  1.0003
## se[2]    0.11116  1.0003
## sp[1]    0.14197  1.0009
## sp[2]   0.095571  1.0002
## prev[1] 0.097392 0.99996
## prev[2]   0.1083  1.0011
## prev[3] 0.091657 0.99999
## prev[4]  0.12731 0.99997
## prev[5]  0.10795  1.0004
## covse12       --      --
## corse12       --      --
## covsp12       --      --
## corsp12       --      --
## 
## Total time taken: 2.5 seconds

A shortcut for activating the covariance terms is to re-run template_huiwalter as follows:

template_huiwalter(data, outfile="template_2test_cov.txt", covariance=TRUE)
## The model and data have been written to template_2test_cov.txt in the current working directory
## You should check and alter priors before running the model
results_cov <- run.jags("template_2test_cov.txt")
results_cov
## 
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##                                                                               
##            Lower95   Median  Upper95     Mean       SD Mode      MCerr MC%ofSD
## se[1]      0.64334  0.78313  0.92503  0.78225 0.080113   --  0.0039838       5
## se[2]      0.46716  0.58033  0.69473  0.58065 0.061467   --  0.0027459     4.5
## sp[1]      0.78245   0.8699  0.95701  0.87066 0.045484   --  0.0017701     3.9
## sp[2]      0.81867  0.88325  0.94182  0.88219 0.032786   --  0.0011804     3.6
## prev[1]    0.65461  0.80202  0.99928   0.8094  0.10004   --  0.0047808     4.8
## prev[2]    0.50103  0.67093  0.87807   0.6814  0.09997   --  0.0044265     4.4
## prev[3]    0.16506  0.30641  0.46291  0.30739 0.075858   --  0.0027595     3.6
## prev[4]    0.09868  0.23953  0.39325  0.24133   0.0754   --   0.002858     3.8
## prev[5] 8.5597e-06 0.095068  0.20717 0.098895 0.060947   --  0.0021398     3.5
## covse12  -0.033542 0.031655 0.087755 0.029275 0.035117   --   0.001644     4.7
## corse12   -0.75389   0.2532  0.59616   0.1296  0.37299   --   0.014715     3.9
## covsp12 -0.0068164 0.022291 0.058063 0.023697 0.018691   -- 0.00070442     3.8
## corsp12   -0.54337  0.24481  0.55833  0.18583  0.27171   --  0.0071299     2.6
##                             
##         SSeff   AC.10   psrf
## se[1]     404 0.63444  1.001
## se[2]     501  0.5454 1.0014
## sp[1]     660 0.48627 1.0019
## sp[2]     771 0.43144 1.0012
## prev[1]   438 0.58728 1.0005
## prev[2]   510  0.5245 1.0004
## prev[3]   756 0.40901      1
## prev[4]   696  0.4083 1.0003
## prev[5]   811 0.40666 1.0006
## covse12   456 0.59882 1.0014
## corse12   642 0.44924 1.0028
## covsp12   704 0.46141  1.001
## corsp12  1452 0.19305 1.0003
## 
## Total time taken: 3.1 seconds

Activating the covariance terms with 2 tests has made the model less identifiable, and has therefore decreased the effective sample size and increased the width of the 95% CI for all of the parameters to the point that the model is no longer very useful. This is not something that we recommend you do in practice, even if the two tests are known to be correlated! We will revisit this issue tomorrow.