COVID-Testing Probabilities.

The last two weeks we discussed the problem and several solutions. Those include a closed form mathematical solution (using probability theory), simulation using R, and visualization using Shiny Web Apps and Plotly. For a rough guide of our work the last few weeks, check https://rpubs.com/hugomoises. In particular, for a motivation of the problem and the deployment of a Shiny Web App, check https://rpubs.com/hugomoises/RepeatedTesting1; and for a step-by-step guide to visualize the solution using the function plot_ly from the R package plotly, see https://rpubs.com/hugomoises/RepeatedTesting2.

Last week, we ended our session with an amazing discussion. I am grateful for your active participation and bringing to light so many interesting points. In particular, a few points were particularly important, in my opinion, or at least they stuck in my mind. First, brought by Dr. Bill Mongan, was the point about the size of the conference and how the likelihood of getting an outbreak grows quickly with the number of people involved. Second, along those lines and brought by Dr. Andrea Kauffman-Berry, was the point of calculating the number of tests necessary to guarantee certain safety. Finally, brought by Anna Albright, was the point of the conditions under which the tests need to be taken to guarantee that the manufacturer’s specificity and sensitivity are accurate and useful. Of course, I am paraphrasing these points the best I can, but if my own words are not accurate, please let me know.

This week, I would like to elaborate around the points mentioned above. This will provide even more room for discussion and brainstorm. If there are new points or ideas, please bring them on. Also, if you have points or ideas from last week, I would love to hear them and have a conversation about them too!

The New Problem (Part 1)

Following up with our previous problem (here). We have already done the math to calculate the probability $p_1$ that one person has the condition given the results of the test. Suppose now that $N$ people, all with probability $p_1$ of having the condition, meet. Assuming that when they meet there is transmission (assuming a highly transmissible virus such the Omicron variant of SARS-CoV-2), what’s the probability of an outbreak?

We actually calculated that probability, mathematically, last week. First, we defined the probability of interest as $p_N$. This is the probability that at least one person in the group of $N$ has the condition. We can use the inclusion-exclusion principle to formulate this probability directly. Link to Brilliant and link to Wikipedia. However, we can use the trick that $1-p_N$ is very easy to calculate because it is the probability of no one in the group of $N$ having the condition, which is $(1-p_1)^N$. Therefore, the probability we are looking for is: $p_N = 1-(1-p_1)^N$.

R Code solution.

Write a function in R that returns $p_N$ when given the parameters $p_1$ and $N$.

#Write a function in R that returns pN when given the parameters p1 and N.
pN_fun<-function(p1,N){
  pN = 1-(1-p1)^N
  return(pN)
}

#Test the function with p1=0.01 and N=100. 
pN_fun(0.01,100)

## [1] 0.6339677

Understand the function

To understand this function (and how R works in a vetorized way), suppose that the probability of any person having the condition is 1%, and the meeting size is a vector from 1 to 1000. 1) Graph the probabilities of an outbreak. 2) Find the meeting size such that the probability $P_N$ just passes 50%.

#1) Graph the probabilities of an outbreak as a function of N. 
#Caculate the probabilities
x<-1:1000
pN<-pN_fun(0.01,x)

#Graph: 
plot(x,pN)

#Add labels: 
plot(x,pN,
     xlab="Meeting Size (# of People)", 
     ylab="Outbreak Probability")

#2) Find the meeting size such that the probability $P_N$ just passes 50%. 
min(which(pN>0.5))

## [1] 69

Use Plotly

To make the previous graph more appealing and interactive, we can use plotly.

#Load the library
library(plotly)

## Loading required package: ggplot2

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

#Make the graph: 
plot_ly(x=~x, y=~pN)

## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter

## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode

#Use examples from last week to add title and labels:

The New Problem (Part 2)

Now suppose we want the minimum number of tests to take so that we guarantee certain safety conditions. Before we answer this question directly, let’s try solve for $p_1$ and $N$ in our equation for $p_N$ above $1-p_N = (1-p_1)^N$, and create respective functions to calculate their values as functions of the other parameters. Solving for $N$ and $p_1$, we get, respectively:

$N=log(1-p_N)/log(1-p_1)$.

$p_1 = 1-(1-p_N)^{1/N}$.

R Code (Part 2)

Write functions in R that returns $p_1$ and $N$ as functions of the other parameters. Then test the functions with different paramters.

#Write functions in R that returns p1 and N as functions of the other parameters. 

N_fun<-function(p1,pN){
  N=log(1-pN)/log(1-p1)
return(N)
  }

p1_fun<-function(pN,N){
  p1 = 1-(1-p1)^(1/N)
  return(p1)
}

#Test the functions with different parameters.

R Code (Part 3)

Use the functions above, and the probabilities we know from the previous meetings, to calculate the minimum number of tests to take to guarantee some degree of safety.

First, find the posterior probabilities (from previous meetings) as functions of the sensitivity, specificity, and number of test taken.

posterior_prob<-function(n=30, prev=0.2, sens=0.7, spec=0.9){
#  n<-30 #Number of test taken. 
#prev<-0.2
#sens<-0.7
#spec<-0.9

  p<-dbinom(0:n, n, sens)*prev / 
  (dbinom(0:n, n, sens)*prev + dbinom(0:n, n, 1-spec)*(1-prev) )
  return(p)
}

posterior_prob()

##  [1] 1.214234e-15 2.549891e-14 5.354772e-13 1.124502e-11 2.361454e-10
##  [6] 4.959054e-09 1.041401e-07 2.186938e-06 4.592369e-05 9.635125e-04
## [11] 1.985123e-02 2.984026e-01 8.993123e-01 9.946968e-01 9.997462e-01
## [16] 9.999879e-01 9.999994e-01 1.000000e+00 1.000000e+00 1.000000e+00
## [21] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## [26] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## [31] 1.000000e+00

Second, let’s use our functions above for $p_N$ and $N$ to map our posterior probabilities to our social gathering setting with $N$ people. Interpret the results.

#Interpret these results. 
pN_fun(posterior_prob(),N=100)

##  [1] 1.221245e-13 2.553513e-12 5.354606e-11 1.124500e-09 2.361454e-08
##  [6] 4.959053e-07 1.041396e-05 2.186701e-04 4.581945e-03 9.189721e-02
## [11] 8.653519e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## [16] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## [21] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## [26] 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## [31] 1.000000e+00

N_fun(posterior_prob(), pN=0.01)

##  [1] 8.229580e+12 3.935886e+11 1.876952e+10 8.937600e+08 4.255994e+07
##  [6] 2.026664e+06 9.650781e+04 4.595615e+03 2.188436e+02 1.042591e+01
## [11] 5.012409e-01 2.835909e-02 4.377837e-03 1.918206e-03 1.213969e-03
## [16] 8.875891e-04 6.995092e-04 5.772002e-04 4.912970e-04 4.276509e-04
## [21] 3.786035e-04 3.396528e-04 3.078915e-04 2.820103e-04 0.000000e+00
## [26] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [31] 0.000000e+00

Now, change the parameters of posterior_prob() and pN_fun and N_fun to calculate your probabilities in different settings: 1) An advisor-advisee meeting (N=2), 2) A typical CIE class (N capped at 16), 3) A typical classroom at Ursinus College (N capped at 25), 4) The whole Ursinus Student population (N about 1,500 students)

The New Problem (Part 3)

Finally, let’s review carefully the manufacturers’ claims to have a discussion. We want to know under which conditions the tests are valid. From the list of tests I provided last week, note that they mention that the some tests need to be taken within seven days of the symptom onset. So, what if a person does not have symptoms? What if the manufacturer’s claim have errors too? Would there be any bias? If so, in which direction? Let’s have a discussion about that.

Kind regards, Prof. Hugo M. Montesinos-Yufa

For convenience, here it is a list of the top 4 NYT’s recommended tests, their sensitivity, specificity, and their documentation:

Abbott BinaxNow COVID-19 Antigen Self Test. Sensitivity: 84.6% (PDF) within seven days of symptom onset. Specificity: 98.5% (PDF) within seven days of symptom onset. Tests included: two. App needed: no. Omicron variant detected: yes. Cost: about $20 at the time of publication. Availability: Amazon, CVS, Walgreens.
Access Bio CareStart COVID-19 Antigen Home Test. Sensitivity: 87% (PDF) within seven days of symptom onset. Specificity: 98% (PDF) within seven days of symptom onset. Tests included: two.App needed: no. Omicron variant detected: yes. Cost: about $20 at the time of publication. Availability: Target.
BD Veritor At-Home COVID-19 Test (app required). Sensitivity: 84.6% (PDF), Specificity: 99.8% (PDF). Tests included: two. App needed: yes (iOS and Android). Omicron variant detected: yes. Cost: about $30 at the time of publication. Availability: Amazon.
Acon Flowflex COVID-19 Antigen Home Test. Sensitivity: 93% (PDF), Specificity: 100% ((PDF)). Tests included: one. App needed: no. Omicron variant detected: yes. Cost: about $10 at the time of publication. Availability: CVS, Target, Walgreens, WellBefore.

Let’s use plot_ly to visualize the probabilities of each of the settings above, in each of our TV computer stations! A direct link to the plot_ly tutorial is here:

Note: For now, we are assuming that a person repeats testing using the same test. We can, however, improve our protocol by mixing up tests. Why is this better? Let’s have an in person conversation about that. The underlying probability distribution becomes a mixture of binomials with different probabilities of success, which is not binomial anymore, but we can still model it easily and use the Bayes Theorem in the same way we did above, we just need to know the results with each test.

COVID-Testing Probabilities 3. Ursinus College

Initiative for Inclusive Data Science and Innovative Research (IDSIRI)

Prof. Hugo Montesinos and Prof. Olga Nicoara

2022-Oct-10 (Monday, Week 7)