For a motivation, mathematical background, and the details of the problem, see the post from last week: https://rpubs.com/hugomoises/RepeatedTesting1.This week, we will elaborate on the details of how to create such interactive plots using plotly.
The following is a question, verbatim, from previous exams and homeworks of mine.
“(5 points) A medical condition occurs in 20% of the population. There is an inexpensive market test which reports 30% false negatives and 10% false positives. Different test applications can be considered independent. a) (2 points) Calculate the probability that a person who shows positive in the test actually suffers the condition? b) (2 points) A hypochondriac takes the test twice. What is the likelihood that he really suffers the condition if one of the two test results is positive, and the other is negative? c) (1 points) If two different people take the test and both come out positive, what is the probability that both suffer from the condition? Extra credit: (d) (1 point EC) The hypochondriac takes the test 100 times. What is the likelihood that s/he/they really suffers the condition if sixty times the test is positive and forty times the test is negative? (e) (1 point EC) Suppose the hypochondriac actually has the condition, if s/he/they takes the test 100 times, what is the expected number of positive results? (And if s/he/they does not have the condition?). Use that to explain your answer in (d). Leave all your answers indicated if necessary.”
We can visualize the solution to this problem using plotly. Let’s start using the parameters above, and we can then modify them. The advantage of plotly is that we can hover over with the mouse on the final figure, and see the calculated probabilities of interest.
Remember that if you don’t have plotly installed yet, you need to install it first. You can install it by typiting install.packages(“plotly”) in the console.
library(plotly)
n<-30
prev<-0.2
sens<-0.7
spec<-0.9
These are the green and red bar charts. They represent the conditional probabilities of the number of positive test results given the two possible states of the nature: Not having the condition, which we will graph in green, and having the condition, which we will graph in red. Note that the R package is plotly, but the funciton we are going to use is plot_ly.
# Add Distribution for people without COVID.
fig <- plot_ly(
x = 0:n,
y = dbinom(0:n, n, 1-spec),
name = "No COVID",
type = "bar",
marker = list(color = 'green')
)
# Add Distribution for people with COVID.
fig <- fig %>% add_bars(
y = dbinom(0:n, n, sens),
name = "COVID",
marker = list(color = 'red') )
fig
To calculate the posterior probability we are interested in (the probability of having the condition given the test results), we use the Bayes Theorem. Note that the posterior probability is significantly differently (both conceptually and numerically/mathematically) from the conditional probabilities above (the likelihoods, or the green and red bar graphs).
#Add Posterior Probability
fig <- fig %>% add_lines(
y = dbinom(0:n, n, sens)*prev /
(dbinom(0:n, n, sens)*prev + dbinom(0:n, n, 1-spec)*(1-prev) ) ,
name = "Prob",
marker = list(color = 'black') )
fig
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
Note that, for visualization purposes, it is convenient to scale the conditional probabilities. The posterior probability goes from 0 to 1 (using the full range), but the conditional probabilities sum to 1, therefore each one of them is much smaller, making it difficult to visualize them effectively. How do you scale the conditional probabilities?
#Notice why we need to scale the conditional probabilities:
bar1 = dbinom(0:n, n, 1-spec) #Green bar graph (When the person has No Covid)
bar2 = dbinom(0:n, n, sens) #Red bar graph (When the person has Covid)
#Note that it is convenient to calculate the densities and assign them to an object.
#That way we don't have to calculate them over and over again.
fig <- plot_ly(
x = 0:n,
y = bar1/max(bar1),
name = "No COVID",
type = "bar",
marker = list(color = 'green')
)
fig <- fig %>% add_bars(
y = bar2/max(bar2),
name = "COVID",
marker = list(color = 'red') )
fig
The corresponding scaled probabilities have the interpretation of the relative likelihood of each test result (relative to the most likely result, which is given a value of 100%). They are not probabilities in the sense that they do not sum to one, but they keep the same distribution and they have an ideal scale for visualization.
The goal here is to color-code the posterior probability. We want the marker for the probability of having Covid-19 given the test results to be if such probability is less than 50%. We want it to be when such probability is more than 50%. To do that, first, assign the posterior probabilities to an object (say, p), second, check when they are larger than 50%, and finally, assign the colors as a new object.
#Make the colors change after the probability jumps above 50%.
#5.1. Assign the posterior probabilities to an object.
p<-dbinom(0:n, n, sens)*prev /
(dbinom(0:n, n, sens)*prev + dbinom(0:n, n, 1-spec)*(1-prev) )
#5.2. Check when they are larger than 50%.
n1<-sum(p>1/2)
#5.3. Create a vector of colors.
colors<- c( rep("green",n-n1+1), rep("red",n1) )
#5.4 (finally). Add it to the graph.
fig <- fig %>% add_lines(
y = p ,
name = "P(COVID|+s)",
marker = list(color = colors, mode='markers') )
fig
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
If we want, we can add a horizontal line to denote the 50% probability. To do that, we just add it to the graph as another layer.
fig <- fig %>% add_lines(
y = 0.5 ,
name = "50% line",
line = list(color = 'grey', mode='lines', width = 1),
marker = list(color = 'black', mode='marker', size = 0.5))
fig
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
Finally, once we have our figure, we can make it a function by combining our steps above. Let’s call the function graph3, and give it the above parameters by default.
graph3 = function(n=30, sens=0.7, spec=0.9, prev=0.2){
#Calculate pmf's:
bar1 = dbinom(0:n, n, 1-spec)
bar2 = dbinom(0:n, n, sens)
#Calculate scaled probabilities
fig <- plot_ly(
x = 0:n,
y = bar1/max(bar1),
name = "No COVID",
type = "bar",
marker = list(color = 'green')
)
fig <- fig %>% add_bars(
y = bar2/max(bar2),
name = "COVID",
marker = list(color = 'red') )
#Calculate Posterior Probability
p<-bar2*prev /
(bar2*prev + bar1*(1-prev) )
#Make the colors change after the probability jumps above 50%.
n1<-sum(p>1/2);colors<- c( rep("green",n-n1+1), rep("red",n1) )
fig <- fig %>% add_lines(
y = p ,
name = "P(COVID|+s)",
marker = list(color = colors, mode='markers') )
#Add 50% probability Line
fig <- fig %>% add_lines(
y = 0.5 ,
name = "50% line",
line = list(color = 'grey', mode='lines', width = 1),
marker = list(color = 'black', mode='marker', size = 0.5))
#Add titles
fig <- fig %>% layout( title = 'Number of Tests Taken, Likelihood (bars),
and Posterior Probability of Having Covid-19',
xaxis =list(title='Number of Positive Test Results (+s)'),
yaxis = list(title ='Scaled Probability'))
fig
}
Now that you have your function, experiment with it by varying the parameters.
#Experiment
graph3()
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
graph3(n=10, spec=0.9, sens=0.95, prev=0.1)
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
graph3(n=20, spec=0.8, sens=0.85, prev=0.3)
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
graph3(spec=0.7, sens=0.8)
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
Let’s change the quality of the tests parameters to model more realistic COVID-19 scenarios. Information regarding the quality of the tests (sensitivity and specificity) can be obtained from the manufacturer, from the CDC or from the FDA. We can also find the information compiled by New York Times, for example, which list that information: https://www.nytimes.com/wirecutter/reviews/at-home-covid-test-kits/
For convenience, here it is a list of the top 4 NYT’s recommended tests, their sensitivity, specificity, and their documentation:
Abbott BinaxNow COVID-19 Antigen Self Test. Sensitivity: 84.6% (PDF) within seven days of symptom onset. Specificity: 98.5% (PDF) within seven days of symptom onset. Tests included: two. App needed: no. Omicron variant detected: yes. Cost: about $20 at the time of publication. Availability: Amazon, CVS, Walgreens.
Access Bio CareStart COVID-19 Antigen Home Test. Sensitivity: 87% (PDF) within seven days of symptom onset. Specificity: 98% (PDF) within seven days of symptom onset. Tests included: two.App needed: no. Omicron variant detected: yes. Cost: about $20 at the time of publication. Availability: Target.
BD Veritor At-Home COVID-19 Test (app required). Sensitivity: 84.6% (PDF), Specificity: 99.8% (PDF). Tests included: two. App needed: yes (iOS and Android). Omicron variant detected: yes. Cost: about $30 at the time of publication. Availability: Amazon.
Acon Flowflex COVID-19 Antigen Home Test. Sensitivity: 93% (PDF), Specificity: 100% ((PDF)). Tests included: one. App needed: no. Omicron variant detected: yes. Cost: about $10 at the time of publication. Availability: CVS, Target, Walgreens, WellBefore.
Let’s use plot_ly to visualize the probabilities of each of the settings above, in each of our TV computer stations! A direct link to the plot_ly tutorial is here:
Note: For now, we are assuming that a person repeats testing using the same test. We can, however, improve our protocol by mixing up tests. Why is this better? Let’s have an in person conversation about that. The underlying probability distribution becomes a mixture of binomials with different probabilities of success, which is not binomial anymore, but we can still model it easily and use the Bayes Theorem in the same way we did above, we just need to know the results with each test.