library(kableExtra, warn.conflicts=F, quietly=T)
library(dplyr, warn.conflicts=F, quietly=T)
library(ggplot2, warn.conflicts=F, quietly=T)
library(EpiEstim, warn.conflicts=F, quietly=T)
linelist <- read.csv("https://raw.githubusercontent.com/blopman/epi569/master/Rushmore_Noro_LineList.csv")
# Read in the data
# Changing the onset date variable into the R date format
linelist$onset_date <- as.Date(as.character(linelist$onset_date), format="%m/%d/%Y")
#code to answer Q1#
table (linelist$lab.confirmed)
##
## N Y
## 103 6
table (linelist$hospitalized)
##
## N
## 109
table(linelist$gender)
##
## F M
## 77 32
table (linelist$case_type)
##
## Staff Student_MPHyear1 Student_MPHyear2 Student_PhD
## 7 67 28 7
median (linelist$age)
## [1] 24
IQR (linelist$age)
## [1] 3
tapply(linelist$age, linelist$case_type, median)
## Staff Student_MPHyear1 Student_MPHyear2 Student_PhD
## 56 23 25 30
tapply(linelist$age, linelist$case_type, IQR)
## Staff Student_MPHyear1 Student_MPHyear2 Student_PhD
## 12.5 2.0 2.0 7.0
Start your analysis with a basic description of the outbreak by answering the following questions:
ANSWER FIRST: 10/16/2019 LAST: 11/27/2019
ANSWER 42 days
How many cases total cases were there?
ANSWER 109 cases
ANSWER 6/109 (5.5%)
What number and percent of cases were hospitalized? * ANSWER 0/109 (0%)
What number and percent of cases were female?
ANSWER 77/109 (70.6%)
What number and percent of cases were students? Staff?
ANSWER STUDENTS: 102/109 (93.6%) STAFF: 7/109 (6.4%)
Of student cases, what number and percent freshman, sophmores and juniors?
ANSWER Student_MPHyear1 = 67/102 (65.7%) Student_MPHyear2 = 28 /102 (27.5%) Student_PhD =7/102 (6.9%)
What was the median age of cases overall and by “case type”? Include the interquartile ranges (IQR).
ANSWER
Median (IQR) Age
all cases: 24 (3) years
Staff: 56 (12.5) years
Student_MPHyear1: 23 (2.0) years
Student_MPHyear2: 25 (2.0) years
Student_PhD: 30 (7.0) years
Next, use the code below to plot the epidemic (epi) curve for this outbreak. Symptom onset dates will be plotted on the x-axis and case counts on the y-axis.
# Plotting the epidemic curve
epicurve <- as.data.frame(linelist %>% group_by(onset_date) %>% summarize(incidence=n()))
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
plot1 <- ggplot(data=epicurve, aes(x=onset_date, y=incidence, labs=FALSE)) +
geom_bar(stat="identity", fill="gray45") +
scale_x_date(date_breaks = "2 days", date_labels = "%m/%d") +
scale_y_continuous(breaks = seq(0,15, by = 2)) +
theme(axis.text.x = element_text(angle = 45)) +
labs(x = "Symptom Onset Date", y = "Number of Cases", title = "Epi Curve for Rushmore Academy Norovirus Outbreak")
plot1
How many peaks are there?
ANSWER - THis looks like propogated waves with at least two peaks and possibly a tiny third peak that would likely be difficult to differentiate signal from the noise. The first large peak occurs around 10/23, the second one smaller occurs around 11/13 and third (ish) occurs around 11/21.
At what time(s) do cases start to decline?
ANSWER around the end of October (it appears to be October 25).
Does the epidemic curve support the theory that the outbreak began with the vomiting incident in the CNR auditorium on 10/16? Briefly explain why or why not.
ANSWER Yes. This is because there is an incubation period-sized pause in the cases after the first case, then all cases occur after the auditorium incident. Additionally, there are no retroactively discovered cases outlined in the data.
Does there appear to be sustained person-to-person transmission after the initial vomiting incidence? Briefly explain why or why not.
ANSWER Yes, this is because there are subsequent cases with smaller peaks.
Why is it difficult to determine who infected whom (i.e., the infector-infected pairs) from this epidemic curve?
# First, convert the data to a format that can be used in the EpiEstim wallinga_teunis function
# Data must be in the following format:
### 1 column for symptom onset dates in ascending order, including dates on which 0 cases were reported, titled "dates"
### 1 column for case counts (incidence) titled "I"
### Note: to calculate an Rt estimate for day day 1 of the outbreak, we must start our epi curve 2 days prior the first symptom onset date
epicurve2 <- epicurve %>% arrange(onset_date) %>% rename(dates = onset_date, I = incidence)
all.dates <- as.data.frame(seq(as.Date("2019-10-14"), by = "day", length.out = 45))
names(all.dates) <- "dates"
epicurve.epiestim <- merge(x=epicurve2, y=all.dates, by="dates", all="TRUE")
epicurve.epiestim <- epicurve.epiestim %>% mutate(I = ifelse(is.na(I), 0, I))
# Next, run the code below to estimate Rt, along with 95% confidence intervals for Rt estimates
# This requires that we specify the mean and standard deviation of the serial interval
# An offset gamma distribution will be used for the serial interval (by default)
mean_si <- 3.6
std_si <- 2.0
estimates <- wallinga_teunis(epicurve.epiestim$I,
method="parametric_si",
config = list(t_start = seq(3, 45),
t_end = seq(3, 45),
mean_si = mean_si,
std_si = std_si,
n_sim = 1000))
# You can examine the serial interval distribution using the code below
plot(estimates$si_distr)
# Lastly, use the code below to plot the Rt estimates and 95% CIs over the epi curve to examine trends
plot2.data <- cbind(epicurve, estimates$R$`Mean(R)`,
estimates$R$`Quantile.0.025(R)`, estimates$R$`Quantile.0.975(R)`)
names(plot2.data) <- c("dates", "I", "R", "lowerCI", "upperCI")
plot2 <- ggplot(data=plot2.data, aes(x=dates, y=I, labs=FALSE)) +
geom_bar(stat="identity", fill="gray45") +
scale_x_date(date_breaks = "2 days", date_labels = "%m/%d") +
scale_y_continuous(breaks = seq(0,15, by = 2)) +
theme(axis.text.x = element_text(angle = 45)) +
labs(x = "Symptom Onset Date",
y = "Number of Cases",
title = "Epi Curve for Rushmore Academy Norovirus Outbreak") +
geom_hline(aes(yintercept=1), colour="red", linetype="dashed", size=0.5) +
geom_errorbar(data=plot2.data, aes(ymax=upperCI, ymin=lowerCI, width=0.6),stat="identity", size=0.8, show.legend=FALSE) +
geom_line(data=plot2.data[!is.na(plot2.data$R),],aes(x=dates, y=R), color='blue', size=0.5) +
geom_point(data = plot2.data, aes(x=dates, y=R), size=1.2, show.legend=FALSE)
plot2
How does infectiousness change over the course of the outbreak? ANSWER When Rt is above 1, the disease is spreading and we are in an outbreak. The infectiousness peaks early with an Rt of 4 (meaning 4 people are infected on each of the first few days of the outbreak. Then it lowers to below one for a time and peaks back up to just shy of 3. right before the second wave. It does not appear to rise above one after 11/12.
Why does the individual with the last illness onset date (11/27) have a R(t) of 0 (95% CI: 0, 0)? ANSWER The individual with the last illness onset has an Rt of 0 because they do not infect anyone else.
Why do cases with onset dates on days with large case counts (e.g., 10/25 with n=15) have relatively small R(t) estimates, despite there being several cases that occur after these dates? ANSWER Because the days with large case counts are the number of cases that were infected about two days prior. When there is a smaller R(t) there will be fewer cases on subsequent days.
Why does the outbreak continue even after Rt falls below 1?
What happens to the R(t) estimate when you decrease the mean (e.g., to 2 days)?
ANSWER - When the mean serial interval is reduced, the maxiumums on the R(t) estimate are reduced. For example, if the mean Serial Interval is halved, the peak R(t) is also halved.
What happens to the R(t) estimate when you increase the mean (e.g., to 6 days)?
ANSWER - When the mean serial interval is increased, the peak R(t) is exponentially increased.
Briefly explain why you see these changes.
mean_si <- 3.6# Change this to answer Question 4
std_si <- 2.0
estimates <- wallinga_teunis(epicurve.epiestim$I,
method="parametric_si",
config = list(t_start = seq(3, 45),
t_end = seq(3, 45),
mean_si = mean_si,
std_si = std_si,
n_sim = 1000))
plot2.data <- cbind(epicurve, estimates$R$`Mean(R)`,
estimates$R$`Quantile.0.025(R)`, estimates$R$`Quantile.0.975(R)`)
names(plot2.data) <- c("dates", "I", "R", "lowerCI", "upperCI")
plot2 <- ggplot(data=plot2.data, aes(x=dates, y=I, labs=FALSE)) +
geom_bar(stat="identity", fill="gray45") +
scale_x_date(date_breaks = "4 days", date_labels = "%m/%d") +
scale_y_continuous(breaks = seq(0,15, by = 2)) +
theme(axis.text.x = element_text(angle = 45)) +
labs(x = "Symptom Onset Date",
y = "Number of Cases",
title = "Epi Curve for Rushmore Academy Norovirus Outbreak") +
geom_hline(aes(yintercept=1), colour="red", linetype="dashed", size=0.5) +
geom_errorbar(data=plot2.data, aes(ymax=upperCI, ymin=lowerCI, width=0.6),stat="identity", size=0.8, show.legend=FALSE) +
geom_line(data=plot2.data[!is.na(plot2.data$R),],aes(x=dates, y=R), color='blue', size=0.5) +
geom_point(data = plot2.data, aes(x=dates, y=R), size=1.2, show.legend=FALSE)
plot2
List 3 limitations of using the Wallinga-Teunis method to estimate R(t).
ANSWER One should be careful using the W-T method to estimate R(t), because it is sensitive to the assumptions about the serial interval. 1) it assumes transmission only occurs among the cases reported in the Epi Curve. (that is missing cases impact the results); 2) It assumes that asymptomatic transmission is not occuring; 3) that all reported cases are a part of the same outbreak (not belonging to another outbreak).
Why would this approach not work well for estimating R(t) for a disease that is not primarily spread person-to-person (e.g., vectorborne, waterborne, or foodborne)?
ANSWER
With vectorborne, waterborne, and foodborne cases, not all cases are captured as we don’t know where the vectors are going or how far the food or water contamination could have gone. For example, we will likely not have every case from a contaminated food recall. Additionally, there may be asymptomatic carriage/ transmission of these diseases. This will limit the use of W-T in these situations.