Abstract

With a full year of the pandemic behind us and vaccination rates on the rise, many workplaces are looking forward to a return to normalcy and in-person work. Employers have many mechanisms at their disposal when crafting a return to work scheme that balances the well-being of their employees with the friction that comes from implementing a system of this type. In order to help contextualize these mechanisms and assign weights to these variables, I utilized modeling to simplify this complex reality and determine how a proxy for COVID risk changes under various return to work schemes.

To speak to some of the specifics: our theoretical office and each of its employees were based in New York City, the testing protocol of choice was the Rapid Antigen Assay, and we were focused on the impact of two key variables: the interval of time between mandatory scheduled tests imposed by the employer, and the amount of time in days that an individual would wait between becoming symptomatic and seeking an additional, non-mandated test.

We found that the total days any individual was contagious in our office increased linearly as the average testing lag increased. The impact of mandated testing was more interesting: we saw that biweekly and triweekly testing resulted in nearly equivalent COVID presence, with weekly testing leading to a 25% decrease in contagious days.

Literature Review

Confirmed Cases

Through the last year, we have used a variety of metrics to contextualize the pandemic and answer questions like “How bad is COVID right now?” and “Are things getting better or worse?” Perhaps the most considered data are the quantity of new cases in a region. We will be using data of this sort as the basis for our model - specifically two tables from the New York City Department of Health: “Case-Rate-by-Modzcta” and “Cases-by-Day”. These are both publicly available and regularly updated on Github.

Each of these tables is limited in some way.

Case-Rate-by-Modzcta:

Rate of Cases per 100k residents
By Modzcta (Modzcta is essentially an adjusted ZIP Code)
Weekly average
Earliest Data: Aug 2020
Only confirmed cases

Cases-by-Day:

New Case Count
By borough
Daily Count
Earliest Data: March 2020
Values for both confirmed and probable cases

Shared Considerations:

Each table only considers each positive case one time - on the date in which the earliest positive molecular test result was delivered.
Limited to NYC residents
Reported by resident address.

In order to build an effective model, we require a table containing the percentage of individuals in a given area who received a positive test result on a given day, starting from the beginning of the pandemic. This requires us to merge these two tables with a few concessions. We can realistically select higher fidelity in one of two forms: we can either elect for the ZIP Code level data from the Case-Rate-by-Modzcta table, or the daily level data summarized by borough offered by the Cases-by-Day file. We will opt for the ZIP Code, by week, summarization.

Looking at the logistics of transforming our data: for dates between 3/1/2020 and 8/1/2020, for which we do not have Case-Rate-by-Modzcta entries, we will use population data from the US census to convert case count to case rate, and use the borough average rate for each ZIP Code in a given borough.

I elected to run this transformation on all data from the start of the pandemic through the week of 8/8. The week of 8/8 is redundant as it is first available week in our Case-Rate-by-Modzcta file, however, this can serve as a control check to ensure the transformation is accurate. We find that the adjustment from the by-day data is fairly close to the by week summary with an “All-City” rate of 20.11 vs 19.02. This variation is likely due to this note in the ReadMe file on the DOH GitHub: “Note that sum of counts in this file may not match values in citywide tables because of records with missing geographic information”. For our purposes, this transformation is acceptable.

# Load Packages
library(lubridate)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(ggthemes)
library(ggridges)
library(gridExtra)
library(cowplot)
library(formatR)
# Import Data

# Caserate by Week x <-
# 'https://raw.githubusercontent.com/ChristopherBloome/coronavirus-data/master/trends/caserate-by-modzcta.csv'
x <- "C:\\Users\\bloom\\Documents\\Thesis\\caserate.csv"
Caserate_by_week_Modzcta <- read.csv(x)


# Caserate by Day
x <- "Casecount_by_day_borough"
# x <-
# 'https://raw.githubusercontent.com/nychealth/coronavirus-data/master/trends/cases-by-day.csv'
Casecount_by_day_borough <- read.csv(x)

# Build DF for weeks prior to first entry in
# 'Case-Rate-by-Modzcta'
x <- seq(from = mdy("3/7/2020"), to = mdy("8/8/2020"), by = 7)
df = data.frame(matrix(vector(), length(x), 184), stringsAsFactors = F)
names(df) <- names(Caserate_by_week_Modzcta)
df$week_ending <- x


# Create df, a, with columns for each borough and city, that
# are summed for each week.
Casecount_by_day_borough$date_of_interest <- mdy(Casecount_by_day_borough$date_of_interest)
a <- Casecount_by_day_borough %>% filter(date_of_interest < mdy("3/8/2020")) %>% 
    summarise(CASERATE_CITY = sum(CASE_COUNT), CASERATE_BX = sum(BX_CASE_COUNT), 
        CASERATE_BK = sum(BK_CASE_COUNT), CASERATE_MN = sum(MN_CASE_COUNT), 
        CASERATE_QN = sum(QN_CASE_COUNT), CASERATE_SI = sum(SI_CASE_COUNT))

for (i in 2:nrow(df)) {
    x <- Casecount_by_day_borough %>% filter(date_of_interest <= 
        df$week_ending[i], date_of_interest > df$week_ending[i - 
        1]) %>% summarise(CASERATE_CITY = sum(CASE_COUNT), CASERATE_BX = sum(BX_CASE_COUNT), 
        CASERATE_BK = sum(BK_CASE_COUNT), CASERATE_MN = sum(MN_CASE_COUNT), 
        CASERATE_QN = sum(QN_CASE_COUNT), CASERATE_SI = sum(SI_CASE_COUNT))
    
    a <- rbind(a, x)
    
}

# Import Popultion figures from Census to convert Case Count
# to Case Rate
x <- "CensusData"
# x <-
# 'https://raw.githubusercontent.com/ChristopherBloome/Thesis/main/QuickFacts%20Mar-07-2021.csv'
CensusData <- read.csv(x, stringsAsFactors = FALSE)
CensusData <- CensusData %>% select(New.York.city..New.York, 
    Bronx.County..Bronx.Borough...New.York, Kings.County..Brooklyn.Borough...New.York, 
    New.York.County..Manhattan.Borough...New.York, Queens.County..Queens.Borough...New.York, 
    Richmond.County..Staten.Island.Borough...New.York)
CensusData <- CensusData[1, ]
names(CensusData) <- c("Total_Pop", "BX_Pop", "BK_POP", "MN_POP", 
    "QN_POP", "SI_POP")

# Use Census data to change Count to Rate, add to DF

df$CASERATE_CITY <- a$CASERATE_CITY/as.numeric(str_replace_all(CensusData$Total_Pop[1], 
    ",", "")) * 1e+05
df$CASERATE_BX <- a$CASERATE_BX/as.numeric(str_replace_all(CensusData$BX_Pop[1], 
    ",", "")) * 1e+05
df$CASERATE_BK <- a$CASERATE_BK/as.numeric(str_replace_all(CensusData$BK_POP[1], 
    ",", "")) * 1e+05
df$CASERATE_SI <- a$CASERATE_SI/as.numeric(str_replace_all(CensusData$SI_POP[1], 
    ",", "")) * 1e+05
df$CASERATE_QN <- a$CASERATE_QN/as.numeric(str_replace_all(CensusData$QN_POP[1], 
    ",", "")) * 1e+05
df$CASERATE_MN <- a$CASERATE_MN/as.numeric(str_replace_all(CensusData$MN_POP[1], 
    ",", "")) * 1e+05

# Import File mapping ZIP Codes to Borough
x <- "Zip_Modzcta"
# x <-
# 'https://raw.githubusercontent.com/ChristopherBloome/coronavirus-data/master/Geography-resources/ZCTA-to-MODZCTA.csv'
Zip_Modzcta <- read.csv(x)

x <- "Zip_Borough"
# x <-
# 'https://raw.githubusercontent.com/ChristopherBloome/Thesis/main/ZipToBorough.csv'
Zip_Borough <- read.csv(x)
names(Zip_Borough) <- c("X", "Borough", "Zip_Code")
Zip_Borough <- subset(Zip_Borough, select = -X)
Zip_Modzcta <- inner_join(Zip_Modzcta, Zip_Borough, by = c(MODZCTA = "Zip_Code"))
Zip_Modzcta <- distinct(subset(Zip_Modzcta, select = -ZCTA))
Zip_Modzcta$MODZCTA <- paste("CASERATE", Zip_Modzcta$MODZCTA, 
    sep = "_")

# Add string to make Borough match column names
BoroughList <- unique(Zip_Modzcta$Borough)
for (i in 1:length(BoroughList)) {
    x <- Zip_Modzcta %>% filter(Borough == BoroughList[i])
    # Uses Borough Total for Zip Totals
    for (j in 1:nrow(x)) {
        y <- paste("CASERATE", BoroughList[i], sep = "_")
        z <- x[j, 1]
        df[[z]] <- df[[y]]
    }
}

# Check Accuracy of pre Aug adjustment, merge into - Final
# File 'CaseRate_DF'
CaseRate_DF <- Caserate_by_week_Modzcta
CaseRate_DF$week_ending <- mdy(CaseRate_DF$week_ending)

t1 <- df %>% filter(week_ending == mdy("8/8/2020")) %>% select(week_ending, 
    CASERATE_CITY, CASERATE_BX, CASERATE_BK, CASERATE_MN, CASERATE_QN, 
    CASERATE_SI)


t2 <- CaseRate_DF %>% filter(week_ending == mdy("8/8/2020")) %>% 
    select(week_ending, CASERATE_CITY, CASERATE_BX, CASERATE_BK, 
        CASERATE_MN, CASERATE_QN, CASERATE_SI)

CaseRate_DF <- CaseRate_DF %>% filter(week_ending != mdy("8/8/2020"))
CaseRate_DF <- rbind(df, CaseRate_DF)


print("Aprox 8/8 Rates From Transformation")

## [1] "Aprox 8/8 Rates From Transformation"

print(t1[, 2:6])

##   CASERATE_CITY CASERATE_BX CASERATE_BK CASERATE_MN CASERATE_QN
## 1      20.11559    27.28798    19.88357    17.92834    18.10229

print("Actual 8/8 Rates")

## [1] "Actual 8/8 Rates"

print(t2[, 2:6])

##   CASERATE_CITY CASERATE_BX CASERATE_BK CASERATE_MN CASERATE_QN
## 1         19.02       26.51       18.75        15.9       17.44

Probable cases

The NYC Department of Health considers anyone who tested positive for COVID with a Rapid Antigen test a “probable case” until that diagnosis is confirmed with a PCR/Molecular test. For the purposes of identifying and removing duplicates, we note that an individual who tests with a rapid test that is later confirmed, is only counted once in our tables - on the date of their “confirmed” PCR/Molecular test diagnosis. Individuals which test first via a Rapid test and are later confirmed are later removed from the probable count, and attributed to the date of their PCR/Molecular result.

Again, our probable case data is only available with by-borough fidelity. That being noted, we can study the rate of probable and total cases and apply our findings to the more granular data set:

# Create DF with probably case
Prob_DF = ""
a <- Casecount_by_day_borough$CASE_COUNT
b <- Casecount_by_day_borough$PROBABLE_CASE_COUNT + Casecount_by_day_borough$CASE_COUNT

Prob_DF <- data.frame(a, b)
# Plot Total Cases over Time
p1 <- ggplot(Prob_DF, aes(x = seq(1, nrow(Prob_DF), by = 1), 
    y = a)) + geom_line() + labs(title = "NYC New Confirmed COVID Case / Day", 
    x = "Day of Pandemic", y = "Case Count") + theme_few()

# Above plot is noisey - take 7 day rolling average
Prob_DF$a1 <- ""
Prob_DF$b1 <- ""

for (i in 7:nrow(Prob_DF)) {
    x <- i - 6
    y <- i
    Prob_DF$a1[i] <- sum(Prob_DF$a[x:y])
    Prob_DF$b1[i] <- sum(Prob_DF$b[x:y])
}
# Plot Confirmed vs Total Cases over Time
p2 <- ggplot(Prob_DF) + geom_line(aes(x = seq(1, nrow(Prob_DF), 
    by = 1), y = as.numeric(a1), color = "Confirmed")) + geom_line(aes(x = seq(1, 
    nrow(Prob_DF), by = 1), y = as.numeric(b1), color = "Confirmed + Probable")) + 
    labs(title = "NYC New Confirmed COVID Case / 7 Day Avg", 
        x = "Day of Pandemic", y = "Case Count") + guides(color = guide_legend(title = "")) + 
    theme_few() + scale_color_tableau("Classic Blue-Red 6")


Prob_DF$Prob_Rate <- as.numeric(Prob_DF$b1)/as.numeric(Prob_DF$a1)

# Plot percentage of Probably Cases over Total
p3 <- ggplot(Prob_DF) + geom_line(aes(x = seq(1, nrow(Prob_DF), 
    by = 1), y = as.numeric(Prob_Rate))) + labs(title = "NYC Rate of Probable COVID Cases / Confirmed Cases ", 
    x = "Day of Pandemic", y = "(Confirmed + Probable) \n / Confirmed") + 
    theme_few()

grid.arrange(p1, p2, p3, nrow = 3)

Above we note two observations.

The day to day data is very noisy. We should use a 7 day rolling average to obtain a clearer understanding of case rate.
The rate of Probable Cases over time is increasing, relative to the total case count.

This second note implies that we cannot use a flat rate to find probable cases from confirmed cases, and instead need to consider at what point in the pandemic we are measuring confirmed cases, then use our probable case by day trend to approximate total cases.

As we have data by borough, lets see if there is considerable variation by borough within the city:

# Build DF with Prob and Total cases by Borough
a <- Casecount_by_day_borough$BK_PROBABLE_CASE_COUNT
b <- Casecount_by_day_borough$BK_PROBABLE_CASE_COUNT + Casecount_by_day_borough$BK_CASE_COUNT

c <- Casecount_by_day_borough$MN_PROBABLE_CASE_COUNT
d <- Casecount_by_day_borough$MN_PROBABLE_CASE_COUNT + Casecount_by_day_borough$MN_CASE_COUNT

e <- Casecount_by_day_borough$QN_PROBABLE_CASE_COUNT
f <- Casecount_by_day_borough$QN_PROBABLE_CASE_COUNT + Casecount_by_day_borough$QN_CASE_COUNT

g <- Casecount_by_day_borough$BX_PROBABLE_CASE_COUNT
h <- Casecount_by_day_borough$BX_PROBABLE_CASE_COUNT + Casecount_by_day_borough$BX_CASE_COUNT

i <- Casecount_by_day_borough$SI_PROBABLE_CASE_COUNT
j <- Casecount_by_day_borough$SI_PROBABLE_CASE_COUNT + Casecount_by_day_borough$SI_CASE_COUNT

k <- Casecount_by_day_borough$PROBABLE_CASE_COUNT
l <- Casecount_by_day_borough$PROBABLE_CASE_COUNT + Casecount_by_day_borough$CASE_COUNT

borough_Prob_DF <- data.frame(a, b, c, d, e, f, g, h, i, j, k, 
    l)

# Take rolling 7 day average
borough_Prob_DF1 <- borough_Prob_DF

for (i in 7:nrow(borough_Prob_DF)) {
    x <- i - 6
    y <- i
    borough_Prob_DF1[i, ] <- t(colSums(borough_Prob_DF[x:y, ]))
}

borough_Prob_DF1 <- borough_Prob_DF1[7:nrow(borough_Prob_DF1), 
    ]


names(borough_Prob_DF1) <- c("BK_Prob", "BK_All", "MN_Prob", 
    "MN_All", "QN_Prob", "QN_All", "BX_Prob", "BX_All", "SI_Prob", 
    "SI_All", "CITY_Prob", "CITY_All")

# Represent Probable cases as percentage of total
borough_Prob_DF2 <- data.frame(matrix(NA, nrow = nrow(borough_Prob_DF1), 
    ncol = 0))
borough_Prob_DF2$BKProbAvg <- borough_Prob_DF1$BK_Prob/borough_Prob_DF1$BK_All
borough_Prob_DF2$MNProbAvg <- borough_Prob_DF1$MN_Prob/borough_Prob_DF1$MN_All
borough_Prob_DF2$QNProbAvg <- borough_Prob_DF1$QN_Prob/borough_Prob_DF1$QN_All
borough_Prob_DF2$BXProbAvg <- borough_Prob_DF1$BX_Prob/borough_Prob_DF1$BX_All
borough_Prob_DF2$SIProbAvg <- borough_Prob_DF1$SI_Prob/borough_Prob_DF1$SI_All
borough_Prob_DF2$CITYProbAvg <- borough_Prob_DF1$CITY_Prob/borough_Prob_DF1$CITY_All

# Plot by Borough
ggplot(borough_Prob_DF2) + geom_line(aes(x = seq(1, nrow(borough_Prob_DF2), 
    by = 1), y = as.numeric(BKProbAvg), color = "BK")) + geom_line(aes(x = seq(1, 
    nrow(borough_Prob_DF2), by = 1), y = as.numeric(MNProbAvg), 
    color = "MN")) + geom_line(aes(x = seq(1, nrow(borough_Prob_DF2), 
    by = 1), y = as.numeric(QNProbAvg), color = "QN")) + geom_line(aes(x = seq(1, 
    nrow(borough_Prob_DF2), by = 1), y = as.numeric(BXProbAvg), 
    color = "BX")) + geom_line(aes(x = seq(1, nrow(borough_Prob_DF2), 
    by = 1), y = as.numeric(SIProbAvg), color = "SI")) + geom_line(aes(x = seq(1, 
    nrow(borough_Prob_DF2), by = 1), y = as.numeric(CITYProbAvg), 
    color = "CITY")) + labs(title = "Probable Cases / Total Cases", 
    x = "Day of Pandemic", y = "Case Percent") + guides(color = guide_legend(title = "Borough")) + 
    theme_few() + scale_color_tableau("Classic Blue-Red 6")

Above we find that while there is some variation by borough, in general the city moves as one in this regard. We will use the city total to convert confirmed to probable cases.

# Pull in dates
Prob_DF$Date <- Casecount_by_day_borough$date_of_interest

Prob_DF <- Prob_DF %>% select(Date, Prob_Rate)

Week_List <- CaseRate_DF$week_ending

Week_List <- as.data.frame(Week_List)

# Only include avg value per week
Prob_Rate_DF <- inner_join(Week_List, Prob_DF, by = c(Week_List = "Date"))

In our simulations, we will be using both Confirmed and Probable cases.

DataType <- "CP"

Unreported Cases

In addition to documented cases, we also need to consider cases which are not reported. The Center for Disease Control suggests that only 1 in 4.2 symptomatic COVID-19 infections were reported nationwide (“Estimated Disease Burden,” 2021). While this may seem high, it is conservative when compared to the work of some researchers. Liu (2020) leveraged SIR modeling to find that “there might exist 19.5 cases undiagnosed while one infection reported in US counties averagely” (p. 6). It is worth noting that this 19.5 figure applies to all cases, not just symptomatic cases as with the figure from the CDC. Additionally, Liu found that this Unreported Infection rate in New York state was significantly below the national average at 7.2 (p. 13).

Considering these sources, we will run our model with the assumption that 1 in every 4 cases is reported. As we are using using both probable and confirmed cases, as well as asymptomatic cases, this will likely result in a value between that of the CDC and the New York average from the work of Liu (2020).

# The convention of this variable is a little unituitive:
# Total_Rate = Reported_Rate * (100 + Unreported_Rate)/100

Unreported_Rate <- 300

Now that we have a handle on our data, we can write a function to produce a tidy table which can later be used to find the percentage of the population in a given area, that will receive a positive test result on a given day. We will pass the rate of unreported cases, as well as the type of cases (Probable + Confirmed, or Confirmed) as variables in a function. Furthermore, with this in place, we can run our model parameters, and nest the resulting data frame in a function to pull the rate for a given region.

CR_Week_Clean <- function(DataTypeX, Unreported_RateX) {
    # function takes following arugmenets: DataTypeX ('P' or
    # 'CP') as argument to indicate confirmed or
    # confirmed+probable cases Unreproted: (Unreported / Reported
    # Cases)*100.  Please note that 0 = all cases reported.  100
    # = unreported cases equal half total cases.
    
    # Function returns tidy table of percentage of popualtion to
    # test positive for COVID during given week.
    
    if (DataTypeX == "CP") {
        
        Rate <- as.data.frame(t(t(CaseRate_DF[2:184]) * as.numeric(Prob_Rate_DF$Prob_Rate)))
        
    } else {
        
        Rate <- CaseRate_DF[2:184]
        
    }
    
    Rate <- Rate * ((Unreported_RateX + 100)/100)
    Rate$week_ending <- CaseRate_DF$week_ending
    
    
    x <- Rate %>% pivot_longer(!week_ending, names_to = "Region", 
        values_to = "Rate")
    
    x$Rate <- x$Rate/1e+05/7
    x
    
}

# Generate DF
CR_Week_Clean_DF <- CR_Week_Clean(DataType, Unreported_Rate)

GetRate <- function(REGIONX, DATEX) {
    
    xx <- CR_Week_Clean_DF %>% filter(week_ending == DATEX, Region == 
        paste("CASERATE_", REGIONX, sep = "")) %>% select(Rate)
    
    return(as.numeric(xx))
    
}

Community Variables

The data above indicates the percent of the population in a given area that received a positive test result on a given day. We need to learn more about the nature of the broader community’s behavior around testing in order to use this to estimate what percentage of the population is contagious and would not expected to be aware of this status.

There are a few landmark days in the progression of a COVID Case. As a guide, we will be considering the following:

Exposure date: At some point, each eventual COVID positive individual is exposed to the virus. While there is some research on the incubation period of the virus, we are not concerned with this date, as there are more timely indicators that are perhaps more relevant for our purposes.
Symptomatic date: This is the date a COVID positive individual first presents with symptoms. We will be approximating other key events relative to this date in our model.
Contagious date: The date a COVID positive individual is first contagious.
Testing date: The date that an individual has their testing sample collected.
Results date: This is the date that a COVID Case is first reported, and the individual receives their results. This is also the date which their result will be reflected in the NYC Department of Health data.

Contagiousness

In a paper titled Presymptomatic Viral Shedding and Infective Ability of SARS-CoV-2; A Case Report we find that, unlike with the SARS-CoV and MARS-CoV, individuals presymptomatic with SARS-CoV-2 are often contagious (Nissen, 2020). This paper goes into great details about the specifics of the viral shedding and transmission of the pathogen, and highlights the limitations in antigen, PCR and antibody testing. As to when exactly one is contagious relative to the first day of symptom presentation, we have reason to believe that an individual is contagious for COVID two days prior to becoming symptomatic.

An article from the Journal of Virological Methods is concise in its findings that are relative to this point. The purpose of this paper is to outline the sensitivity and specificity of the Roche Antigen test at different cycle thresholds when compared to a SARS-CoV-2 PCR test. They then go on to touch on the antigen test’s ability to detect contagious individuals. After sharing how the antigen is less effective for cycle thresholds higher than 30, they go to add “differentiation between contagious and non-contagious individuals may be possible with this assay. Samples with Ct-values >30 usually do not allow culturing of the virus indicating low infectivity. Such individuals may be regarded non-contagious despite carrying low virus loads” (Krüttgen, 2021).

At the risk of oversimplifying, they seem to indicate that the Antigen test is less accurate at detecting those with higher cycle threshold but excels at detecting lower thresholds - which is useful as these individuals are contagious.

In a paper titled Put to the Test: use of Rapid Testing Technologies for COVID-19, Crozier has outlined various testing strategies in several nations. Additionally, they highlight best practices and limitations of Antigen testing, specifically sharing that Antigen testing can at best hope to identify COVID-19 1-2 days before an individual becomes symptomatic (Crozier, 2021). When you view this with our findings from the previous article, we can conclude that 2 days prior to becoming symptomatic, an individuals cycle thresholds are below 30, at a level that implies contagiousness and also allows them to be tested via a Rapid Antigen test.

While combining differing sources related to different antigen tests may increase the risk of inaccurate findings, it is worth noting that these findings are supported by the Center for Disease Control. In the appendix of their expanded screening protocol, they define a newly positives individual’s “close contacts” and note that “the infected person 48 hours before can spread SARS-CoV-2 starting from 2 days before they have any symptoms” (“Overview,” 2021).

# Symptom Lag = gap between becoming contagious and
# presenting symptoms. Hard Coded, not to be adjusted
Symptom_Lag = 2

Testing Lag

(Symptom Onset to Sample Collection Date) Using data from the NYC Department of Health, we can find how many people in a given area received positive tests results on a given day. The DOH also has figures related to testing processing time which allows us to approximate the testing date for each of these individuals. As our model is concerned with the total number of days an individual is contagious in the workplace, we need to find figures related to the median delay from symptom onset to testing date.

While this information is not available from the NYC DOH, it is provided in a national patient-level data set by the Center for Disease control. This data set, provided by CDC Case Surveillance Task Force and titled COVID-19 Case Surveillance Public Use Data has several elements related to all COVID-19 cases shared with the CDC, including most notably, symptom onset date and test-sample collection date.

CDCData <- read.csv("C:\\Users\\bloom\\Downloads\\CDCCOVID.csv")

CDCData2 <- subset(CDCData, onset_dt != "" & pos_spec_dt != "" & 
    cdc_report_dt != "")

CDCData2$diff <- ""

CDCData2$diff <- as.numeric(difftime(CDCData2$pos_spec_dt, CDCData2$onset_dt, 
    units = "days"))

CDCData2$diff2 <- as.numeric(difftime(CDCData2$cdc_report_dt, 
    CDCData2$onset_dt, units = "days"))



CDCData3 <- subset(CDCData2, diff >= 0 & diff < 100)

ggplot(CDCData3, aes(x = diff)) + geom_histogram(binwidth = 1) + 
    xlim(-1, 25) + theme_few() + theme(axis.text.y = element_blank(), 
    axis.ticks.y = element_blank(), plot.title = element_text(size = 8)) + 
    labs(title = "Test Day minus \n First Day of Symptoms", x = "Lag", 
        y = "Count")

A rough analysis shows that after filtering for patients whose symptom onset date was not an outlier or a null value indicating they were asymptomatic, that many individuals were tested on their first day of symptoms. This is problematic, as it is possible human error in reporting may have led to asymptomatic individuals being counted among the symptomatic population, with the date of testing used in place of the symptom onset date. With all individuals who were tested on their first date of symptoms included, our median delay from symptom onset to test date was two days. When we excluded this population, we saw this median delay increase to three days. Further review of the paperwork used to report this data to the CDC proved that the location of this question on the form was clearly in a section intended for symptomatic individuals, and the wording was quite clear this was to be ignored for asymptomatic individuals, giving us more confidence in the reported data and in using a 2-day delay for our median value.

# Test Lag = gap between becoming symptomatic and getting
# tested. This is Hard Coded as it represents community, not
# population

Test_Lag = 2

Results Lag

(Test Date to Report Date) Finally, we need to account for the delay between sample collection and reported result. This measure, in days, is available from the NYC DOH on a weekly basis. This is convenient as it is the same source and format as our case rate table.

x <- "LagTime"
# x <-
# 'https://raw.githubusercontent.com/ChristopherBloome/coronavirus-data/master/trends/testing-turnaround.csv'
LagTime <- read.csv(x)
LagTime$week_end <- mdy(LagTime$week_end)

Conclusions

It is worth taking a step back to summarize the above, for the average COVID case we can calculate the following relative to the date symptoms are first presented:

Exposure date: Not Relevant
Contagious date: Symptomatic Date - 2 days.
Symptomatic date: Day 0 for our purposes
Testing date: Symptomatic Date + 2
Results date: Symptomatic Date + 2 + value found in Lag Time DF.

Global Variables

Test Sensitivity

We will also incorporate a realistic measure of test sensitivity and account for false negatives. Test sensitivity is the rate at which True Positives are found in the total diseased population: TP / (TP + FP). Our workplace will be exclusively leveraging Rapid Antigen Assays, of which there were 4 approved for use in the United States in 2020: the Abbott BinaxNOW COVID-19 Ag Card, the Becton Dickinson Veritor System, the LumiraDx SARS-CoV-2 Ag test; and the Quidel Sofia SARS Antigen IFA (“SARS-CoV-2 Point of Care,” 2021). While each of these manufacturers publishes test sensitivity and specificity, peer reviewed studies verifying these are varied, in part due to how the accuracy of the test relates to the conditions of testing. We will want to consider the available data before landing on a value for this variable in our model.

Starting with the least sensitive assay, the BinaxNOW Card, Abbott publishes a specificity of 77% across all specimens, with a specificity as high as 83% on the first day of symptoms (“BinaxNOW,” p. 12). This is partially backed by peer review literature. One study found that for “participants who were within 7 days of symptom onset, the BinaxNOW antigen test sensitivity was 71.1%” (“Evaluation of Abbott,” 2020). Moving in order of increasing sensitivity, according to the documentation published by the manufacturer and made available by the FDA, the sensitivity of the BD Veritor System is 84% (“Quick Reference Instructions for BD Veritor,” 2020). The comparable documentation of this type related to the Quidel Sofia SARS Antigen IFA is less straight forward: its sensitivity is 96.7%, however, they include the following disclaimer: “the performance of this test has not yet been clinically validated for use in patients without signs and symptoms of respiratory infection or for serial screening applications, and performance may differ in these populations” (“Sofia SARS”, 2020). Perhaps the test with the best documentation and supporting evidence is the test from LumiraDx. Its published sensitivity is 100% for those in the study with symptom presentation fewer than 4 days before testing date (“SARS-CoV-2 Ag Test Strip Product Insert,” 2020). For the entire study, its average sensitive was 97.4%. This is only partially backed up in peer reviewed literature. In a study comparing several testing methods, Kohmer (2021) found the sensitivity of this test was 82.4%, those there was significant variation in testing timing when compared to the study published by the FDA and the study included asymptomatic individuals.

With such a range of sensitivity rates, it is difficult to feel confident in any one value. Thinking through the specifics of these testing studies, it is clear that even for the lower performing tests, the sensitivity rate decreases the longer one waits from symptom onset. With this in mind, we will use a 90% sensitivity rate in our model.

FN <- 0.9

Asymptomatic Rate

The correlation between asymptomatic and unreported cases provides a significant challenge when building a model and assigning values to variables. These variables, however, have unique purposes in our model. When thinking about the entire community – in our case, the population of New York City - the unreported case count scales the total positive rate. As this figure increases, so does the probability that any one of our office workers will become contagious on a given day. The asymptomatic rate does not impact our figures for community transmission or spread.

With that in mind, we note that the rate of asymptomatic cases is of concern within the office worker population when calculating the total number of days the workforce would be contagious. If an individual is asymptomatic, we would only expect them to undergo a COVID test if one were mandated. Additionally, as “asymptomatic infections have the same infectivity as symptomatic infections,” an asymptomatic office worker is likely to result in many additional infections within the workplace (Gao et al., 2020).

Regarding the proportion of asymptomatic individuals in the total infected population, there seems to be some agreement. An extensive analysis titled Proportion of Asymptomatic Coronavirus Disease 2019: A Systematic Review and Meta‐Analysis reviewed over 40 studies encompassing more than 50,000 cases and found that asymptomatic ratio to be 15.4% (He et al., 2020). This is largely in line with the CDC estimate of asymptomatic cases in the United States. Their preliminary estimate of the total disease burden from February to December 2020 indicates the US had approximately 83 million cases, with 70 million of these presenting symptoms at one point during the illness (“Estimated Disease Burden,” 2021). This translates to an asymptomatic ratio of 15.6%.

We will use 16% in our model.

AsymptomaticRate <- 0.16
# Translating for ease of use in model.
AsymptomaticRate <- 100 * round((1/(1 - AsymptomaticRate)) - 
    1, 2)

Maximum Contagiousness Period

We are additionally concerned with the duration in days of one’s communicability period: the period in which they are contagious. In our workplace, if one never tests for COVID, we assume they continue to arrive at work indefinitely, however, our response variable is the quantity of days one is contagious. In a review of 165 studies on the subject, Park (2021) found that the median communicability period was 9 days, with a maximum value of 32 days. This is further supported by other sources, one of which indicated that this period is “usually between 2-14 days” (Ayenigbara, 2020). We will use 10 days in our model.

Max_Contagious <- 10

Antibodies / Turnover

Currently, science suggests that once you have had COVID and recovered, you will not get the same strain a second time. While we could incorporate this type of thinking into our model, it would require us to consider employee turnover metrics. For simplicity sake, we can neglect this altogether, and simply consider the probability that each individual is among the infectious population in their home ZIP Code on a given day, regardless of feasibility. To make this more palatable, in some workplaces we could reasonably expect some turnover during a given year.

Population Variables

Now that we have data regarding the behavior of the community, we can move on to the behavior of our relevant population. Before moving forward - I want to highlight how these differ from the similar variables above. Through research, we found the community average for many timing-related variables: specifically, the average amount of time from when a person is first symptomatic till they receive a test result. This is helpful in calculating community spread and the prevalence of unaware-contagious individuals in a population. Below we will set parameters related to a workplace’s population. These can be impacted by way of workplace attitudes and HR policies around testing and time off.

Each of these variables below will be an adjustable input in our model. These can be thought of our as our independent variables, with the total count of infectious days a response variable.

Mandatory Testing Cadence

If testing is to be made mandatory, this variable will indicate how many total calendar days are to pass between testing intervals.

TestInterval <- 14

Testing Lag

(Symptom Onset to (Symptomatic) Sample Collection Date) This variable is largely reflective of company culture. Consider a workplace with strict PTO policies, which might require the classic “Doctor’s note” before taking a day off. While this might sound particularly barbaric in a post-COVID society, its easier to picture this when you consider that this might be coupled with regular mandatory testing. In any case, we would expect a workplace like this to have a significantly longer gap between symptom onset and test date than a more progressive workplace with more liberal time off policies. Additionally, many workplaces are leveraging a “badge system” to acquire testing data and prompt their employees to reflect on their own health. This typically consists of a series of questions posed at regular or daily intervals, regarding ones wellness. The hope here is that this, coupled with regular reminders about the dangers of coronavirus symptoms, will lead to more regular testing and early detection.

Pop_Lag <- 3

Testing PTO

Similarly, this variable (a 1 or 0) indicates if one is expected to work in office on a testing date, for both regularly scheduled and symptomatic testing. Effectively, this changes the convention from testing after work, to testing before work on the date for which the test is attributed. While the impact is negligible for routine/non-symptomatic tests, moving the test before work (or giving your team the morning off for a test on days which they are not feeling well) shortens the “symptomatic and contagious” period by a day.

# 1 = Test after work. Must work on 'test day'
Test_After_Work <- 0

Employee demographics

As mentioned above, we have testing data on the ZIP Code, borough and “all NYC” level. We will be passing the count of employees in each region as variables in our model. Logistically, this will appear as a list, with each employee represented by the presence of one instance of their region.

Employee_Demo <- c(rep("BX", 5), rep("BK", 5), rep("QN", 5), 
    rep("MN", 5), rep("SI", 5))

Start/End Date

Finally, we need to specify the period in which we want to analyse.

StartDate <- mdy("8/1/2020")
EndDate <- mdy("3/1/2021")

Methodology

With all of the above defined, we can write a function to simulate an office and return data on the contagiousness of its workers.

One Individual

We will start with a function that calculates this for one office worker.

# Build DF Our Function will leverage. Only need to create it
# once, hence it is outside funciton.

# Create Vector with all relevant dates
DateL = seq(from = StartDate, to = EndDate, by = 1)
# As we are revieing a period, we will calc odds FIRST
# contagious on each day.  Create Vector with date we would
# expect someone to be tested.
test_date <- DateL + Symptom_Lag + Test_Lag
# Create Vector containing date results returned To do this,
# we need Lag Date, df summarized by week. Vector below
# translates test date to date in report.
test_date_report_date <- 7 - wday(test_date) + test_date
# Vector with Lag days per test date.
lag_days_fx <- function(listx) {
    
    LagTime %>% filter(week_end == listx) %>% select(lag_median) %>% 
        as.numeric()
}


lag_days <- unlist(lapply(test_date_report_date, lag_days_fx))
# Vector with daet results returned (this is how result DB is
# sorted)
Result_DateL <- test_date + lag_days
Result_DateL <- 7 - wday(Result_DateL) + Result_DateL

# Build DF.
Model_DF <- data.frame(DateL, Result_DateL)

names(Model_DF) <- c("Date", "Result_Date")
# Remove dates for which results are not avail.
Model_DF <- Model_DF %>% filter(!is.na(Result_Date))




# Write Function which takes Region, Start and End date as
# explicit arguments, plus other parameters defined previous,
# runs monte carlo simulation, and returns DF containing
# various COVID related outcomes for entire duration.

MonteCarlo <- function(RegionX, StartDateX, EndDateX) {
    # Create/clear column containing probability of first
    # becoming contagious on given day.
    Model_DF$Prob <- ""
    # Run GetRate function once per row(day). Function supplies
    # probability.
    Prob <- unlist(lapply(Model_DF$Result_Date, GetRate, REGIONX = RegionX))
    Model_DF$Prob <- Prob
    # Rand Number
    Model_DF$Rand <- runif(nrow(Model_DF), 0, 1)
    
    # if Rand < Prob, COVID Contagious.
    Model_DF$Result <- Model_DF$Rand < as.numeric(Model_DF$Prob)
    # Filter our rows forwhich no data avail. Should be
    # redundant.
    Model_DF <- Model_DF %>% filter(!is.na(Prob))
    
    
    
    # First Test Date always a Monday. Ensures consistancy
    TestDay <- 1
    while (wday(Model_DF$Date[TestDay]) != 2) {
        TestDay = TestDay + 1
    }
    
    # Populate Test Column: indicates if day for regularly
    # scheduled test
    Model_DF$Test <- FALSE
    
    
    while (TestDay < nrow(Model_DF)) {
        Model_DF$Test[TestDay + Test_After_Work] <- TRUE
        TestDay <- TestDay + TestInterval
        
    }
    
    # Adds needed Columns with default values
    Model_DF$FalseNeg <- NA
    Model_DF$Notes <- NA
    Model_DF$ContagiousDay <- 0
    Model_DF$SX <- NA
    
    
    
    
    # Write Function to simulatate Test, determin if accuate of
    # False Negative.  Takes Row Number (of DF) and data related
    # to test type, returns DF with test result and notes column
    # updated.  Should overwrite DF.
    Test_Func <- function(RowNum, notex) {
        Model_DF$FalseNeg[RowNum] <- runif(1, 0, 1)
        
        # If not False Negative
        if (Model_DF$FalseNeg[RowNum] < FN) {
            Model_DF$ContagiousDay[RowNum] = 0
            
            # If IS False Negative
        } else {
            Model_DF$Notes[RowNum] = paste("False Negative -", 
                notex)
        }
        
        return(Model_DF)
    }
    
    # Write Function to see if Symptomatic.  Takes Symptomatic
    # Rate as input. Returns 'SX' or 'NSX'
    Sx_Func <- function(AsymptomaticRate) {
        x <- runif(1, 0, 100 + AsymptomaticRate)
        if (x < 100) {
            "SX"
        } else {
            "NSX"
        }
    }
    
    
    
    # If not positive tests, can skip remainder of function.
    if ({
        x <- Model_DF %>% filter(Result == TRUE) %>% nrow()
        
        x == 0
    }) {
        Model_DF$ContagiousDay <- 0
    } else {
        # If not, simulation continues:
        
        for (i in 1:nrow(Model_DF)) {
            
            # Start with setting contagious day Contagious Day = Days
            # since first contagious.  Later assesed and revised if
            # needed, ie in case of pre-work positive test result.  First
            # Day
            if (i == 1) {
                if (Model_DF$Result[1] == TRUE) {
                  Model_DF$ContagiousDay[1] <- 1
                  Model_DF$SX[i] <- Sx_Func(AsymptomaticRate)
                } else {
                  Model_DF$ContagiousDay[1] <- 0
                }
                # If already contagious
            } else if (Model_DF$ContagiousDay[i - 1] > 0) {
                Model_DF$ContagiousDay[i] <- Model_DF$ContagiousDay[i - 
                  1] + 1
                Model_DF$SX[i] <- Model_DF$SX[i - 1]
                # If Newly contagious
            } else if (Model_DF$Result[i] == TRUE) {
                Model_DF$ContagiousDay[i] <- 1
                Model_DF$SX[i] <- Sx_Func(AsymptomaticRate)
                # If not contagious
            } else {
                Model_DF$ContagiousDay[i] <- 0
            }
            
            
            # Scheduled test - SX irrelevant
            if (Model_DF$Test[i] == TRUE & Model_DF$ContagiousDay[i] > 
                0) {
                Model_DF$Notes[i] <- "Scheduled Test"
                Model_DF <- Test_Func(i, "Scheduled Test")
                
                # SX Test
            } else if (Model_DF$ContagiousDay[i] == (Symptom_Lag + 
                Pop_Lag) & Model_DF$SX[i] == "SX") {
                Model_DF$Notes[i] <- "SX Test"
                Model_DF <- Test_Func(i, "SX Test")
            }
            
            # If contagious more than maximum contagious days, stop
            # count.
            if (Model_DF$ContagiousDay[i] > Max_Contagious) {
                Model_DF$ContagiousDay[i] = 0
                Model_DF$Notes[i] = "Undetected Positive"
            }
            
            
        }
        
        # If simulation ends with someone still contagious, note.
        if (Model_DF$ContagiousDay[nrow(Model_DF)] > 0) {
            Model_DF$Notes[nrow(Model_DF)] <- "Currently Contagious - Undetected"
        }
        
    }
    # Remove all irrelevant data.  Keep any contagious days or
    # days with notes.
    return(filter(Model_DF, ContagiousDay > 0 | !is.na(Notes) | 
        !is.na(SX)))
}

Entire Office, One Simulation

With a function to simulate one person during the relevant period, we can now generate a function to run this for each person in the office. This will take our Employee Demo as an argument, with each person’s region listed once in the Employee Demo vector.

MonteMacro <- function(RegionL, StartDateX, EndDateX, SimX) {
    # Takes Region (List of regions) and period start and end
    # date, runs MonteCarlo Macro for each and merges values
    # Takes SimX as input, this is for running this macro several
    # times.
    
    # Generate Empty DF
    x <- data.frame(matrix(NA, nrow = 0, ncol = 10), stringsAsFactors = FALSE)
    # names(x) <-
    # c('Date','Result_Date','Prob','Rand','Result','Test','FalseNeg','Notes','ContagiousDay
    # SX') Create list of DFs from MonteCarlo Macro.  Generating
    # List of DFs and Rbind-ing via reduce is ~30% faster than
    # Rbind-ing after each.  This will likley be run several
    # times.
    xtest <- lapply(RegionL, MonteCarlo, StartDateX = StartDateX, 
        EndDateX = EndDateX)
    # print(xtest) Merge DFs into one
    rows <- Reduce(rbind, xtest)
    x <- rbind(x, rows)
    
    # From Merged DF, titled X, we need two sets of info.  Y
    # Summary: Tracks cumiliative contagious days in period.
    # Used in plotting Z Summary: Reports on testing data:
    # Scheduled vs Symptomatic tests Notes any False Negatives,
    # undetecteds Number of days individual was contagious
    
    
    # Create Y summary Calculates Cumiliative contagious days
    # across all office workers
    y <- x %>% filter(ContagiousDay > 0) %>% count(Date) %>% 
        mutate(cumsum = cumsum(n))
    # Add first and last day.
    y <- rbind(y, list(StartDateX, 0, 0), list(EndDateX, 0, max(y$cumsum)))
    
    y$SimNum <- SimX
    
    # Create Z summary a is vector of row numbers we want to keep
    # from x.
    a <- NA
    # keep if last day contagious in a row, or notes re: testing.
    for (i in 1:(nrow(x) - 1)) {
        if (x$ContagiousDay[i] > x$ContagiousDay[i + 1] | !is.na(x$Notes[i])) {
            a <- c(a, i)
        }
    }
    if (x$ContagiousDay[nrow(x)] > x$ContagiousDay[nrow(x) - 
        1] | !is.na(x$Notes[nrow(x)])) {
        a <- c(a, nrow(x))
        
    }
    # remove leading NA
    a <- a[-1]
    # pull rows in a from x
    z <- x[a, ]
    
    # Contagious Day column drops to zero on day of test.  We
    # want the number of total days contagious before test Below
    # we pull value from previous row if drop to zero This is to
    # report on testing stats.
    
    # e is vector of rows to remove.
    e = NA
    for (i in nrow(z):2) {
        if (z$Date[i] - 1 == z$Date[(i - 1)] & z$ContagiousDay[i] == 
            0 & is.na(z$Notes[(i - 1)])) {
            e <- c(e, i - 1)
            z$ContagiousDay[i] <- z$ContagiousDay[(i - 1)]
        }
    }
    
    e <- e[-1]
    
    # remove e from z DF. Select relevant columns
    z <- z[-e, ] %>% select(Date, Notes, ContagiousDay, SX)
    
    names(z) <- c("Date", "Type_Of_Test", "Quantity_Contagious_Days", 
        "SX")
    z$SimNum <- SimX
    row.names(z) <- NULL
    
    # Join y and z as x
    
    x <- full_join(y, z)
    
    return(x)
}

Entire Office, Repeated

With this in place, we can nest this function to run repeatedly and compare office contagious metrics under different conditions and/or office policies.

MonteSim_New <- function(RegionL, StartDateX, EndDateX, NumTimes) {
    # Takes Region (List of regions) and period start and end
    # date, Runs MonteCarlo Macro for each and merges values
    # Takes NumTimes as input, this is quantity of times we want
    # to simulate under identical conditions.
    
    # Generate Empty DF
    MonteSimDFX <- data.frame(matrix(NA, nrow = 0, ncol = 12), 
        stringsAsFactors = FALSE)
    # Run MonteMacro, increasing SimX input with each simulation.
    # Geneartes one DF with all results.
    for (i in 1:NumTimes) {
        x <- MonteMacro(RegionL, StartDateX, EndDateX, i)
        # print(x)
        MonteSimDFX <- rbind(MonteSimDFX, x)
        
    }
    
    return(MonteSimDFX)
}

Summary Metrics

Finally, we will write a function to clean up and contextualize our summary statistics. The output here will be a high level “executive summary” of how the office fared, on average, under the specified conditions. This function will be titled SummaryStats

SummaryStats <- function(SimDF, NumTimes) {
    # Takes DF as input and quantity of times, returns DF with
    # summary statistics.
    
    # Most straight forward appraoch is to calculate aggregate
    # statistics, and then divide by quantity of simulations to
    # find average.
    
    # As first input is quantity of similations, we will
    # initially list this as itself squared.
    
    SumDF <- data.frame(c("Quantity of simulations", "Total Contagious Days", 
        "Contagious Work Days (5-day week)", "Positive Sx Test Cnt", 
        "Positive Scheduled Test Cnt", "Undetected Positive Cases", 
        "False Negative Tests", "Avg Contagious Work Days per Test (inc Undetected)", 
        "Symptomatic Cases", "Asymptomatic Cases"), c(NumTimes^2, 
        sum(SimDF$Quantity_Contagious_Days, na.rm = TRUE), sum(SimDF$Quantity_Contagious_Days, 
            na.rm = TRUE) * 5/7, nrow(filter(SimDF, Type_Of_Test == 
            "SX Test")), nrow(filter(SimDF, Type_Of_Test == "Scheduled Test")), 
        nrow(filter(SimDF, Type_Of_Test == "Undetected Positive")), 
        nrow(filter(SimDF, Type_Of_Test == "False Negative")), 
        sum(SimDF$Quantity_Contagious_Days, na.rm = TRUE) * 5/7/(nrow(filter(SimDF, 
            Type_Of_Test == "SX Test")) + nrow(filter(SimDF, 
            Type_Of_Test == "Scheduled Test"))), nrow(filter(SimDF, 
            SX == "SX")), nrow(filter(SimDF, SX == "NSX"))))
    
    names(SumDF) <- c("a", "b")
    
    SumDF$b <- round(SumDF$b/NumTimes, digits = 2)
    
    names(SumDF) <- NULL
    row.names(SumDF) <- NULL
    
    return(SumDF)
    
}

Experimentation

Office Types

With our functions written above, we can see how various approaches to workforce management can be expected to impact COVID cases in the workplace. For the purposes of illustration, we will consider 3 different approaches:

Low Trust: I think we have all worked in, or can imagine a “low trust” workplace. This is a business which might require a “Doctors Note” when taking a day off due to illness. For our purposes, we will say that this office has mandatory testing once a week, and whose employees will not seek any additional testing if symptoms happen to develop off-cycle.
High Trust: This is meant to embody an emerging trend in workplace culture. With increased amenities (ping pong tables, catered food) and privileges such as unlimited PTO, comes trust on half of the employer in its workforce. This workplace has no mandatory testing, but gives its workforce the ability to take a day off for testing if they feel even remotely symptomatic.
Medium Trust: A blend of the above. This workforce has mandatory testing every other week. This mandating testing results in a slight delay over the national average in testing to 4 days (over the national average of 2).

# Standardize Variables
Test_After_Work <- 1

StartDate <- mdy("8/1/2020")

# Low Trust Office
TestInterval <- 7

Pop_Lag <- 10

LowTrust_T <- MonteSim_New(Employee_Demo, StartDate, EndDate, 
    30)


# 'Medium Trust' Office
TestInterval <- 14

Pop_Lag <- 4

MedTrust_T <- MonteSim_New(Employee_Demo, StartDate, EndDate, 
    30)

# 'High Trust' Office
TestInterval <- 30

Pop_Lag <- 1

HighTrust_T <- MonteSim_New(Employee_Demo, StartDate, EndDate, 
    30)


# Add Sim Type
LowTrust_T$SimType <- "Low"
MedTrust_T$SimType <- "Med"
HighTrust_T$SimType <- "High"

# Join, adjust
Office_Type_DF <- rbind(LowTrust_T, MedTrust_T, HighTrust_T)
Office_Type_DF$SimType <- factor(Office_Type_DF$SimType, levels = c("Low", 
    "Med", "High"))

p1 <- Office_Type_DF %>% filter(!is.na(cumsum)) %>% ggplot(aes(x = Date, 
    y = cumsum)) + geom_line(aes(color = as.factor(SimNum))) + 
    geom_smooth(method = "lm") + xlim(StartDate, EndDate) + ylim(0, 
    NA) + theme_few() + theme(legend.position = "none", axis.text.x = element_text(angle = 90, 
    vjust = 0.5, hjust = 1)) + labs(title = "Contagious Days \n as function of trust level.", 
    x = "Date", y = "Cumulative Contagious Days") + facet_wrap(~SimType)

Summary_DF_Office <- Office_Type_DF %>% group_by(SimNum, SimType) %>% 
    summarise(max(cumsum, na.rm = TRUE))

names(Summary_DF_Office) <- c("SumNum", "SimType", "Total_Cont_Days")

p2 <- Summary_DF_Office %>% ggplot(aes(x = Total_Cont_Days, y = as.factor(SimType), 
    fill = factor(stat(quantile)))) + stat_density_ridges(geom = "density_ridges_gradient", 
    calc_ecdf = TRUE, quantiles = 4, quantile_lines = TRUE) + 
    scale_fill_tableau("Classic Cyclic") + theme_few() + labs(title = "Total Contagious Days \n 25 Person Office. Many Simulations", 
    y = "Office Trust Level", x = "Total Contagious Days") + 
    theme(legend.position = "none")


grid.arrange(p1, p2, ncol = 1)

Well, this was a bit of a surprise. I firmly believed that trust would be correlated with a decrease in COVID risk. That, we would find decreasing the lag between when a person becomes contagious and when they are tested would have a considerable impact on the our response variable, and that decreasing the interval periodic testing would have a marginal effect. Our initial findings do not appear to suggest this is true!

It is worth noting that this is a simple simulation, and in a real world situation, we would expect some asymptomatic testing in a high trust workplace, however, noting the increased rate and increased variation seems to indicate that mandated asymptomatic testing may be of value. Specifically, this initial result seems to indicate that increased scheduled testing comes a smaller standard deviation in the quantity of contagious days, and a lower mean count. We also see that the data tends to be bimodal across all trust environments, with a second peak on the higher end of the range well beyond the mean.

Full Study

Let’s take a deeper dive and isolate these variables. We will proceed by running our model under several different conditions.

# Run Simulation
    # Set Variables 
        StartDate <- mdy("1/1/2021")
        Test_After_Work <- 1
        
    # One Week Testing Intervals 
        TestInterval <- 7
        
            # Pop Lag 1
                Pop_Lag <- 1
                One_One <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)

            # Pop Lag 2
                Pop_Lag <- 2
                One_Two <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
              
            # Pop Lag 3
                Pop_Lag <- 3
                One_Three <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
                
            # Pop Lag 5
                Pop_Lag <- 5
                One_Five <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
  
            # Pop Lag 7
                Pop_Lag <- 7
                One_Seven <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)

                
        
    # Two Week Testing Intervals 
        TestInterval <- 14
        
            # Pop Lag 1
                Pop_Lag <- 1
                Two_One <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
            # Pop Lag 2
                Pop_Lag <- 2
                Two_Two <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
            # Pop Lag 3
                Pop_Lag <- 3
                Two_Three <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)

            # Pop Lag 5
                Pop_Lag <- 5
                Two_five <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
            # Pop Lag 7
                Pop_Lag <- 7
                Two_Seven <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
                
    # Three Week Testing Intervals 
        TestInterval <- 21
        
            # Pop Lag 1
                Pop_Lag <- 1
                Three_One <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
            # Pop Lag 2
                Pop_Lag <- 2
                Three_Two <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
            # Pop Lag 3
                Pop_Lag <- 3
                Three_Three <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
            # Pop Lag 5
                Pop_Lag <- 5
                Three_Five <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)

            # Pop Lag 7
                Pop_Lag <- 7
                Three_Seven <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
        
        
    # Four Week Testing Intervals 
        TestInterval <- 28
        
            # Pop Lag 1
                Pop_Lag <- 1
                Four_One <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
            # Pop Lag 2
                Pop_Lag <- 2
                Four_Two <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
            # Pop Lag 3
                Pop_Lag <- 3
                Four_Three <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
            # Pop Lag 5
                Pop_Lag <- 5
                Four_Five <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)
                
            # Pop Lag 7
                Pop_Lag <- 7
                Four_Seven <- MonteSim_New(Employee_Demo,StartDate,EndDate, 30)



# Clean Data 
      # Add Sim Types
        One_One$SimType <- "One_One"
        One_Two$SimType <- "One_Two"
        One_Three$SimType <- "One_Three"
        One_Five$SimType <- "One_Five"
        One_Seven$SimType <- "One_Seven"
        
        Two_One$SimType <- "Two_One"
        Two_Two$SimType <- "Two_Two"
        Two_Three$SimType <- "Two_Three"
        Two_five$SimType <- "Two_five"
        Two_Seven$SimType <-"Two_Seven"
        
        Three_One$SimType <- "Three_One"
        Three_Two$SimType <- "Three_Two"
        Three_Three$SimType <- "Three_Three"
        Three_Five$SimType <- "Three_Five"
        Three_Seven$SimType <- "Three_Seven"
        
        Four_One$SimType <- "Four_One"
        Four_Two$SimType <- "Four_Two"
        Four_Three$SimType <- "Four_Three"
        Four_Five$SimType <- "Four_Five"
        Four_Seven$SimType <- "Four_Seven"
        
        
# Join
        Grid_Sim_DF2Final <- rbind(One_One, One_Two, One_Three, One_Five, One_Seven, Two_One, Two_Two, Two_Three, Two_five, Two_Seven, Three_One, Three_Two, Three_Three, Three_Five, Three_Seven, Four_One, Four_Two, Four_Three, Four_Five, Four_Seven)

# Factor 
        Grid_Sim_DF2Final$SimType <- factor(Grid_Sim_DF2Final$SimType, levels = c(
           "One_One",
           "One_Two",
           "One_Three",
           "One_Five",
           "One_Seven",
          
           "Two_One",
           "Two_Two",
           "Two_Three",
           "Two_Five",
          "Two_Seven",
          
           "Three_One",
           "Three_Two",
           "Three_Three",
           "Three_Five",
           "Three_Seven",
          
           "Four_One",
           "Four_Two",
           "Four_Three",
           "Four_Five",
           "Four_Seven"))

Grid_Sim_DF2Final %>% filter(!is.na(cumsum)) %>% ggplot(aes(x = Date, 
    y = cumsum)) + geom_line(aes(color = as.factor(SimNum))) + 
    geom_smooth(method = "lm") + xlim(StartDate, EndDate) + ylim(0, 
    NA) + theme_few() + theme(legend.position = "none", strip.background = element_blank(), 
    strip.text.x = element_blank(), axis.text.x = element_blank(), 
    axis.ticks.x = element_blank()) + labs(title = "", x = "", 
    y = "Cumulative Contagious Days") + facet_wrap(~SimType)

SummaryStats(Grid_Sim_DF2Final, 600)

##                                                             
## 1                             Quantity of simulations 600.00
## 2                               Total Contagious Days  25.37
## 3                   Contagious Work Days (5-day week)  18.12
## 4                                Positive Sx Test Cnt   2.58
## 5                         Positive Scheduled Test Cnt   2.32
## 6                           Undetected Positive Cases   0.47
## 7                                False Negative Tests   0.00
## 8  Avg Contagious Work Days per Test (inc Undetected)   0.01
## 9                                   Symptomatic Cases   5.14
## 10                                 Asymptomatic Cases   0.89

The above exhibit contains 20 charts, each representing the results of 30 instances of our model under different conditions.

Each row indicates that the office modeled with mandatory testing every x number of weeks for x = 1, 2, 3, 4. Row 1 = contains mandatory testing each week, Row 2 = every second week and so forth.

Similarly, each column indicates a different number of lapsed days from when the employees become symptomatic till when they are tested, for values of 1, 2, 3, 5, 7. Column 1 indicates the model was run where the employees are tested on the first day they are symptomatic, Column 2 after 2 days etc.

There is not anything too surprising here, we see our total case count increase as we increase the weeks between mandatory tests or the delay from symptom onset to symptomatic tests. Lets summarize this a little differently to see if one measure leads to greater risk mitigation than the other.

Summary_DF <- Grid_Sim_DF2Final %>% group_by(SimNum, SimType) %>% 
    summarise(max(cumsum, na.rm = TRUE))

names(Summary_DF) <- c("SumNum", "SimType", "Total_Cont_Days")

HelperDF <- data.frame(unique(Grid_Sim_DF0$SimType), c(1, 1, 
    1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4), c(1, 
    2, 3, 5, 7, 1, 2, 3, 5, 7, 1, 2, 3, 5, 7, 1, 2, 3, 5, 7))

names(HelperDF) <- c("SimType", "Week_Test", "SX_Lag")

Summary_DF <- inner_join(Summary_DF, HelperDF)

p1 <- Summary_DF %>% ggplot(aes(x = Total_Cont_Days, y = as.factor(Week_Test), 
    fill = factor(stat(quantile)))) + stat_density_ridges(geom = "density_ridges_gradient", 
    calc_ecdf = TRUE, quantiles = 4, quantile_lines = TRUE) + 
    scale_fill_tableau("Classic Cyclic") + theme_few() + labs(title = "Total Contagious Days \n 25 Person Office. Many Simulations", 
    y = "Wks Between \n Scheduled Tests", x = "Total Contagious Days") + 
    theme(legend.position = "none")


p2 <- Summary_DF %>% ggplot(aes(x = Total_Cont_Days, y = as.factor(SX_Lag), 
    fill = factor(stat(quantile)))) + stat_density_ridges(geom = "density_ridges_gradient", 
    calc_ecdf = TRUE, quantiles = 4, quantile_lines = TRUE) + 
    scale_fill_tableau("Classic Cyclic") + theme_few() + labs(title = "Total Contagious Days \n 25 Person Office. Many Simulations", 
    y = "Days from Symptom Presentation \n to Testing", x = "Total Contagious Days") + 
    theme(legend.position = "none")

grid.arrange(p1, p2, ncol = 1)

Grouping by mandatory testing intervals shows that there is a significant increase in contagious days going from the first to the second week, meanwhile, grouping by lag time in days show fairly linear increases. We must remember that our x axis is not linear through the sample, as we model for a lag of 1, 2, 3, 5 and 7 days.

Things become a little more clear when we show both variables on the same plot:

# Summary_DF %>% ggplot(aes(SX_Lag, Total_Cont_Days, group =
# SX_Lag, fill = as.factor(SX_Lag))) + geom_boxplot() +
# labs(title='Total Contagious Days \n Jan through March
# 2020. 25 Person Office', x ='Days from Symptom Presentation
# to Testing', y = 'Total Contagious Days') + theme_few() +
# scale_fill_tableau('Classic Cyclic') +
# theme(legend.position = 'none')


p1 <- Summary_DF %>% ggplot(aes(x = SX_Lag, y = Total_Cont_Days, 
    color = as.factor(Week_Test))) + geom_jitter() + geom_smooth(aes(color = as.factor(Week_Test)), 
    method = "lm") + guides(color = guide_legend(title = "Wks Between \n Scheduled Tests")) + 
    labs(title = "Total Contagious Days \n Jan through March 2020. 25 Person Office", 
        x = "Days from Symptom Presentation to Testing", y = "Total Contagious Days") + 
    theme_few() + scale_color_tableau("Classic Blue-Red 6")

p2 <- Summary_DF %>% ggplot(aes(x = Week_Test, y = Total_Cont_Days, 
    color = as.factor(SX_Lag))) + geom_jitter() + geom_smooth(aes(color = as.factor(SX_Lag)), 
    method = "lm") + guides(color = guide_legend(title = "Days from Symptom \n Presentation to Testing")) + 
    labs(title = "Total Contagious Days \n Jan through March 2020. 25 Person Office", 
        x = "Wks Between \n Scheduled Tests", y = "Total Contagious Days") + 
    theme_few() + scale_color_tableau("Classic Cyclic")

# Summary_DF %>% ggplot(aes(Week_Test, Total_Cont_Days, group
# = Week_Test, fill = as.factor(Week_Test))) + geom_boxplot()
# + theme_few() + labs(title='Total Contagious Days \n Jan
# through March 2020. 25 Person Office', x ='Wks Between \n
# Scheduled Tests', y = 'Total Contagious Days') +
# theme_few() + scale_fill_tableau('Classic Blue-Red 6') +
# theme(legend.position = 'none')

grid.arrange(p1, p2, ncol = 1)

Plotting the weeks between scheduled tests on the X axis and grouping by our testing lag shows that we see fairly linear improvements as the weeks between scheduled tests increases. Meanwhile, when this convention is reversed, we find that testing each week yields increasingly more favorable results over other scheduled testing cadences as the testing lag increases.

This is particularly clear in the split box-plot below:

Summary_DF %>% ggplot(aes(as.factor(SX_Lag), Total_Cont_Days, 
    fill = as.factor(Week_Test))) + geom_boxplot() + theme_few() + 
    scale_fill_tableau("Classic Blue-Red 6") + guides(fill = guide_legend(title = "Wks Between \n Scheduled Tests")) + 
    labs(title = "Total Contagious Days \n Jan through March 2020. 25 Person Office", 
        x = "Days from Symptom Presentation to Testing", y = "Total Contagious Days")

Findings and Analysis

While mandating testing each week seems to be a very impactful way to decrease COVID presence in the workplace, there seems to be a high degree of variation in our samples. We will next run statistical analyses to see if this could be attributed to sample variation. Additionally, we can contextualize how mandatory testing at a weekly cadence compares in terms of performance to other testing cadence/testing lag schema.

We will be focusing on a few different subsets: Weekly Mandatory Testing, all testing lag ; Mandatory Testing every 2 weeks, all testing lag ; Mandatory Testing every 3 weeks, all testing lag ; Weekly Mandatory Testing, 5 or 7 day testing lag ; Mandatory Tests every 2 weeks, 1 or 2 day testing lag

w4_df <- Summary_DF %>% filter(Week_Test == 4)

w4 <- unlist(w4_df$Total_Cont_Days)

w3_df <- Summary_DF %>% filter(Week_Test == 3)

w3 <- unlist(w3_df$Total_Cont_Days)

w2_df <- Summary_DF %>% filter(Week_Test == 2)

w2 <- unlist(w2_df$Total_Cont_Days)

w1_df <- Summary_DF %>% filter(Week_Test == 1)

w1 <- unlist(w1_df$Total_Cont_Days)


d7 <- Summary_DF %>% filter(SX_Lag == 7)

d7 <- unlist(d7$Total_Cont_Days)


d5 <- Summary_DF %>% filter(SX_Lag == 5)

d5 <- unlist(d5$Total_Cont_Days)

d3 <- Summary_DF %>% filter(SX_Lag == 3)

d3 <- unlist(d3$Total_Cont_Days)

d2 <- Summary_DF %>% filter(SX_Lag == 2)

d2 <- unlist(d2$Total_Cont_Days)

d1 <- Summary_DF %>% filter(SX_Lag == 1)

d1 <- unlist(d1$Total_Cont_Days)

w1_d5_d7_df <- Summary_DF %>% filter(SX_Lag == 5 | SX_Lag == 
    7) %>% filter(Week_Test == 1)
w1_d5_d7 <- unlist(w1_d5_d7_df$Total_Cont_Days)

w2_d5_d7_df <- Summary_DF %>% filter(SX_Lag == 5 | SX_Lag == 
    7) %>% filter(Week_Test == 2)
w2_d5_d7 <- unlist(w2_d5_d7_df$Total_Cont_Days)

w2_d3_d5_d7_df <- Summary_DF %>% filter(SX_Lag == 5 | SX_Lag == 
    7 | SX_Lag == 3) %>% filter(Week_Test == 2)
w2_d3_d5_d7 <- unlist(w2_d3_d5_d7_df$Total_Cont_Days)

w2_d1_d2_df <- Summary_DF %>% filter(SX_Lag == 1 | SX_Lag == 
    2) %>% filter(Week_Test == 2)
w2_d1_d2 <- unlist(w2_d1_d2_df$Total_Cont_Days)

d1_d2_df <- Summary_DF %>% filter(SX_Lag == 1 | SX_Lag == 2)
d1_d2 <- unlist(d1_d2_df$Total_Cont_Days)

w3_d1_d2_df <- Summary_DF %>% filter(SX_Lag == 1 | SX_Lag == 
    2) %>% filter(Week_Test == 3)
w3_d1_d2 <- unlist(w3_d1_d2_df$Total_Cont_Days)

Weekly Testing vs Other

First, we can check that the quantity of total contagious days in simulations run in offices with mandatory weekly testing is in fact statistically lower than the broader sample. We will compare against mandatory testing at all intervals (1-4 weeks) and against simulations when the office was tested every other week.

t.test(w1, c(w1, w2, w3, w4))

## 
##  Welch Two Sample t-test
## 
## data:  w1 and c(w1, w2, w3, w4)
## t = -6.8949, df = 336.68, p-value = 2.671e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.281537 -4.605129
## sample estimates:
## mean of x mean of y 
##  17.25333  23.69667

t.test(w1, w2)

## 
##  Welch Two Sample t-test
## 
## data:  w1 and w2
## t = -5.1911, df = 262.68, p-value = 4.191e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.517258 -4.282742
## sample estimates:
## mean of x mean of y 
##  17.25333  24.15333

In both cases, we reject the alternative hypothesis that the difference in means is 0. Weekly testing yields a statistically fewer number of contagious days.

Weekly Testing w/ Longer Lag vs Biweekly Testing

Digging a little deeper, we find that mandatory weekly testing in a population that waits equal parts 5 or 7 days from becoming symptomatic till undergoing non-mandated testing, yields statistically fewer contagious days than mandatory biweekly testing in a population that waits equal parts 1, 2, 3, 5 or 7 days before symptomatic testing. In other words: a population that is tested every week but waits on average 6 days from becoming symptomatic till undergoing additional testing has statistically fewer contagious days in our model than a population which is tested every other week but waits on average 3.6 days from becoming symptomatic till undergoing additional testing. This is particularly relevant to our study.

t.test(w1_d5_d7, w2)

## 
##  Welch Two Sample t-test
## 
## data:  w1_d5_d7 and w2
## t = -2.5253, df = 143.98, p-value = 0.01264
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.6715426 -0.9351241
## sample estimates:
## mean of x mean of y 
##  19.85000  24.15333

Biweekly Testing w/ Shorter Lag vs Weekly Testing

We can now study the inverse. When comparing the contagious days in a population with weekly mandatory testing that waits equal parts 1, 2, 3, 5 or 7 days from becoming symptomatic till undergoing symptomatic testing to that of a population that has biweekly mandatory testing but undergoes symptomatic testing only one or two days after becoming symptomatic, we see that the means are nearly equivalent. We cannot reject our alternative hypothesis.

This implies that biweekly testing in a workplace where everyone is tested within the first 2 days after becoming symptomatic, with equal distribution across these 2 days, is about equivalent to a workplace where everyone is tested each week, without this focus on additional symptomatic testing immediately after presenting with symptoms.

t.test(w1, w2_d1_d2)

## 
##  Welch Two Sample t-test
## 
## data:  w1 and w2_d1_d2
## t = 0.099886, df = 97.707, p-value = 0.9206
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.893107  3.199773
## sample estimates:
## mean of x mean of y 
##  17.25333  17.10000

Biweekly vs Triweekly testing

Finally, the contagious days from populations with testing every other week, and every third week, are statistically similar. We cannot reject this null hypothesis either.

t.test(w2, w3)

## 
##  Welch Two Sample t-test
## 
## data:  w2 and w3
## t = -1.1222, df = 297.9, p-value = 0.2627
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.846521  1.326521
## sample estimates:
## mean of x mean of y 
##  24.15333  25.91333

DF_Ttest_1 <- rbind(w1_d5_d7_df, w2_df)

testdf <- data.frame(c(1, 2), c("1 W; \n Lag 5,7D", "2 W"))
names(testdf) <- c("Week_Test", "Cadence")

DF_Ttest_1 <- inner_join(DF_Ttest_1, testdf)

p1 <- DF_Ttest_1 %>% ggplot(aes(x = Total_Cont_Days, y = as.factor(Cadence), 
    fill = factor(stat(quantile)))) + stat_density_ridges(geom = "density_ridges_gradient", 
    calc_ecdf = TRUE, quantiles = 4, quantile_lines = TRUE) + 
    scale_fill_tableau("Classic Cyclic") + theme_few() + labs(title = "", 
    y = "", x = "") + theme(legend.position = "none", axis.title = element_text(), 
    plot.title = element_blank()) + xlim(-10, 80)






DF_Ttest_2 <- rbind(w1_df, w2_d1_d2_df)

testdf <- data.frame(c(1, 2), c("1 W", "2 W; \n Lag 1,2D"))
names(testdf) <- c("Week_Test", "Cadence")

DF_Ttest_2 <- inner_join(DF_Ttest_2, testdf)

p2 <- DF_Ttest_2 %>% ggplot(aes(x = Total_Cont_Days, y = as.factor(Cadence), 
    fill = factor(stat(quantile)))) + stat_density_ridges(geom = "density_ridges_gradient", 
    calc_ecdf = TRUE, quantiles = 4, quantile_lines = TRUE) + 
    scale_fill_tableau("Classic Cyclic") + theme_few() + labs(title = "", 
    y = "", x = "") + theme(legend.position = "none", axis.title = element_text(), 
    plot.title = element_blank()) + xlim(-10, 80)





DF_Ttest_3 <- rbind(w2_df, w3_df)

testdf <- data.frame(c(2, 3), c("2 W", "3 W"))

names(testdf) <- c("Week_Test", "Cadence")

DF_Ttest_3 <- inner_join(DF_Ttest_3, testdf)

p3 <- DF_Ttest_3 %>% ggplot(aes(x = Total_Cont_Days, y = as.factor(Cadence), 
    fill = factor(stat(quantile)))) + stat_density_ridges(geom = "density_ridges_gradient", 
    calc_ecdf = TRUE, quantiles = 4, quantile_lines = TRUE) + 
    scale_fill_tableau("Classic Cyclic") + theme_few() + labs(title = "", 
    y = "", x = "") + theme(legend.position = "none", axis.title = element_text(), 
    plot.title = element_blank()) + xlim(-10, 80)





DF_Ttest_0 <- rbind(w1_df, w2_df)

testdf <- data.frame(c(1, 2), c("1 W", "2 W"))

names(testdf) <- c("Week_Test", "Cadence")

DF_Ttest_0 <- inner_join(DF_Ttest_0, testdf)

p0 <- DF_Ttest_0 %>% ggplot(aes(x = Total_Cont_Days, y = as.factor(Cadence), 
    fill = factor(stat(quantile)))) + stat_density_ridges(geom = "density_ridges_gradient", 
    calc_ecdf = TRUE, quantiles = 4, quantile_lines = TRUE) + 
    scale_fill_tableau("Classic Cyclic") + theme_few() + labs(title = "", 
    y = "", x = "") + theme(legend.position = "none", axis.title = element_text(), 
    plot.title = element_blank()) + xlim(-10, 80)

grid.arrange(p0, p1, p3, p2, nrow = 2, ncol = 2)

Recommendations

In the end, in order to limit the presence of COVID in a workplace, testing is required. As expected, with more frequent testing comes fewer contagious days in a workplace. The purpose of our study was to find which was more impactful: increasing asymptomatic mandated testing, or decreasing the amount of time one waits from becoming symptomatic till undergoing a non-mandated test.

We can summarize our findings as follows: increasing mandated testing from bi-weekly to weekly is particularly impactful. Over a 3 month period, mandating testing each week decreases COVID by 25% percentage over a similar sample with testing every other week. Interestingly enough, we do not see this relationship hold for longer durations: going from a biweekly testing cadence to triweekly yielded a statistically insignificant increase in COVID presence.

Thinking through the benefits of decreasing the lag between symptom presentation and symptomatic testing, we do see some benefits. A population with biweekly testing who seek additional symptomatic testing after only one or two days had a statistically similar quantity of contagious days as a population with weekly mandatory testing.

With that in mind, one must consider how adherent a workforce would be to policies around symptomatic testing. The above only holds true if each individual undergoes symptomatic testing in equal parts after 1 or 2 days. This relies heavily on individual judgement and even the best intentioned employees may misunderstand or unintentionally ignore their symptoms. Meanwhile, a population with weekly mandatory testing, who waited in equal parts 5 or 7 days before testing out preformed a sample with biweekly tests and equal parts 1, 2, 3, 5 and 7 days before symptomatic testing.

In the end, there will always be a cost associated with a successful return to work program. This study suggests that it may be better to invest in a culture of frequent asymptomatic testing than one which relies on the judgement of the workforce.

Future Works

In trying to explain our findings, we note that increasing asymptomatic testing is likely more successful than decreasing testing lag, due in part to the quantity of asymptomatic carriers. The literature suggests two things for which our findings largely rely: (1) that asymptomatic carriers can spread COVID at the same rate as symptomatic carriers and (2) and that roughly 16% of COVID cases are not symptomatic. I would be curious to know the accuracy of these findings.

Additionally, I struggled with reconciling my findings around the wait time from becoming symptomatic till seeking symptomatic testing. We saw that the data from the CDC suggests that the median wait time is 2 days. I expected this to be significantly higher. Anecdotally, at the height of the pandemic, I personally was asked if I had been symptomatic for at least 5 days before seeking testing, and was told that insurance might not cover the test if this was not the case. If this figure were to increase, our model might respond differently.

When thinking through COVID transmission rates, we note that we have not considered true community spread, and that our rate of transmission considers those in congregate settings such as nursing homes. It is reasonable to conclude our study may be more accurate if we were to exclude this population, however, as this study in particular was focused on the difference in testing strategy, this was fine for our purposes.

Finally, this model ignores community spread within the office itself. An additional study of interest would be to consider weigh strong COVID protocols against strong COVID testing protocols and find how these variables might impact the health of the office population.

Key Words

COVID, Testing, Antigen Test, Community Spread, Pandemic Modeling.

Appendices

References

Abbott. (2020). BinaxNOW COVID-19 Ag CARD. https://www.fda.gov/media/141570/download

Ayenigbara, I. O., Adeleke, O. R., Ayenigbara, G. O., Adegboro, J. S., & Olofintuyi, O. O. (2020). COVID-19 (SARS-CoV-2) pandemic: Fears, facts and preventive measures. Germs, 10(3), 218-228. Retrieved from https://remote.baruch.cuny.edu/login?url=https://www-proquest-com.remote.baruch.cuny.edu/scholarly-journals/covid-19-sars-cov-2-pandemic-fears-facts/docview/2444522545/se-2?accountid=8500

Becton Dickinson. (2020). Quick Reference Instructions for BD Veritor. https://www.fda.gov/media/139755/download

Centers for Disease Control and Prevention. (2021, January 19). Estimated Disease Burden of COVID-19. Centers for Disease Control and Prevention. https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/burden.html.

Centers for Disease Control and Prevention. (2021, January 28). Evaluation of Abbott BinaxNOW Rapid Antigen Test for SARS-CoV-2 Infection at Two Community-Based Testing Sites - Pima County, Arizona, November 3–17, 2020. Centers for Disease Control and Prevention. https://www.cdc.gov/mmwr/volumes/70/wr/mm7003e3.htm.

Centers for Disease Control and Prevention. (2021, March 17). Overview of Testing for SARS-CoV-2. Centers for Disease Control and Prevention. https://www.cdc.gov/coronavirus/2019-ncov/php/testing/expanded-screening-testing.html.

Centers for Disease Control and Prevention. (2021, March 31). COVID-19 Case Surveillance Public Use Data with Geography. Centers for Disease Control and Prevention. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data-with-Ge/n8mc-b4w4.

COVID-19: Data. COVID-19: Latest Data - NYC Health. (n.d.). https://www1.nyc.gov/site/doh/covid/covid-19-data.page.

Crozier, A. (2021), Put to the test: use of rapid testing technologies for covid-19. BMJ 2021;372:n208 https://www.bmj.com/content/372/bmj.n208

Gao, Z., Xu, Y., Sun, C., Wang, X., Guo, Y., Qiu, S., & Ma, K. (2020, May 15). A systematic review of asymptomatic infections with COVID-19. Journal of Microbiology, Immunology and Infection. https://www.sciencedirect.com/science/article/pii/S1684118220301134.

He, J., Guo, Y., Mao, R., & Zhang, J. (2020, August 13). Proportion of asymptomatic coronavirus disease 2019: A systematic review and meta‐analysis. Wiley Online Library. https://onlinelibrary.wiley.com/doi/full/10.1002/jmv.26326.

Kohmer, N., Toptan, T., Pallas, C., Karaca, O., Pfeiffer, A., Westhaus, S., Widera, M., Berger, A., Hoehl, S., Kammel, M., Ciesek, S., & Rabenau, H. F. (2021). The Comparative Clinical Performance of Four SARS-CoV-2 Rapid Antigen Tests and Their Correlation to Infectivity In Vitro. Journal of clinical medicine, 10(2), 328. https://doi.org/10.3390/jcm10020328

Krüttgen, A., Cornelissen, C., Dreher, M., Hornef, M., Imöhl, M., Kleines, M. (2021). Comparison of the SARS-CoV-2 Rapid antigen test to the real star Sars-CoV-2 RT PCR kit. Journal of Virological Methods, Volume 288, 114024. ISSN 0166-0934, https://doi.org/10.1016/j.jviromet.2020.114024. (https://www.sciencedirect.com/science/article/pii/S0166093420302767)

Liu, L., Hu, T., Bao, S., Wu, H., Peng, Z., & Wang, R. (2020). Estimating unreported COVID-19 cases in the United States based on the tvSIRu model.

LumiraDx. (2020). SARS-CoV-2 Ag Test Strip Product Insert. https://www.fda.gov/media/141304/download

New York Department of Health. (2020). SARS-CoV-2 Point of Care (POC) Antigen Tests Frequently Asked Questions for Health Care Providers [White Paper]. New York Department of Health. https://coronavirus.health.ny.gov/system/files/documents/2020/10/sars-cov-2-antigen-tests-faq.pdf

Nissen, K., Hagbom, M., Krambrich, J., Akaberi, D., Sharma, S., Ling, J., Hoffman, T., Svensson, L., Bondeson, K., Salaneck, E. (2021).Presymptomatic viral shedding and infective ability of SARS-CoV-2; a case report Heliyon, Volume 7, Issue 2. ISSN 2405-8440, https://doi.org/10.1016/j.heliyon.2021.e06328. (https://www.sciencedirect.com/science/article/pii/S2405844021004333)

Park, M., Pawliuk, C., Nguyen, T., Griffitt, A., Dix-Cooper, L., Fourik, N., & Dawes, M. (2020, January 1). Determining the period of communicability of SARS-CoV-2: A rapid review of the literature. medRxiv. https://www.medrxiv.org/content/10.1101/2020.07.28.20163873v1.

Quidel. (2020). Sofia SARS Antigen FIA. https://www.fda.gov/media/137885/download

Shyu, D., Dorroh, J., Holtmeyer, C., Ritter, D., Upendran, A., Kannan, R., Dandachi, D., Rojas-Moreno, C., Whitt, S. P., & Regunath, H. (2020). Laboratory Tests for COVID-19: A Review of Peer-Reviewed Publications and Implications for Clinical Use. Missouri medicine, 117(3), 184–195.

Balancing Mandated Testing and Trust in a Pandemic Workplace

Christopher Bloome

4/1/2021

Abstract

Literature Review

Confirmed Cases

Probable cases

Unreported Cases

Community Variables

Contagiousness

Testing Lag

Results Lag

Conclusions

Global Variables

Test Sensitivity

Asymptomatic Rate

Maximum Contagiousness Period

Antibodies / Turnover

Population Variables

Mandatory Testing Cadence

Testing Lag

Testing PTO

Employee demographics

Start/End Date

Methodology

One Individual

Entire Office, One Simulation

Entire Office, Repeated

Summary Metrics

Experimentation

Office Types

Full Study

Findings and Analysis

Weekly Testing vs Other

Weekly Testing w/ Longer Lag vs Biweekly Testing

Biweekly Testing w/ Shorter Lag vs Weekly Testing

Biweekly vs Triweekly testing

Recommendations

Future Works

Key Words

Appendices

References