## ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
## ✓ tibble 3.1.4 ✓ tsibble 1.1.1
## ✓ dplyr 1.0.7 ✓ tsibbledata 0.4.0
## ✓ tidyr 1.1.3 ✓ feasts 0.2.2
## ✓ lubridate 1.8.0 ✓ fable 0.3.1
## ✓ ggplot2 3.3.5
## Warning: package 'tsibbledata' was built under R version 4.1.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## x lubridate::date() masks base::date()
## x dplyr::filter() masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval() masks lubridate::interval()
## x dplyr::lag() masks stats::lag()
## x tsibble::setdiff() masks base::setdiff()
## x tsibble::union() masks base::union()
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.4 ──
## ✓ fma 2.4 ✓ expsmooth 2.3
##
##
## Attaching package: 'fpp2'
## The following object is masked from 'package:fpp3':
##
## insurance
The challenge is to predict the number of COVID-19 cases and fatalities around the world. The Kaggle contest uses data from the beginning of the pandemic (March-May) with the goal of predicting cases for for May and June. The key difficulty will be using a limited amount of data with a changing trend to predict into the future. This model could be used (or could have been used, should it have been made earlier) to predict near-term cases which could guide public policy.
The problem will continue to change. The goal here is to provide another potential model or avenue for predicting the spread of Covid, which could later be applied to a broader data for more current predictions.
The contest can be found here: https://www.kaggle.com/c/covid19-global-forecasting-week-5/data?select=submission.csv
This investigation is based on the following key questions:
Since this research is being done retroactively, it primarily serves to show how well time-series models can predict Covid cases, which could be useful for the prediction of future pandemics or future models of Covid-19. It is important to note that much has changed in terms of variants, public policy, and human behavior since the start of the pandemic. Thus, this research does not seek to assert the accuracy of particular values or predictions, but instead use the best performing models as a good reference for future research.
Below I converted the data into a time series
df<-read.csv("/Users/Luke/Documents/BC/Predictive Analytics/Midterm/covid19-global-forecasting-week-5/train.csv")
confirm <- df[which(df$Target == "ConfirmedCases"),]
fatal <- df[which(df$Target == "Fatalities"),]
confirm_2 <- aggregate(TargetValue ~ Date, data = confirm, FUN = sum)
confirm_2$Date<-as.Date(confirm_2$Date)
fatal_2 <- aggregate(TargetValue ~ Date, data = fatal, FUN = sum)
fatal_2$Date<-as.Date(fatal_2$Date)
c_ts<- tsibble(confirm_2, index = Date)
f_ts<- tsibble(fatal_2, index = Date)
Cases and deaths show similar patterns, with upticks in deaths lagging slightly behind cases. Logically, this occurs because the virus takes some time to cause deaths. Both deaths and fatalities do not spike until march, then both follow and up-and-down pattern with a relatively constant pattern through the spring. This presents challenges for trend-based or linear models which will assume an upward trend, but a general shift may have occurred with the onset of summer as some places experience outbreaks and other places see reduced cases.
autoplot(c_ts)+
labs( y= "Cases", title= "Worldwide Covid Cases Feb-June 2020")
## Plot variable not specified, automatically selected `.vars = TargetValue`
autoplot(f_ts)+
labs(y = "Deaths", title= "Worldwide Covid Deaths Feb-June2020")
## Plot variable not specified, automatically selected `.vars = TargetValue`
Additonally, when we examine the ACF and pACF plots for cases and fatalities. In the ‘cases’ plot, we can conclude that the data is nonstationary since we see a tailing pattern on on the acf plot and a spike at lag=1 on the pACF plot, which suggests that there are seasonal and nonseasonal components to this model. The fatalities shows general decay, but an up-and-down trend suggesting low levels of seasonality.
autoplot(acf(c_ts))
autoplot(pacf(c_ts))
autoplot(acf(f_ts))
autoplot(pacf(f_ts))
Below, I split the data into a test and training set, with slightly more than 80% of the series in the training set. This slightly larger training set was used to to accomodate for the change in trend accross the span, with the hopes that the models may better capture some of the oscillation that ocurrs after April 1.
c_train <- filter(c_ts, between(c_ts$Date, as.Date("2020-01-23"), as.Date("2020-05-01")))
c_test <- filter (c_ts, between(c_ts$Date, as.Date("2020-05-01"), as.Date("2020-06-10")))
f_train<- filter(f_ts, between(f_ts$Date, as.Date("2020-01-23"), as.Date("2020-05-01")))
f_test <- filter (f_ts, between(f_ts$Date, as.Date("2020-05-01"), as.Date("2020-06-10")))
The models to be used below are a linear model, a linear model using a fourier component, an Exponetional Smooth (ETS) model, and an AutoRegressive Integrated Moving Average (ARIMA) model.
The linear model was selected primarily as a benchmark. Since we are not provided with breadth of data about policies, demographics, and other regional data, a linear model will likely perform poorly but is worth examining to show if there is utility form other models. The Linear Model can be adjusted to use a fourier sequence, which could be useful given the up-and-down trend of the data after its uptick in April.
Additionally, I will use ETS models, which use averages between data points to smooth overall trends and often perform better than linear models. Finally, I will use the ARIMA model which uses similar features as ETS but – in short – performs autoregression between terms to come up with a model that has important differences from ETS.
Much of the literature on this topic has come since June 2020 and is therefore based on more complete data than this investigation will use. Thus, many of the models examined here may more accurately predict results, but will also serve to drive the research I will engage in here.
It is important to note here that many of the existing models acknowledge that changes in public policy can have a dramatic impact on results. Zhao et al proposed a method for short term predictions using incidence cases with a Poisson distributions, a gamma distribution for the time series interval, and future cases from posterior distributions (Zhao et. al, 2021). Their model – given a fixed set of policy and variant frameworks – predicted well over short internals which is useful on the regional level for policymaking. This study used similar data as will be applied here. Relatedly Wattanachit et al. found that ensemble forecasts could icnreased the robustness of existing forecasts.
Another strand of the research uses a broader set of predictors Barcellos, Fernandes and de Souza use a broader set of predictors to predict morbidity and death using linear-type models(Barcellos, 2021). Other research has used machine learning algorithms – XGBoost, Random Forest, etc – and found that the RF model predicts covid deaths very well.
Research that is more directly in line with the investigation here compared time series models – ARIMA, ETS, NNAR, TBATS – to the second wave of covid hospitalizations in Italy . The results find that in the case of hospitalizations, hybrid models functioned as the strongest predictors, but relevant to my investigation here, ARIMA models performed well and were present in all the best hybrid models. (Perone, 2021)
For each of the models below, I will use AIC, AICc, and BIC to evaluate model accuracy, and RMSE, MSAE, and MAE to evaluate fit. The ideal model will have a combination of low metrics for model fit and acuracy.
I will begin by going through each model for ‘cases’ then do the models in bulk for ‘fatalities.’
Below is the linear model. We can see on inspection that a simple linear model with just time and trend way overpredicts future outcomes. When we include the fourier series, it predicts more accurately, but the model still relies too heavily on trend and misses the mark in terms of prediction, with a high RMSE, MSE, and MSAE implying poor model fit when tested on the test set.
When we include data from cases (I’m not sure if this was allowed or intended), we can achieve a very low RMSE of approx 4000, the lowest of all the models. MASE and MAP are also quite lowe. In this case, allowing for the exogenous variable of cases allows us to predict a high level of accuracy compared to the models that exclude outside variables. The challenge with this approach is that cases and deaths are only lagged by 2-3 weeks, so a practical model would not have this wide six week window for prediction.
lin_cases<- c_train %>%
model(TSLM(TargetValue ~ trend()))%>%
report()
## Series: TargetValue
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -49986 -26173 5171 25326 52752
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -39642.4 5945.1 -6.668 0.00000000154 ***
## trend() 1911.6 102.2 18.704 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29500 on 98 degrees of freedom
## Multiple R-squared: 0.7812, Adjusted R-squared: 0.7789
## F-statistic: 349.8 on 1 and 98 DF, p-value: < 0.000000000000000222
c_train$Deaths<-f_train$TargetValue
De<-c_train %>%
model(TSLM(Deaths ~ TargetValue + trend()))%>%
report()
## Series: Deaths
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -3244.9 -431.6 165.0 353.9 6553.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -506.28392 373.98852 -1.354 0.179
## TargetValue 0.07032 0.00527 13.342 <0.0000000000000002 ***
## trend() 5.27297 11.39939 0.463 0.645
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1539 on 97 degrees of freedom
## Multiple R-squared: 0.8991, Adjusted R-squared: 0.897
## F-statistic: 432.1 on 2 and 97 DF, p-value: < 0.000000000000000222
de_predict <- forecast(De, new_data = c_test, h=40)
## Warning: Input forecast horizon `h` will be ignored as `new_data` has been
## provided.
de_predict
## # A fable: 41 x 5 [1D]
## # Key: .model [1]
## .model Date Deaths .mean TargetValue
## <chr> <date> <dist> <dbl> <int>
## 1 TSLM(Deaths ~ TargetValue + … 2020-05-01 N(11113, 2463898) 11113. 157734
## 2 TSLM(Deaths ~ TargetValue + … 2020-05-02 N(9952, 2469857) 9952. 141156
## 3 TSLM(Deaths ~ TargetValue + … 2020-05-03 N(9311, 2483757) 9311. 131963
## 4 TSLM(Deaths ~ TargetValue + … 2020-05-04 N(8676, 2504388) 8676. 122862
## 5 TSLM(Deaths ~ TargetValue + … 2020-05-05 N(9166, 2498585) 9166. 129748
## 6 TSLM(Deaths ~ TargetValue + … 2020-05-06 N(10041, 2487607) 10041. 142112
## 7 TSLM(Deaths ~ TargetValue + … 2020-05-07 N(10509, 2486416) 10509. 148698
## 8 TSLM(Deaths ~ TargetValue + … 2020-05-08 N(10453, 2492022) 10453. 147831
## 9 TSLM(Deaths ~ TargetValue + … 2020-05-09 N(9743, 2510780) 9743. 137659
## 10 TSLM(Deaths ~ TargetValue + … 2020-05-10 N(8344, 2562847) 8344. 117680
## # … with 31 more rows
augment(De) %>%
ggplot(aes(x = Date)) +
geom_line(aes(y = Deaths, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
scale_colour_manual(
values = c(Data = "black", Fitted = "#D55E00")
) +
labs(y = "Cases",
title = "Linear Model with Cases as Predictor") +
guides(colour = guide_legend(title = "Series"))
new_f_test<-f_test
new_f_test$Deaths<-new_f_test$TargetValue
c_train%>%
autoplot(Deaths) +
autolayer(de_predict)
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
acc_de=accuracy(de_predict, new_f_test)
acc_de%>%kbl(caption="Cases predicting Death")%>%kable_classic(html_font="Cambria")
| .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|
| TSLM(Deaths ~ TargetValue + trend()) | Test | -3421.232 | 4039.478 | 3542.627 | -62.34604 | 63.47889 | NaN | NaN | 0.7743951 |
augment(De) %>%
ggplot(aes(x = Date)) +
geom_line(aes(y = Deaths, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
scale_colour_manual(
values = c(Data = "black", Fitted = "#D55E00")
) +
labs(y = "Cases",
title = "Linear Model with Cases as Predictor") +
guides(colour = guide_legend(title = "Series"))
augment(lin_cases) %>%
ggplot(aes(x = Date)) +
geom_line(aes(y = TargetValue, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
scale_colour_manual(
values = c(Data = "black", Fitted = "#D55E00")
) +
labs(y = "Cases",
title = "Linear Model with Trend") +
guides(colour = guide_legend(title = "Series"))
lin_cases2<- c_train %>%
model(TSLM(TargetValue ~ trend()+fourier(K = 1)))%>%
report()
## Series: TargetValue
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -54857 -23759 3230 22864 45961
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -39852.1 5924.2 -6.727 0.00000000125 ***
## trend() 1913.4 101.9 18.786 < 0.0000000000000002 ***
## fourier(K = 1)C1_7 4990.2 4141.1 1.205 0.231
## fourier(K = 1)S1_7 4792.0 4174.2 1.148 0.254
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29380 on 96 degrees of freedom
## Multiple R-squared: 0.7874, Adjusted R-squared: 0.7807
## F-statistic: 118.5 on 3 and 96 DF, p-value: < 0.000000000000000222
augment(lin_cases2) %>%
ggplot(aes(x = Date)) +
geom_line(aes(y = TargetValue, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
scale_colour_manual(
values = c(Data = "black", Fitted = "#D55E00")
) +
labs(y = "Cases",
title = "Linear Model with Fourier") +
guides(colour = guide_legend(title = "Series"))
harmfc_predict <- forecast(lin_cases2, h=40)
c_train%>%
autoplot(TargetValue) +
autolayer(harmfc_predict)
library(kableExtra)
acc_harmfc=accuracy(harmfc_predict, c_test)
acc_harmfc%>%kbl(caption="ETS")%>%kable_classic(html_font="Cambria")
| .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|
| TSLM(TargetValue ~ trend() + fourier(K = 1)) | Test | -44439.62 | 47040.24 | 44439.62 | -30.95849 | 30.95849 | NaN | NaN | 0.6472627 |
Ultimately, the linear model including cases as a predictor outperformed the other lienear models on measures of fit. In terms of accuracy, though, it seems to introduce a high level of bias (lower accuracy) and would therefore be stronger if it included other predictors.
For the ETS model, I used the ETS() function to identify the optimal components for the ets model. In this case, it chose an AAA model, implying an additive model for error, trend, seasonality. The resulting data for measures of fit and accuracy for the model shows that it dramatically outperforms the linear and fourier model.
#chose AAA model
c_ets <- c_train %>%
model(ETS(TargetValue))
report(c_ets)
## Series: TargetValue
## Model: ETS(A,A,A)
## Smoothing parameters:
## alpha = 0.1498886
## beta = 0.127196
## gamma = 0.3731849
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5]
## -4377.998 1098.091 -1924.491 -3910.677 -6475.073 -1537.856 1833.689 6317.191
## s[-6]
## 5697.216
##
## sigma^2: 78724309
##
## AIC AICc BIC
## 2291.010 2294.596 2322.272
c_ets_predict <- forecast(c_ets, h=40)
c_train %>%
autoplot(TargetValue) +
autolayer(c_ets_predict)
library(kableExtra)
acc_ets=accuracy(c_ets_predict, c_test)
acc_ets%>%kbl(caption="ETS")%>%kable_classic(html_font="Cambria")
| .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|
| ETS(TargetValue) | Test | 40478.36 | 48771.27 | 40478.36 | 26.6126 | 26.6126 | NaN | NaN | 0.869517 |
de_predict <- forecast(De, new_data = c_test, h=40)
## Warning: Input forecast horizon `h` will be ignored as `new_data` has been
## provided.
de_predict
## # A fable: 41 x 5 [1D]
## # Key: .model [1]
## .model Date Deaths .mean TargetValue
## <chr> <date> <dist> <dbl> <int>
## 1 TSLM(Deaths ~ TargetValue + … 2020-05-01 N(11113, 2463898) 11113. 157734
## 2 TSLM(Deaths ~ TargetValue + … 2020-05-02 N(9952, 2469857) 9952. 141156
## 3 TSLM(Deaths ~ TargetValue + … 2020-05-03 N(9311, 2483757) 9311. 131963
## 4 TSLM(Deaths ~ TargetValue + … 2020-05-04 N(8676, 2504388) 8676. 122862
## 5 TSLM(Deaths ~ TargetValue + … 2020-05-05 N(9166, 2498585) 9166. 129748
## 6 TSLM(Deaths ~ TargetValue + … 2020-05-06 N(10041, 2487607) 10041. 142112
## 7 TSLM(Deaths ~ TargetValue + … 2020-05-07 N(10509, 2486416) 10509. 148698
## 8 TSLM(Deaths ~ TargetValue + … 2020-05-08 N(10453, 2492022) 10453. 147831
## 9 TSLM(Deaths ~ TargetValue + … 2020-05-09 N(9743, 2510780) 9743. 137659
## 10 TSLM(Deaths ~ TargetValue + … 2020-05-10 N(8344, 2562847) 8344. 117680
## # … with 31 more rows
Similar to above model, I allowed the ARIMA() function to automatically select specifications. The ARIMA model peforms better than the ETS, with a much lowers RMSE, MSE, and MAPE. Additionally, the accuracy values(AIC, AICc and BIC) are all lower, suggesting better fit and accuracy.
c_arima <- c_train %>%
model(ARIMA(TargetValue))
report(c_arima)
## Series: TargetValue
## Model: ARIMA(2,0,2)(0,1,2)[7]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1 sma2
## 0.0208 0.9496 0.5747 -0.2688 -0.3161 -0.3313
## s.e. 0.0469 0.0473 0.1191 0.1091 0.1329 0.1276
##
## sigma^2 estimated as 77334443: log likelihood=-975.94
## AIC=1965.89 AICc=1967.21 BIC=1983.62
c_arima_predict <- forecast(c_arima, h=40)
c_train %>%
autoplot(TargetValue) +
autolayer(c_arima_predict)
acc_arima=accuracy(c_arima_predict, c_test)
acc_arima%>%kbl(caption="ARIMA")%>%kable_classic(html_font="Cambria")
| .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|
| ARIMA(TargetValue) | Test | -3281.024 | 10773.56 | 8334.135 | -2.911194 | 5.845196 | NaN | NaN | 0.4839195 |
Below are the equivalent with outputs for ETS and ARIM models for deaths. We once again see that the predictive qualities of the ARIMA model are superior, with lower accuracy and fit values, hence it is a stronger model for fatalities.
f_ets <- f_train %>%
model(ETS(TargetValue))
report(f_ets)
## Series: TargetValue
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.3327147
## beta = 0.002484359
##
## Initial states:
## l[0] b[0]
## -12.89152 14.59331
##
## sigma^2: 0.1899
##
## AIC AICc BIC
## 1643.177 1643.815 1656.203
f_ets_predict <- forecast(f_ets, h=40)
f_train %>%
autoplot(TargetValue) +
autolayer(f_ets_predict)
acc_ets_f=accuracy(f_ets_predict, f_test)
acc_ets_f%>%kbl(caption="ETS Fatalities")%>%kable_classic(html_font="Cambria")
| .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|
| ETS(TargetValue) | Test | -4467.807 | 5033.8 | 4516.64 | -82.32491 | 82.7544 | NaN | NaN | 0.7271761 |
f_arima <- f_train %>%
model(ARIMA(TargetValue))
report(f_arima)
## Series: TargetValue
## Model: ARIMA(1,1,1)(1,0,0)[7]
##
## Coefficients:
## ar1 ma1 sar1
## 0.3749 -0.7218 0.4519
## s.e. 0.1658 0.1056 0.1068
##
## sigma^2 estimated as 1461947: log likelihood=-842.5
## AIC=1693 AICc=1693.42 BIC=1703.38
f_arima_predict <- forecast(f_arima, h=40)
f_train %>%
autoplot(TargetValue) +
autolayer(f_arima_predict)
acc_arima_f=accuracy(f_arima_predict, f_test)
acc_arima_f%>%kbl(caption="ARIMA Fatalities")%>%kable_classic(html_font="Cambria")
| .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|
| ARIMA(TargetValue) | Test | -1960.634 | 2554.761 | 2091.078 | -40.05277 | 41.27876 | NaN | NaN | 0.6184166 |
The primary limitations of these models relate to incomplete information. If we conclude, as I did, that the ARIMA model predicts future cases most accurately and the linear model using cases as in input is most accurate. The problem with the model that makes cases endogenous to deaths is that the time horizon used for the predictive model is not the same as the time horizon in real life. Thus, for any utility beyond 2-3 weeks, we would need predictions for cases. By combining the ARIMA predictions for cases and the cases-endogenous model for deaths, we could have a model that may be reliable for cases and thereby deaths. This could be used to anticipate needs in testing, deaths, and hospital beds/needs.
Additionally, this model does not account for public policy measures, behavior changes, or virus mutations. A model that has wider information would do a better job predicting, and the cases-endogenous trend model could easily be expanded to use more information. Finally, this does not use hybrid models (such as combined ETS and ARIMA), and therefore suffers in prediction accuracy compared to other models available.
In the future, the following research could be included to improve models and predictive accuracy: 1) Use data that spans a wider period, especially once public health measures are in place 2) include a wider set of predictor variables such as public policy, medications available, and vaccination. 3) Use the most advanced models for case predicts, such as hybrid models.
Ultimately, such a model could be used for predicting trends, especially if it was easy for local scientists to adapt the specific measure’s in their community and use as a predictive modeling tool.
The key learning here is that cases are a great predictor of deaths, and that ARIMA models can predict cases well. Additionally, the data presents a unique challenge given the flat and jump nature. Using either autoregressive models, or endogenizing cases functions to make a better predictive model.
Ultimately, the data are fully available, and we could measure these models over a long period of time. They would likely perform poorly because of the degree of omitted variable bias, such as public policy, variant, vaccine, and behavioral changes accross a longer timescale. These are essential to a strong model, but were not available in this time period.
Barcellos, D.d.S., Fernandes, G.M.K. & de Souza, F.T. Data based model for predicting COVID-19 morbidity and mortality in metropolis. Sci Rep 11, 24491 (2021). https://doi.org/10.1038/s41598-021-04029-6
Moulaei, K., Shanbehzadeh, M., Mohammadi-Taghiabad, Z. et al. Comparing machine learning algorithms for predicting COVID-19 mortality. BMC Med Inform Decis Mak 22, 2 (2022). https://doi.org/10.1186/s12911-021-01742-0
Evan L Ray, Nutcha Wattanachit, Jarad Niemi, Abdul Hannan Kanji,et al, Ensemble Forecasts of Coronavirus Disease 2019 (COVID-19) in the U.S., CDC(2021)https://doi.org/10.1101/2020.08.19.2017749 medRxiv 2020.08.19.20177493; doi: https://doi.org/10.1101/2020.08.19.20177493
Zhao, Hongwei, Mercent, N, McNulty, Alyssa, Radcliff, Tiffany, Cole, Murray, Fischer, Rebecca, Sang, Huiyan, Ory, Marci COVID-19: Short term prediction model using daily incidence data (2021)https://doi.org/10.1371/journal.pone.0250110