I’ve done this brief analysis to answer a forecasting question (illustrated below) posed on the Good Judgement - Open platform.
After remaining relatively COVID-free for most of the pandemic, an outbreak has hit Taiwan hard since April 2022 (AP, Focus Taiwan). The question will be suspended on 31 July 2022 and the outcome determined using data as reported by Our World in Data (OWiD) at approximately 5:00PM ET on 3 August 2022 (Our World in Data, parameters set in link).
My forecasting process follows two simple steps:
I start this forecast by loading a set of standard libraries and data.
Next, I plot missing data:
I then remove variables that have missing data in more than 40% of the cases, then I select the variables of interest.
# keeping variables with 60% or more observation rate
# then selecting variables of interest
df <- data_processed[, which(colMeans(!is.na(data_processed))> 0.6)] %>%
select(date, total_cases, new_cases, new_cases_smoothed,
total_cases_per_million, new_cases_per_million,
new_cases_smoothed_per_million, reproduction_rate,
total_tests, new_tests,
new_tests_per_thousand, new_tests_smoothed, new_tests_smoothed_per_thousand,
tests_per_case, positive_rate)
My aim is to estimate what the total number of cases is going to be on 31 July 2022. After exploring the data for a bit in esquisse, I decided that an autoregressive time series analysis would best suit the problem.
I used Facebook’s (Meta’s) prophet package for this. I then made use of their quick start tutorial to produce a baseline forecast.
## ds yhat yhat_lower yhat_upper
## 917 2022-07-26 4537076 4486821 4588949
## 918 2022-07-27 4563070 4514923 4613981
## 919 2022-07-28 4588434 4531235 4642415
## 920 2022-07-29 4612957 4562496 4667960
## 921 2022-07-30 4637751 4582504 4697902
## 922 2022-07-31 4662139 4598779 4719549
The model predicts that the number of cases on the 31st of July will be 4.66 million [4.59, 4.72].
There is no case-specific information that I’ll use to make this forecast. This is because the new cases rate would have to be greatly different within a very shot amount of time (13 days) for the final outcome to lie outside the most likely bracket (see brackets below).
In light of this, I decided to make my forecast as follows:
The confidence interval is at 80% confidence level, with a narrow bandwidth. But the cases bracket (between 4.3 and 5.3 million is quite large and it is much less likely for the final outcome to be outside of this spectrum. The 98% probability assigned to this bracket is subjective then. I guestimated a 2% chance that a new variant will cause a sharp increase in the number of cases. As for the bracket that’s below the 4.3 million mark, since the current rate is 4.28 million, and there were some 17k new cases in the previous day (and even more the days before), I think it is almost impossible that less than 4.3 million cases will be reported - unless there will be some counting corrections :D.
Nothing else to note here.