Intro

I’ve done this brief analysis to answer a forecasting question (illustrated below) posed on the Good Judgement - Open platform.

What will be the total number of confirmed cases of COVID-19 in Taiwan as of 31 July 2022?

After remaining relatively COVID-free for most of the pandemic, an outbreak has hit Taiwan hard since April 2022 (AP, Focus Taiwan). The question will be suspended on 31 July 2022 and the outcome determined using data as reported by Our World in Data (OWiD) at approximately 5:00PM ET on 3 August 2022 (Our World in Data, parameters set in link).

Method

My forecasting process follows two simple steps:

  1. Examine historical evidence to create a baseline estimation.
  2. Examine other sources of evidence that might cause deviation from the baseline.

I start this forecast by loading a set of standard libraries and data.

Next, I plot missing data:

I then remove variables that have missing data in more than 40% of the cases, then I select the variables of interest.

# keeping variables with 60% or more observation rate
# then selecting variables of interest
df <- data_processed[, which(colMeans(!is.na(data_processed))> 0.6)] %>%
  select(date, total_cases, new_cases, new_cases_smoothed,
         total_cases_per_million, new_cases_per_million,
         new_cases_smoothed_per_million, reproduction_rate,
         total_tests, new_tests,
         new_tests_per_thousand, new_tests_smoothed, new_tests_smoothed_per_thousand,
         tests_per_case, positive_rate)

My aim is to estimate what the total number of cases is going to be on 31 July 2022. After exploring the data for a bit in esquisse, I decided that an autoregressive time series analysis would best suit the problem.

I used Facebook’s (Meta’s) prophet package for this. I then made use of their quick start tutorial to produce a baseline forecast.

##             ds    yhat yhat_lower yhat_upper
## 917 2022-07-26 4537076    4486821    4588949
## 918 2022-07-27 4563070    4514923    4613981
## 919 2022-07-28 4588434    4531235    4642415
## 920 2022-07-29 4612957    4562496    4667960
## 921 2022-07-30 4637751    4582504    4697902
## 922 2022-07-31 4662139    4598779    4719549

The model predicts that the number of cases on the 31st of July will be 4.66 million [4.59, 4.72].

There is no case-specific information that I’ll use to make this forecast. This is because the new cases rate would have to be greatly different within a very shot amount of time (13 days) for the final outcome to lie outside the most likely bracket (see brackets below).

In light of this, I decided to make my forecast as follows:

The confidence interval is at 80% confidence level, with a narrow bandwidth. But the cases bracket (between 4.3 and 5.3 million is quite large and it is much less likely for the final outcome to be outside of this spectrum. The 98% probability assigned to this bracket is subjective then. I guestimated a 2% chance that a new variant will cause a sharp increase in the number of cases. As for the bracket that’s below the 4.3 million mark, since the current rate is 4.28 million, and there were some 17k new cases in the previous day (and even more the days before), I think it is almost impossible that less than 4.3 million cases will be reported - unless there will be some counting corrections :D.

Further investigation into prophet output

Nothing else to note here.