The topic that has been on everyone’s mind these last few months is the coronavirus and how it has affected the world drastically. The concensus of most countries has been that people need to shelter-in-place and social distance to prevent the spread of the virus, with the hopes of eventually containing it completely so that life can return to normal. This entire situation has been tragic in many ways. So far there have been 252,950 deaths worldwide, and unfortunately the end is likely not in the near future. The shelter-in-place orders have also taken a different toll on people, with the level of unemployment skyrocketing. According to a recent article by The Guardian, “…in just six weeks an unprecedented 30 million Americans have now sought unemployment benefits and the numbers are still growing.” The situation has bankrupted many small businesses and people are struggling to pay their bills now that they can no longer work. The question many want to know the answer to is when will this be over? When will life return to normal, if ever? Attempts to forecast the future of the coronavirus are being made with the intent to answer this question, or at least give an idea as to what we could be seeing in the near future.
The Kalman Filter, which is an algorithm that uses a series of measurements observed over time and produces estimates of unknown variables by estimating a joint probability distribution over the variables for each time frame, has shown to be an effective method at forecasting coronavirus cases, deaths, and recoveries. Here is a look at Kalman predictions for U.S. confirmed cases alongside the actual data. This model was done by Ran Kremer of medium.com, and it is made of one-day predictions, that were continuously generated for the upcoming day.
The predicted data was extremely close to the actual data throughout the model. The number of confirmed cases between April 12th and April 13th increased by 25,306, and the projected increase for April 14th was 28,732, suggesting that the rate of increase was not slowing down.
Looking back, this was in fact the case. The increase in cases has followed a linear growth rate with roughly 30,000 new confirmed cases in the United States daily.
Researchers from the Singapore University of Technology and Design made a predictive model for the coronavirus using a SIR model. The SIR model is a well known mathematical model used in epidemiology to predict the spread of disease. SIR stands for Susceptible, Infected, and Recovered. In the case of the coronavirus, the entire world population is susceptible, the infected portion is the confirmed cases, and the recovered portion is the confirmed recoveries. This data was fed into a standard SIR model, which uses differential equations.
In their predictions as of April 28th, the number of new cases per day at the moment should be at about 1,000 for Italy, and below 250 for Singapore. They listed a theoretical ending of all new cases at June 16th for Singapore, and October 10th for Italy.
Singapore reported 573 new cases between May 3rd and May 4th, while Italy reported 1,219 new cases. They have a 3-day moving average of 559 and 1,503 respectively. While not exactly the same as the SIR model’s predictions, it is somewhat close and both Italy and Singapore do seem to be on a downward trend. The researchers noted that this pandemic is expected to follow an S-shaped curve with a long tail to the right, and that seems to be accurate. The SIR model seems a bit optimistic, but it comparing it to the actual data does provide an encouraging outlook.
The peer-reviewed scientific journal PLOS One published a research article “Forecasting the novel coronavirus COVID-19.” Their method of forecasting the number of total confirmed cases was “simple time series forecasting approaches” (Petropoulos, 2020). Here is what their forecast looks like:
They did five total forecasts, beginning on February 1st, February 11th, February 21st, March 2nd, and March 12th. Their first forecast was off the mark by quite a bit. The successive forecasts were much closer to the reported data. Their final forecast predicted that the number of confirmed cases would slightly increase over time in a linear fashion, similar to how it did throughout February. However, in early March the virus was only just beginning to reach other countries, which then resulted in a huge spike in new cases that has yet to slow down.
As we can see here, the spread of the virus to other countries made the number of total confirmed cases take a turn for the worst, with a significant amount of those cases being from the United States which now has well over a million confirmed cases. That is nearly a third of all confirmed cases in the world.
In comparing the three methods, the Kalman Filter algorithm and the SIR model show the most promise. The Kalman Filter algorithm produces highly accurate forecasts, but is only effective at making short-term forecasts. The SIR model is an established model in epidemiology, and its model for the coronavirus does show promise. It is well known that epidemic outbreaks follow a curve, either a normal distribution curve or one with a longer right tail. The simple time series forecast was far off the actual data in the first and last forecast, because it simply looks at moving averages of what already happened, and does not take major factors into account such as the virus spreading to different countries. While the SIR model gives us an optimistic timeline of when the coronavirus could be completely gone in some countries, it is important to remember that there are other factors that come in to play. Two factors that were mentioned in the SIR article that cause limitations of the model are the quality of data, and policy decision changes. The data almost certainly does not account for all coronavirus infections that have gone undetected, and a policy change such as an abrupt end to social distancing measures could result in a second wave of infections, as we are seeing in some countries already. The Kalman Filter article also cautioned that their model is limited by the fact that “any movement of infected people to other regions can cause a rapid eruption in new areas, as seen in South Korea, Italy, and Iran,” and the model cannot account for this possibility.
The most similar viral outbreak we can compare COVID-19 to is the Spanish Flu of 1918. It lasted over a year, and is estimated to have infected over 500 million people and resulted in 50 million deaths. Here is a look at how social distancing measures affected the death toll:
From here we can see that social distancing measures were effective at getting the death rate to curve downward, and ending social distancing measures did in fact result in a second wave/spike in the death rate in those four U.S. cities. Thankfully, after 24 weeks the death rate was approaching 0. If the coronavirus follows that timeline, that would mean the outbreak should be nearly over in roughly six months from when it started in each city. That is very close to the theoretical endings provided by the SIR model as well. However, this depends heavily on how well countries are able to continue to social distance and limit international travel.