Combating the exponential growth bias through extrapolation

Introduction

Note: Official advice on the risk to public health is available from https://www.who.int/

The excellent https://ourworldindata.org/coronavirus analysis contains a section on the public misconceptions of the concept of exponential growth. There are many psychological studies that show that when faced with exponential growth we still tend to think in terms of linear growth processes (1,2,3,4) even when this is not appropriately describing the reality in front of our eyes (1,2,4,8). The bias – to “linearise exponential functions when assessing them intuitively” – is referred to as the exponential growth bias. This research also shows that “neither special instructions about the nature of exponential growth nor daily experience with growth processes” improves the failure to grasp exponential growth processes.

One problem is that the word “exponential” is frequently used loosely in order to describe a process that is growing rapidly, but is not necessarily increasingly exponentially in a mathematical sense. Ironically, exponential growth is not particularly rapid in its early stages. It begins slowly, then accelerates rapidly. This acceleration seems surprising and may be unexpected to those looking at the process from the perspective of the psychological exponential growth bias.

The other issue is that the dynamic of a real epidemic doesn’t (in fact it can’t possibly) follow an exponential trajectory for very long. The incipient stages can be closely approximated as exponential growth. Then the curve always flattens as the epidemic progresses, even without control measures. This is a simple consequences of a reduction in the susceptible population. Extrapolating exponential growth rates over more than a very limited time interval quickly leads to physically impossible projections, such as total mortality figures that exceed the world’s population!

There are two consequences of the exponential growth bias. They act in opposite directions.

In the initial stages, when the growth is indeed approximated as an exponential process, the real impact of the epidemic is underestimated by the bias. If the total number of cases increases from 10 to 20 in a day this would be extremely alarming if modelled as exponential growth. However it could easily be downplayed by the press “only 10 new cases reported yesterday in the UK”. The biased reporting looked at the process in terms of a linear function. This tends to lead to a delay in implementing the measures that would suppress an epidemic in the early stages.
In the later stages of the epidemic, when growth is no longer exponential, a reduction in the initial growth rate can be overlooked under the exponential growth bias. The bias encourages reporting of the absolute size of the numbers, not the proportionate increase. An increase from 100 thousand to 110 thousand in a day would represent a very substantial decrease in the initial growth rate. However it represents a much greater absolute number than found in the early stages. So a journalist thinking in linear terms (or exploiting the bias in public perceptions for effect) would tend to write up headlines such as “Numbers of new cases in the UK soar to ten thousand in one single day!”. This suggests that measures aimed at mitigating an epidemic are ineffectual, when they are in fact working. This could lead to failure to comply with control measures just when they are most effective.

The perceptual exponential growth bias is therefore a barrier to implementing effective public health measures. In order to combat the bias it is important to understand the consequences of exponential growth clearly.

Data download and reformatting

Data have been downloaded from the Johns Hopkins github site on 26 March, 2020 Expand the code tabs to see all the code needed to reproduce the analysis.

library(lubridate)
library(tidyverse)
library(RCurl)
library(ggplot2)

URL <- getURL("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")
data <- read.csv(text = URL, check.names = F)

pivot_longer(data,cols=5:dim(data)[2],names_to = "Date") ->d
names(d)<-c("Province","Country","Lat","Long","Date","NCases")
d$Date<-as.Date(d$Date,format="%m/%d/%y")

Confirmed<-d

URL <- getURL("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv")
data <- read.csv(text = URL, check.names = F)

pivot_longer(data,cols=5:dim(data)[2],names_to = "Date") ->d
names(d)<-c("Province","Country","Lat","Long","Date","NCases")

d$Date<-as.Date(d$Date,format="%m/%d/%y")

Deaths<-d

URL <- getURL("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv")
data <- read.csv(text = URL, check.names = F)

pivot_longer(data,cols=5:dim(data)[2],names_to = "Date") ->d
names(d)<-c("Province","Country","Lat","Long","Date","NCases")

d$Date<-as.Date(d$Date,format="%m/%d/%y")
Recovered<-d


Confirmed %>% group_by(Country, Date) %>% summarise(NCases=sum(NCases)) -> confirmed_country
Deaths %>% group_by(Country, Date) %>% summarise(NDeaths=sum(NCases)) -> deaths_country
Recovered %>% group_by(Country, Date) %>% summarise(NRecovered=sum(NCases)) -> recovered_country

confirmed_country %>% left_join(deaths_country,by = c("Country", "Date")) %>% left_join(recovered_country, by = c("Country", "Date")) -> by_country
 by_country%>%arrange(Date) %>% mutate(New_cases = NCases - lag(NCases, default = first(NCases)), NActive=NCases-NDeaths-NRecovered) -> by_country

Intepretation

The aim of the analysis is to provide extrapolations that can help in avoiding the exponential growth bias.

Do not attempt to use the results to actually predict the future!.

An extrapolation is simply a statement about the present situation, not a prediction of the future. An extrapolation provides the numbers of cases that would be expected if the trajectory continues on its present path. It is highly unlikely to do that even over short time scales and will certainly not do so over longer time scales.

A simple statistical model has been fit to estimate the relationship between the date of observation and the logged number of observations of total confirmed cases, new cases per day and numbers of deaths. This does not represent a mechanistic model of the epidemic and cannot be used for long term prediction of the dynamic. The purpose is simply to allow a short term extrapolation of the current dynamic in order to understand the development of the epidemic.

Statistical confidence intervals are not included in this case as it would not be appropriate to base estimates of uncertainty on the fit of an inappropriate model.

Confirmed cases

United States

by_country %>% filter(Country == "US") %>% filter( Date>= as.Date("2020-03-07")) %>% na.omit()-> d

Model fit to current data

mod<- lm(data=d,log10(NCases+1)~ Date)
doub<-round(log10(2)/coef(mod)[2],1)
sprintf("An exponential model fitted to current data suggests that the number of confirmed cases is currently doubling every %s days", doub)

## [1] "An exponential model fitted to current data suggests that the number of confirmed cases is currently doubling every 2.3 days"

ggplot(d,aes(x=Date,y=NCases, label=round(NCases/1000,1))) + geom_point() + geom_line()+ geom_line(aes(y=10^predict(mod)), lwd=1.2,color="red") + ylab("Number of confirmed cases") + ggtitle("Total number of confirmed cases with fitted exponential curve")

Model extrapolated to April 7

x<-seq(as.Date("2020-3-22"),as.Date("2020-04-7"), by=1)
preds<-data.frame(Date=x)
preds$NCases<-10^predict(mod,newdata=preds)

ggplot(preds,aes(x=Date,y=NCases, label=round(NCases/1000,1))) + geom_point()  + geom_line(lwd=1.2,color="red") + geom_label() + ylab("Number of confirmed cases") + ggtitle("Extrapolated number of cases. Labels show rounded numbers in thousands")

United Kingdom

by_country %>% filter(Country == "United Kingdom") %>% filter( Date>= as.Date("2020-03-07")) %>% na.omit() -> d

Model fit to current data

mod<- lm(data=d,log10(NCases+1)~ Date)
doub<-round(log10(2)/coef(mod)[2],1)
sprintf("An exponential model fitted to current data suggests that the number of confirmed cases is currently doubling every %s days", doub)

## [1] "An exponential model fitted to current data suggests that the number of confirmed cases is currently doubling every 3.1 days"

ggplot(d,aes(x=Date,y=NCases, label=round(NCases/1000,1))) + geom_point() + geom_line()+ geom_line(aes(y=10^predict(mod)), lwd=1.2,color="red") + ylab("Number of confirmed cases") + ggtitle("Total number of confirmed cases with fitted exponential curve")

Model extrapolated to April 7

x<-seq(as.Date("2020-3-22"),as.Date("2020-04-7"), by=1)
preds<-data.frame(Date=x)
preds$NCases<-10^predict(mod,newdata=preds)

ggplot(preds,aes(x=Date,y=NCases, label=round(NCases/1000,1))) + geom_point()  + geom_line(lwd=1.2,color="red") + geom_label() + ylab("Number of confirmed cases") + ggtitle("Extrapolated number of cases. Labels show rounded numbers in thousands")

Italy

by_country %>% filter(Country == "Italy") %>% filter( Date>= as.Date("2020-03-07"))  %>% na.omit()-> d

Model fit to current data

mod<- lm(data=d,log10(NCases+1)~ Date)
doub<-round(log10(2)/coef(mod)[2],1)
sprintf("An exponential model fitted to current data suggests that the number of confirmed cases is currently doubling every %s days", doub)

## [1] "An exponential model fitted to current data suggests that the number of confirmed cases is currently doubling every 4.7 days"

ggplot(d,aes(x=Date,y=NCases, label=round(NCases/1000,1))) + geom_point() + geom_line()+ geom_line(aes(y=10^predict(mod)), lwd=1.2,color="red") + ylab("Number of confirmed cases") + ggtitle("Total number of confirmed cases with fitted exponential curve")

Model extrapolated to April 7

x<-seq(as.Date("2020-3-22"),as.Date("2020-04-7"), by=1)
preds<-data.frame(Date=x)
preds$NCases<-10^predict(mod,newdata=preds)

ggplot(preds,aes(x=Date,y=NCases, label=round(NCases/1000,1))) + geom_point()  + geom_line(lwd=1.2,color="red") + geom_label() + ylab("Number of confirmed cases") + ggtitle("Extrapolated number of cases. Labels show rounded numbers in thousands")

Number of deaths

Note that the United States and the UK lag behind Italy by at least two weeks. The US has been slow to attribute mortality to covid-19. The extrapolation of currently recorded deaths reflects this.

United States

by_country %>% filter(Country == "US") %>% filter( Date>= as.Date("2020-03-07")) %>% na.omit() -> d

Model fit to current data

mod<- lm(data=d,log10(NDeaths+1)~ Date)
doub<-round(log10(2)/coef(mod)[2],1)
sprintf("An exponential model fitted to current data suggests that the cumulative number of deaths is currently doubling every %s days", doub)

## [1] "An exponential model fitted to current data suggests that the cumulative number of deaths is currently doubling every 3.2 days"

ggplot(d,aes(x=Date,y=NDeaths, label=round(NDeaths/1000,1))) + geom_point() + geom_line()+ geom_line(aes(y=10^predict(mod)), lwd=2,color="red") + ylab("Number of deaths") + ggtitle("Cumalative number of deaths with fitted exponential curve")

Model extrapolated to April 7

x<-seq(as.Date("2020-3-22"),as.Date("2020-04-7"), by=1)
preds<-data.frame(Date=x)
preds$NDeaths<-10^predict(mod,newdata=preds)

ggplot(preds,aes(x=Date,y=NDeaths, label=round(NDeaths/1000,1))) + geom_point()  + geom_line(lwd=2,color="red") + geom_label() + ylab("Cumulative number of deaths") + ggtitle("Extrapolated number of deaths.\nLabels show rounded numbers in thousands")

United Kingdom

by_country %>% filter(Country == "United Kingdom") %>% filter( Date>= as.Date("2020-03-07")) %>% na.omit() -> d

Model fit to current data

mod<- lm(data=d,log10(NDeaths+1)~ Date)
doub<-round(log10(2)/coef(mod)[2],1)
sprintf("An exponential model fitted to current data suggests that the cumulative number of deaths is currently doubling every %s days", doub)

## [1] "An exponential model fitted to current data suggests that the cumulative number of deaths is currently doubling every 2.2 days"

ggplot(d,aes(x=Date,y=NDeaths, label=round(NDeaths/1000,1))) + geom_point() + geom_line()+ geom_line(aes(y=10^predict(mod)), lwd=2,color="red") + ylab("Number of deaths") + ggtitle("Cumalative number of deaths with fitted exponential curve")

Model extrapolated to April 7

x<-seq(as.Date("2020-3-22"),as.Date("2020-04-7"), by=1)
preds<-data.frame(Date=x)
preds$NDeaths<-10^predict(mod,newdata=preds)

ggplot(preds,aes(x=Date,y=NDeaths, label=round(NDeaths/1000,1))) + geom_point()  + geom_line(lwd=2,color="red") + geom_label() + ylab("Cumulative number of deaths") + ggtitle("Extrapolated number of deaths.\nLabels show rounded numbers in thousands")

Italy

by_country %>% filter(Country == "Italy") %>% filter( Date>= as.Date("2020-03-07")) %>% na.omit() -> d

Model fit to current data

mod<- lm(data=d,log10(NDeaths+1)~ Date)
doub<-round(log10(2)/coef(mod)[2],1)
sprintf("An exponential model fitted to current data suggests that the cumulative number of deaths is currently doubling every %s days", doub)

## [1] "An exponential model fitted to current data suggests that the cumulative number of deaths is currently doubling every 3.6 days"

ggplot(d,aes(x=Date,y=NDeaths, label=round(NDeaths/1000,1))) + geom_point() + geom_line()+ geom_line(aes(y=10^predict(mod)), lwd=2,color="red") + ylab("Number of deaths") + ggtitle("Cumalative number of deaths with fitted exponential curve")

Model extrapolated to April 7

x<-seq(as.Date("2020-3-22"),as.Date("2020-04-7"), by=1)
preds<-data.frame(Date=x)
preds$NDeaths<-10^predict(mod,newdata=preds)

ggplot(preds,aes(x=Date,y=NDeaths, label=round(NDeaths/1000,1))) + geom_point()  + geom_line(lwd=2,color="red") + geom_label() + ylab("Cumulative number of deaths") + ggtitle("Extrapolated number of deaths.\nLabels show rounded numbers in thousands")

New cases

The data for new cases is much more variable on a day to day basis than the cumulative cases. An exponential model is not a very good description of changes in the rate of change but is included to complete the analysis.

United States

by_country %>% filter(Country == "US") %>% filter( Date>= as.Date("2020-03-07")) %>% na.omit()-> d

Model fit to current data

mod<- lm(data=d,log10(New_cases+1)~ Date)
doub<-round(log10(2)/coef(mod)[2],1)
sprintf("An exponential model fitted to current data suggests that the number of new cases per day is currently doubling every %s days", doub)

## [1] "An exponential model fitted to current data suggests that the number of new cases per day is currently doubling every 2.3 days"

ggplot(d,aes(x=Date,y=New_cases, label=round(New_cases/1000,1))) + geom_point() + geom_line()+ geom_line(aes(y=10^predict(mod)), lwd=2,color="red") + ylab("Number of new cases") + ggtitle("Number of new cases per day with fitted exponential curve")

Model extrapolated to April 7

x<-seq(as.Date("2020-3-22"),as.Date("2020-04-7"), by=1)
preds<-data.frame(Date=x)
preds$New_cases<-10^predict(mod,newdata=preds)

ggplot(preds,aes(x=Date,y=New_cases, label=round(New_cases/1000,1))) + geom_point()  + geom_line(lwd=2,color="red") + geom_label() + ylab("Number of new cases per day") + ggtitle("Extrapolated number of new cases.\nLabels show rounded numbers in thousands")

United Kingdom

by_country %>% filter(Country == "United Kingdom") %>% filter( Date>= as.Date("2020-03-07")) %>% na.omit() -> d

Model fit to current data

mod<- lm(data=d,log10(New_cases+1)~ Date)
doub<-round(log10(2)/coef(mod)[2],1)
sprintf("An exponential model fitted to current data suggests that the number of new cases per day is currently doubling every %s days", doub)

## [1] "An exponential model fitted to current data suggests that the number of new cases per day is currently doubling every 2.9 days"

ggplot(d,aes(x=Date,y=New_cases, label=round(New_cases/1000,1))) + geom_point() + geom_line()+ geom_line(aes(y=10^predict(mod)), lwd=2,color="red") + ylab("Number of new cases") + ggtitle("Number of new cases per day with fitted exponential curve")

Model extrapolated to April 7

x<-seq(as.Date("2020-3-22"),as.Date("2020-04-7"), by=1)
preds<-data.frame(Date=x)
preds$New_cases<-10^predict(mod,newdata=preds)

ggplot(preds,aes(x=Date,y=New_cases, label=round(New_cases/1000,1))) + geom_point()  + geom_line(lwd=2,color="red") + geom_label() + ylab("Number of new cases per day") + ggtitle("Extrapolated number of new cases.\nLabels show rounded numbers in thousands")

Italy

by_country %>% filter(Country == "Italy") %>% filter( Date>= as.Date("2020-03-07")) %>% na.omit() -> d

Model fit to current data

mod<- lm(data=d,log10(New_cases+1)~ Date)
doub<-round(log10(2)/coef(mod)[2],1)
sprintf("An exponential model fitted to current data suggests that the number of new cases per day is currently doubling every %s days", doub)

## [1] "An exponential model fitted to current data suggests that the number of new cases per day is currently doubling every 4.6 days"

ggplot(d,aes(x=Date,y=New_cases, label=round(New_cases/1000,1))) + geom_point() + geom_line()+ geom_line(aes(y=10^predict(mod)), lwd=2,color="red") + ylab("Number of new cases") + ggtitle("Number of new cases per day with fitted exponential curve")

Model extrapolated to April 7

x<-seq(as.Date("2020-3-22"),as.Date("2020-04-7"), by=1)
preds<-data.frame(Date=x)
preds$New_cases<-10^predict(mod,newdata=preds)

ggplot(preds,aes(x=Date,y=New_cases, label=round(New_cases/1000,1))) + geom_point()  + geom_line(lwd=2,color="red") + geom_label() + ylab("Number of new cases per day") + ggtitle("Extrapolated number of new cases.\nLabels show rounded numbers in thousands")

Combating the exponential growth bias through extrapolation

Duncan Golicher

26 March, 2020

Introduction

Data download and reformatting

Intepretation

Confirmed cases

United States

Model fit to current data

Model extrapolated to April 7

United Kingdom

Model fit to current data

Model extrapolated to April 7

Italy

Model fit to current data

Model extrapolated to April 7

Number of deaths

United States

Model fit to current data

Model extrapolated to April 7

United Kingdom

Model fit to current data

Model extrapolated to April 7

Italy

Model fit to current data

Model extrapolated to April 7

New cases

United States

Model fit to current data

Model extrapolated to April 7

United Kingdom

Model fit to current data

Model extrapolated to April 7

Italy

Model fit to current data

Model extrapolated to April 7