The Background

I have made a couple bar chart races now comparing COVID-19 deaths to other global causes of death. Those are available here and here. I think the second is my preferred of the two, because it is more complete and also attempts to project the global causes of death forward to 2020.

The second one came about because some people criticized the first one as misleading. They said I was only comparing COVID to other causes of death with annual deaths similar to what COVID is on track for. They said I was purposefully leaving out heart disease and cancer to make COVID more severe. I wasn’t purposefully doing so, but I understood the criticism. That is why I made the second one, to address that lack of completeness.

However, the second one comes with its own problems. Is it really fair to compare COVID to heart disease? To cancer? To diabetes? Not necessarily. COVID is a communicable disease. Not all diseases are communicable. Now, in a broad sense, non-communicable causes of death are the biggest killers today. That is a huge shift, historically speaking.

We have done a remarkable job at reducing the global disease burden from communicable disease. This is even seen in the data from the global burden of death from communicable diseases just from 1990 to 2017.

With that in mind, I will now compare the deaths from COVID to date in 2020 to estimated deaths from all other communicable diseases, using the Global Burden of Disease study again as a source of data.

Downloading the Data

First, I need to download all the data. I again use the GBD query tool. Here I look at global death rate for all years from 1990 to 2017, for all ages and both genders. I selected all communicable level 3 diseases (the levels just refer to how narrowly categorized the disease is; higher the level, the more narrow the category). I also want to show the trend over time just so you can get a sense of how much has changed since 1990.

#prior to this load libraries; dplyr,tidyr,matrixStats,lubridate
df<-read.csv("~/Downloads/commdisdeath.csv",stringsAsFactors = F)
df<-df%>%select(year,cause,val)
graph<-df%>%group_by(year)%>%summarise(sum=sum(val))
graph$sum<-graph$sum*78000
graph$sum<-graph$sum/1000000
#~7.8 billion on earth right now which is 78000 100,000s; death rates are deaths per 100,000
g<-ggplot(graph,aes(x=year,y=sum))+geom_line()+ggtitle("Global Deaths from Communicable Diseases",subtitle="Adjusted to 2020 Global Population")+labs(x="Year",y="Millions of Deaths")
g

This simple graph shows just how much deaths from communicable diseases have declined since 1990. And this is why I have to try and forecast how many deaths are likely in 2020 based on this trend, instead of just using averages, as averages will significantly overestimate the likely deaths from communicable diseases in 2020.

Adjusting for Influenza

Just as I did with my previous chart, I need to estimate how many deaths occurred from influenza. Influenza is not listed separately in the Global Burden of Disease study but is instead lumped into Lower Respiratory Infections. However, the same study group estimated global deaths from influenza, which I pulled from the Lancet. In it they give an estimate of 145,000 global deaths from influenza in 2017, with a 95% uncertainty interval of 99,000-200,000. As that is an uneven uncertainty interval, for the sake of simulation I’ll say the estimate is 145,000 and give the higher band of uncertainty of 55,000, which would give a standard deviation of 55000/1.96=28061.22.

#first convert data frame to wide format as previous coding is based on that
df<-df%>%spread(cause,val)
#convert all to numbers of people; Worldometers population clock has the world at 7,794,798,739 people in 2020
df[1:28,2:34]<-df[1:28,2:34]*(7794798739/100000)
df$Influenza<-rnorm(28,mean=145000,sd=28061.22)
df$`Lower respiratory infections`<-df$`Lower respiratory infections`-df$Influenza

Forecasting 2020

Now we can forecast where things might stand in 2020. For this I’ll just do a simple time series forecast.

tracking<-data.frame(disease=colnames(df[2:35]),mean=NA,sd=NA)

for (i in 2:35){
ts<-ts(data=df[i],start=1990,end=2017)
fit<-auto.arima(ts)
fc<-forecast(fit,h=3)
tracking[i-1,2]<-fc$mean[[3]]
tracking[i-1,3]<-(fc$upper[[3,2]]-fc$mean[[3]])/1.96
}

Seasonality of Mortality

Now I have a tracking data frame with the average and standard deviation of expected annual mortality for each disease in 2020. But, mortality tends to be somewhat seasonal, particularly as far as respiratory diseases are concerned. To see how I determine seasonality, see my entry on my previous post.

Simulation of Daily Deaths

I will simulate daily deaths in the same manner as the last post.

#this will be a slightly sloppy way of doing this but oh well
sim<-df[1:35]
names(sim)[1]<-"Date"
sim<-rbind(sim,sim,sim,sim,sim,sim,sim,sim,sim,sim,sim,sim,sim,sim)
sim<-sim[-c(367:392),]
sim$Date<-seq(ymd("2020-01-01"),ymd("2020-12-31"),by="day")
for (i in 2:35){
  sim[,i]<-rnorm(366,mean=tracking[i-1,2],sd=tracking[i-1,3])
}
#that gives you annuals; the averages are over multiple years in which some are leap years, some are not, so divide the annual totals by 365.25 to get dailies
sim[2:35]<-sim[2:35]/365.25
#anything that is below 0 we will just set to 0; I think the only one where this happens is terrorism because annual deaths are so variable for it
for (i in 2:35){
  sim[,i]<-ifelse(sim[,i]<0,0,sim[,i])
}
#and, finally, to apply the seasonality figures; and yes, I am aware that not all causes of death will have seasonality associated with them; this is in a way stacking the deck against COVID in the comparison to give a "fairer" comparison a la criticism received to date
season<-data.frame(month=1:12,adjustment=c(1.166,1.022,1.084,0.969,0.979,.924,.971,.962,.915,.986,.973,1.048))
sim$month<-month(sim$Date)
sim$seas_adj<-season$adjustment[match(sim$month,season$month)]
for (i in 2:35){
  sim[i]<-sim[i]*sim$seas_adj
}
#finally, to calculate cumulative figures for each column and round them to make it neater
for (i in 2:35){
  sim[i]<-cumsum(sim[i])
  sim[i]<-round(sim[i],0)
}
#remove month and seas_adj
sim$month<-NULL
sim$seas_adj<-NULL
#save as csv using whatever directory you wish
#write.csv(sim,"~/Desktop/allcod.csv")

Conclusion

From there, the data gets uploaded to Fluorish along with the COVID deaths and you have your bar chart race. You can see details in the previous posts on how I did this. Enjoy!