The Premise

On Wednesday, May 27th, Premier Jason Kenney of Alberta stated, in the provincial legislature, that “The average age of death from COVID in Alberta is 83, and I’ll remind the house that the average life expectancy in the province is 82”.

This is obviously an attempt to “normalize” COVID-19. I will address the fallacy of influenza-COVID comparisons he made in another post, as another attempt to normalize COVID. This though is an attempt to frame the deaths from COVID-19 as acceptable and not worth shutting down the economy over.

Aside from the political problems with his statement, it is a statistical abomination.

Why?

What is Life Expectancy?

The number Premier Kenney quoted, 82 years, is the life expectancy AT BIRTH of babies born in Alberta today. It is NOT the life expectancy of an individual who has already reached 83 years of age, obviously. The implication with juxtaposing the two numbers is that the 83 year old has already “crossed the finish line” and is living on borrowed time, so if COVID comes along and kills them, well, they had a good run and there was no loss of expected life.

But that is not the case.

Life Tables

Demographers, statisticians and others who study such issues use something called life tables to determine the likelihood of an individual at any given age living to any other given age. From that, they can also determine what the average number of years an individual at age X will live further. There is a great resource known as the Canadian Human Mortality Database where they keep just such data.

As an example, let’s use Mr. Kenney’s “average” COVID death age of 83. In 2016, the most recent year for which data is available, an 83 year old in Alberta would be expected to live, on average, another 8 years.

So if they died at 83 from COVID, a new disease that did not exist prior to December, it took, on average, 8 years away from them and their family.

Alberta’s Experience

The Government of Alberta helpfully makes the individual COVID case data available to the public, including age and gender of patients and whether or not they died. If you go to this link, then click on “Data export”, scroll down to the bottom and you will see a button to export a CSV file. Download that CSV file and read it into R.

#load in libraries
library(dplyr)
library(stringr)
library(car)
library(tidyr)
library(readr)
library(ggplot2)
#read in csv
df<-read.csv("C:/Users/tonynickonchuk/Downloads/covid19dataexport.csv",stringsAsFactors = F)
#select variables to keep, filter only to deaths, add count variable for each person who died
df<-df%>%select(Gender:Case.status)%>%filter(Case.status=="Died")%>%mutate(count=1)
#group by gender and age group and summarise number of deaths
summary<-df%>%group_by(Gender,Age.group)%>%summarise(deaths=sum(count))
print(summary)
## # A tibble: 11 x 3
## # Groups:   Gender [2]
##    Gender Age.group   deaths
##    <chr>  <chr>        <dbl>
##  1 Female 20-29 years      1
##  2 Female 50-59 years      1
##  3 Female 60-69 years      4
##  4 Female 70-79 years      9
##  5 Female 80+ years       53
##  6 Male   30-39 years      1
##  7 Male   40-49 years      1
##  8 Male   50-59 years      1
##  9 Male   60-69 years      6
## 10 Male   70-79 years     19
## 11 Male   80+ years       47

This gives you a data frame with the number of deaths by age and gender. How do we know the precise age of each patient? We don’t unfortunately. So, for the purposes of simulation, we can just simulate their age.

This data table from the 2016 Canadian Census lists the population in Alberta by single years and gender. From there, I will sample from each population group just to get a potential age for each person in the death dataset.StatsCan data tables are a bit messy when you download them, so I went and cleaned it up first in Excel so it was just single year populations for male and female, and turned “100 years and over” to “100” so all ages would be numeric.

#read in csv
pop<-read.csv("C:/Users/tonynickonchuk/Downloads/censuspop.csv",stringsAsFactors = F)

#lowest age group in the death data is 20-29, so can remove all populations <20
pop<-pop%>%filter(age>=20)

#add in age groupings from death data; for this I'll use the excellent "recode" function from the car package

pop$age_group<-recode(pop$age,"20:29='20-29 years';30:39='30-39 years';40:49='40-49 years';50:59='50-59 years';60:69='60-69 years';70:79='70-79 years';80:100='80+ years'")

#pull population data into longer dataframe by gender
pop<-pop%>%gather(gender,population,male:female)
#interaction for gender and age group and do same for df and summary
pop$age.gender<-paste(pop$age_group,pop$gender,sep=" ")
df$Gender<-str_replace(df$Gender,"Male","male")
df$Gender<-str_replace(df$Gender,"Female","female")
df$age.gender<-paste(df$Age.group,df$Gender,sep=" ")
summary$Gender<-str_replace(summary$Gender,"Male","male")
summary$Gender<-str_replace(summary$Gender,"Female","female")
summary$age.gender<-paste(summary$Age.group,summary$Gender,sep=" ")

#now I want each person in the population data to be one entry; can use the uncount function for this
#this will take some time as it generates 3 million observations
pop<-uncount(pop,weights=population)

#now need to sample ages
#can write a loop for this
#make placeholder column for ages in df
df$age<-NA
#arrange
df<-df%>%arrange(Gender,Age.group)
df<-df%>%group_by(age.gender)%>%mutate(count=row_number())%>%ungroup()

repeat{
for (i in 1:11){
  age.gender<-summary$age.gender[i]
  ages<-sample(pop$age[pop$age.gender==age.gender],size=summary$deaths[i],replace=F)
  ages<-data.frame(number=1:length(ages),age=ages)
  df$age<-ifelse(df$age.gender==age.gender,ages$age[match(df$count,ages$number)],df$age)
}
  if (mean(df$age)>81){
    break}
}

df<-df%>%select(Gender,age)
print(df)
## # A tibble: 143 x 2
##    Gender   age
##    <chr>  <int>
##  1 female    23
##  2 female    52
##  3 female    61
##  4 female    65
##  5 female    69
##  6 female    65
##  7 female    75
##  8 female    74
##  9 female    79
## 10 female    75
## # ... with 133 more rows

Now we have a tidy little dataframe of the genders and “simulated” ages of each of the 143 deaths in Alberta. In order to check if it’s reasonable, you can calculate the mean of the ages and see if it comes close to the provincial mean.

mean(df$age)
## [1] 81.18182

Years Lost

Now that we have “simulated” ages of the patients who died from COVID in Alberta, we can use life tables to predict the likelihood that they live to see their next birthday, and their next one, and so on and so forth.

First we need to read in the life tables for men and women in Alberta.

male<-read.csv("~/csv/male.csv",stringsAsFactors = F)
male$gender<-"male"
female<-read.csv("~/csv/female.csv",stringsAsFactors = F)
female$gender<-"female"
probs<-rbind(male,female)
probs$age.gender<-paste(probs$Age,".",probs$gender,sep="")

“qx” is the probability of death at a given age before they reach the next year. So, for a male aged 20, for example, qx=0.00103, so a 20 year old male in Alberta has a 0.1% probability of dying before he reaches age 21.

We’ll assign a number to each “patient” in our data frame and then attempt to simulate their survival. We will run 1000 simulations for each patient and determine their average age of death and range. In order to do that we will build a tracking frame for the simulations.

#develop loop to simulate age of death of each patient based on probabilities
df$number<-1:nrow(df)
tracking<-data.frame(pt=1:143)
tracking[2:1001]<-NA
for (i in 1:143){
for (f in 2:1001){
age<-df$age[i]-1
gender<-df$Gender[i]
repeat{
age<-age+1
age.gender<-paste(age,".",gender,sep="")
age.gender<-ifelse(age>110,paste("110",".",gender,sep=""),age.gender)
prob<-probs$qx[match(age.gender,probs$age.gender)]
die<-sample(c(1,0),size=1,prob=c(prob,1-prob))
if(die==1){
    break
  }}
  tracking[i,f]<-age}}
#now we have the ages of death; can import starting age and determine added years of life
tracking$age<-df$age
for (i in 2:1001){
  tracking[i]<-tracking[i]-tracking[1002]
}

Now we have simulated years of life left for each patient who died from COVID in Alberta over 1000 different simulations. From there we can grab the column sums.

sum<-colSums(tracking[2:1001])
mean<-mean(sum)
sd<-sd(sum)
quantile(sum,c(0.025,0.975))
##     2.5%    97.5% 
## 1234.975 1500.025

From that we can see that on average, 1372 person-years of expected life were lost due to COVID, 95% of the time falling between 1234.98 and 1500.03.

graph<-data.frame(x=1:1000,y=sum)
g<-ggplot(graph,aes(y=y))+geom_boxplot()+ggtitle("Boxplot of Expected Years of Life Lost")+labs(y="Years of Expected Life Lost")

What About the 83-year olds?

Jason Kenney implied in his comments in the Legislature that those who make it to 83 and die of COVID already passed the life expectancy finish line of 82. We’ve already discussed why that makes no sense. But what if those patients who already reached 83 did not die of COVID? As in, their lives continued on in an “average” timeline in which COVID did not exist?

To what age would they have been expected to live?

We can figure this out easily from the data we already generated.

old<-subset(tracking,age>=83)
sum<-colSums(old[2:1001])
mean<-mean(sum)
sd<-sd(sum)
quantile(sum,c(0.025,0.975))
##   2.5%  97.5% 
## 326.00 457.05

Even those who made it to 83, without COVID, would’ve lived on average another 5 years, ranging 95% of the time from 4.35-6.09 years, an average total of 390 person-years.

Conclusion

The fact is, COVID is a new disease to which the human species has never been exposed. Most evidence to date from countries where data is available suggests that COVID is not replacing other causes of death. It is adding to death tolls. In many cases, in a magnitude higher than would be expected from the COVID death counts alone.

It is stealing years of life from all ages, but mostly from our most elderly and vulnerable citizens those who, Mr. Kenney should remember, worked so hard during their younger years to make our society what it is today. They deserve better than indifference from the very leaders who should be doing what they can to protect them.