Covid Deaths Histogram

Author

Dr Andrew Dalby

Background

The following bar chart was published on Twitter which caused an intense discussion about the number of Covid deaths in children and the risks associated with sending children back to school.

This is a bar chart showing the percentage of Covid deaths that have occurred in each age group.

Bar Chart of the Number of Covid Deaths by Age Groups

The bar is highlighted for the 1-14 age group. This age group is also much wider than all of the others which also raises some questions. Why does that age range stand out?

The mortality rate for children aged 1-14 will in general be low and so a very small number of Covid deaths can have a significant impact on this percentage. In the original data the deaths are recorded for 1-4, 5-9 and 10-14 but often there are weeks with no Covid deaths for this age range and deaths overall are less than 10 which indicates that a single death would be 10% for that week. For this reason the person who prepared the figure decided to pool the data for children aged from 1-14 to reduce the variability of the percentage.

Looking at the ONS data I also suggest that there has been an update in the Covid deaths for this age group so that the peak is not quite as high as reported.

Reproducing the Chart

The data from the ONS converted to percentages for each of the age groups for the first 12 weeks of 2023 is available from the following link. You will need to cut and paste as RPubs seems to forbid links.

https://drive.google.com/file/d/1LooigmKxg3Gh9nvQz1-wKQjg-nD9-Xmu/view?usp=share_link

library(tidyr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)

data <- read.csv("Covid Deaths as a Percentage of Total Deaths.csv", header=TRUE)
data_long <- as.data.frame(gather(data, age, percentage, All.ages:X90.))

data_long <- data_long %>%
  mutate(age = recode(age,
                      "All.ages"="All",
                      "X.1"="<1",
                      "X01.04"="1-4",
                      "X05.09"="5-9",
                      "X10.14"="10-14",
                      "X01.14"="1-14",
                      "X15.19"="15-19",
                      "X20.24"="20-24",
                      "X25.29"="25-29",
                      "X30.34"="30-34",
                      "X35.39"="35-39",
                      "X40.44"="40-44",
                      "X45.49"="45-49",
                      "X50.54"="50-54",
                      "X55.59"="55-59",
                      "X60.64"="60-64",
                      "X65.69"="65-69",
                      "X70.74"="70-74",
                      "X75.79"="75-79",
                      "X80.84"="80-84",
                      "X85.89"="85-89",
                      "X90."="90+"
                      ))

data_long$age <- factor(data_long$age,levels=c("All","<1","1-14","1-4","5-9",
                                               "10-14","15-19",
                              "20-24","25-29","30-34","35-39","40-44","45-49",
                              "50-54","55-59","60-64","65-69","70-74","75-79",
                              "80-84","85-89","90+"))

total <- subset(data_long,data_long$Week=="Total")

ggplot(total, aes(x=age, y=percentage))+
  geom_point(col = "cornflowerblue", size=3)+
  geom_segment(
    aes(y=0, yend=percentage, x=age, xend=age),
    linewidth = 1,
    col="cornflowerblue")+
  geom_text(size=3,aes(label=round(percentage,2)), 
            position=position_dodge(width=0.9), vjust=-1)+
  ylim(0,5.5)+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  labs(
    x= "Age Group",
    y= "Percentage of deaths due to Covid",
    title="Covid Deaths as a Percentage of Total Deaths by Age for England and Wales "
  )

Variability in Weekly Data for Children Aged 0-14

There is still a peak for ages 5-9 and 10-14 which look like they have the same mortality rate as 50 and 60 year olds respectively. As I said before this is mostly because the total number of deaths for these age groups are very low and a single Covid death can make up a large proportion of the total deaths. This can be seen more clearly as an effect when you look at the weekly death data.

children <- subset(data_long,data_long$age=="<1"|data_long$age=="1-4"|data_long$age=="5-9"|data_long$age=="10-14"|data_long$age=="15-19"|data_long$age=="All")

children$Week <- as.factor(children$Week)

children$Week <- factor(children$Week,levels=c("1","2","3","4","5","6","7","8","9","10","11","12","Total"))

ggplot(children, aes(x=Week, y=percentage, group= age, color=age)) +
  geom_point() +
  geom_line()+
  labs(
   x= "Week of 2023",
    y= "Percentage of deaths due to Covid",
    title="Weekly Covid Deaths as a Percentage of Total Deaths by Age"  
  )

There is considerable variability including one week where the percentage goes above 15% because of the very small number involved. In order to get a better picture you would have to collect the data for a much longer period of time. It is going to be very difficult to determine if the virus has mutated to a variant that seriously affects younger people because of this issue of small numbers. You would need to check the “excess deaths” for the age groups to get a better feel for the effects of Covid when looking at weekly or monthly data.

Calculating The Confidence Intervals

We can calculate the confidence intervals for the proportions using the normal approximation to the binomial distribution.

Confidence limits for p are:

\[ \dfrac{p+\dfrac{a^{2}}{2n}\pm\dfrac{a}{\sqrt{n}} \sqrt{pq+a^{2}/4n} }{1+a^{2}/n} \]

Where n is the number of observations and a is the tabulated value of z for the chosen level of alpha.

From the ONS data we have to add the total number of deaths for the weeks up until the 24th of March.

deaths <- c(160016, 604,103,54,91,248,216,300,428,689,1070,1513,1999,3516,5236,7335,9947,11428,21410,25502,29290,36285)
total1 <- data.frame(total,deaths)

From this we can them calculate the confidence intervals for an alpha value of 0.05 where a=1.96.

p <- total1$percentage/100
q <- 1-p
n <- total1$deaths
a <- 1.96
b <- a^2
lower <- 100*( (p + b/(2*n) - (a/(sqrt(n)))*(sqrt((p*q)+b/(4*n)))) /(1+(b/n)) )
upper <- 100*( (p + b/(2*n) + (a/(sqrt(n)))*(sqrt((p*q)+b/(4*n)))) /(1+(b/n)) )
ci <- data.frame(total1,lower,upper)

Then we can plot the percentages and the confidence intervals to show that the confidence intervals for children are very wide because of the large degree of variability in the data.

ggplot(ci, aes(x=age, y=percentage))+
  geom_errorbar(aes(ymin=lower, ymax=upper), colour ="cornflowerblue", width=0.3)+
  geom_point(size=2)+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  labs(
    x= "Age Group",
    y= "Percentage of deaths due to Covid",
    title="Covid Deaths as a Percentage of Total Deaths by Age for England and Wales "
  )