Visit the website of Sciensano (https://epistat.wiv-isp.be/covid/) and download the dataset including the mortality data by age, sex and region over time (csv-file; Mortality by date, age, sex and region).
The analysis were done using package incidence2
#set working directory (path) and import COVID-19 mortality dataset (Sciensano)
setwd("C:/Users/naima/Documents/R_studio_files")
hosp_dat <-read.table("COVID19BE_MORT.csv", header = TRUE, sep =",")
#view data
str(hosp_dat)
## 'data.frame': 10443 obs. of 5 variables:
## $ DATE : chr "2020-03-07" "2020-03-10" "2020-03-11" "2020-03-11" ...
## $ REGION : chr "Brussels" "Brussels" "Flanders" "Brussels" ...
## $ AGEGROUP: chr "75-84" "85+" "85+" "65-74" ...
## $ SEX : chr "M" "F" "M" "M" ...
## $ DEATHS : int 1 1 1 1 1 1 1 1 1 2 ...
summary(hosp_dat)
## DATE REGION AGEGROUP SEX
## Length:10443 Length:10443 Length:10443 Length:10443
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## DEATHS
## Min. : 1.000
## 1st Qu.: 1.000
## Median : 2.000
## Mean : 3.156
## 3rd Qu.: 3.000
## Max. :55.000
Produce a plot showing the daily incidence of deaths over time aggregated over age, sex and region and indicate on the x-axis the first day of each month.
daily_incidence <- incidence(hosp_dat, date_index = DATE, count = DEATHS, interval = "day")
plot(daily_incidence, title= "Daily incidence of deaths over time", xlab = "Time", ylab = "Daily incidence of death", fill= "blue", angle = 30, date_format = "%d/%m/%Y", date_breaks = "1 month")
This figure shows the number of daily deaths in the different COVID-19 waves between March 2020 and November 2022. The number ofdaily deaths was the highest during the two first waves.
Note that incidence refers to daily new deaths (incidence count) and not incidence rate ( new deaths / risk population).
Show the cumulative incidence of deaths over time.
cum_incidence <- cumulate(daily_incidence)
plot(cum_incidence, title= "Cumulative incidence of deaths over time", xlab = "Time", ylab = "Cumulative incidence of deaths", fill= "blue", angle = 30, date_format = "%d/%m/%Y", date_breaks = "1 month")
Produce the same plots for the daily number of deaths over time, by age group, region and sex. Briefly explain the key differences from an epidemiological point of view.
At the start I produced plots integrating the subgroups, but I decided to switch to a plot per subgroup to allow for a better visual interpretation of the findings.
incidence_age <-incidence(hosp_dat, date_index = DATE, count = DEATHS, groups =(AGEGROUP), na_as_group = FALSE, interval = "day")
facet_plot(incidence_age, facets = AGEGROUP, title= "Daily incidence of deaths over time, by agegroup", xlab = "Time", ylab = "Daily incidence of deaths", fill= "blue", angle = 90, date_format = "%d/%m/%Y", date_breaks = "1 month", size = 5)
Note: missing data excluded.
Key differences from an epidemiological point of view
The daily death cases over time differed in de different age groups and increased with increasing age: higher absolute numbers of daily deaths was observed in the age group 65+ (65-74y, 75-84y and the highest in 85+). We should however take into account the denominators to have a better estimation of the association between mortality and age. This mortality data is specifically linked to COVID-19 but there can be some overestimation (see also publication in Eurosurveillance on Sciensano website). Therefore, it is better to also look at “oversterfte” or excess mortality per age group.
incidence_region <-incidence(hosp_dat, date_index = DATE, count = DEATHS, groups =(REGION), interval= "day")
facet_plot(incidence_region, facets = REGION, title= "Daily incidence of deaths over time, by region", xlab = "Time", ylab = "Daily incidence of deaths", fill= "blue", angle = 90, date_format = "%d/%m/%Y", date_breaks = "1 month", size = 5)
Note:no missing data
Key differences from an epidemiological point of view
The absolute number of daily deaths seems to be the lowest in Brussels and the highest in Flanders. Denominators (number of inhabitants per region) are also here key to be able to compare the incidences of mortality over the regions and assess whether there is an association between region of residence and mortality.
incidence_sex <-incidence(hosp_dat, date_index = DATE, count = DEATHS, groups =(SEX), na_as_group = FALSE, interval= "day")
facet_plot(incidence_sex, facets = SEX, title= "Daily incidence of deaths over time, by sex", xlab = "Time", ylab = "Daily incidence of deaths", fill= "blue", angle = 90, date_format = "%d/%m/%Y", date_breaks = "1 month", size = 5)
Note: missing data excluded.
Key differences from an epidemiological point of view
Overall the two curves for daily mortality are very similar except for the first wave of the pandemic in which there seems to be more women who died than men. Further analysis is needed to assess whether sex is associated with death.
incidence_age <-incidence(hosp_dat, date_index = DATE, count = DEATHS, groups =(AGEGROUP), na_as_group = FALSE, interval= "day")
cum_incidence_age <- cumulate(incidence_age)
facet_plot(cum_incidence_age, facets = AGEGROUP, title= "Cumulative incidence of deaths over time, by agegroup", xlab = "Time", ylab = "Daily incidence of deaths", fill= "blue", angle = 90, date_format = "%d/%m/%Y", date_breaks = "1 month", size = 5)
Note: missing data excluded.
Key differences from an epidemiological point of view
The highestcumulative incidence of death were observed in the age group 65+ and increased with age (65-74y < 75-84y < in 85+). Increasing age seem to be assocatiated with mortality based on this data.
incidence_region <-incidence(hosp_dat, date_index = DATE, count = DEATHS, groups =(REGION), interval= "day")
cum_incidence_region <- cumulate(incidence_region)
facet_plot(cum_incidence_region, facets = REGION, title= "Cumulative incidence of deaths over time, by region", xlab = "Time", ylab = "Daily incidence of deaths", fill= "blue", angle = 90, date_format = "%d/%m/%Y", date_breaks = "1 month", size = 5)
Note:no missing data
Key differences from an epidemiological point of view
The cumulative number of daily deaths seems to be the lowest in Brussels and the highest in Flanders. However, the number of inhabitants per region are very different. Denominators (number of inhabitants per region) are also here key to be able to compare the incidences of mortality over the regions and assess whether there is an association between region of residence and mortality.
incidence_sex <-incidence(hosp_dat, date_index = DATE, count = DEATHS, groups =(SEX), na_as_group = FALSE, interval= "day")
cum_incidence_sex <- cumulate(incidence_sex)
facet_plot(cum_incidence_sex, facets = SEX, title= "Cumulative incidence of deaths over time, by sex", xlab = "Time", ylab = "Daily incidence of deaths", fill= "blue", angle = 90, date_format = "%d/%m/%Y", date_breaks = "1 month", size = 5)
Note: missing data excluded.
Key differences from an epidemiological point of view
The cumulative incidence of death in females is similar to that of male. However, progressively the cumulative death in men is higher. Further analysis is needed to assess whether sex is associated with death.
Create a new data frame containing the total number of deaths per month for the different age categories and by gender, aggregated over region. Write this file out as a .csv file. Show the first lines of the data frame in the R Markdown file.
monthly_incidence <- incidence(hosp_dat, date_index = DATE, count = DEATHS, interval= "month", groups = c(AGEGROUP,SEX))
head(monthly_incidence)
summary(monthly_incidence)
## date range: [2020-mrt] to [2022-nov]
## DEATHS: 32959
## interval: 1 month
## cumulative: FALSE
## timespan: 1005 days
##
## 2 grouped variables
##
## AGEGROUP DEATHS
## <chr> <int>
## 1 <NA> 33
## 2 0-24 16
## 3 25-44 205
## 4 45-64 2451
## 5 65-74 4808
## 6 75-84 9785
## 7 85+ 15661
##
##
## SEX DEATHS
## <chr> <int>
## 1 M 17191
## 2 F 15743
## 3 <NA> 25
write.csv(monthly_incidence,"COVID19BE_MORT_MONTH_AGE_SEX.csv", row.names = TRUE )
head(monthly_incidence, n=20)