How has the principle causes of death in Singapore changed since 2006?

This dataset is retrieved from the Ministry of Health and shows the top 10 Causes of death in Singapore over the period of 14 years, according to the percentage of total death contributed by each cause. This publication aims to see how the top principal causes of death has changed overtime (i.e. any disease becoming more common and increasing in prevalence). The dataset can be retrieved here at Principal Causes of Death, where data for 2006-2011 is recorded in the first sheet and 2012-2019 in the second sheet.

Loading and cleaning the 2006 - 2011 Data

The data is not structured properly for analysis i.e. has empty columns and some principal causes of death are split into two rows (for example, Cerebrovascular Disease and (including stroke) are recorded in separate rows).

We will skip the first 7 rows of the sheet as they contain headings which are not necessary in our analysis. Further, we will not assign the topmost rows to be column names.

knitr::opts_chunk$set(message = FALSE, warning = FALSE)
setwd("C:/Users/X Lin/Desktop/R")
library(readxl)
df06 <- read_xlsx("PCoD.xlsx", skip = 7, col_names = FALSE)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...
head(df06, n = 15)
## # A tibble: 15 x 9
##     ...1 ...2                          ...3   ...4  ...5  ...6  ...7  ...8  ...9
##    <dbl> <chr>                         <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1 Cancer                        NA     28.5  27.7  29.3  29.3  28.5  30  
##  2    NA [ ICD9 : 140-208 ]            NA     NA    NA    NA    NA    NA    NA  
##  3    NA <NA>                          NA     NA    NA    NA    NA    NA    NA  
##  4     2 Ischaemic Heart Disease       NA     18.5  19.8  20.1  19.2  18.7  16.4
##  5    NA [ ICD9 : 410-414 ]            NA     NA    NA    NA    NA    NA    NA  
##  6    NA <NA>                          NA     NA    NA    NA    NA    NA    NA  
##  7     3 Pneumonia                     NA     13.7  13.9  13.9  15.3  15.7  16  
##  8    NA [ ICD9 : 480-486 ]            NA     NA    NA    NA    NA    NA    NA  
##  9    NA <NA>                          NA     NA    NA    NA    NA    NA    NA  
## 10     4 Cerebrovascular Disease       NA      8.9   8.7   8.3   8     8.4   9  
## 11    NA (including stroke)            NA     NA    NA    NA    NA    NA    NA  
## 12    NA [ ICD9 : 430-438 ]            NA     NA    NA    NA    NA    NA    NA  
## 13    NA <NA>                          NA     NA    NA    NA    NA    NA    NA  
## 14     5 Accidents, Poisoning & Viole~ NA      6.3   6     5.8   5.7   5.5   5.5
## 15    NA [ ICD9 : E800-E999 ]          NA     NA    NA    NA    NA    NA    NA

Tidying df06

Remove columns 1 & 3, assign respective headings and rename Causes of death accordingly.

df06 <- df06[,-c(1,3)] #remove empty columns
colnames(df06) <- c("Cause", "2006", "2007","2008","2009","2010","2011") #assign column names
df06$Cause
##  [1] "Cancer"                           "[ ICD9 : 140-208 ]"              
##  [3] NA                                 "Ischaemic Heart Disease"         
##  [5] "[ ICD9 : 410-414 ]"               NA                                
##  [7] "Pneumonia"                        "[ ICD9 : 480-486 ]"              
##  [9] NA                                 "Cerebrovascular Disease"         
## [11] "(including stroke)"               "[ ICD9 : 430-438 ]"              
## [13] NA                                 "Accidents, Poisoning & Violence" 
## [15] "[ ICD9 : E800-E999 ]"             NA                                
## [17] "Other Heart Diseases"             "[ ICD9 : 393-398, 402, 415-429 ]"
## [19] NA                                 "Urinary Tract Infection"         
## [21] "[ ICD9 : 599.0 ]"                 NA                                
## [23] "Chronic Obstructive Lung"         "Disease"                         
## [25] "[ ICD9 : 490-493, 496 ]"          NA                                
## [27] "Nephritis, Nephrotic"             "Syndrome & Nephrosis"            
## [29] "[ ICD9 : 580-589 ]"               NA                                
## [31] "Diabetes Mellitus"                "[ ICD9 : 250 ]"
#Cleaning up column 'Cause'
df06[10,1] <- paste(df06[10,1], df06[11,1])
df06[23,1] <- paste(df06[23,1], df06[24,1])
df06[28,1] <- paste(df06[27,1], df06[28,1])
df06$Cause
##  [1] "Cancer"                                    
##  [2] "[ ICD9 : 140-208 ]"                        
##  [3] NA                                          
##  [4] "Ischaemic Heart Disease"                   
##  [5] "[ ICD9 : 410-414 ]"                        
##  [6] NA                                          
##  [7] "Pneumonia"                                 
##  [8] "[ ICD9 : 480-486 ]"                        
##  [9] NA                                          
## [10] "Cerebrovascular Disease (including stroke)"
## [11] "(including stroke)"                        
## [12] "[ ICD9 : 430-438 ]"                        
## [13] NA                                          
## [14] "Accidents, Poisoning & Violence"           
## [15] "[ ICD9 : E800-E999 ]"                      
## [16] NA                                          
## [17] "Other Heart Diseases"                      
## [18] "[ ICD9 : 393-398, 402, 415-429 ]"          
## [19] NA                                          
## [20] "Urinary Tract Infection"                   
## [21] "[ ICD9 : 599.0 ]"                          
## [22] NA                                          
## [23] "Chronic Obstructive Lung Disease"          
## [24] "Disease"                                   
## [25] "[ ICD9 : 490-493, 496 ]"                   
## [26] NA                                          
## [27] "Nephritis, Nephrotic"                      
## [28] "Nephritis, Nephrotic Syndrome & Nephrosis" 
## [29] "[ ICD9 : 580-589 ]"                        
## [30] NA                                          
## [31] "Diabetes Mellitus"                         
## [32] "[ ICD9 : 250 ]"
#extracting only rows of interest
df06 <- df06[c(1,4,7,10,14,17,20,23,28,31),]
df06
## # A tibble: 10 x 7
##    Cause                               `2006` `2007` `2008` `2009` `2010` `2011`
##    <chr>                                <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 Cancer                                28.5   27.7   29.3   29.3   28.5   30  
##  2 Ischaemic Heart Disease               18.5   19.8   20.1   19.2   18.7   16.4
##  3 Pneumonia                             13.7   13.9   13.9   15.3   15.7   16  
##  4 Cerebrovascular Disease (including~    8.9    8.7    8.3    8      8.4    9  
##  5 Accidents, Poisoning & Violence        6.3    6      5.8    5.7    5.5    5.5
##  6 Other Heart Diseases                   4.3    4.3    4      4.4    4.8    5  
##  7 Urinary Tract Infection                2      2.2    2.1    2.5    2.5    2.5
##  8 Chronic Obstructive Lung Disease       3.3    2.6    2.5    2.4    2.5    2.2
##  9 Nephritis, Nephrotic Syndrome & Ne~    1.7    2      2.1    2.3    2.2    2  
## 10 Diabetes Mellitus                      3.3    3.6    2.7    1.7    1      1.7

Loading and cleaning the 2012 - 2019 Data

Here, we do the same cleaning as we did for the first sheet above before binding the two tidied dataframes.

df12 <- read_xlsx("PCoD.xlsx", sheet = 2, skip = 7, col_names = FALSE)
df12_tidy <- df12[,-c(1,3)]
colnames(df12_tidy) <- c("Cause","2012", "2013","2014","2015","2016","2017","2018","2019")
df12_tidy[10,1] <- paste(df12_tidy[10,1], df12_tidy[11,1])
df12_tidy[18,1] <- paste(df12_tidy[17,1], df12_tidy[18,1])
df12_tidy[21,1] <- paste(df12_tidy[21,1], df12_tidy[22,1])

#Convert all causes to lower case
library(stringr)
df06$Cause<- str_to_lower(df06$Cause)
df12_tidy$Cause <- str_to_lower(df12_tidy$Cause)

df12_tidy <- df12_tidy[c(1,4,7,10,14,18,21,25,28,31),]
df12_tidy
## # A tibble: 10 x 9
##    Cause                 `2012` `2013` `2014` `2015` `2016` `2017` `2018` `2019`
##    <chr>                  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 cancer                  30.1   30.5   29.4   29.7   29.6   29.1  28.8    28.4
##  2 pneumonia               16.8   18.5   19     19.4   19.3   20.1  20.6    20.7
##  3 ischaemic heart dise~   16.1   15.5   16     16.7   17     18.5  18.1    18.8
##  4 cerebrovascular dise~    9.3    8.9    8.4    6.8    6.6    6.3   6       5.8
##  5 external causes of m~    5.6    4.9    4.7    4.5    4.4    4     4.3     4  
##  6 nephritis, nephrotic~    2.4    2.4    2      2.3    1.9    2.4   2.97    3.1
##  7 hypertensive disease~    2.8    3.1    3.6    3.9    4      3.4   2.97   NA  
##  8 urinary tract infect~    2.4    2.6    2.6    2.2    2.3    1.9   2.04    2.3
##  9 other heart diseases     1.9    2      1.9    2.2    1.9    1.9   2.09    2  
## 10 chronic obstructive ~    2.1    1.6    1.8    1.8    1.6    1.5   1.34    1.4
#missing value in Hypertensive diseases, 2019
df12_tidy[7,9] <- df12[22,11]
df12_tidy
## # A tibble: 10 x 9
##    Cause                 `2012` `2013` `2014` `2015` `2016` `2017` `2018` `2019`
##    <chr>                  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 cancer                  30.1   30.5   29.4   29.7   29.6   29.1  28.8    28.4
##  2 pneumonia               16.8   18.5   19     19.4   19.3   20.1  20.6    20.7
##  3 ischaemic heart dise~   16.1   15.5   16     16.7   17     18.5  18.1    18.8
##  4 cerebrovascular dise~    9.3    8.9    8.4    6.8    6.6    6.3   6       5.8
##  5 external causes of m~    5.6    4.9    4.7    4.5    4.4    4     4.3     4  
##  6 nephritis, nephrotic~    2.4    2.4    2      2.3    1.9    2.4   2.97    3.1
##  7 hypertensive disease~    2.8    3.1    3.6    3.9    4      3.4   2.97    2.6
##  8 urinary tract infect~    2.4    2.6    2.6    2.2    2.3    1.9   2.04    2.3
##  9 other heart diseases     1.9    2      1.9    2.2    1.9    1.9   2.09    2  
## 10 chronic obstructive ~    2.1    1.6    1.8    1.8    1.6    1.5   1.34    1.4

Slope graph of the top 10 Principal Causes of Death

2006 - 2011

Cancer is the leading cause of death in Singapore across all years, contributing to approximately one third of total deaths. The incidence of ischaemic heart diseas has fallen whilst pneumonia has contributed to greater deaths in 2011 as compared to 2006.Percentage of total deaths caused by Cerebrovascular diseases, including stroke, was rather similar in both 2006 and 2011. The percentage of deaths caused by the remaining causes remain fairly negligible, each contributing to less than 7% of total deaths in Singapore in both 2006 and 2011.

library(dplyr)
library(tidyr)
library(ggplot2)
df06_cleaned <- df06 %>% gather(key = 'Year', value = 'Percentage_death', -Cause)
df06_cleaned
## # A tibble: 60 x 3
##    Cause                                      Year  Percentage_death
##    <chr>                                      <chr>            <dbl>
##  1 cancer                                     2006              28.5
##  2 ischaemic heart disease                    2006              18.5
##  3 pneumonia                                  2006              13.7
##  4 cerebrovascular disease (including stroke) 2006               8.9
##  5 accidents, poisoning & violence            2006               6.3
##  6 other heart diseases                       2006               4.3
##  7 urinary tract infection                    2006               2  
##  8 chronic obstructive lung disease           2006               3.3
##  9 nephritis, nephrotic syndrome & nephrosis  2006               1.7
## 10 diabetes mellitus                          2006               3.3
## # ... with 50 more rows
df06_slope <- df06_cleaned %>% filter(Year == 2006 | Year == 2011)
df06_slope
## # A tibble: 20 x 3
##    Cause                                      Year  Percentage_death
##    <chr>                                      <chr>            <dbl>
##  1 cancer                                     2006              28.5
##  2 ischaemic heart disease                    2006              18.5
##  3 pneumonia                                  2006              13.7
##  4 cerebrovascular disease (including stroke) 2006               8.9
##  5 accidents, poisoning & violence            2006               6.3
##  6 other heart diseases                       2006               4.3
##  7 urinary tract infection                    2006               2  
##  8 chronic obstructive lung disease           2006               3.3
##  9 nephritis, nephrotic syndrome & nephrosis  2006               1.7
## 10 diabetes mellitus                          2006               3.3
## 11 cancer                                     2011              30  
## 12 ischaemic heart disease                    2011              16.4
## 13 pneumonia                                  2011              16  
## 14 cerebrovascular disease (including stroke) 2011               9  
## 15 accidents, poisoning & violence            2011               5.5
## 16 other heart diseases                       2011               5  
## 17 urinary tract infection                    2011               2.5
## 18 chronic obstructive lung disease           2011               2.2
## 19 nephritis, nephrotic syndrome & nephrosis  2011               2  
## 20 diabetes mellitus                          2011               1.7
plot1 <- ggplot(data = df06_slope, aes(x = Year, y = Percentage_death, group = Cause)) +
  geom_line(aes(color = Cause, alpha = 1), size = 2) +
  geom_point(aes(color = Cause, alpha = 1), size = 4) +
        geom_text(data =df06_slope %>% filter(Year == 2011), aes(label = Cause), hjust = -.05, 
            fontface = "bold", 
            size = 4) +
        theme(legend.position = "none") +
        scale_y_continuous(breaks = seq(0, max(df06_slope$Percentage_death), by = 5)) +
  labs(
    title = "Percentage of total deaths in Singapore, 2006-2011",
    subtitle = "(Principal causes of deaths)",
    caption = "Data source: Ministry of Health"
  ) + ylab("Percentage of total deaths")

 plot1 

2012 - 2019

Cancer remains the leading cause of death in Singapore even beyond 2011, although the percentage of deaths caused by Cancer was lower in 2019 compared to 2012.Similar to the 2006-2011 trend, the percentage of deaths caused by pneumonia has risen. Ischaemic heart diseases contribute more to total deaths in 2019 than in 2012, a reverse of what we see between 2006 and 2011. Notably, cerebrovascular diseases contribute less to total deaths in 2019 than in 2012, an improvement from 2006-2011.The remaining causes of death remains negligible, constituting less than 5% of deaths in 2019.

df12_cleaned <- df12_tidy %>% gather(key = 'Year', value = 'Percentage_death', -Cause)
df12_cleaned
## # A tibble: 80 x 3
##    Cause                                                  Year  Percentage_death
##    <chr>                                                  <chr>            <dbl>
##  1 cancer                                                 2012              30.1
##  2 pneumonia                                              2012              16.8
##  3 ischaemic heart diseases                               2012              16.1
##  4 cerebrovascular diseases (including stroke)            2012               9.3
##  5 external causes of morbidity and mortality             2012               5.6
##  6 nephritis, nephrotic syndrome & nephrosis              2012               2.4
##  7 hypertensive diseases (including hypertensive heart d~ 2012               2.8
##  8 urinary tract infection                                2012               2.4
##  9 other heart diseases                                   2012               1.9
## 10 chronic obstructive lung diseases                      2012               2.1
## # ... with 70 more rows
df12_slope <- df12_cleaned %>% filter(Year == 2012 | Year == 2019)
df12_slope
## # A tibble: 20 x 3
##    Cause                                                  Year  Percentage_death
##    <chr>                                                  <chr>            <dbl>
##  1 cancer                                                 2012              30.1
##  2 pneumonia                                              2012              16.8
##  3 ischaemic heart diseases                               2012              16.1
##  4 cerebrovascular diseases (including stroke)            2012               9.3
##  5 external causes of morbidity and mortality             2012               5.6
##  6 nephritis, nephrotic syndrome & nephrosis              2012               2.4
##  7 hypertensive diseases (including hypertensive heart d~ 2012               2.8
##  8 urinary tract infection                                2012               2.4
##  9 other heart diseases                                   2012               1.9
## 10 chronic obstructive lung diseases                      2012               2.1
## 11 cancer                                                 2019              28.4
## 12 pneumonia                                              2019              20.7
## 13 ischaemic heart diseases                               2019              18.8
## 14 cerebrovascular diseases (including stroke)            2019               5.8
## 15 external causes of morbidity and mortality             2019               4  
## 16 nephritis, nephrotic syndrome & nephrosis              2019               3.1
## 17 hypertensive diseases (including hypertensive heart d~ 2019               2.6
## 18 urinary tract infection                                2019               2.3
## 19 other heart diseases                                   2019               2  
## 20 chronic obstructive lung diseases                      2019               1.4
plot2 <- ggplot(data = df12_slope, aes(x = Year, y = Percentage_death, group = Cause)) +
  geom_line(aes(color = Cause, alpha = 1), size = 2) +
  geom_point(aes(color = Cause, alpha = 1), size = 4) +
        geom_text(data =df12_slope %>% filter(Year == 2019), aes(label = Cause), hjust = -.05, 
            fontface = "bold", 
            size = 4) +
        theme(legend.position = "none") +
        scale_y_continuous(breaks = seq(0, max(df12_slope$Percentage_death), by = 5)) +
  labs(
    title = "Percentage of total deaths in Singapore, 2012-2019",
    subtitle = "(Principal causes of deaths)",
    caption = "Data source: Ministry of Health"
  ) + ylab("Percentage of total deaths")
 plot2   

Side-by-side comparison of the two slope graphs

Compared to 2006-2011, we see that the following causes of deaths are no longer the top 10 principal causes of death between 2012 and 2019:
1. accidents, poisoning & violence
2. diabetes mellitus

Recent causes that have been added to the top 10 principal causes of deaths not seen between 2006-2011 are:
1. External causes of morbidity and mortality (accidents, poisoning and violence could have been recorded under this category)
2. Hypertensive diseases (including hypertensive heart disease)

Notably, Cancer, pneumonia, ischaemic heart disease, cerebrovascular disease are the leading causes of death in Singapore.

library(gridExtra)
grid.arrange(plot1, plot2, ncol = 2)

Joining the two datasets

To see the top principal causes of death across 2006-2019, an inner join is performed to deduce which diseases are in the top 10 principal causes of death in both datsets (2006-2011 and 2012-2019).

#check spelling of Cause in both datasets - we want to make sure they are the same.
df06$Cause
##  [1] "cancer"                                    
##  [2] "ischaemic heart disease"                   
##  [3] "pneumonia"                                 
##  [4] "cerebrovascular disease (including stroke)"
##  [5] "accidents, poisoning & violence"           
##  [6] "other heart diseases"                      
##  [7] "urinary tract infection"                   
##  [8] "chronic obstructive lung disease"          
##  [9] "nephritis, nephrotic syndrome & nephrosis" 
## [10] "diabetes mellitus"
df12_tidy$Cause #in df12, disease is spelled with an 's' i.e. diseases, unlike df06 (spelled as disease). We will correct this to match df06.
##  [1] "cancer"                                                      
##  [2] "pneumonia"                                                   
##  [3] "ischaemic heart diseases"                                    
##  [4] "cerebrovascular diseases (including stroke)"                 
##  [5] "external causes of morbidity and mortality"                  
##  [6] "nephritis, nephrotic syndrome & nephrosis"                   
##  [7] "hypertensive diseases (including hypertensive heart disease)"
##  [8] "urinary tract infection"                                     
##  [9] "other heart diseases"                                        
## [10] "chronic obstructive lung diseases"
class(df12_tidy$Cause)
## [1] "character"
df12_tidy$Cause[ grep("cerebrovascular",df12_tidy$Cause, ignore.case = TRUE)] <- "cerebrovascular disease (including stroke)"
df12_tidy$Cause[ grep("ischaemic",df12_tidy$Cause, ignore.case = TRUE)] <- "ischaemic heart disease"
df12_tidy$Cause[ grep("lung",df12_tidy$Cause, ignore.case = TRUE)] <- "chronic obstructive lung disease"

#Perform inner join on 2006-2011 and 2012-2019 dataset

df_full <- inner_join(df06,df12_tidy, by = "Cause")
df_full
## # A tibble: 8 x 15
##   Cause `2006` `2007` `2008` `2009` `2010` `2011` `2012` `2013` `2014` `2015`
##   <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 canc~   28.5   27.7   29.3   29.3   28.5   30     30.1   30.5   29.4   29.7
## 2 isch~   18.5   19.8   20.1   19.2   18.7   16.4   16.1   15.5   16     16.7
## 3 pneu~   13.7   13.9   13.9   15.3   15.7   16     16.8   18.5   19     19.4
## 4 cere~    8.9    8.7    8.3    8      8.4    9      9.3    8.9    8.4    6.8
## 5 othe~    4.3    4.3    4      4.4    4.8    5      1.9    2      1.9    2.2
## 6 urin~    2      2.2    2.1    2.5    2.5    2.5    2.4    2.6    2.6    2.2
## 7 chro~    3.3    2.6    2.5    2.4    2.5    2.2    2.1    1.6    1.8    1.8
## 8 neph~    1.7    2      2.1    2.3    2.2    2      2.4    2.4    2      2.3
## # ... with 4 more variables: `2016` <dbl>, `2017` <dbl>, `2018` <dbl>,
## #   `2019` <dbl>
df_full_tidy <- df_full %>% gather(key = 'Year', value = 'Percentage_deaths', -Cause)
df_full_tidy
## # A tibble: 112 x 3
##    Cause                                      Year  Percentage_deaths
##    <chr>                                      <chr>             <dbl>
##  1 cancer                                     2006               28.5
##  2 ischaemic heart disease                    2006               18.5
##  3 pneumonia                                  2006               13.7
##  4 cerebrovascular disease (including stroke) 2006                8.9
##  5 other heart diseases                       2006                4.3
##  6 urinary tract infection                    2006                2  
##  7 chronic obstructive lung disease           2006                3.3
##  8 nephritis, nephrotic syndrome & nephrosis  2006                1.7
##  9 cancer                                     2007               27.7
## 10 ischaemic heart disease                    2007               19.8
## # ... with 102 more rows

Top 5 principal causes of death, 2006-2019

A time series graph here shows the percentage of total deaths by each cause of death.Cancer, as we have already seen, is the leading cause of death in Singapore. However, there seems to have been a slight downward trend in the percentage of deaths caused by cancer in recent years. Pneumonia and ischaemic heart diseases, on the other hand, have been contributing to rising numbers of deaths.

library(RColorBrewer)
df_full_tidy %>% filter(Year ==2019) %>% arrange(desc(Percentage_deaths)) #we will focus on the top 5 principal causes of death in 2019 here
## # A tibble: 8 x 3
##   Cause                                      Year  Percentage_deaths
##   <chr>                                      <chr>             <dbl>
## 1 cancer                                     2019               28.4
## 2 pneumonia                                  2019               20.7
## 3 ischaemic heart disease                    2019               18.8
## 4 cerebrovascular disease (including stroke) 2019                5.8
## 5 nephritis, nephrotic syndrome & nephrosis  2019                3.1
## 6 urinary tract infection                    2019                2.3
## 7 other heart diseases                       2019                2  
## 8 chronic obstructive lung disease           2019                1.4
df_full_final <- df_full_tidy %>% filter(Cause != "urinary tract infection" & Cause != "other heart diseases" & Cause != "chronic obstructive lung disease")
unique(df_full_final$Cause) #check that only 5 causes are here
## [1] "cancer"                                    
## [2] "ischaemic heart disease"                   
## [3] "pneumonia"                                 
## [4] "cerebrovascular disease (including stroke)"
## [5] "nephritis, nephrotic syndrome & nephrosis"
df_full_final %>% ggplot(aes(x = Year, y = Percentage_deaths, group = Cause)) + geom_line(aes(color = Cause)) + geom_point(aes(color = Cause)) +
        scale_color_brewer(palette = "Dark2") + ylab("Percentage of total deaths")

Running a linear regression model over 2006-2019

A linear model (blue dashed line) with 95% confidence interval (depicted by the grey areas) is fitted into the data points to show the overall trend.

df_full_final %>% ggplot(aes(x = Year, y = Percentage_deaths, group = Cause)) + geom_line(aes(color = Cause)) + geom_point(aes(color = Cause)) +
        scale_color_brewer(palette = "Dark2") + ylab("Percentage of total deaths") + geom_smooth(method = "lm", linetype = 2, level = 0.95)