Most frequent causes in the world in 2016
The first graph illustrates the proportion of all deaths for all 32
causes.
# Downloading necessary libraries
library(dplyr)
library(shiny)
library(ggplot2)
library(tidyverse)
library(maps)
library(animation)
library(sf)
library(magrittr)
library(treemapify)
# Defining custom theme
my_theme <- function() {
theme(
panel.border = element_rect(colour = "gray", fill = NA),
panel.background = element_rect(fill = "white"),
panel.grid.major.x = element_line(colour = "gray", linetype = 3, size = 0.5),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(colour = "gray", linetype = 3, size = 0.5),
panel.grid.minor.y = element_blank(),
axis.text = element_text(colour = "black", face = "italic"),
axis.title = element_text(colour = "black"),
axis.ticks = element_line(colour = "black"),
)
}
# Downloading the data and changing names for the map
mortality <- readxl::read_excel("global_mortality.xlsx")%>%
mutate(country = case_when(
country == 'Antigua and Barbuda' ~ 'Antigua',
country == 'United States' | country == 'United States and Virgin Islands' ~ 'USA',
country == 'United Kingdom' ~ 'UK',
country == 'England' ~ 'UK',
country == 'Scotland' ~ 'UK',
country == 'Trinidad and Tobago' ~ 'Trinidad',
country == 'Congo' ~ 'Republic of Congo',
country == 'Democratic Republic of Congo' ~ 'Democratic Republic of the Congo',
TRUE ~ country
))
# Deleting (%) from the causes of death
names(mortality) <- names(mortality) %>%
gsub("\\s\\(%\\)","", .)
# Changing the format to long
mortality.long <- mortality %>%
gather(cause, percent, 4:35) %>%
mutate(cause = as.factor(cause), country = as.factor(country), country_code = as.factor(country_code))
# Creating the dataset for the world
mortality_world <- mortality.long %>%
filter(country_code %in% ("OWID_WRL"), year %in% c(1990, 2016)) %>%
spread(year, percent) %>%
mutate(percentchange = `2016` - `1990`)
# Lollipop graph for the most frequent causes in 2016
ggplot (mortality_world, aes(x = reorder(cause, `2016`), y =`2016`)) +
geom_segment(aes(x = reorder(cause, `2016`), xend = reorder(cause, `2016`), y = 0, yend = `2016`),
color = "gray", lwd = 1) +
geom_point(size = 3, pch = 21, bg = "#990000", color="black") +
coord_flip() +
geom_text(aes(label = paste0(round(`2016`,2), "%")), color = "black", size = 2.7, hjust = -0.3) +
scale_y_continuous(limits = c(0, 40), breaks = c(0, 10, 20, 30, 40),
label = c("0%", "10%", "20%",
"30%", "40%")) +
labs(x = "Cause of Death", y = "Percentage of all deaths",
title = "Percentages by Cause of Death in 2016 (World)") +
theme(plot.title = element_text(hjust = 0, vjust = 1)) +
my_theme()

As we can see from the graph, ‘Cardiovascular diseases’ and ‘Cancers’
accounted for around 50% of all deaths in 2016. Other reasons that had
relatively high percentages (> 4%) are: ‘Respiratory diseases’,
‘Diabetes’, ‘Dementia’, and ‘Lower respiratory infections’.
Comparison of causes’ world shares in 1990 and 2016
The next plot presents the difference in the shares for 1990 and
2016.
# New columns for the changes in the percentages
mortality_world$percentchange <- round(mortality_world$percentchange, digits = 2)
mortality_world$percent_type <- ifelse(mortality_world$percentchange < 0, "increased", "decreased")
mortality_world <- mortality_world[order(mortality_world$percentchange),]
mortality_world$cause <- factor(mortality_world$cause, levels = mortality_world$cause)
mortality_world$r_2016 <- round(mortality_world$`2016`)
# Changes in the share 1990-2016
ggplot(mortality_world,aes(x = factor(cause), y = percentchange, label = percentchange)) +
geom_bar(stat = "identity", aes(fill = percent_type), width = .5) +
geom_text(color = "black", size = 2.5, nudge_y = 0.2) +
coord_flip() +
scale_fill_manual(name="Percent Change",
labels = c("Increased", "Decreased"),
values = c("increased"="#ff9966", "decreased"="#6699ff")) +
labs(y = NULL, x = NULL, title = "Difference in the Shares for 1990 and 2016 (World)") +
my_theme()
Turned out that ‘Cardiovascular diseases’ and ‘Cancers’ not only had the
highest shares in 2016 but also had the highest rise in compassion with
1990. ‘Diabetes’ and ‘Dementia’, which had fairly high values in 2016,
have grown quite significantly as well (by 2.4% and 2.29% respectively).
Additionally, we can notice that the ‘HIV/AIDS’ is also quite high in
this list. Compared to 1990, it grew by 1.27% (the highest rise relative
to its value in 2016 which was equal to 1.89%).
Conversely, ‘Neonatal deaths’, ‘Diarrheal diseases’, ‘Lower
respiratory diseases’, and ‘Tuberculosis’ had the highest drop in 2016
shares in comparison with 1990.
The changes in the world shares for the top 10 causes in 2016
through years
The graph below shows how the share for the 10 most frequent causes
changed through the years.
# Change through years for top 10 diseases
mortality.long %>% filter(country_code=='OWID_WRL') %>%
filter(cause %in% c('Cardiovascular diseases', 'Cancers', 'Respiratory diseases',
'Diabetes', 'Dementia', 'Lower respiratory infections',
'Neonatal deaths', 'Diarrheal diseases', 'Road accidents',
'Liver diseases')) %>%
ggplot(aes(x=year, y=percent, color=cause)) +
geom_line(size=1.2) +
geom_point(size=3) +
theme(legend.position="top") +
guides(color=guide_legend(nrow=2, byrow=TRUE)) +
labs(x = "Year", y = "Percentage of all deaths",
title = "Top 10 causes of death through years (World)") +
scale_x_continuous(breaks = seq(1990, 2016, by = 5)) +
scale_y_continuous(breaks = c(10, 20, 30),
label = c("10%", "20%", "30%")) +
scale_color_brewer(palette="Set1") +
my_theme()

We can observe that ‘Cardiovascular diseases’ and ‘Cancers’ steadily
rise with each year. As for the other 8 causes, ‘Diabetes’ and
‘Dementia’ are the only ones for which the shares increase consistently.
In 2016, they were among the five most frequent causes of death,
although in 1990 they were not. It is noticeable how a constant positive
trend has lifted ‘Diabetes’ and ‘Dementia’ up the list.
Changes in shares of ‘Cardiovascular diseases’ and ‘Cancers’ for
different regions through years
From the previous findings, it is evident that ‘Cardiovascular
diseases’ and ‘Cancers’ are the most frequent causes of death on earth
with ever-increasing proportions. Therefore, I decided to look at the
distribution of their shares in different world regions.
# Creating region variable
mortality.long$region <- case_when(mortality.long$country %in% c("Central Asia","East Asia","Southeast Asia","South Asia") ~ "Asia",
mortality.long$country == "Oceania" ~ "Oceania",
mortality.long$country == "Latin America and Caribbean" ~ "South America",
mortality.long$country == "North America" ~ "North America",
mortality.long$country %in% c("Central Europe", "Eastern Europe","Western Europe") ~ "Europe",
mortality.long$country %in% c("North Africa and Middle East","Sub-Saharan Africa") ~ "Africa",
TRUE ~ "Other")
# Boxplot for Cardiovascular diseases
p1 <- mortality.long %>%
filter(cause %in% c("Cardiovascular diseases")) %>%
filter(region != "Other") %>%
group_by(region, year) %>%
summarize(percent = mean(percent, na.rm=TRUE)) %>%
ggplot(aes(x=reorder(region, -percent), y=percent, fill=region)) +
geom_boxplot() +
scale_fill_brewer(palette = "RdBu") +
labs(y = "Percentage of all deaths",
title = "Cardiovascular diseases different world regions") +
scale_y_continuous(breaks = c(10, 20, 30, 40 ,50),
label = c("10%", "20%", "30%", "40%", "50%")) +
theme(axis.text.x = element_text(angle = 35, hjust = 1),
axis.title.x= element_blank(),
legend.position = "none",
text = element_text(size = 20)) +
my_theme()
# Boxplot for Cancers
p2 <- mortality.long %>%
filter(cause %in% c("Cancers")) %>%
filter(region != "Other") %>%
group_by(region, year) %>%
summarize(percent = mean(percent, na.rm=TRUE)) %>%
ggplot(aes(x=reorder(region, -percent), y=percent, fill=region)) +
geom_boxplot() +
scale_fill_brewer(palette = "RdBu") +
labs(y = "Percentage of all deaths",
title = "Cancers in different world regions") +
scale_y_continuous(breaks = c(10, 20, 30, 40 ,50),
label = c("10%", "20%", "30%", "40%", "50%")) +
theme(axis.text.x = element_text(angle = 35, hjust = 1),
axis.title.x= element_blank(),
legend.position = "none",
text = element_text(size = 20)) +
my_theme()
p1
p2


As we can see from the graph, the highest shares of ‘Cardiovascular
diseases’ through the years are in Europe with a mean of 50%. The
following regions are North America and Asia with an average of about
35%.
As for ‘Cancers’ the highest shares are for North America and Europe
with the mean values above 20%.
Noticeably, the shares for these two causes are the lowest in Oceania
and Africa.
Lastly, I checked the changes in shares for ‘Cardiovascular diseases’
on a country level using the animated map below.
# Creating dataset for the map
world_map <- st_as_sf(maps::map('world', plot = FALSE, fill = TRUE))
map_years <- world_map %>%
full_join(mortality, by = c('ID' = 'country'))
# Function for the animated map
gif_map <- function(data, cause_of_death) {
fill_lim <- range(data %$% get(cause_of_death), na.rm = T)
data_list <- split(data, data$year)
out <- lapply(data_list, function(data, cause_of_death, fill_lim){
p <- ggplot(data, aes(fill = get(cause_of_death))) +
geom_sf(size = .2, color = 'black') +
scale_fill_gradient(low = "#ffefef", high = "#720000", space = "Lab",
na.value = "#c6c6c6", guide = "colourbar",
limits = c(fill_lim[1], fill_lim[2]),
name = cause_of_death) +
ggthemes::theme_map() +
labs(title = paste(cause_of_death,"in", unique(data$year))) +
theme(panel.grid.major = element_line(colour = 'gray', size = .2, linetype = 'dashed'),
legend.position = 'bottom')
print(p)
}, cause_of_death, fill_lim)
}
img <- magick::image_graph(width = 800, height = 350, res = 96)
# gif for Cardiovascular diseases
gif_map(map_years, "Cardiovascular diseases")
dev.off()
animation <- magick::image_animate(img, fps = 1)
# Savinf gif
magick::image_write(animation, path = 'cardio_map.gif')

The highest values are observed in Russia and Eastern Europe.
Moreover, the share did not change much through the years and stayed at
a high level for the whole 1990-2016 period. Additionally, it is
noticeable that the share of ‘Cardiovascular diseases’ dropped for North
America, south part of Latin America and Australia.
Conclusion
The highest shares in 2016 were for ‘Cardiovascular diseases’ and
‘Cancers’ (32.26% and 16.32% respectively). Moreover, these causes had
the constant and the biggest rise throughout the whole available period.
The highest values of shares for these two causes are observed in Europe
and North America.
The project has potential limitations due to the availability of data
only until 2016. The pandemic in 2020 could substantially change the
results. Therefore, it can be further improved by the addition of
updated statistics.