Introduction

This project explores global trends in causes of death from 1990 to 2019, using a dataset from Kaggle (https://www.kaggle.com/datasets/iamsouravbanerjee/cause-of-deaths-around-the-world).

The main aim is to perform statistical and visual analysis relevant to answer follwing questions using R programming language.

1)How have the top causes of death changed globally from 1990 to 2019?

2)How have Disease Burdens Changed Globally from 1990 to 2019?

3)Which Countries Have Seen the Largest Increase or Decrease in Mortality from 1990 to 2019?

4)Are Deaths from Communicable Diseases Increasing or Decreasing Compared to Non-Communicable Ones?

1) How Have the Top Causes of Death Changed Globally from 1990 to 2019?

To understand which diseases have caused the most deaths worldwide over time, I first checked the dataset for any missing values. Then, I reshaped the data from wide to long format to enable easier statistical analysis. After that, I calculated the total global deaths by cause for each year and visualized the top causes of death from 1990 to 2019 using a line plot with ggplot2.

#Checking NA values
sum(is.na(deaths))
## [1] 0
#Checking negative values
any(deaths<0)
## [1] FALSE
#Checking Number of Unique Years and Countries
length(unique(deaths$'Country/Territory'))
## [1] 204
length(unique(deaths$Year))
## [1] 30
#Reshaping Data from Wide Form to Long form for grouping according to cause+year
long_deaths<-deaths%>%
  pivot_longer(
    cols=-(1:3),
    names_to= "Cause",
    values_to= "Deaths"
  )

#View Data
head(long_deaths,5)
#Finding Top 5 Causes of Death
top_5_cause<-long_deaths %>%
  group_by(Cause) %>%
  summarise(total_deaths = sum(Deaths, na.rm = TRUE)) %>%
  arrange(desc(total_deaths)) %>%
  slice(1:5)

#View
top_5_cause
#Calculate total deaths per yaer from 1990 to 2019 for top 5 causes
top_causes<-long_deaths%>%
  filter(Cause %in% top_5_cause$Cause) %>%
  group_by(Year, Cause) %>%
  summarise(Total_Deaths=sum(Deaths, na.rm = TRUE))
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
#Plot
  ggplot(top_causes, aes(x=Year, y= Total_Deaths, color=Cause))+
    geom_line(linewidth=1.2) +
    labs(title= "Top 5 Causes of Death from 1990 to 2019",
         x="Year",
         y="Total Deaths",
         color="Cause") +
  theme_minimal()

Conclusion

Cardiovascular diseases consistently remained the top global cause of death, followed by other non-communicable diseases, showing increasing dominance over time.

2) How have Disease Burdens Changed Globally from 1990 to 2019?

To identify how the global impact of specific diseases has changed over time, I compared total global deaths by cause in the years 1990 and 2019. I first filtered the data set for these two years separately and calculated the total number of deaths for each cause. Then, I merged the two data sets and calculated the change in deaths for each disease between 1990 and 2019. I have visualized the results using a bar chart.

#Filter Data for 1990
deaths_1990 <- long_deaths %>%
  filter(Year == 1990) %>%
  group_by(Cause) %>%
  summarise(Deaths_1990 = sum(Deaths, na.rm = TRUE))

#Filter Data for 2019
deaths_2019 <- long_deaths %>%
  filter(Year == 2019) %>%
  group_by(Cause) %>%
  summarise(Deaths_2019 = sum(Deaths, na.rm = TRUE))

#Join the Datasets
death_change <- left_join(deaths_1990, deaths_2019, by = "Cause")

#Calculate the Change
death_change <- death_change %>%
  mutate(Change = Deaths_2019 - Deaths_1990)

# Largest decrease in deaths
head(death_change %>% arrange(Change), 5)
# Largest increase in deaths
head(death_change %>% arrange(desc(Change)), 5)
#Plot
ggplot(death_change, aes(x = Cause, y = Change, fill = Change > 0)) +
  geom_col() +
  coord_flip() +
  labs(title = "Change in Global Deaths by Cause (2019 vs 1990)",
       x = "Cause of Death",
       y = "Change in Deaths") +
   theme_minimal()

Conclusion

While some causes like Malaria, HIV/AIDS saw a major decrease in deaths, others like cardiovascular diseases, neoplasms and diabetes showed significant increases.

3) Which Countries Have Seen the Largest Increase or Decrease in Mortality from 1990 to 2019?

To examine how mortality has shifted across countries over time, I calculated the total number of deaths for each country in both 1990 and 2019. This analysis focuses on the five countries with the largest increases and the five with the largest decreases in deaths over this 30-year period. The results are visualized using bar chart.

Note: This analysis measures absolute change in deaths and may still be influenced by differences in population size and other factors. However, my focus was on observing trend of mortality over time.

# Filter Data for 1990
country_1990 <- long_deaths %>%
  filter(Year == 1990) %>%
  group_by(`Country/Territory`) %>%
  summarise(Deaths_1990 = sum(Deaths, na.rm = TRUE))

# Filter Data for 2019
country_2019 <- long_deaths %>%
  filter(Year == 2019) %>%
  group_by(`Country/Territory`) %>%
  summarise(Deaths_2019 = sum(Deaths, na.rm = TRUE))

# Join datasets
Country_change <- left_join(country_1990, country_2019, by = "Country/Territory")

# Calculate change in deaths within each country itself
Country_change <- Country_change %>%
  mutate(Change = Deaths_2019 - Deaths_1990)

# Top 5 increases 
top_increase <- Country_change %>% arrange(desc(Change)) %>% slice(1:5)

# Top 5 decreases
top_decrease <- Country_change %>% arrange(Change) %>% slice(1:5)

# Combine for Plot
top_changes <- bind_rows(top_increase, top_decrease)

# Plot
ggplot(top_changes, aes(x = reorder(`Country/Territory`, Change), y = Change, fill = Change > 0)) +
  geom_col() +
  coord_flip() +
  labs(title = "Top 5 Countries with Increase and Decrease in Deaths (2019 vs 1990)",
       x = "Country",
       y = "Change in Deaths") +
  theme_minimal()

Conclusion

Countries like India and China experienced the largest increases in total deaths, while others showed significant reductions over the 30-year period.

Note - The observed decline or increase in deaths for countries like Bangladesh, Uganda, and Rwanda may not reflect a true change in overall mortality. Instead, it could be due to limitations in the data set, which may not capture all possible causes of death uniformly across countries and years.

4) Are Deaths from Communicable Diseases Increasing or Decreasing Compared to Non-Communicable Ones?

To explore the global shifts in disease burdens, I categorized causes of death into three groups: communicable diseases, non-communicable diseases, and injuries. By comparing the total deaths in each category over time (from 1990 to 2019), we can identify trends in the global burden of communicable diseases versus non-communicable diseases and injuries. The analysis is visualized with a line plot.

# Add a new column 
long_deaths$Category
## Warning: Unknown or uninitialised column: `Category`.
## NULL
#classify causes into 3 categories
 long_deaths$Category[long_deaths$Cause %in% c(
  "HIV/AIDS", "Tuberculosis", "Malaria", "Meningitis", 
  "Lower Respiratory Infections", "Neonatal Disorders", 
  "Maternal Disorders", "Nutritional Deficiencies", 
  "Protein-Energy Malnutrition", "Acute Hepatitis"
)] <- "Communicable"
## Warning: Unknown or uninitialised column: `Category`.
 long_deaths$Category[long_deaths$Cause %in% c(
  "Cardiovascular Diseases", "Neoplasms", "Diabetes Mellitus", 
  "Alzheimer's Disease and Other Dementias", "Parkinson's Disease", 
  "Chronic Respiratory Diseases", "Alcohol Use Disorders", 
  "Drug Use Disorders", "Chronic Kidney Disease", 
  "Cirrhosis and Other Chronic Liver Diseases", "Digestive Diseases"
)] <- "Non-Communicable"

long_deaths$Category[long_deaths$Cause %in% c(
  "Road Injuries", "Self-harm", "Falls", "Drowning", 
  "Fire, Heat, and Hot Substances", "Poisonings", 
  "Environmental Heat and Cold Exposure", "Interpersonal Violence"
)] <- "Injuries"

# Calculate total deaths per year for each category
deaths_summary <- long_deaths %>%
  filter(!is.na(Category)) %>%
  group_by(Year, Category) %>%
  summarise(Total_Deaths = sum(Deaths, na.rm = TRUE))
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
#Plot
ggplot(deaths_summary, aes(x = Year, y = Total_Deaths, color = Category)) +
  geom_line(linewidth = 1.2) +
  labs(title = "Deaths by Disease Category (1990–2019)",
       x = "Year",
       y = "Total Deaths",
       color = "Category") +
  theme_minimal()

Conclusion

Communicable disease deaths have declined, while non-communicable diseases have risen steadily, reflecting a global shift in health burdens.

Project Summary

This project aimed to explore how the global burden of disease has evolved from 1990 to 2019 by analyzing trends in causes of death across time, geography, and disease categories. The findings reflect a clear epidemiological transition — with a notable decline in communicable diseases and a rise in non-communicable, chronic conditions — indicating shifting healthcare priorities worldwide.

Note - This analysis is based solely on the causes of death included in the datas et and may not reflect the full spectrum of mortality causes.