Introduction

Air quality is an incredibly pressing environmental issue that has become increasingly important as countries industrialize and develop technological practices that emit more air pollution. Poor air quality can cause a number of health problems and even lead to death. Outdoor particulate matter is the leading cause of death due to air pollution. The main particulate matter air pollutant is PM 2.5, which is a small particle of matter with diameter smaller than 2.5 micrometers.

Furthermore, there are large disparities between air quality in different countries, depending on how wealthy and developed the country is. In my project, I hope to address the problem of particulate matter air pollution and its effect on global health across countries with varying levels of wealth.

Materials and Methods

Load the necessary packages. For this project, I will be using tidyverse (dplyr and ggplot2), and raster, maps, sf, sp, spData to graph maps and spatial data.

library(knitr)
library(tidyverse)
library(raster)
library(maps)
library(sf)
library(sp)
library(spData)
library(readr)
knitr::opts_chunk$set(cache=TRUE)

First, I want to produce a global map showing average air pollutant (PM 2.5) concentrations in different countries. Steps:

data(world)
world=filter(world, continent != "Antarctica")
pm25data <- read_csv("~/Downloads/PM25-air-pollution.csv")
annual_pm25 <- filter(pm25data, Year==2016)
annual_pm25[51, 1] = "Democratic Republic of the Congo"
annual_pm25[182, 1] = "Russian Federation"
annual_pm25[46, 1] = "Côte d'Ivoire"
annual_pm25[44, 1] = "Republic of the Congo"
annual_pm25[161, 1] = "Dem. Rep. Korea"
annual_pm25[202, 1] = "Republic of Korea"
annual_pm25[115, 1] = "Lao PDR"
colnames(annual_pm25)[1] <- "name_long"
world_pm25 <- merge(world, annual_pm25, by="name_long")
colnames(world_pm25)[13] <- "PM2.5"
st_as_sf(world_pm25, sf_column_name = "geom")
library(viridis) # load nice color themes
p1 = ggplot(world)+
  geom_sf()+
  geom_sf(world_pm25, mapping = aes(fill=PM2.5))+
  scale_fill_gradientn(colors = rev(inferno(1000)), name="PM 2.5 Concentration (µg/m3)")+
  theme_bw()+
  theme(legend.position = 'bottom')+
  labs(title = "Global Map of PM 2.5 Concentrations in 2016", caption="Source: World Bank - WDI")
p1

I want to find the top ten countries with the highest amount of PM 2.5 air pollution and display this in a bar graph over the past couple of years.

colnames(pm25data)[4] <- "PM2.5"
top_pm25 = pm25data %>% na.omit() %>% group_by(Year) %>% 
  arrange(desc(PM2.5)) %>% slice(1:10)

p2 = ggplot(data = top_pm25, aes(x = Entity, y = PM2.5))+
  geom_bar(stat="identity") +
  facet_wrap(~Year, nrow = 4, scales="free_y") +
  coord_flip() + theme_bw() + theme(axis.text = element_text(size = 4.5)) +
  labs(title = "Top 10 Countries With Highest PM 2.5 Concentrations", x = "Country (Top 10)", y = "PM 2.5 Concentration (µg/m3)", caption = "Source: World Bank - WDI")
p2

Next, I want to produce a global map displaying death rates due to particulate matter pollution of every country for the same year (2016). I followed the same method and steps from creating the previous map, using data from the Global Burden of Disease Study (2017) by the Institute for Health Metrics and Evaluation at http://ghdx.healthdata.org/gbd-results-tool.

mortality <- read_csv("~/Downloads/outdoor-pollution-death-rate.csv")
mortality_2016 <- filter(mortality, Year==2016)
mortality_2016[56, 1] = "Democratic Republic of the Congo"
mortality_2016[168, 1] = "Russian Federation"
mortality_2016[51, 1] = "Côte d'Ivoire"
mortality_2016[49, 1] = "Republic of the Congo"
mortality_2016[150, 1] = "Dem. Rep. Korea"
mortality_2016[187, 1] = "Republic of Korea"
mortality_2016[111, 1] = "Lao PDR"
colnames(mortality_2016)[1] <- "name_long"
mortality_map <- merge(world, mortality_2016, by="name_long")
colnames(mortality_map)[13] <- "death"
st_as_sf(mortality_map, sf_column_name = "geom")
p3 = ggplot(world)+
  geom_sf()+
  geom_sf(mortality_map, mapping = aes(fill = death))+
  scale_fill_gradientn(colors = rev(inferno(10)), name="Deaths per 100,000 People")+
  theme_bw()+
  theme(legend.position = 'bottom')+
  labs(title = "Global Map of 2016 Death Rates From Air Pollution", caption="Source: GBD - IHME")
p3

From looking at the map, there seems to be an association between PM 2.5 concentration and death rate from particulate matter air pollution. I want to test this hypothesis by creating a scatterplot and linear regression of the two variables.

combined_data <- merge(filter(pm25data, Year==2016), filter(mortality, Year==2016), by="Entity")
colnames(combined_data)[c(4, 7)] <- c("PM2.5", "death")
popsize <- read_csv("~/Downloads/API_SP.POP.TOTL_DS2_en_csv_v2_821007/API_SP.POP.TOTL_DS2_en_csv_v2_821007.csv")
colnames(popsize)[1] <- "Entity"
final_data <- merge(combined_data, popsize, by = "Entity")
final_data <- final_data[-c(144, 148, 116, 171),]

p4 = ggplot(final_data, aes(x=PM2.5, y=death))+
  geom_point(aes(size=X5, color="red"), alpha=0.6)+
  geom_abline(color="red")+
  geom_text(data=subset(final_data, X5>100000000), aes(x=PM2.5, y=death, label=Entity), size=3)+
  labs(size="Population Size")+
  guides(color=FALSE, alpha=FALSE)+
  labs(title="Death Rate From Particular Matter Air Pollution vs. PM2.5 Concentration", subtitle="2016", caption="Sources: GBD & World Bank")+
  xlab("average PM 2.5 exposure")+ylab("# deaths per 100,000 people")
p4

I also want to test whether death rates due to air pollution in a country are correlated with the wealth (using GDP per capita) of a country. To do this, I will create a scatterplot and linear regression line to test for correlation between the two variables. * From World Bank data, load GDP per capita data (https://data.worldbank.org/indicator/NY.GDP.PCAP.KD) and merge it to death rate data in one dataframe.

gdp <- read_csv("~/Downloads/API_NY.GDP.PCAP.KD_DS2_en_csv_v2_820921/GDPpercapita.csv")
colnames(gdp)[1] <- "Entity"
gdp_death <- merge(filter(mortality, Year==2016), gdp, by="Entity")
gdp_death <- merge(gdp_death, popsize, by="Entity")
gdp_death <- gdp_death[-c(171),]
gdp_death <- na.omit(gdp_death)
colnames(gdp_death)[c(4, 8, 12)] <- c("death", "GDP", "population")

p5 = ggplot(data=subset(gdp_death), aes(x=GDP, y=death))+
  geom_point(aes(size=population, color="red"), alpha=0.6)+
  geom_smooth(method="loess", se=FALSE, color="red")+
  geom_text(data=subset(gdp_death, population>200000000), aes(x=GDP, y=death, label=Entity), size=3)+
  labs(size="Population Size")+
  guides(color=FALSE, alpha=FALSE)+
  labs(title="Death Rate From Particular Matter Air Pollution vs. GDP per Capita", subtitle="2016", caption="Source: World Bank")+
  xlab("GDP per capita")+ylab("# deaths per 100,000 people")
p5

Additionally, because air pollution varies between countries at different stages of economic development, I want to compare death rates due to particulate matter air pollution in countries with varying socio-demographic indices (SDI) and income levels, as well as target specific countries of interest. I will graph death rates (number of deaths per 100,000 people) from 1990 to 2017 in line graphs.

all_data <- read_csv("~/Downloads/IHME-GBD_2017_DATA-5600b99d-1/IHME-GBD_2017_DATA-5600b99d-1.csv")

p6 = ggplot()+
  geom_point(data=subset(all_data, location=="High SDI"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="High SDI"), aes(x=year, y=val, color=location))+
  geom_point(data=subset(all_data, location=="Middle SDI"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="Middle SDI"), aes(x=year, y=val, color=location))+
  geom_point(data=subset(all_data, location=="Low SDI"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="Low SDI"), aes(x=year, y=val, color=location))+
  ylab("# deaths per 100,000 people due to air pollution")+
  labs(title="Death Rate From Particulate Matter Air Pollution Over Time", subtitle="(Varying Socio-Demographic Indices)", caption="Source: GBD - IHME")
p6

p7 = ggplot()+
  geom_point(data=subset(all_data, location=="World Bank High Income"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="World Bank High Income"), aes(x=year, y=val, color=location))+
  geom_point(data=subset(all_data, location=="World Bank Upper Middle Income"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="World Bank Upper Middle Income"), aes(x=year, y=val, color=location))+
  geom_point(data=subset(all_data, location=="World Bank Lower Middle Income"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="World Bank Lower Middle Income"), aes(x=year, y=val, color=location))+
  geom_point(data=subset(all_data, location=="World Bank Low Income"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="World Bank Low Income"), aes(x=year, y=val, color=location))+
  ylab("# deaths per 100,000 people due to air pollution")+
  labs(title="Death Rate From Particulate Matter Air Pollution Over Time", subtitle="(Varying World Bank Income Levels)", caption="Source: GBD - IHME")
p7

p8 = ggplot()+
  geom_point(data=subset(all_data, location=="China"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="China"), aes(x=year, y=val, color=location))+
  geom_point(data=subset(all_data, location=="India"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="India"), aes(x=year, y=val, color=location))+
  geom_point(data=subset(all_data, location=="Sub-Saharan Africa"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="Sub-Saharan Africa"), aes(x=year, y=val, color=location))+
  geom_point(data=subset(all_data, location=="United States"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="United States"), aes(x=year, y=val, color=location))+
  geom_point(data=subset(all_data, location=="Western Europe"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="Western Europe"), aes(x=year, y=val, color=location))+
  geom_point(data=subset(all_data, location=="Global"), aes(x=year, y=val, color=location))+
  geom_line(data=subset(all_data, location=="Global"), aes(x=year, y=val, color=location))+
  ylab("# deaths per 100,000 people due to air pollution")+
  labs(title="Death Rate From Particulate Matter Air Pollution Over Time", caption="Source: GBD - IHME")
p8

Results

In this section I will explain the results as well as the analysis and any insights to the issue of question, particulate matter air pollution. I will address the different graphs, trends, and variables that play a role.

Graph 1: Map showing average PM 2.5 concentrations for every country in the world in 2016

This map visually shows the countries and geographic areas with the highest and lowest PM 2.5 concentrations in 2016. From the graph, it is clear that most of the darker colored countries which have the highest PM 2.5 pollution are in Africa, the Middle East, and Asia. On the other side of the spectrum, Western Europe and the Americas have a lot less particulate matter air pollution. These observations are expected as Europe and America have mostly developed countries while many countries in Africa and Asia are still developing and thus emit more air pollution.

Graph 2: Multiplot bar graphs showing the top 10 countries with the highest concentrations of PM 2.5 from 1990, 1995, 2000, 2005, 2010 until 2016.

This bar graph is effective in showing which specific countries actually produce the highest amounts particulate matter air pollution over a long term range of 26 years. Over the years, the top countries have seen not much change, with Saudi Arabia, United Arab Emirates, Qatar, Niger, Mauritania, Egypt and Cameroon consistently in the top polluters. Between 1990 and 2000, Pakistan and Nigeria were also in the top 10, with Gambia and Tajikistan. Since 2000, countries including Kuwait, India, Iraq, Nepal, Bangladesh, and Burkina Faso have also appeared once or more times on the top 10 list. All of the countries on the list are in areas of industrialization and development, with most being in the Middle East and Africa. From 1990 to 2016, there is also an increase in overall PM 2.5 concentration as the bars become longer across all the top countries. Specifically, in 1990, the maximum PM 2.5 concentration values are around or below 100 µg/m3, and by 2016, the maximum values are over 100 and close to 200 µg/m3.

Graph 3: Map showing death rates due to particulate matter air pollution for every country in the world in 2016

This map show countries and areas with the highest and lowest death rates caused by particulate matter air pollution in 2016, with death rate being measured in the number of deaths per 100,000 people. The darkest colored countries represent the highest death rates, and are generally located in Asia, the Middle East, and North Africa. There are extremely high death rates in Egypt and India, which are very affected by air pollution. In general, the countries with higher death rates correspond with the countries with greater PM 2.5 concentrations in 2016. Additionally, the same areas that had low air pollution, Western and Northern Europe, The U.S., Canada, and Australia, also have very low death rates due to risk from air pollution.

Graph 4: Scatterplot comparing death rates from air pollution and PM 2.5 concentrations in countries of all population sizes in 2016.

In this scatterplot, there is a strong positive correlation between death rates due to air pollution and actual particulate matter concentrations. The linear regression line has a positive slope, meaning as average PM 2.5 exposure increases, the number of deaths does as well. Generally, countries with lower PM 2.5 exposures have lower death rates and vice versa. In the plot, you can also identify countries with the largest population sizes as key points. The United States, along with the majority of countries, has both low average PM 2.5 exposure and low number of deaths due to air pollution. Large countries with a lot of pollution like India have extremely high death rates, which is concerning.

Graph 5: Scatterplot comparing death rates from air pollution and PM 2.5 concentrations in countries of all population sizes in 2016.

Unlike the previous scatterplot, there is no positive linear correlation between GDP per capita and number of deaths due to air pollution. However, by following the smooth trendline, it does show that countries with low GDP/capita start out with medium PM 2.5 concentrations, which increase to a peak as GDP reaches the level of large industrializing countries like China and India, then gradually decreases as GDP/capita increases. There are many countries with both low GDP per capita and low death rate, due to having small economies that are not dependent on pollution-generating industries. Then, there are countries like China and India that are still developing and currently industrializing, having a lot of factories and air pollution. Finally, the United States and developed countries with high GDP per capita have significantly lower death rates.

Graph 6: Line graph showing changes in death rate from particulate matter air pollution from 1990 to 2016 in countries with high, middle, and low socio-demographic indices.

This line graph shows the differences in death rate trends over time for areas of differing SDIs. The SDI or socio-demographic index identifies the development of a country and is a composite average of rankings of incomes per capita, average educational attainment, and fertility rates. In this graph, it is clear that high SDI (developed) countries have low death rates, with a gradual decrease across the years. Underdeveloped countries had very high death rates initially, with a large, consistent decline over the years, as industries and practices have become cleaner. However, countries that are currently developing with middle SDI have seen slight decline in death rate until the 2000s, when death rates due to air pollution have stayed the same. This is probably due to the need to keep industrializing and relying on economies that produce a lot of air pollution.

Graph 7: Line graph showing changes in death rate from particulate matter air pollution from 1990 to 2016 in countries with high, upper middle, lower middle, and low income levels.

This line graph shows the differences in death rate trends over time for countries based on their income level. High income countries have lower death rates that are gradually decreasing. Lower and upper middle income countries started with moderate death rates comparatively which have declined slightly. Low income countries started with extremely high death rates, which have declined tremendously and by 2016 are below or at the same level as the middle income countries. This shows that death rates are highest now in middle income countries that are probably still industrializing.

Graph 8: Line graph showing changes in death rate from particulate matter air pollution from 1990 to 2016 in select countries and the world.

This line graph shows the differences in death rate trends over time for select countries and areas of interest, specifically China, India, Sub-Saharan Africa, the United States, Western Europe, and the global average. As expected, Western Europe and the United States have comparatively low death rates with a gradual decrease. Sub-Saharan Africa started with high death rates which then declined rapidly over time and is currently pretty low. China and India also started with high death rates due to air pollution, which declined until the mid 2000s, when they became more constant and even increased slightly, due to increased industrialization and expansion of their economies. Their death rates are now above the global values, while the other 3 areas are below the global death rate.

Conclusions

From the results, I can conclude that PM 2.5 concentrations are associated with death rates due to air pollution, and do correspond to a country’s development stage and wealth.

I learned that the top countries that have the highest PM 2.5 concentrations over the past 3 decades are Saudi Arabia, United Arab Emirates, Qatar, Niger, Mauritania, Egypt and Cameroon, and has not changed much.

Additionally, I learned that a country’s wealth in GDP per capita is not directly linearly correlated with death rates due to air pollution, but follows a different logical trend pattern. Death rates and GDP are initially low, but increase after industrialization and development and then decrease as the economy stabilizes and GDP becomes high. This provides insight that death rates due to air pollution are dependent on the level of development of the country, whether it is underdeveloped, developing, or developed.

More evidence of this is seen in the line graphs comparing death rates due to air pollution of countries with varying socio-demographic indices and of varying wealth levels. In both these graphs, lower SDI and wealth corresponds to lower death rates, while higher SDI and wealth corresponds to higher death rates throughout the years. In both these areas, death rates declined through the years, showing improvement in the air quality and global health as time passes. However, for middle income and middle SDI countries, death rates have leveled off and stopped declining in the mid 2000s until now, which is a concerning issue as there are no improvements being made on the air quality and health of citizens in these developing countries. From these results, it can be shown that while air quality and risk of death is improving in most countries, it is still making a large adverse impact in large developing countries like India and China.

On the broader side, there are implications that air pollution is a very significant issue for developing countries that depend on industries that release a lot of pollution into the air. It also implies that generally, developed countries do not face the extreme negative health effects of air pollution, despite depending on the industries and products created in developing countries. Finally, it can be concluded that air pollution is a very important issue because of how it can be a health risk and cause deaths for vulnerable people in developing countries.

References

Literature

Hannah Ritchie and Max Roser. (2020). “Air Pollution”. Retrieved from https://ourworldindata.org/air-pollution

Health Effects Institute. (2019). State of Global Air Special Report. Retrieved from https://www.stateofglobalair.org/sites/default/files/soga_2019_report.pdf

Data Sources

Brauer, M. et al. for the Global Burden of Disease Study. (2017). PM2.5 air pollution, mean annual exposure (micrograms per cubic meter) [Data]. Available from https://data.worldbank.org/indicator/EN.ATM.PM25.MC.M3

Institute for Health Metrics and Evaluation. (2017). Global Burden Disease Study [Data] Available from http://ghdx.healthdata.org/gbd-results-tool

World Bank. (2019). World Development Indicators [Data]. Available from https://datacatalog.worldbank.org/dataset/world-development-indicators

World Bank and OECD National Accounts. (2018). GDP per capita [Data]. Available from https://data.worldbank.org/indicator/NY.GDP.PCAP.KD