Hello, this is my first data visualisation. I am super excited but also…
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(WDI)
library(ggplot2)
wdi_AP=WDI(indicator=c('EN.ATM.PM25.MC.M3')) %>%
rename(PM25=EN.ATM.PM25.MC.M3) %>%
filter(year==2017)
wdi_GDPpc=WDI(indicator=c('NY.GDP.PCAP.CD')) %>%
rename(GDPpc=NY.GDP.PCAP.CD) %>%
filter(year==2017)
inc_lvl=data.frame(WDI_data$country) %>%
select(country, income)
combined_data=left_join(wdi_AP,wdi_GDPpc)
## Joining, by = c("iso2c", "country", "year")
combined_data=left_join(combined_data,inc_lvl) %>%
filter(income==c('High income', 'Low income', 'Lower middle income', 'Upper middle income'))
## Joining, by = "country"
ggplot(combined_data, aes(x=GDPpc, y=PM25))+
geom_point(aes(colour=income)) +
geom_smooth(method = 'lm', se=FALSE, colour='orange')+
theme_minimal()+
labs(title='Exposure to air pollution vs. GDP per capita, 2017',
caption = 'source: World Bank:World Development Indicators',
x='GDP per capita [current US$]',
y='PM 2.5 pollution [mean annual exposure]')+
scale_color_grey()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 7 rows containing non-finite values (stat_smooth).
## Warning: Removed 7 rows containing missing values (geom_point).
The data suggest a negative correlation between GDP per capita and exposure to air pollution. Hence the wealthier the country, the less likely are its citizens to suffer from high levels of PM 2.5.
However, as every good data magician knows, correlation does not imply causation.
Major sources of particulate matter(PM) include agriculture, industrial processes, and combustion fossil fuels. Low income countries are more likely to experience those activities when compared to their richer counterparts.
¬THE END.¬