The 2005 Environmental Sustainability Index (ESI) benchmarks the world’s nations ability to protect the environment over the next few decades. 146 Countries were given an ESI, made up of 21 indicators in several different environmental areas. These 21 indicators were tallied on a z-score scale and subsequently grouped into 5 Components. Environmental Systems, Reducing Stresses, Reducing Human Vulnerability and Social and Institutional Capacity. Together these make up the ESI target variable.
The idea was to look at which countries received higher ESI scores, and compare/contrast this with the some of the 21 indicators available in this dataset. Are their certain indicators that have a stronger linear relationship with ESI? Do countries with higher ESI scores come from a particular region? Do more developed countries have higher ESI? Or are there other factors contributing to this metric. My first instinct is to think countries with less Greenhouse Gas emissions and strict environmental governance would have higher ESI. This doesn’t necessarily mean more or less developed. But let’s see what the data tells us!
library(readr)
esi <- read_csv("https://raw.githubusercontent.com/justinm0rgan/bridge-workshop/main/R/finalproject/esi.csv?token=GHSAT0AAAAAABPMFD5CVLEKEGIPS4J36IPIYPOB2FA",col_select = 2:30, show_col_types = FALSE)
New names:
* `` -> ...1
# preview dataset
head(esi)
# A tibble: 6 × 29
code country esi system stress vulner cap global sys_air sys_bio sys_lan
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ALB Albania 58.8 52.4 65.4 72.3 46.2 57.9 0.45 0.17 -0.31
2 DZA Algeria 46 43.1 66.3 57.5 31.8 21.1 -0.02 -0.08 1.34
3 AGO Angola 42.9 67.9 59.1 11.8 22.1 39.1 -0.77 0.77 0.77
4 ARG Argenti… 62.7 67.6 54.9 69.9 65.4 58.5 0.4 0.1 0.66
5 ARM Armenia 53.2 54.4 62.2 50.8 34.9 60.3 1.21 -0.02 -0.22
6 AUS Austral… 61 78.1 40.5 75.2 76.9 30.2 0.7 0.16 1.41
# … with 18 more variables: sys_wql <dbl>, sys_wqn <dbl>, str_air <dbl>,
# str_eco <dbl>, str_pop <dbl>, str_was <dbl>, str_wat <dbl>, str_nrm <dbl>,
# vul_hea <dbl>, vul_sus <dbl>, vul_dis <dbl>, cap_gov <dbl>, cap_eff <dbl>,
# cap_pri <dbl>, cap_st <dbl>, glo_col <dbl>, glo_ghg <dbl>, glo_tbp <dbl>
Get summary statistics of target variable esi
.
summary(esi$esi)
Min. 1st Qu. Median Mean 3rd Qu. Max.
29.20 44.62 49.55 49.88 54.15 75.10
esi
library(ggplot2)
ggplot(esi, aes(x=esi)) +
geom_histogram(color="black", fill="lightblue", binwidth = 1) +
geom_vline(data=esi, aes(xintercept=mean(esi)), linetype='dashed') +
ggtitle("Environmental Sustainability Index (ESI)",) + theme(plot.title = element_text(hjust = 0.5)) +
xlab("ESI") + ylab("Count")
We can see the bulk of distribution of esi
is between 40 and 60 with the mean ~49.
Let’s visualize the Quantile’s!
ggplot(esi, aes(x=esi)) +
geom_boxplot(outlier.colour = 'black',
outlier.shape = 8,
outlier.size = 4,
notch = T,
fill="lightblue") +
ggtitle("Environmental Sustainability Index (ESI)",) +
theme(plot.title = element_text(hjust = 0.5),
axis.text.y.left = element_blank(),
axis.ticks.y.left = element_blank()) +
xlab("ESI")
Here we can see the quantile’s of esi
with mean ~49 and some outliers below 30 and above 70.
Let’s create a categorical column labeling each records quartile in relation to esi
.
esi$esiQuart <- ifelse((esi$esi >= 29.20) & (esi$esi < 44.62), "First",
ifelse((esi$esi >= 44.62) & (esi$esi < 49.88), "Second",
ifelse((esi$esi >=49.88) & (esi$esi < 54.15), "Third",
ifelse(esi$esi>=54.15, "Fourth", NA))))
# convert to factor
esi$esiQuart <- as.factor(esi$esiQuart)
esiQuart
ggplot(esi, aes(x=esiQuart, y=esi, fill=esiQuart)) +
geom_boxplot() +
scale_color_brewer(palette = 'Dark2') +
scale_fill_discrete(name="Quartile") +
ggtitle("Environmental Sustainability Index (ESI) by Quartile") +
theme(plot.title = element_text(hjust = 0.5)) +
ylab("ESI") + xlab("")
Here we can see the division of quartiles for the esi
metric. With the Fourth (highest esi
group) Interquartile range from ~58 to ~63 looks there are about 4 or 5 outliers. This begs the question, who are these outliers?
head(esi[order(-esi$esi),2:3],5)
# A tibble: 5 × 2
country esi
<chr> <dbl>
1 Finland 75.1
2 Norway 73.4
3 Uruguay 71.8
4 Sweden 71.7
5 Iceland 70.8
Looks like top 5 are from Northern European mostly Scandinavian countries.
Let’s look at a scatter matrix segmented by quartile to visualize some of the variables linear relationships with esi
.
We will look at the following specific indicators:
Variable | Description |
---|---|
glo_ghg |
Global Greenhouse gases (GHG’s) |
vul_hea |
Environmental health |
cap_pri |
Private sector responsiveness |
cap_gov |
Environmental governance |
sys_wql |
Water quality |
esi |
Environmental Sustainability Index (ESI) (target variable) |
pairs(~glo_ghg + # greenhouse gas emissions
vul_hea + # environmental health
cap_pri + # private sector responsiveness
cap_gov + # environmental governance
sys_wql + # water quality
esi, # target variable
col = factor(esi$esiQuart), pch = 19, data = esi)
From looking at these 5 variables, it seems sys_wql
or water quality has the most linear relationship to esi
, meaning as esi
goes up, so does water quality and vice versa. A feature that I expected to have a stronger linear relationship with esi
was glo_ghg
or GHG’s, however this is a bit all over the place.
Ok, let’s isolate water sys_wql
and glo_ghg
and take a closer look.
Let’s look at esi
quartile’s in relation to average greenhouse gas emissions glo_ghg
.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
esiAvgGhg <- esi %>%
group_by(esiQuart) %>%
summarise(AvgGhg=mean(glo_ghg)) %>%
arrange(AvgGhg)
esiAvgGhg
# A tibble: 4 × 2
esiQuart AvgGhg
<fct> <dbl>
1 First -0.187
2 Second -0.0316
3 Fourth 0.0659
4 Third 0.16
ggplot(esiAvgGhg, aes(y=esiQuart, x=AvgGhg, fill=esiQuart)) +
geom_bar(stat="identity", show.legend = F) +
labs(y="",x="Greenhouse Gas Emissions (mean z-score)", fill="",
subtitle="Mean Greenhouse Gas Emissions by ESI Quartile")
Contrary to what we may expect, those in the first esi
quartile have a lower (less GHG’s) glo_ghg
z-score, while those in the third and fourth have higher greenhouse gas emissions. However, the third quartile has the largest score, more then double the fourth.
Let’s look at esi
quartile’s in relation to average water quality sys_wql
esiAvgWql <- esi %>%
group_by(esiQuart) %>%
summarise(AvgWql=mean(sys_wql)) %>%
arrange(AvgWql)
esiAvgWql
# A tibble: 4 × 2
esiQuart AvgWql
<fct> <dbl>
1 Second -0.399
2 First -0.375
3 Third 0.137
4 Fourth 0.665
ggplot(esiAvgWql, aes(y=esiQuart, x=AvgWql, fill=esiQuart)) +
geom_bar(stat="identity", show.legend = F) +
labs(y="",x="Water Quality (mean z-score)", fill="",
subtitle="Mean Water Quality by ESI Quartile")
Water quality seems to correlate more strongly with esi
. Countries in the third and fourth esi
quartile have much higher water quality then those in the first and second. In fact if we plot x as glo_ghg
and y as sys_wql
, we can see those with higher water quality, tend to have around average (or zero z-score) GHG emissions. Whereas those in the first and second quatiles have more of a spread of of GHG’s and lower water quality.
ggplot(data=esi, aes(x=glo_ghg, y=sys_wql, color=esiQuart)) +
geom_point(alpha=0.7, size=2) +
labs(y="Water Quality", x="Greenhouse Gases (GHG's)",
subtitle="ESI vs. GHG's", color="ESI Quartile") +
scale_color_brewer(palette = 'Dark2')
Let’s bring in a table with region and join by country code, to see if there are any relationship’s of GHG and water quality to region and esi
quartile.
# import region data
region_df <- read_csv("https://raw.githubusercontent.com/justinm0rgan/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv", show_col_type=F)
head(region_df)
# A tibble: 6 × 11
name `alpha-2` `alpha-3` `country-code` `iso_3166-2` region `sub-region`
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Afghanis… AF AFG 004 ISO 3166-2:… Asia Southern Asia
2 Åland Is… AX ALA 248 ISO 3166-2:… Europe Northern Eur…
3 Albania AL ALB 008 ISO 3166-2:… Europe Southern Eur…
4 Algeria DZ DZA 012 ISO 3166-2:… Africa Northern Afr…
5 American… AS ASM 016 ISO 3166-2:… Ocean… Polynesia
6 Andorra AD AND 020 ISO 3166-2:… Europe Southern Eur…
# … with 4 more variables: intermediate-region <chr>, region-code <chr>,
# sub-region-code <chr>, intermediate-region-code <chr>
# join with esi
esi_region <- left_join(esi, region_df, by = c("code" = "alpha-3"))
head(esi_region)
# A tibble: 6 × 40
code country esi system stress vulner cap global sys_air sys_bio sys_lan
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ALB Albania 58.8 52.4 65.4 72.3 46.2 57.9 0.45 0.17 -0.31
2 DZA Algeria 46 43.1 66.3 57.5 31.8 21.1 -0.02 -0.08 1.34
3 AGO Angola 42.9 67.9 59.1 11.8 22.1 39.1 -0.77 0.77 0.77
4 ARG Argenti… 62.7 67.6 54.9 69.9 65.4 58.5 0.4 0.1 0.66
5 ARM Armenia 53.2 54.4 62.2 50.8 34.9 60.3 1.21 -0.02 -0.22
6 AUS Austral… 61 78.1 40.5 75.2 76.9 30.2 0.7 0.16 1.41
# … with 29 more variables: sys_wql <dbl>, sys_wqn <dbl>, str_air <dbl>,
# str_eco <dbl>, str_pop <dbl>, str_was <dbl>, str_wat <dbl>, str_nrm <dbl>,
# vul_hea <dbl>, vul_sus <dbl>, vul_dis <dbl>, cap_gov <dbl>, cap_eff <dbl>,
# cap_pri <dbl>, cap_st <dbl>, glo_col <dbl>, glo_ghg <dbl>, glo_tbp <dbl>,
# esiQuart <fct>, name <chr>, alpha-2 <chr>, country-code <chr>,
# iso_3166-2 <chr>, region <chr>, sub-region <chr>,
# intermediate-region <chr>, region-code <chr>, sub-region-code <chr>, …
What countries are the top 10 and bottom of GHG emissions, and how does that relate to esi
score/region?
select(esi_region, country, esi, glo_ghg, esiQuart, region, "sub-region") %>%
slice_max(glo_ghg, n=10) %>%
ggplot(aes(y=country, x=esi, fill=region)) +
geom_bar(stat='identity') +
scale_color_brewer(palette='Dark2') +
labs(x="ESI", y="", subtitle = "Top 10 GHG by Region and ESI")
select(esi_region, country, esi, glo_ghg, esiQuart, region, "sub-region") %>%
slice_min(glo_ghg, n=10)%>%
ggplot(aes(y=country, x=esi, fill=region)) +
geom_bar(stat='identity') +
scale_color_brewer(palette='Dark2') +
labs(x="ESI", y="", subtitle = "Bottom 10 GHG by Region and ESI")
It looks like countries with higher GHG tend to be from the regions of South Eastern Asia and Sub-Saharan Africa, whilst those with lower GHG from Central Asia and Eastern Europe.
Countries with higher GHG emissions tend to be from Sub-Saharan Africa and South Eastern Asia, whilst those with less from Central Asia and Europe. Overall, countries with higher esi
scores tend to have above average water quality, but spew more greenhouse gas emissions then those with lower esi
scores. Therefore, according to the 2005 Environmental Sustainability Index, curtailing greenhouse gas emissions is not necessarily an indicator of increaed ability to protect the environment, whereas ability to maintain higher water quality may be.