Eniviromental Sustainability Index 2005

Introduction

The 2005 Environmental Sustainability Index (ESI) benchmarks the world’s nations ability to protect the environment over the next few decades. 146 Countries were given an ESI, made up of 21 indicators in several different environmental areas. These 21 indicators were tallied on a z-score scale and subsequently grouped into 5 Components. Environmental Systems, Reducing Stresses, Reducing Human Vulnerability and Social and Institutional Capacity. Together these make up the ESI target variable.

The idea was to look at which countries received higher ESI scores, and compare/contrast this with the some of the 21 indicators available in this dataset. Are their certain indicators that have a stronger linear relationship with ESI? Do countries with higher ESI scores come from a particular region? Do more developed countries have higher ESI? Or are there other factors contributing to this metric. My first instinct is to think countries with less Greenhouse Gas emissions and strict environmental governance would have higher ESI. This doesn’t necessarily mean more or less developed. But let’s see what the data tells us!

library(readr)
esi <- read_csv("https://raw.githubusercontent.com/justinm0rgan/bridge-workshop/main/R/finalproject/esi.csv?token=GHSAT0AAAAAABPMFD5CVLEKEGIPS4J36IPIYPOB2FA",col_select = 2:30, show_col_types = FALSE)
New names:
* `` -> ...1
# preview dataset
head(esi)
# A tibble: 6 × 29
  code  country    esi system stress vulner   cap global sys_air sys_bio sys_lan
  <chr> <chr>    <dbl>  <dbl>  <dbl>  <dbl> <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
1 ALB   Albania   58.8   52.4   65.4   72.3  46.2   57.9    0.45    0.17   -0.31
2 DZA   Algeria   46     43.1   66.3   57.5  31.8   21.1   -0.02   -0.08    1.34
3 AGO   Angola    42.9   67.9   59.1   11.8  22.1   39.1   -0.77    0.77    0.77
4 ARG   Argenti…  62.7   67.6   54.9   69.9  65.4   58.5    0.4     0.1     0.66
5 ARM   Armenia   53.2   54.4   62.2   50.8  34.9   60.3    1.21   -0.02   -0.22
6 AUS   Austral…  61     78.1   40.5   75.2  76.9   30.2    0.7     0.16    1.41
# … with 18 more variables: sys_wql <dbl>, sys_wqn <dbl>, str_air <dbl>,
#   str_eco <dbl>, str_pop <dbl>, str_was <dbl>, str_wat <dbl>, str_nrm <dbl>,
#   vul_hea <dbl>, vul_sus <dbl>, vul_dis <dbl>, cap_gov <dbl>, cap_eff <dbl>,
#   cap_pri <dbl>, cap_st <dbl>, glo_col <dbl>, glo_ghg <dbl>, glo_tbp <dbl>

Get summary statistics of target variable esi.

summary(esi$esi)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  29.20   44.62   49.55   49.88   54.15   75.10 

Distribution of esi

library(ggplot2)
ggplot(esi, aes(x=esi)) + 
  geom_histogram(color="black", fill="lightblue", binwidth = 1) +
  geom_vline(data=esi, aes(xintercept=mean(esi)), linetype='dashed') +
  ggtitle("Environmental Sustainability Index (ESI)",) + theme(plot.title = element_text(hjust = 0.5)) +
  xlab("ESI") + ylab("Count")

We can see the bulk of distribution of esi is between 40 and 60 with the mean ~49.

Let’s visualize the Quantile’s!

ggplot(esi, aes(x=esi)) +
  geom_boxplot(outlier.colour = 'black', 
               outlier.shape = 8, 
               outlier.size = 4,
               notch = T,
               fill="lightblue") +
  ggtitle("Environmental Sustainability Index (ESI)",) + 
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.y.left = element_blank(),
        axis.ticks.y.left = element_blank()) +
  xlab("ESI")

Here we can see the quantile’s of esi with mean ~49 and some outliers below 30 and above 70.

Let’s create a categorical column labeling each records quartile in relation to esi.

esi$esiQuart <- ifelse((esi$esi >= 29.20) & (esi$esi < 44.62), "First",
                       ifelse((esi$esi >= 44.62) & (esi$esi < 49.88), "Second",
                              ifelse((esi$esi >=49.88) & (esi$esi < 54.15), "Third",
                                     ifelse(esi$esi>=54.15, "Fourth", NA))))

# convert to factor
esi$esiQuart <- as.factor(esi$esiQuart)

Boxplot segmented by esiQuart

  ggplot(esi, aes(x=esiQuart, y=esi, fill=esiQuart)) +
  geom_boxplot() +
  scale_color_brewer(palette = 'Dark2') +
  scale_fill_discrete(name="Quartile") +
  ggtitle("Environmental Sustainability Index (ESI) by Quartile") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  ylab("ESI") + xlab("")

Here we can see the division of quartiles for the esi metric. With the Fourth (highest esi group) Interquartile range from ~58 to ~63 looks there are about 4 or 5 outliers. This begs the question, who are these outliers?

head(esi[order(-esi$esi),2:3],5)
# A tibble: 5 × 2
  country   esi
  <chr>   <dbl>
1 Finland  75.1
2 Norway   73.4
3 Uruguay  71.8
4 Sweden   71.7
5 Iceland  70.8

Looks like top 5 are from Northern European mostly Scandinavian countries.

Scattermatrix by Quartile

Let’s look at a scatter matrix segmented by quartile to visualize some of the variables linear relationships with esi.

We will look at the following specific indicators:

Variable Description
glo_ghg Global Greenhouse gases (GHG’s)
vul_hea Environmental health
cap_pri Private sector responsiveness
cap_gov Environmental governance
sys_wql Water quality
esi Environmental Sustainability Index (ESI) (target variable)
pairs(~glo_ghg + # greenhouse gas emissions
        vul_hea + # environmental health
        cap_pri + # private sector responsiveness
        cap_gov  + # environmental governance
        sys_wql + # water quality
        esi, # target variable
      col = factor(esi$esiQuart), pch = 19, data = esi) 

From looking at these 5 variables, it seems sys_wql or water quality has the most linear relationship to esi, meaning as esi goes up, so does water quality and vice versa. A feature that I expected to have a stronger linear relationship with esi was glo_ghg or GHG’s, however this is a bit all over the place.

Ok, let’s isolate water sys_wql and glo_ghg and take a closer look.

Greenhouse Gases (GHG’s) and Water Quality

Let’s look at esi quartile’s in relation to average greenhouse gas emissions glo_ghg.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
esiAvgGhg <- esi %>% 
  group_by(esiQuart) %>% 
  summarise(AvgGhg=mean(glo_ghg)) %>% 
  arrange(AvgGhg)

esiAvgGhg
# A tibble: 4 × 2
  esiQuart  AvgGhg
  <fct>      <dbl>
1 First    -0.187 
2 Second   -0.0316
3 Fourth    0.0659
4 Third     0.16  
  ggplot(esiAvgGhg, aes(y=esiQuart, x=AvgGhg, fill=esiQuart)) +
  geom_bar(stat="identity", show.legend = F) +
  labs(y="",x="Greenhouse Gas Emissions (mean z-score)", fill="",
       subtitle="Mean Greenhouse Gas Emissions by ESI Quartile")

Contrary to what we may expect, those in the first esi quartile have a lower (less GHG’s) glo_ghg z-score, while those in the third and fourth have higher greenhouse gas emissions. However, the third quartile has the largest score, more then double the fourth.

Let’s look at esi quartile’s in relation to average water quality sys_wql

esiAvgWql <- esi %>% 
  group_by(esiQuart) %>% 
  summarise(AvgWql=mean(sys_wql)) %>% 
  arrange(AvgWql)

esiAvgWql
# A tibble: 4 × 2
  esiQuart AvgWql
  <fct>     <dbl>
1 Second   -0.399
2 First    -0.375
3 Third     0.137
4 Fourth    0.665
  ggplot(esiAvgWql, aes(y=esiQuart, x=AvgWql, fill=esiQuart)) +
  geom_bar(stat="identity", show.legend = F) +
  labs(y="",x="Water Quality (mean z-score)", fill="",
       subtitle="Mean Water Quality by ESI Quartile")

Water quality seems to correlate more strongly with esi. Countries in the third and fourth esi quartile have much higher water quality then those in the first and second. In fact if we plot x as glo_ghg and y as sys_wql, we can see those with higher water quality, tend to have around average (or zero z-score) GHG emissions. Whereas those in the first and second quatiles have more of a spread of of GHG’s and lower water quality.

ggplot(data=esi, aes(x=glo_ghg, y=sys_wql, color=esiQuart)) +
  geom_point(alpha=0.7, size=2) +
  labs(y="Water Quality", x="Greenhouse Gases (GHG's)",
       subtitle="ESI vs. GHG's", color="ESI Quartile") +
  scale_color_brewer(palette = 'Dark2')

By region

Let’s bring in a table with region and join by country code, to see if there are any relationship’s of GHG and water quality to region and esi quartile.

# import region data
region_df <- read_csv("https://raw.githubusercontent.com/justinm0rgan/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv", show_col_type=F)
head(region_df)
# A tibble: 6 × 11
  name      `alpha-2` `alpha-3` `country-code` `iso_3166-2` region `sub-region` 
  <chr>     <chr>     <chr>     <chr>          <chr>        <chr>  <chr>        
1 Afghanis… AF        AFG       004            ISO 3166-2:… Asia   Southern Asia
2 Åland Is… AX        ALA       248            ISO 3166-2:… Europe Northern Eur…
3 Albania   AL        ALB       008            ISO 3166-2:… Europe Southern Eur…
4 Algeria   DZ        DZA       012            ISO 3166-2:… Africa Northern Afr…
5 American… AS        ASM       016            ISO 3166-2:… Ocean… Polynesia    
6 Andorra   AD        AND       020            ISO 3166-2:… Europe Southern Eur…
# … with 4 more variables: intermediate-region <chr>, region-code <chr>,
#   sub-region-code <chr>, intermediate-region-code <chr>
# join with esi
esi_region <- left_join(esi, region_df, by = c("code" = "alpha-3"))
head(esi_region)
# A tibble: 6 × 40
  code  country    esi system stress vulner   cap global sys_air sys_bio sys_lan
  <chr> <chr>    <dbl>  <dbl>  <dbl>  <dbl> <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
1 ALB   Albania   58.8   52.4   65.4   72.3  46.2   57.9    0.45    0.17   -0.31
2 DZA   Algeria   46     43.1   66.3   57.5  31.8   21.1   -0.02   -0.08    1.34
3 AGO   Angola    42.9   67.9   59.1   11.8  22.1   39.1   -0.77    0.77    0.77
4 ARG   Argenti…  62.7   67.6   54.9   69.9  65.4   58.5    0.4     0.1     0.66
5 ARM   Armenia   53.2   54.4   62.2   50.8  34.9   60.3    1.21   -0.02   -0.22
6 AUS   Austral…  61     78.1   40.5   75.2  76.9   30.2    0.7     0.16    1.41
# … with 29 more variables: sys_wql <dbl>, sys_wqn <dbl>, str_air <dbl>,
#   str_eco <dbl>, str_pop <dbl>, str_was <dbl>, str_wat <dbl>, str_nrm <dbl>,
#   vul_hea <dbl>, vul_sus <dbl>, vul_dis <dbl>, cap_gov <dbl>, cap_eff <dbl>,
#   cap_pri <dbl>, cap_st <dbl>, glo_col <dbl>, glo_ghg <dbl>, glo_tbp <dbl>,
#   esiQuart <fct>, name <chr>, alpha-2 <chr>, country-code <chr>,
#   iso_3166-2 <chr>, region <chr>, sub-region <chr>,
#   intermediate-region <chr>, region-code <chr>, sub-region-code <chr>, …

What countries are the top 10 and bottom of GHG emissions, and how does that relate to esi score/region?

select(esi_region, country, esi, glo_ghg, esiQuart, region, "sub-region") %>% 
  slice_max(glo_ghg, n=10) %>% 
  ggplot(aes(y=country, x=esi, fill=region)) +
  geom_bar(stat='identity') +
  scale_color_brewer(palette='Dark2') +
  labs(x="ESI", y="", subtitle = "Top 10 GHG by Region and ESI")

select(esi_region, country, esi, glo_ghg, esiQuart, region, "sub-region") %>% 
  slice_min(glo_ghg, n=10)%>% 
  ggplot(aes(y=country, x=esi, fill=region)) +
  geom_bar(stat='identity') +
  scale_color_brewer(palette='Dark2') +
  labs(x="ESI", y="", subtitle = "Bottom 10 GHG by Region and ESI")

It looks like countries with higher GHG tend to be from the regions of South Eastern Asia and Sub-Saharan Africa, whilst those with lower GHG from Central Asia and Eastern Europe.

Conclusion

Countries with higher GHG emissions tend to be from Sub-Saharan Africa and South Eastern Asia, whilst those with less from Central Asia and Europe. Overall, countries with higher esi scores tend to have above average water quality, but spew more greenhouse gas emissions then those with lower esi scores. Therefore, according to the 2005 Environmental Sustainability Index, curtailing greenhouse gas emissions is not necessarily an indicator of increaed ability to protect the environment, whereas ability to maintain higher water quality may be.