Overview

The objective of this exercise is to identify sustainability “top-performers” at the country level by comparing countries’ performance on sustainability indicators. This exercise relies primarily on the World Bank’s Sovereign Environmental, Social and Governance Data, which provides historical country-level data on 67 sustainability indicators. The World Bank’s ESG dataset is available here: https://datatopics.worldbank.org/esg/index.html. The version of the data used in this exercise was downloaded in October, 2022.

The methodology of this exercise is to rank-order countries based on their performance on various sustainability indicators, such as emissions and pollution levels, natural resource management and energy use. Notably, countries will be assessed based on their performance relative to other countries in their peer groups, in order to control for capacity. As such, this exercise also utilizes a variety of IMF and World Bank country groupings to enable peer-group comparisons.
# Data Preparation

Data preparation plan

The data preparation for this exercise involves cleaning and merging 2 datasets: the World Bank Sovereign ESG dataset and the IMF WB Country Groups dataset. The steps are as follows. Step 1: Load each dataset and review the data. Step 2: convert the country_name column in the IMF WB Country Groups dataset into iso3c codes. Step 3: Join the two datasets using the iso3c codes. The steps above will yield a combined dataset that includes ESG indicator data by country, with country groupings.

Load the data

folder_path <- partial(here, "00_data_raw")

folder_path() %>% list.files()
## [1] "~$imf_wb_country_groups_exploration.xlsx"
## [2] "imf_wb_country_groups.csv"               
## [3] "imf_wb_country_groups_exploration.xlsx"  
## [4] "World Bank Sovereign ESG dataset"
WB_sovereign_ESG <- folder_path("World Bank Sovereign ESG dataset/ESGData2.csv") %>%
  read_csv()
## Rows: 16013 Columns: 67
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): Country Name, iso3c, Indicator Name, Indicator Code
## dbl (63): 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
imf_wb_country_groups <- folder_path("imf_wb_country_groups.csv") %>%
  read_csv()
## Rows: 2587 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): country_name, country_group, group_type
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Review the WB ESG dataset and comment on its data structure

glimpse(WB_sovereign_ESG)
## Rows: 16,013
## Columns: 67
## $ `Country Name`   <chr> "Arab World", "Arab World", "Arab World", "Arab World…
## $ iso3c            <chr> "ARB", "ARB", "ARB", "ARB", "ARB", "ARB", "ARB", "ARB…
## $ `Indicator Name` <chr> "Access to clean fuels and technologies for cooking (…
## $ `Indicator Code` <chr> "EG.CFT.ACCS.ZS", "EG.ELC.ACCS.ZS", "NY.ADJ.DRES.GN.Z…
## $ `1960`           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `1961`           <dbl> NA, NA, NA, NA, 30.981194, NA, NA, NA, NA, NA, NA, NA…
## $ `1962`           <dbl> NA, NA, NA, NA, 30.982443, NA, NA, NA, NA, NA, NA, NA…
## $ `1963`           <dbl> NA, NA, NA, NA, 31.006834, NA, NA, NA, NA, NA, NA, NA…
## $ `1964`           <dbl> NA, NA, NA, NA, 31.01778, NA, NA, NA, NA, NA, NA, NA,…
## $ `1965`           <dbl> NA, NA, NA, NA, 31.042245, NA, NA, NA, NA, NA, NA, NA…
## $ `1966`           <dbl> NA, NA, NA, NA, 31.05018, NA, NA, NA, NA, NA, NA, NA,…
## $ `1967`           <dbl> NA, NA, NA, NA, 31.103003, NA, NA, NA, NA, NA, NA, NA…
## $ `1968`           <dbl> NA, NA, NA, NA, 31.133345, 14.660731, NA, NA, NA, NA,…
## $ `1969`           <dbl> NA, NA, NA, NA, 31.190209, 14.295950, NA, NA, NA, NA,…
## $ `1970`           <dbl> NA, NA, 6.8192031, 0.1745661, 31.2542726, 14.6376317,…
## $ `1971`           <dbl> NA, NA, 6.8625877, 0.1392865, 31.3863673, 13.9126712,…
## $ `1972`           <dbl> NA, NA, 8.9932942, 0.1320823, 31.4997277, 13.7760100,…
## $ `1973`           <dbl> NA, NA, 12.2730219, 0.1524257, 31.4965877, 12.3616551…
## $ `1974`           <dbl> NA, NA, 25.24123751, 0.08739206, 31.55058649, 7.77632…
## $ `1975`           <dbl> NA, NA, 17.17663885, 0.09875801, 31.52950124, 7.98857…
## $ `1976`           <dbl> NA, NA, 18.92687785, 0.06634076, 31.59973643, 7.44355…
## $ `1977`           <dbl> NA, NA, 19.2409638, 0.1024133, 31.6219972, 7.1412275,…
## $ `1978`           <dbl> NA, NA, 16.3069297, 0.1071046, 31.6660778, 7.1654123,…
## $ `1979`           <dbl> NA, NA, 32.75748927, 0.07422815, 31.67849388, 5.67074…
## $ `1980`           <dbl> NA, NA, 27.88409161, 0.05775981, 31.75886761, 4.75050…
## $ `1981`           <dbl> NA, NA, 20.08867813, 0.05511109, 31.45424086, 5.07641…
## $ `1982`           <dbl> NA, NA, 10.9940045, 0.1141440, 31.4801559, 5.9800629,…
## $ `1983`           <dbl> NA, NA, 9.65310503, 0.08148845, 31.52860052, 6.738849…
## $ `1984`           <dbl> NA, NA, 9.7577998, 0.0795408, 31.9421227, 7.3157059, …
## $ `1985`           <dbl> NA, NA, 8.32055276, 0.03830571, 32.44202971, 8.068573…
## $ `1986`           <dbl> NA, NA, 5.59763556, 0.08965615, 33.02639234, 9.621148…
## $ `1987`           <dbl> NA, NA, 7.36173263, 0.08506867, 33.58285192, 10.27551…
## $ `1988`           <dbl> NA, NA, 6.5508191, 0.0897014, 34.1868305, 10.5341524,…
## $ `1989`           <dbl> NA, NA, 9.05466681, 0.08732953, 34.69763715, 11.08920…
## $ `1990`           <dbl> NA, NA, 7.98566651, 0.06761466, 35.10930645, 10.38169…
## $ `1991`           <dbl> NA, NA, 8.43729462, 0.07275225, 35.15972003, 12.27688…
## $ `1992`           <dbl> NA, NA, 8.51357510, 0.05785795, 35.32097250, 9.225802…
## $ `1993`           <dbl> NA, NA, 8.14964549, 0.04460087, 36.09585061, 9.263672…
## $ `1994`           <dbl> NA, NA, 7.44516045, 0.04575975, 36.75422959, 10.22704…
## $ `1995`           <dbl> NA, NA, 7.67706491, 0.06196782, 37.38644379, 10.38116…
## $ `1996`           <dbl> NA, 76.61107378, 8.91043066, 0.05868878, 37.98259419,…
## $ `1997`           <dbl> NA, 77.25362114, 7.75296001, 0.05524078, 38.49337516,…
## $ `1998`           <dbl> NA, 78.11157597, 5.40270002, 0.07836069, 39.11826905,…
## $ `1999`           <dbl> NA, 78.69106128, 7.00074834, 0.03706546, 39.64700699,…
## $ `2000`           <dbl> 75.31875484, 80.73614121, 9.72850267, 0.02122214, 39.…
## $ `2001`           <dbl> 76.65447960, 81.58623137, 7.51233649, 0.02455567, 39.…
## $ `2002`           <dbl> 77.85548458, 81.54022203, 7.39429040, 0.02680793, 39.…
## $ `2003`           <dbl> 79.04554536, 82.50815948, 8.71632561, 0.03261723, 39.…
## $ `2004`           <dbl> 80.03416635, 82.50368475, 10.19516060, 0.02505279, 40…
## $ `2005`           <dbl> 81.02576797, 83.21298163, 12.94744287, 0.02041118, 40…
## $ `2006`           <dbl> 81.96581490, 85.45900327, 12.83170086, 0.02157661, 40…
## $ `2007`           <dbl> 82.78718917, 83.76443999, 11.44096571, 0.01803353, 40…
## $ `2008`           <dbl> 83.47573318, 83.38698965, 13.30778857, 0.02509408, 40…
## $ `2009`           <dbl> 84.15742348, 84.31348380, 8.26486020, 0.02885193, 40.…
## $ `2010`           <dbl> 84.68334427, 87.11486281, 9.45136425, 0.03003109, 40.…
## $ `2011`           <dbl> 85.1927423, 87.3326612, 13.4275913, 0.0309041, 40.172…
## $ `2012`           <dbl> 85.64421787, 87.03958839, 12.85922547, 0.03226624, 36…
## $ `2013`           <dbl> 85.93256689, 88.99261981, 11.63839653, 0.06262475, 36…
## $ `2014`           <dbl> 86.23238390, 88.01535618, 10.29789118, 0.08494782, 36…
## $ `2015`           <dbl> 86.47859724, 88.68188567, 6.23770324, 0.09978365, 36.…
## $ `2016`           <dbl> 86.72268517, 89.19506235, 5.20389537, 0.09505614, 36.…
## $ `2017`           <dbl> 86.93793290, 90.32465949, 6.48008284, 0.09549819, 36.…
## $ `2018`           <dbl> 87.04077373, 88.91074940, 8.47907008, 0.05105818, 36.…
## $ `2019`           <dbl> 87.23553889, 89.99994555, 7.43714482, 0.06219453, 36.…
## $ `2020`           <dbl> 87.30706752, 90.27773508, 4.37637424, 0.07908057, 36.…
## $ `2021`           <dbl> NA, NA, NA, NA, NA, 5.237110, NA, NA, NA, NA, NA, NA,…
## $ `2050`           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

The WB ESG dataset includes 67 columns and 16,013 rows.

The first 2 columns are the country name and the iso3c code. The next 2 columns are the indicator name and unique indicator code. Columns 5-66 include the value of the indicator for each year between 1960 and 2021. The final column 67 includes forecast values for 2050. Notably, many of the observations in columns 5-67 are blank, likely due to a lack of data.

Each row represents a unique combination of country and indicator, with the values of the indicator for the specific country included in columns 5-67. The Country Name and iso3c columns 1-2 include 239 unique countries and the indicator name and unique indicator code columns 3-4 include 67 unique indicators. The multiplication of these 239 countries and 67 indicators exactly yields the total 16,013 rows in the dataset.

Notably, multiple rows have blank values, likely due to data limitations, with some indicator rows for countries being entirely blank for each year column. Finally, the Country Name and iso3c columns include a subset of country groupings (e.g. “Arab World,” “Euro area”).

Review the IMF WB Country Groups dataset and comment on its data structure

glimpse(imf_wb_country_groups)
## Rows: 2,587
## Columns: 3
## $ country_name  <chr> "Australia", "Austria", "Belgium", "Canada", "Switzerlan…
## $ country_group <chr> "Advanced Economies", "Advanced Economies", "Advanced Ec…
## $ group_type    <chr> "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", …

The IMF WB Country Groups dataset includes 3 columns and 2,587 rows.

The columns are country name, country group (including various country groupings) and group type (an indication of whether the row is corresponds to an IMF or WB grouping).

Each row is a unique combination of a country name, country group and group type. The dataset includes 218 unique country names, 66 unique country groups and 2 unique group types (IMF or WB). 19 of the country groups are tagged as IMF and 48 are tagged as WB (“Euro Area” is the only country tagged as both IMF and WB and thus duplicated for each country).

Notably, each country has a different number of rows, depending on the number of country groupings that they fall under (e.g. Malawi has 15 rows as it falls under 15 country groupings while Australia has only 9 rows as it only falls under 9 country groupings). Finally, the IMF WB Country Groups dataset does not include iso3c country codes.

Convert country names to iso3c in the IMF WB Country Groups dataset

country_name_regex_to_iso3c <- function(country_name) {
  country_name %>%
    countrycode(origin = "country.name", 
                                     destination = "iso3c",
                                     origin_regex = TRUE)
}

imf_wb_country_groups_iso3c <- imf_wb_country_groups %>%
  mutate(iso3c = country_name_regex_to_iso3c(country_name))
## Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Channel Islands, Kosovo
imf_wb_country_groups_iso3c
## # A tibble: 2,587 × 4
##    country_name country_group      group_type iso3c
##    <chr>        <chr>              <chr>      <chr>
##  1 Australia    Advanced Economies IMF        AUS  
##  2 Austria      Advanced Economies IMF        AUT  
##  3 Belgium      Advanced Economies IMF        BEL  
##  4 Canada       Advanced Economies IMF        CAN  
##  5 Switzerland  Advanced Economies IMF        CHE  
##  6 Cyprus       Advanced Economies IMF        CYP  
##  7 Czechia      Advanced Economies IMF        CZE  
##  8 Germany      Advanced Economies IMF        DEU  
##  9 Denmark      Advanced Economies IMF        DNK  
## 10 Spain        Advanced Economies IMF        ESP  
## # … with 2,577 more rows

Merge the two datasets using the iso3c country codes

The left join function, taking the WB ESG dataset as the “left” data frame, is most suitable in this case as this will add the country groups from the IMF WB Country Groups dataset to the WB ESG dataset as new columns, while preserving all rows in the ESG dataset (including the country groupings that are already included in the ESG dataset)

WB_sovereign_ESG_country_groups <- WB_sovereign_ESG %>% left_join(imf_wb_country_groups_iso3c, by = "iso3c")

glimpse(WB_sovereign_ESG_country_groups)
## Rows: 169,175
## Columns: 70
## $ `Country Name`   <chr> "Arab World", "Arab World", "Arab World", "Arab World…
## $ iso3c            <chr> "ARB", "ARB", "ARB", "ARB", "ARB", "ARB", "ARB", "ARB…
## $ `Indicator Name` <chr> "Access to clean fuels and technologies for cooking (…
## $ `Indicator Code` <chr> "EG.CFT.ACCS.ZS", "EG.ELC.ACCS.ZS", "NY.ADJ.DRES.GN.Z…
## $ `1960`           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `1961`           <dbl> NA, NA, NA, NA, 30.981194, NA, NA, NA, NA, NA, NA, NA…
## $ `1962`           <dbl> NA, NA, NA, NA, 30.982443, NA, NA, NA, NA, NA, NA, NA…
## $ `1963`           <dbl> NA, NA, NA, NA, 31.006834, NA, NA, NA, NA, NA, NA, NA…
## $ `1964`           <dbl> NA, NA, NA, NA, 31.01778, NA, NA, NA, NA, NA, NA, NA,…
## $ `1965`           <dbl> NA, NA, NA, NA, 31.042245, NA, NA, NA, NA, NA, NA, NA…
## $ `1966`           <dbl> NA, NA, NA, NA, 31.05018, NA, NA, NA, NA, NA, NA, NA,…
## $ `1967`           <dbl> NA, NA, NA, NA, 31.103003, NA, NA, NA, NA, NA, NA, NA…
## $ `1968`           <dbl> NA, NA, NA, NA, 31.133345, 14.660731, NA, NA, NA, NA,…
## $ `1969`           <dbl> NA, NA, NA, NA, 31.190209, 14.295950, NA, NA, NA, NA,…
## $ `1970`           <dbl> NA, NA, 6.8192031, 0.1745661, 31.2542726, 14.6376317,…
## $ `1971`           <dbl> NA, NA, 6.8625877, 0.1392865, 31.3863673, 13.9126712,…
## $ `1972`           <dbl> NA, NA, 8.9932942, 0.1320823, 31.4997277, 13.7760100,…
## $ `1973`           <dbl> NA, NA, 12.2730219, 0.1524257, 31.4965877, 12.3616551…
## $ `1974`           <dbl> NA, NA, 25.24123751, 0.08739206, 31.55058649, 7.77632…
## $ `1975`           <dbl> NA, NA, 17.17663885, 0.09875801, 31.52950124, 7.98857…
## $ `1976`           <dbl> NA, NA, 18.92687785, 0.06634076, 31.59973643, 7.44355…
## $ `1977`           <dbl> NA, NA, 19.2409638, 0.1024133, 31.6219972, 7.1412275,…
## $ `1978`           <dbl> NA, NA, 16.3069297, 0.1071046, 31.6660778, 7.1654123,…
## $ `1979`           <dbl> NA, NA, 32.75748927, 0.07422815, 31.67849388, 5.67074…
## $ `1980`           <dbl> NA, NA, 27.88409161, 0.05775981, 31.75886761, 4.75050…
## $ `1981`           <dbl> NA, NA, 20.08867813, 0.05511109, 31.45424086, 5.07641…
## $ `1982`           <dbl> NA, NA, 10.9940045, 0.1141440, 31.4801559, 5.9800629,…
## $ `1983`           <dbl> NA, NA, 9.65310503, 0.08148845, 31.52860052, 6.738849…
## $ `1984`           <dbl> NA, NA, 9.7577998, 0.0795408, 31.9421227, 7.3157059, …
## $ `1985`           <dbl> NA, NA, 8.32055276, 0.03830571, 32.44202971, 8.068573…
## $ `1986`           <dbl> NA, NA, 5.59763556, 0.08965615, 33.02639234, 9.621148…
## $ `1987`           <dbl> NA, NA, 7.36173263, 0.08506867, 33.58285192, 10.27551…
## $ `1988`           <dbl> NA, NA, 6.5508191, 0.0897014, 34.1868305, 10.5341524,…
## $ `1989`           <dbl> NA, NA, 9.05466681, 0.08732953, 34.69763715, 11.08920…
## $ `1990`           <dbl> NA, NA, 7.98566651, 0.06761466, 35.10930645, 10.38169…
## $ `1991`           <dbl> NA, NA, 8.43729462, 0.07275225, 35.15972003, 12.27688…
## $ `1992`           <dbl> NA, NA, 8.51357510, 0.05785795, 35.32097250, 9.225802…
## $ `1993`           <dbl> NA, NA, 8.14964549, 0.04460087, 36.09585061, 9.263672…
## $ `1994`           <dbl> NA, NA, 7.44516045, 0.04575975, 36.75422959, 10.22704…
## $ `1995`           <dbl> NA, NA, 7.67706491, 0.06196782, 37.38644379, 10.38116…
## $ `1996`           <dbl> NA, 76.61107378, 8.91043066, 0.05868878, 37.98259419,…
## $ `1997`           <dbl> NA, 77.25362114, 7.75296001, 0.05524078, 38.49337516,…
## $ `1998`           <dbl> NA, 78.11157597, 5.40270002, 0.07836069, 39.11826905,…
## $ `1999`           <dbl> NA, 78.69106128, 7.00074834, 0.03706546, 39.64700699,…
## $ `2000`           <dbl> 75.31875484, 80.73614121, 9.72850267, 0.02122214, 39.…
## $ `2001`           <dbl> 76.65447960, 81.58623137, 7.51233649, 0.02455567, 39.…
## $ `2002`           <dbl> 77.85548458, 81.54022203, 7.39429040, 0.02680793, 39.…
## $ `2003`           <dbl> 79.04554536, 82.50815948, 8.71632561, 0.03261723, 39.…
## $ `2004`           <dbl> 80.03416635, 82.50368475, 10.19516060, 0.02505279, 40…
## $ `2005`           <dbl> 81.02576797, 83.21298163, 12.94744287, 0.02041118, 40…
## $ `2006`           <dbl> 81.96581490, 85.45900327, 12.83170086, 0.02157661, 40…
## $ `2007`           <dbl> 82.78718917, 83.76443999, 11.44096571, 0.01803353, 40…
## $ `2008`           <dbl> 83.47573318, 83.38698965, 13.30778857, 0.02509408, 40…
## $ `2009`           <dbl> 84.15742348, 84.31348380, 8.26486020, 0.02885193, 40.…
## $ `2010`           <dbl> 84.68334427, 87.11486281, 9.45136425, 0.03003109, 40.…
## $ `2011`           <dbl> 85.1927423, 87.3326612, 13.4275913, 0.0309041, 40.172…
## $ `2012`           <dbl> 85.64421787, 87.03958839, 12.85922547, 0.03226624, 36…
## $ `2013`           <dbl> 85.93256689, 88.99261981, 11.63839653, 0.06262475, 36…
## $ `2014`           <dbl> 86.23238390, 88.01535618, 10.29789118, 0.08494782, 36…
## $ `2015`           <dbl> 86.47859724, 88.68188567, 6.23770324, 0.09978365, 36.…
## $ `2016`           <dbl> 86.72268517, 89.19506235, 5.20389537, 0.09505614, 36.…
## $ `2017`           <dbl> 86.93793290, 90.32465949, 6.48008284, 0.09549819, 36.…
## $ `2018`           <dbl> 87.04077373, 88.91074940, 8.47907008, 0.05105818, 36.…
## $ `2019`           <dbl> 87.23553889, 89.99994555, 7.43714482, 0.06219453, 36.…
## $ `2020`           <dbl> 87.30706752, 90.27773508, 4.37637424, 0.07908057, 36.…
## $ `2021`           <dbl> NA, NA, NA, NA, NA, 5.237110, NA, NA, NA, NA, NA, NA,…
## $ `2050`           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ country_name     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ country_group    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ group_type       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

The merged dataset has 70 columns and 169,175 rows.

The columns correspond to the 67 columns in the WB ESG dataset plus the 3 columns from the IMF WB Country Groupings dataset.

The total number of rows has increased substantially because each row now represents a unique combination of country name, indicator, country group and group type. In other words, the 67 rows for each country in the WB ESG dataset was duplicated by the number of country groups that the country falls into in the IMF WB Country Groupings dataset. As an example, Australia now has 603 rows in the merged dataset because each of its 67 rows in the WB ESG dataset was duplicated 9 times (67 * 9 = 603) to create separate rows for each of the 9 country groupings that Australia falls under in the IMF WB Country Groupings dataset.

In addition, 3082 of the rows in the merged dataset are blank for the new columns 68-70 because they correspond to one of 46 “country names” in the WB ESG dataset that are actually country groupings (e.g. “Arab World”). These observations are marked as “NA” in these columns because they did not match with any observations in the iso3c column in the IMF WB Country Groupings dataset.

Finally, the blank observations in the WB ESG dataset have now been populated with “NA” in the merged dataset.

Write merged dataset to .csv

write.csv(WB_sovereign_ESG_country_groups,"C:/Users/Gen Shiraishi/Desktop/Sustainable Finance - Application and Methods/Final project/03_data_processed\\WB_sovereign_ESG_country_groups.csv", row.names = FALSE)

Data analysis

Next steps: Update the dataset so that each country only appears once per indicator Show the country comparisons for my 10 selected indicators Show the country comparisons by peer group Build peer groups - high income, upper middle income, lower middle income, low income Build some kind of percentile rating system Identify “top performers” and “most investment needed” countries by indicator category and peer group