The objective of this exercise is to identify sustainability “top-performers” at the country level by comparing countries’ performance on sustainability indicators. This exercise relies primarily on the World Bank’s Sovereign Environmental, Social and Governance Data, which provides historical country-level data on 67 sustainability indicators. The World Bank’s ESG dataset is available here: https://datatopics.worldbank.org/esg/index.html. The version of the data used in this exercise was downloaded in October, 2022.
The methodology of this exercise is to rank-order countries based on
their performance on various sustainability indicators, such as
emissions and pollution levels, natural resource management and energy
use. Notably, countries will be assessed based on their performance
relative to other countries in their peer groups, in order to control
for capacity. As such, this exercise also utilizes a variety of IMF and
World Bank country groupings to enable peer-group comparisons.
# Data Preparation
The data preparation for this exercise involves cleaning and merging 2 datasets: the World Bank Sovereign ESG dataset and the IMF WB Country Groups dataset. The steps are as follows. Step 1: Load each dataset and review the data. Step 2: convert the country_name column in the IMF WB Country Groups dataset into iso3c codes. Step 3: Join the two datasets using the iso3c codes. The steps above will yield a combined dataset that includes ESG indicator data by country, with country groupings.
folder_path <- partial(here, "00_data_raw")
folder_path() %>% list.files()
## [1] "~$imf_wb_country_groups_exploration.xlsx"
## [2] "imf_wb_country_groups.csv"
## [3] "imf_wb_country_groups_exploration.xlsx"
## [4] "World Bank Sovereign ESG dataset"
WB_sovereign_ESG <- folder_path("World Bank Sovereign ESG dataset/ESGData2.csv") %>%
read_csv()
## Rows: 16013 Columns: 67
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Country Name, iso3c, Indicator Name, Indicator Code
## dbl (63): 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
imf_wb_country_groups <- folder_path("imf_wb_country_groups.csv") %>%
read_csv()
## Rows: 2587 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): country_name, country_group, group_type
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(WB_sovereign_ESG)
## Rows: 16,013
## Columns: 67
## $ `Country Name` <chr> "Arab World", "Arab World", "Arab World", "Arab World…
## $ iso3c <chr> "ARB", "ARB", "ARB", "ARB", "ARB", "ARB", "ARB", "ARB…
## $ `Indicator Name` <chr> "Access to clean fuels and technologies for cooking (…
## $ `Indicator Code` <chr> "EG.CFT.ACCS.ZS", "EG.ELC.ACCS.ZS", "NY.ADJ.DRES.GN.Z…
## $ `1960` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `1961` <dbl> NA, NA, NA, NA, 30.981194, NA, NA, NA, NA, NA, NA, NA…
## $ `1962` <dbl> NA, NA, NA, NA, 30.982443, NA, NA, NA, NA, NA, NA, NA…
## $ `1963` <dbl> NA, NA, NA, NA, 31.006834, NA, NA, NA, NA, NA, NA, NA…
## $ `1964` <dbl> NA, NA, NA, NA, 31.01778, NA, NA, NA, NA, NA, NA, NA,…
## $ `1965` <dbl> NA, NA, NA, NA, 31.042245, NA, NA, NA, NA, NA, NA, NA…
## $ `1966` <dbl> NA, NA, NA, NA, 31.05018, NA, NA, NA, NA, NA, NA, NA,…
## $ `1967` <dbl> NA, NA, NA, NA, 31.103003, NA, NA, NA, NA, NA, NA, NA…
## $ `1968` <dbl> NA, NA, NA, NA, 31.133345, 14.660731, NA, NA, NA, NA,…
## $ `1969` <dbl> NA, NA, NA, NA, 31.190209, 14.295950, NA, NA, NA, NA,…
## $ `1970` <dbl> NA, NA, 6.8192031, 0.1745661, 31.2542726, 14.6376317,…
## $ `1971` <dbl> NA, NA, 6.8625877, 0.1392865, 31.3863673, 13.9126712,…
## $ `1972` <dbl> NA, NA, 8.9932942, 0.1320823, 31.4997277, 13.7760100,…
## $ `1973` <dbl> NA, NA, 12.2730219, 0.1524257, 31.4965877, 12.3616551…
## $ `1974` <dbl> NA, NA, 25.24123751, 0.08739206, 31.55058649, 7.77632…
## $ `1975` <dbl> NA, NA, 17.17663885, 0.09875801, 31.52950124, 7.98857…
## $ `1976` <dbl> NA, NA, 18.92687785, 0.06634076, 31.59973643, 7.44355…
## $ `1977` <dbl> NA, NA, 19.2409638, 0.1024133, 31.6219972, 7.1412275,…
## $ `1978` <dbl> NA, NA, 16.3069297, 0.1071046, 31.6660778, 7.1654123,…
## $ `1979` <dbl> NA, NA, 32.75748927, 0.07422815, 31.67849388, 5.67074…
## $ `1980` <dbl> NA, NA, 27.88409161, 0.05775981, 31.75886761, 4.75050…
## $ `1981` <dbl> NA, NA, 20.08867813, 0.05511109, 31.45424086, 5.07641…
## $ `1982` <dbl> NA, NA, 10.9940045, 0.1141440, 31.4801559, 5.9800629,…
## $ `1983` <dbl> NA, NA, 9.65310503, 0.08148845, 31.52860052, 6.738849…
## $ `1984` <dbl> NA, NA, 9.7577998, 0.0795408, 31.9421227, 7.3157059, …
## $ `1985` <dbl> NA, NA, 8.32055276, 0.03830571, 32.44202971, 8.068573…
## $ `1986` <dbl> NA, NA, 5.59763556, 0.08965615, 33.02639234, 9.621148…
## $ `1987` <dbl> NA, NA, 7.36173263, 0.08506867, 33.58285192, 10.27551…
## $ `1988` <dbl> NA, NA, 6.5508191, 0.0897014, 34.1868305, 10.5341524,…
## $ `1989` <dbl> NA, NA, 9.05466681, 0.08732953, 34.69763715, 11.08920…
## $ `1990` <dbl> NA, NA, 7.98566651, 0.06761466, 35.10930645, 10.38169…
## $ `1991` <dbl> NA, NA, 8.43729462, 0.07275225, 35.15972003, 12.27688…
## $ `1992` <dbl> NA, NA, 8.51357510, 0.05785795, 35.32097250, 9.225802…
## $ `1993` <dbl> NA, NA, 8.14964549, 0.04460087, 36.09585061, 9.263672…
## $ `1994` <dbl> NA, NA, 7.44516045, 0.04575975, 36.75422959, 10.22704…
## $ `1995` <dbl> NA, NA, 7.67706491, 0.06196782, 37.38644379, 10.38116…
## $ `1996` <dbl> NA, 76.61107378, 8.91043066, 0.05868878, 37.98259419,…
## $ `1997` <dbl> NA, 77.25362114, 7.75296001, 0.05524078, 38.49337516,…
## $ `1998` <dbl> NA, 78.11157597, 5.40270002, 0.07836069, 39.11826905,…
## $ `1999` <dbl> NA, 78.69106128, 7.00074834, 0.03706546, 39.64700699,…
## $ `2000` <dbl> 75.31875484, 80.73614121, 9.72850267, 0.02122214, 39.…
## $ `2001` <dbl> 76.65447960, 81.58623137, 7.51233649, 0.02455567, 39.…
## $ `2002` <dbl> 77.85548458, 81.54022203, 7.39429040, 0.02680793, 39.…
## $ `2003` <dbl> 79.04554536, 82.50815948, 8.71632561, 0.03261723, 39.…
## $ `2004` <dbl> 80.03416635, 82.50368475, 10.19516060, 0.02505279, 40…
## $ `2005` <dbl> 81.02576797, 83.21298163, 12.94744287, 0.02041118, 40…
## $ `2006` <dbl> 81.96581490, 85.45900327, 12.83170086, 0.02157661, 40…
## $ `2007` <dbl> 82.78718917, 83.76443999, 11.44096571, 0.01803353, 40…
## $ `2008` <dbl> 83.47573318, 83.38698965, 13.30778857, 0.02509408, 40…
## $ `2009` <dbl> 84.15742348, 84.31348380, 8.26486020, 0.02885193, 40.…
## $ `2010` <dbl> 84.68334427, 87.11486281, 9.45136425, 0.03003109, 40.…
## $ `2011` <dbl> 85.1927423, 87.3326612, 13.4275913, 0.0309041, 40.172…
## $ `2012` <dbl> 85.64421787, 87.03958839, 12.85922547, 0.03226624, 36…
## $ `2013` <dbl> 85.93256689, 88.99261981, 11.63839653, 0.06262475, 36…
## $ `2014` <dbl> 86.23238390, 88.01535618, 10.29789118, 0.08494782, 36…
## $ `2015` <dbl> 86.47859724, 88.68188567, 6.23770324, 0.09978365, 36.…
## $ `2016` <dbl> 86.72268517, 89.19506235, 5.20389537, 0.09505614, 36.…
## $ `2017` <dbl> 86.93793290, 90.32465949, 6.48008284, 0.09549819, 36.…
## $ `2018` <dbl> 87.04077373, 88.91074940, 8.47907008, 0.05105818, 36.…
## $ `2019` <dbl> 87.23553889, 89.99994555, 7.43714482, 0.06219453, 36.…
## $ `2020` <dbl> 87.30706752, 90.27773508, 4.37637424, 0.07908057, 36.…
## $ `2021` <dbl> NA, NA, NA, NA, NA, 5.237110, NA, NA, NA, NA, NA, NA,…
## $ `2050` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
The WB ESG dataset includes 67 columns and 16,013 rows.
The first 2 columns are the country name and the iso3c code. The next 2 columns are the indicator name and unique indicator code. Columns 5-66 include the value of the indicator for each year between 1960 and 2021. The final column 67 includes forecast values for 2050. Notably, many of the observations in columns 5-67 are blank, likely due to a lack of data.
Each row represents a unique combination of country and indicator, with the values of the indicator for the specific country included in columns 5-67. The Country Name and iso3c columns 1-2 include 239 unique countries and the indicator name and unique indicator code columns 3-4 include 67 unique indicators. The multiplication of these 239 countries and 67 indicators exactly yields the total 16,013 rows in the dataset.
Notably, multiple rows have blank values, likely due to data limitations, with some indicator rows for countries being entirely blank for each year column. Finally, the Country Name and iso3c columns include a subset of country groupings (e.g. “Arab World,” “Euro area”).
glimpse(imf_wb_country_groups)
## Rows: 2,587
## Columns: 3
## $ country_name <chr> "Australia", "Austria", "Belgium", "Canada", "Switzerlan…
## $ country_group <chr> "Advanced Economies", "Advanced Economies", "Advanced Ec…
## $ group_type <chr> "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", …
The IMF WB Country Groups dataset includes 3 columns and 2,587 rows.
The columns are country name, country group (including various country groupings) and group type (an indication of whether the row is corresponds to an IMF or WB grouping).
Each row is a unique combination of a country name, country group and group type. The dataset includes 218 unique country names, 66 unique country groups and 2 unique group types (IMF or WB). 19 of the country groups are tagged as IMF and 48 are tagged as WB (“Euro Area” is the only country tagged as both IMF and WB and thus duplicated for each country).
Notably, each country has a different number of rows, depending on the number of country groupings that they fall under (e.g. Malawi has 15 rows as it falls under 15 country groupings while Australia has only 9 rows as it only falls under 9 country groupings). Finally, the IMF WB Country Groups dataset does not include iso3c country codes.
country_name_regex_to_iso3c <- function(country_name) {
country_name %>%
countrycode(origin = "country.name",
destination = "iso3c",
origin_regex = TRUE)
}
imf_wb_country_groups_iso3c <- imf_wb_country_groups %>%
mutate(iso3c = country_name_regex_to_iso3c(country_name))
## Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Channel Islands, Kosovo
imf_wb_country_groups_iso3c
## # A tibble: 2,587 × 4
## country_name country_group group_type iso3c
## <chr> <chr> <chr> <chr>
## 1 Australia Advanced Economies IMF AUS
## 2 Austria Advanced Economies IMF AUT
## 3 Belgium Advanced Economies IMF BEL
## 4 Canada Advanced Economies IMF CAN
## 5 Switzerland Advanced Economies IMF CHE
## 6 Cyprus Advanced Economies IMF CYP
## 7 Czechia Advanced Economies IMF CZE
## 8 Germany Advanced Economies IMF DEU
## 9 Denmark Advanced Economies IMF DNK
## 10 Spain Advanced Economies IMF ESP
## # … with 2,577 more rows
The left join function, taking the WB ESG dataset as the “left” data frame, is most suitable in this case as this will add the country groups from the IMF WB Country Groups dataset to the WB ESG dataset as new columns, while preserving all rows in the ESG dataset (including the country groupings that are already included in the ESG dataset)
WB_sovereign_ESG_country_groups <- WB_sovereign_ESG %>% left_join(imf_wb_country_groups_iso3c, by = "iso3c")
glimpse(WB_sovereign_ESG_country_groups)
## Rows: 169,175
## Columns: 70
## $ `Country Name` <chr> "Arab World", "Arab World", "Arab World", "Arab World…
## $ iso3c <chr> "ARB", "ARB", "ARB", "ARB", "ARB", "ARB", "ARB", "ARB…
## $ `Indicator Name` <chr> "Access to clean fuels and technologies for cooking (…
## $ `Indicator Code` <chr> "EG.CFT.ACCS.ZS", "EG.ELC.ACCS.ZS", "NY.ADJ.DRES.GN.Z…
## $ `1960` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `1961` <dbl> NA, NA, NA, NA, 30.981194, NA, NA, NA, NA, NA, NA, NA…
## $ `1962` <dbl> NA, NA, NA, NA, 30.982443, NA, NA, NA, NA, NA, NA, NA…
## $ `1963` <dbl> NA, NA, NA, NA, 31.006834, NA, NA, NA, NA, NA, NA, NA…
## $ `1964` <dbl> NA, NA, NA, NA, 31.01778, NA, NA, NA, NA, NA, NA, NA,…
## $ `1965` <dbl> NA, NA, NA, NA, 31.042245, NA, NA, NA, NA, NA, NA, NA…
## $ `1966` <dbl> NA, NA, NA, NA, 31.05018, NA, NA, NA, NA, NA, NA, NA,…
## $ `1967` <dbl> NA, NA, NA, NA, 31.103003, NA, NA, NA, NA, NA, NA, NA…
## $ `1968` <dbl> NA, NA, NA, NA, 31.133345, 14.660731, NA, NA, NA, NA,…
## $ `1969` <dbl> NA, NA, NA, NA, 31.190209, 14.295950, NA, NA, NA, NA,…
## $ `1970` <dbl> NA, NA, 6.8192031, 0.1745661, 31.2542726, 14.6376317,…
## $ `1971` <dbl> NA, NA, 6.8625877, 0.1392865, 31.3863673, 13.9126712,…
## $ `1972` <dbl> NA, NA, 8.9932942, 0.1320823, 31.4997277, 13.7760100,…
## $ `1973` <dbl> NA, NA, 12.2730219, 0.1524257, 31.4965877, 12.3616551…
## $ `1974` <dbl> NA, NA, 25.24123751, 0.08739206, 31.55058649, 7.77632…
## $ `1975` <dbl> NA, NA, 17.17663885, 0.09875801, 31.52950124, 7.98857…
## $ `1976` <dbl> NA, NA, 18.92687785, 0.06634076, 31.59973643, 7.44355…
## $ `1977` <dbl> NA, NA, 19.2409638, 0.1024133, 31.6219972, 7.1412275,…
## $ `1978` <dbl> NA, NA, 16.3069297, 0.1071046, 31.6660778, 7.1654123,…
## $ `1979` <dbl> NA, NA, 32.75748927, 0.07422815, 31.67849388, 5.67074…
## $ `1980` <dbl> NA, NA, 27.88409161, 0.05775981, 31.75886761, 4.75050…
## $ `1981` <dbl> NA, NA, 20.08867813, 0.05511109, 31.45424086, 5.07641…
## $ `1982` <dbl> NA, NA, 10.9940045, 0.1141440, 31.4801559, 5.9800629,…
## $ `1983` <dbl> NA, NA, 9.65310503, 0.08148845, 31.52860052, 6.738849…
## $ `1984` <dbl> NA, NA, 9.7577998, 0.0795408, 31.9421227, 7.3157059, …
## $ `1985` <dbl> NA, NA, 8.32055276, 0.03830571, 32.44202971, 8.068573…
## $ `1986` <dbl> NA, NA, 5.59763556, 0.08965615, 33.02639234, 9.621148…
## $ `1987` <dbl> NA, NA, 7.36173263, 0.08506867, 33.58285192, 10.27551…
## $ `1988` <dbl> NA, NA, 6.5508191, 0.0897014, 34.1868305, 10.5341524,…
## $ `1989` <dbl> NA, NA, 9.05466681, 0.08732953, 34.69763715, 11.08920…
## $ `1990` <dbl> NA, NA, 7.98566651, 0.06761466, 35.10930645, 10.38169…
## $ `1991` <dbl> NA, NA, 8.43729462, 0.07275225, 35.15972003, 12.27688…
## $ `1992` <dbl> NA, NA, 8.51357510, 0.05785795, 35.32097250, 9.225802…
## $ `1993` <dbl> NA, NA, 8.14964549, 0.04460087, 36.09585061, 9.263672…
## $ `1994` <dbl> NA, NA, 7.44516045, 0.04575975, 36.75422959, 10.22704…
## $ `1995` <dbl> NA, NA, 7.67706491, 0.06196782, 37.38644379, 10.38116…
## $ `1996` <dbl> NA, 76.61107378, 8.91043066, 0.05868878, 37.98259419,…
## $ `1997` <dbl> NA, 77.25362114, 7.75296001, 0.05524078, 38.49337516,…
## $ `1998` <dbl> NA, 78.11157597, 5.40270002, 0.07836069, 39.11826905,…
## $ `1999` <dbl> NA, 78.69106128, 7.00074834, 0.03706546, 39.64700699,…
## $ `2000` <dbl> 75.31875484, 80.73614121, 9.72850267, 0.02122214, 39.…
## $ `2001` <dbl> 76.65447960, 81.58623137, 7.51233649, 0.02455567, 39.…
## $ `2002` <dbl> 77.85548458, 81.54022203, 7.39429040, 0.02680793, 39.…
## $ `2003` <dbl> 79.04554536, 82.50815948, 8.71632561, 0.03261723, 39.…
## $ `2004` <dbl> 80.03416635, 82.50368475, 10.19516060, 0.02505279, 40…
## $ `2005` <dbl> 81.02576797, 83.21298163, 12.94744287, 0.02041118, 40…
## $ `2006` <dbl> 81.96581490, 85.45900327, 12.83170086, 0.02157661, 40…
## $ `2007` <dbl> 82.78718917, 83.76443999, 11.44096571, 0.01803353, 40…
## $ `2008` <dbl> 83.47573318, 83.38698965, 13.30778857, 0.02509408, 40…
## $ `2009` <dbl> 84.15742348, 84.31348380, 8.26486020, 0.02885193, 40.…
## $ `2010` <dbl> 84.68334427, 87.11486281, 9.45136425, 0.03003109, 40.…
## $ `2011` <dbl> 85.1927423, 87.3326612, 13.4275913, 0.0309041, 40.172…
## $ `2012` <dbl> 85.64421787, 87.03958839, 12.85922547, 0.03226624, 36…
## $ `2013` <dbl> 85.93256689, 88.99261981, 11.63839653, 0.06262475, 36…
## $ `2014` <dbl> 86.23238390, 88.01535618, 10.29789118, 0.08494782, 36…
## $ `2015` <dbl> 86.47859724, 88.68188567, 6.23770324, 0.09978365, 36.…
## $ `2016` <dbl> 86.72268517, 89.19506235, 5.20389537, 0.09505614, 36.…
## $ `2017` <dbl> 86.93793290, 90.32465949, 6.48008284, 0.09549819, 36.…
## $ `2018` <dbl> 87.04077373, 88.91074940, 8.47907008, 0.05105818, 36.…
## $ `2019` <dbl> 87.23553889, 89.99994555, 7.43714482, 0.06219453, 36.…
## $ `2020` <dbl> 87.30706752, 90.27773508, 4.37637424, 0.07908057, 36.…
## $ `2021` <dbl> NA, NA, NA, NA, NA, 5.237110, NA, NA, NA, NA, NA, NA,…
## $ `2050` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ country_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ country_group <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ group_type <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
The merged dataset has 70 columns and 169,175 rows.
The columns correspond to the 67 columns in the WB ESG dataset plus the 3 columns from the IMF WB Country Groupings dataset.
The total number of rows has increased substantially because each row now represents a unique combination of country name, indicator, country group and group type. In other words, the 67 rows for each country in the WB ESG dataset was duplicated by the number of country groups that the country falls into in the IMF WB Country Groupings dataset. As an example, Australia now has 603 rows in the merged dataset because each of its 67 rows in the WB ESG dataset was duplicated 9 times (67 * 9 = 603) to create separate rows for each of the 9 country groupings that Australia falls under in the IMF WB Country Groupings dataset.
In addition, 3082 of the rows in the merged dataset are blank for the new columns 68-70 because they correspond to one of 46 “country names” in the WB ESG dataset that are actually country groupings (e.g. “Arab World”). These observations are marked as “NA” in these columns because they did not match with any observations in the iso3c column in the IMF WB Country Groupings dataset.
Finally, the blank observations in the WB ESG dataset have now been populated with “NA” in the merged dataset.
write.csv(WB_sovereign_ESG_country_groups,"C:/Users/Gen Shiraishi/Desktop/Sustainable Finance - Application and Methods/Final project/03_data_processed\\WB_sovereign_ESG_country_groups.csv", row.names = FALSE)
Next steps: Update the dataset so that each country only appears once per indicator Show the country comparisons for my 10 selected indicators Show the country comparisons by peer group Build peer groups - high income, upper middle income, lower middle income, low income Build some kind of percentile rating system Identify “top performers” and “most investment needed” countries by indicator category and peer group