Overview

Data Cleaning Plan

For the initial draft, I will use 2 datasets: the World Bank Sovereign ESG dataset and the IMF WB Country Groups dataset. First, I will load each dataset and review the data. Next, I will convert the country_name column in the IMF WB Country Groups dataset into iso3c codes Then, I will join the two datasets using the iso3c codes Finally, I will have a combined dataset that includes ESG indicator data by country, with country group tags.

Data Clean-up

Load data

folder_path <- partial(here, "00_data_raw")

folder_path() %>% list.files()
## [1] "imf_wb_country_groups.csv"        "World Bank Sovereign ESG dataset"
WB_sovereign_ESG <- folder_path("World Bank Sovereign ESG dataset/ESGData2.csv") %>%
  read_csv()
## Rows: 16013 Columns: 67
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): Country Name, iso3c, Indicator Name, Indicator Code
## dbl (63): 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
imf_wb_country_groups <- folder_path("imf_wb_country_groups.csv") %>%
  read_csv()
## Rows: 2587 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): country_name, country_group, group_type
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Review the data

glimpse(WB_sovereign_ESG)
## Rows: 16,013
## Columns: 67
## $ `Country Name`   <chr> "Arab World", "Arab World", "Arab World", "Arab World…
## $ iso3c            <chr> "ARB", "ARB", "ARB", "ARB", "ARB", "ARB", "ARB", "ARB…
## $ `Indicator Name` <chr> "Access to clean fuels and technologies for cooking (…
## $ `Indicator Code` <chr> "EG.CFT.ACCS.ZS", "EG.ELC.ACCS.ZS", "NY.ADJ.DRES.GN.Z…
## $ `1960`           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `1961`           <dbl> NA, NA, NA, NA, 30.981194, NA, NA, NA, NA, NA, NA, NA…
## $ `1962`           <dbl> NA, NA, NA, NA, 30.982443, NA, NA, NA, NA, NA, NA, NA…
## $ `1963`           <dbl> NA, NA, NA, NA, 31.006834, NA, NA, NA, NA, NA, NA, NA…
## $ `1964`           <dbl> NA, NA, NA, NA, 31.01778, NA, NA, NA, NA, NA, NA, NA,…
## $ `1965`           <dbl> NA, NA, NA, NA, 31.042245, NA, NA, NA, NA, NA, NA, NA…
## $ `1966`           <dbl> NA, NA, NA, NA, 31.05018, NA, NA, NA, NA, NA, NA, NA,…
## $ `1967`           <dbl> NA, NA, NA, NA, 31.103003, NA, NA, NA, NA, NA, NA, NA…
## $ `1968`           <dbl> NA, NA, NA, NA, 31.133345, 14.660731, NA, NA, NA, NA,…
## $ `1969`           <dbl> NA, NA, NA, NA, 31.190209, 14.295950, NA, NA, NA, NA,…
## $ `1970`           <dbl> NA, NA, 6.8192031, 0.1745661, 31.2542726, 14.6376317,…
## $ `1971`           <dbl> NA, NA, 6.8625877, 0.1392865, 31.3863673, 13.9126712,…
## $ `1972`           <dbl> NA, NA, 8.9932942, 0.1320823, 31.4997277, 13.7760100,…
## $ `1973`           <dbl> NA, NA, 12.2730219, 0.1524257, 31.4965877, 12.3616551…
## $ `1974`           <dbl> NA, NA, 25.24123751, 0.08739206, 31.55058649, 7.77632…
## $ `1975`           <dbl> NA, NA, 17.17663885, 0.09875801, 31.52950124, 7.98857…
## $ `1976`           <dbl> NA, NA, 18.92687785, 0.06634076, 31.59973643, 7.44355…
## $ `1977`           <dbl> NA, NA, 19.2409638, 0.1024133, 31.6219972, 7.1412275,…
## $ `1978`           <dbl> NA, NA, 16.3069297, 0.1071046, 31.6660778, 7.1654123,…
## $ `1979`           <dbl> NA, NA, 32.75748927, 0.07422815, 31.67849388, 5.67074…
## $ `1980`           <dbl> NA, NA, 27.88409161, 0.05775981, 31.75886761, 4.75050…
## $ `1981`           <dbl> NA, NA, 20.08867813, 0.05511109, 31.45424086, 5.07641…
## $ `1982`           <dbl> NA, NA, 10.9940045, 0.1141440, 31.4801559, 5.9800629,…
## $ `1983`           <dbl> NA, NA, 9.65310503, 0.08148845, 31.52860052, 6.738849…
## $ `1984`           <dbl> NA, NA, 9.7577998, 0.0795408, 31.9421227, 7.3157059, …
## $ `1985`           <dbl> NA, NA, 8.32055276, 0.03830571, 32.44202971, 8.068573…
## $ `1986`           <dbl> NA, NA, 5.59763556, 0.08965615, 33.02639234, 9.621148…
## $ `1987`           <dbl> NA, NA, 7.36173263, 0.08506867, 33.58285192, 10.27551…
## $ `1988`           <dbl> NA, NA, 6.5508191, 0.0897014, 34.1868305, 10.5341524,…
## $ `1989`           <dbl> NA, NA, 9.05466681, 0.08732953, 34.69763715, 11.08920…
## $ `1990`           <dbl> NA, NA, 7.98566651, 0.06761466, 35.10930645, 10.38169…
## $ `1991`           <dbl> NA, NA, 8.43729462, 0.07275225, 35.15972003, 12.27688…
## $ `1992`           <dbl> NA, NA, 8.51357510, 0.05785795, 35.32097250, 9.225802…
## $ `1993`           <dbl> NA, NA, 8.14964549, 0.04460087, 36.09585061, 9.263672…
## $ `1994`           <dbl> NA, NA, 7.44516045, 0.04575975, 36.75422959, 10.22704…
## $ `1995`           <dbl> NA, NA, 7.67706491, 0.06196782, 37.38644379, 10.38116…
## $ `1996`           <dbl> NA, 76.61107378, 8.91043066, 0.05868878, 37.98259419,…
## $ `1997`           <dbl> NA, 77.25362114, 7.75296001, 0.05524078, 38.49337516,…
## $ `1998`           <dbl> NA, 78.11157597, 5.40270002, 0.07836069, 39.11826905,…
## $ `1999`           <dbl> NA, 78.69106128, 7.00074834, 0.03706546, 39.64700699,…
## $ `2000`           <dbl> 75.31875484, 80.73614121, 9.72850267, 0.02122214, 39.…
## $ `2001`           <dbl> 76.65447960, 81.58623137, 7.51233649, 0.02455567, 39.…
## $ `2002`           <dbl> 77.85548458, 81.54022203, 7.39429040, 0.02680793, 39.…
## $ `2003`           <dbl> 79.04554536, 82.50815948, 8.71632561, 0.03261723, 39.…
## $ `2004`           <dbl> 80.03416635, 82.50368475, 10.19516060, 0.02505279, 40…
## $ `2005`           <dbl> 81.02576797, 83.21298163, 12.94744287, 0.02041118, 40…
## $ `2006`           <dbl> 81.96581490, 85.45900327, 12.83170086, 0.02157661, 40…
## $ `2007`           <dbl> 82.78718917, 83.76443999, 11.44096571, 0.01803353, 40…
## $ `2008`           <dbl> 83.47573318, 83.38698965, 13.30778857, 0.02509408, 40…
## $ `2009`           <dbl> 84.15742348, 84.31348380, 8.26486020, 0.02885193, 40.…
## $ `2010`           <dbl> 84.68334427, 87.11486281, 9.45136425, 0.03003109, 40.…
## $ `2011`           <dbl> 85.1927423, 87.3326612, 13.4275913, 0.0309041, 40.172…
## $ `2012`           <dbl> 85.64421787, 87.03958839, 12.85922547, 0.03226624, 36…
## $ `2013`           <dbl> 85.93256689, 88.99261981, 11.63839653, 0.06262475, 36…
## $ `2014`           <dbl> 86.23238390, 88.01535618, 10.29789118, 0.08494782, 36…
## $ `2015`           <dbl> 86.47859724, 88.68188567, 6.23770324, 0.09978365, 36.…
## $ `2016`           <dbl> 86.72268517, 89.19506235, 5.20389537, 0.09505614, 36.…
## $ `2017`           <dbl> 86.93793290, 90.32465949, 6.48008284, 0.09549819, 36.…
## $ `2018`           <dbl> 87.04077373, 88.91074940, 8.47907008, 0.05105818, 36.…
## $ `2019`           <dbl> 87.23553889, 89.99994555, 7.43714482, 0.06219453, 36.…
## $ `2020`           <dbl> 87.30706752, 90.27773508, 4.37637424, 0.07908057, 36.…
## $ `2021`           <dbl> NA, NA, NA, NA, NA, 5.237110, NA, NA, NA, NA, NA, NA,…
## $ `2050`           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
glimpse(imf_wb_country_groups)
## Rows: 2,587
## Columns: 3
## $ country_name  <chr> "Australia", "Austria", "Belgium", "Canada", "Switzerlan…
## $ country_group <chr> "Advanced Economies", "Advanced Economies", "Advanced Ec…
## $ group_type    <chr> "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", …

Convert country names to iso3c

country_name_regex_to_iso3c <- function(country_name) {
  country_name %>%
    countrycode(origin = "country.name", 
                                     destination = "iso3c",
                                     origin_regex = TRUE)
}

imf_wb_country_groups_iso3c <- imf_wb_country_groups %>%
  mutate(iso3c = country_name_regex_to_iso3c(country_name))
## Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Channel Islands, Kosovo
imf_wb_country_groups_iso3c
## # A tibble: 2,587 × 4
##    country_name country_group      group_type iso3c
##    <chr>        <chr>              <chr>      <chr>
##  1 Australia    Advanced Economies IMF        AUS  
##  2 Austria      Advanced Economies IMF        AUT  
##  3 Belgium      Advanced Economies IMF        BEL  
##  4 Canada       Advanced Economies IMF        CAN  
##  5 Switzerland  Advanced Economies IMF        CHE  
##  6 Cyprus       Advanced Economies IMF        CYP  
##  7 Czechia      Advanced Economies IMF        CZE  
##  8 Germany      Advanced Economies IMF        DEU  
##  9 Denmark      Advanced Economies IMF        DNK  
## 10 Spain        Advanced Economies IMF        ESP  
## # … with 2,577 more rows

Join the two datasets using the iso3c country codes

WB_sovereign_ESG_country_groups <- imf_wb_country_groups_iso3c %>% left_join(WB_sovereign_ESG, by = "iso3c")

WB_sovereign_ESG_country_groups
## # A tibble: 166,201 × 70
##    country_…¹ count…² group…³ iso3c Count…⁴ Indic…⁵ Indic…⁶ `1960` `1961` `1962`
##    <chr>      <chr>   <chr>   <chr> <chr>   <chr>   <chr>    <dbl>  <dbl>  <dbl>
##  1 Australia  Advanc… IMF     AUS   Austra… Access… EG.CFT…     NA   NA     NA  
##  2 Australia  Advanc… IMF     AUS   Austra… Access… EG.ELC…     NA   NA     NA  
##  3 Australia  Advanc… IMF     AUS   Austra… Adjust… NY.ADJ…     NA   NA     NA  
##  4 Australia  Advanc… IMF     AUS   Austra… Adjust… NY.ADJ…     NA   NA     NA  
##  5 Australia  Advanc… IMF     AUS   Austra… Agricu… AG.LND…     NA   61.8   62.0
##  6 Australia  Advanc… IMF     AUS   Austra… Agricu… NV.AGR…     NA   NA     NA  
##  7 Australia  Advanc… IMF     AUS   Austra… Annual… ER.H2O…     NA   NA     NA  
##  8 Australia  Advanc… IMF     AUS   Austra… Annual… SI.SPR…     NA   NA     NA  
##  9 Australia  Advanc… IMF     AUS   Austra… Cause … SH.DTH…     NA   NA     NA  
## 10 Australia  Advanc… IMF     AUS   Austra… Childr… SL.TLF…     NA   NA     NA  
## # … with 166,191 more rows, 60 more variables: `1963` <dbl>, `1964` <dbl>,
## #   `1965` <dbl>, `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>,
## #   `1970` <dbl>, `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>,
## #   `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>,
## #   `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>,
## #   `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>,
## #   `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, …

Write joined dataset to .csv

write.csv(WB_sovereign_ESG_country_groups,"C:/Users/Gen Shiraishi/Desktop/Sustainable Finance - Application and Methods/Final project/03_data_processed\\WB_sovereign_ESG_country_groups.csv", row.names = FALSE)

Note to self: review the joined dataset and confirm that the country-level data of interest from the ESG dataset was kept through the left join function. Also check to see if the joined dataset should be pivoted to facilitate analysis.