Working With Sovereign Carbon Emissions

Author

Teal Emery

Introduction

This week we will introduce new datasets that will be useful for analyzing sovereign carbon accounting proposals. You’ll see why we spent so much time creating tidy datasets with standardized country names. Because now you can stick them together like Legos however you would like in order to come up with unique insights.

First, let’s load our packages.

library(tidyverse) # because, always
library(countrycode) # because, almost always
library(ggrepel) # great for making charts with labels that don't overlap

options(scipen=10) # forces regular notation vs scientific notation (ie5)

Reading in our data

We’re going to be using some data from the class GitHub Data repository.

Like all good programmers, we look for opportunities to do a little work that lets us be lazy. Here, all of the files are in the same folder on GitHub. So they all have the same URL except for the filename at the end: https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/emissions_dataset_full.csv

So let’s take a second to make a simple function to read in a dataset from that folder:

get_class_github_dataset <- function(file_name) {
  full_url <- paste0("https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/",file_name)
  full_url |> 
    read_csv()
}

Let’s test it out on the emissions dataset

emissions_dataset <- get_class_github_dataset("emissions_dataset_full.csv")

Rows: 6702 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): iso3c
dbl (30): year, gdp_usd_current_prices, gdp_ppp_current_prices, gdp_pc_usd_c...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

emissions_dataset

# A tibble: 6,702 × 31
   iso3c  year gdp_usd…¹ gdp_p…² gdp_p…³ gdp_p…⁴ popul…⁵ govt_…⁶ debt_…⁷ terri…⁸
   <chr> <dbl>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 AFG    1990        NA      NA      NA      NA      NA      NA      NA    2.60
 2 AFG    1991        NA      NA      NA      NA      NA      NA      NA    2.43
 3 AFG    1992        NA      NA      NA      NA      NA      NA      NA    1.38
 4 AFG    1993        NA      NA      NA      NA      NA      NA      NA    1.33
 5 AFG    1994        NA      NA      NA      NA      NA      NA      NA    1.28
 6 AFG    1995        NA      NA      NA      NA      NA      NA      NA    1.23
 7 AFG    1996        NA      NA      NA      NA      NA      NA      NA    1.16
 8 AFG    1997        NA      NA      NA      NA      NA      NA      NA    1.08
 9 AFG    1998        NA      NA      NA      NA      NA      NA      NA    1.03
10 AFG    1999        NA      NA      NA      NA      NA      NA      NA    0.81
# … with 6,692 more rows, 21 more variables: trade_co2 <dbl>,
#   consumption_co2 <dbl>, cumulative_co2 <dbl>, debt_usd <dbl>,
#   govt_expenditure_usd <dbl>, territorial_co2_per_capita <dbl>,
#   trade_co2_per_capita <dbl>, consumption_co2_per_capita <dbl>,
#   territorial_co2_per_gdp <dbl>, trade_co2_per_gdp <dbl>,
#   consumption_co2_per_gdp <dbl>, territorial_co2_per_debt <dbl>,
#   trade_co2_per_debt <dbl>, consumption_co2_per_debt <dbl>, …

This tibble contains the country weights for key sovereign debt indices.

bond_indices <- get_class_github_dataset("country_list_combined_bond_indices.csv")

Rows: 158 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): country_name, iso3c, index, em_dm
dbl (1): value_pct

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

bond_indices

# A tibble: 158 × 5
   country_name iso3c value_pct index             em_dm            
   <chr>        <chr>     <dbl> <chr>             <chr>            
 1 Australia    AUS      0.0469 Developed Markets Developed Markets
 2 Austria      AUT      0.0457 Developed Markets Developed Markets
 3 Belgium      BEL      0.0461 Developed Markets Developed Markets
 4 Canada       CAN      0.0480 Developed Markets Developed Markets
 5 Denmark      DNK      0.0471 Developed Markets Developed Markets
 6 Finland      FIN      0.0462 Developed Markets Developed Markets
 7 France       FRA      0.0816 Developed Markets Developed Markets
 8 Germany      DEU      0.0596 Developed Markets Developed Markets
 9 Ireland      IRL      0.0463 Developed Markets Developed Markets
10 Israel       ISR      0.0468 Developed Markets Developed Markets
# … with 148 more rows

This data gives the local & hard currency debt outstanding provided by the Bank for International Settlements (BIS).

bis_gov_securities <- get_class_github_dataset("country_list_bis_debt_securities.csv")

Rows: 55 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country_name, iso3c, em_dm
dbl (2): total_debt_securities, pct_of_total

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

bis_gov_securities

# A tibble: 55 × 5
   country_name   iso3c total_debt_securities pct_of_total em_dm            
   <chr>          <chr>                 <dbl>        <dbl> <chr>            
 1 United States  USA                  19115.       0.327  Developed Markets
 2 Japan          JPN                   9088.       0.155  Developed Markets
 3 China          CHN                   7056.       0.121  Emerging Markets 
 4 United Kingdom GBR                   2572.       0.0440 Developed Markets
 5 France         FRA                   2520.       0.0431 Developed Markets
 6 Italy          ITA                   2486.       0.0425 Developed Markets
 7 Germany        DEU                   2068.       0.0354 Developed Markets
 8 India          IND                   1450.       0.0248 Emerging Markets 
 9 Canada         CAN                   1411.       0.0241 Developed Markets
10 Spain          ESP                   1335.       0.0228 Developed Markets
# … with 45 more rows

This data gives you all of the country groupings from the World Bank and IMF.

imf_wb_country_groups <- get_class_github_dataset("imf_wb_country_groups.csv")

Rows: 2587 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country_name, country_group, group_type

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

imf_wb_country_groups

# A tibble: 2,587 × 3
   country_name country_group      group_type
   <chr>        <chr>              <chr>     
 1 Australia    Advanced Economies IMF       
 2 Austria      Advanced Economies IMF       
 3 Belgium      Advanced Economies IMF       
 4 Canada       Advanced Economies IMF       
 5 Switzerland  Advanced Economies IMF       
 6 Cyprus       Advanced Economies IMF       
 7 Czechia      Advanced Economies IMF       
 8 Germany      Advanced Economies IMF       
 9 Denmark      Advanced Economies IMF       
10 Spain        Advanced Economies IMF       
# … with 2,577 more rows

country_features <- get_class_github_dataset("country_features_2022-10.csv")

Rows: 217 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): country_name, iso3c, wb_income_group, wb_region
dbl (4): debt_gross_percent_of_gdp, nominal_gdp_bn_ppp, nominal_gdp_per_capi...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

country_features

# A tibble: 217 × 8
   country_name         iso3c wb_incom…¹ wb_re…² debt_…³ nomin…⁴ nomin…⁵ popul…⁶
   <chr>                <chr> <chr>      <chr>     <dbl>   <dbl>   <dbl>   <dbl>
 1 Aruba                ABW   High       Latin …   95.0     5.07  46309.   0.109
 2 Afghanistan          AFG   Low        South …    7.40   80.9    2456.  32.9  
 3 Angola               AGO   Lower Mid… Sub-Sa…   56.6   245.     7455.  32.9  
 4 Albania              ALB   Upper Mid… Europe…   70.3    51.2   17858.   2.87 
 5 Andorra              AND   High       Europe…   43.0     5.30  65372.   0.081
 6 United Arab Emirates ARE   High       Middle…   30.7   815.    77272.  10.5  
 7 Argentina            ARG   Upper Mid… Latin …   76.0  1207.    26074.  46.3  
 8 Armenia              ARM   Upper Mid… Europe…   52.3    49.8   16798    2.96 
 9 American Samoa       ASM   Upper Mid… East A…   NA      NA        NA   NA    
10 Antigua & Barbuda    ATG   High       Latin …   91.2     2.22  22070.   0.101
# … with 207 more rows, and abbreviated variable names ¹wb_income_group,
#   ²wb_region, ³debt_gross_percent_of_gdp, ⁴nominal_gdp_bn_ppp,
#   ⁵nominal_gdp_per_capita_ppp, ⁶population_mn

Taking a Look at Our Datasets

Emissions Datasets

The main emissions dataset is created from 2 other datasets.

Carbon data comes from OWID, and the population and macroeconomic data come from the IMF. Many of the variables you see below are “feature engineered” from the base data using the dply::mutate() function as we’ve done elsewhere. For example, consumption_co2_per_capita is caluclated by dividing the consumption carbon emissions figure from OWID by the population data provided by the IMF.

I’m a big fan of doing ones own calculation of figures like this from base data that you trust and know the source of. If you see differences in a per capita or per GDP figure, you can get a transparent sense of whether that difference is because of the numerator or the denominator.

Here are the numerators:

Territorial Emissions: the Co2 emissions created in a country’s physical territory.
Trade Emissions: Emissions imported or exported in trade with other countries.
Consumption Emissions: Emissions resulting from territorial emissions + Net trade emissions.
Cumulative Emissions: Cumulative territorial emissions.

Here are the denominators:

Population
GDP
Debt
Government Expenditure

Current GFANZ guidance for “Net Zero” investing in sovereign debt suggests using territorial emissions per capita.

This version of the emissions dataset has only the countries that are in the bond indices we’re using. We also have the other countries in emissions_dataset_full should you want to use them.

emissions_dataset |> glimpse()

Rows: 6,702
Columns: 31
$ iso3c                                <chr> "AFG", "AFG", "AFG", "AFG", "AFG"…
$ year                                 <dbl> 1990, 1991, 1992, 1993, 1994, 199…
$ gdp_usd_current_prices               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ gdp_ppp_current_prices               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ gdp_pc_usd_current_prices            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ gdp_pc_ppp_current_prices            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ population                           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ govt_expenditure_pct_gdp             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ debt_pct_gdp                         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ territorial_co2                      <dbl> 2.603, 2.427, 1.379, 1.333, 1.282…
$ trade_co2                            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ consumption_co2                      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ cumulative_co2                       <dbl> 59.182, 61.610, 62.989, 64.322, 6…
$ debt_usd                             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ govt_expenditure_usd                 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ territorial_co2_per_capita           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trade_co2_per_capita                 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ consumption_co2_per_capita           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ territorial_co2_per_gdp              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trade_co2_per_gdp                    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ consumption_co2_per_gdp              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ territorial_co2_per_debt             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trade_co2_per_debt                   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ consumption_co2_per_debt             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ debt_usd_per_capita                  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ territorial_co2_per_govt_expenditure <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trade_co2_per_govt_expenditure       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ consumption_co2_govt_expenditure     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ govt_expenditure_per_capita          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trade_pct_of_consumption_co2         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ cumulative_co2_per_capita            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…

Bond Indices

This data comes from BlackRock government bond ETFs. It’s useful because it can help us clarify how different decisions would affect asset allocation decisions between asset classes.

bond_indices |> glimpse()

Rows: 158
Columns: 5
$ country_name <chr> "Australia", "Austria", "Belgium", "Canada", "Denmark", "…
$ iso3c        <chr> "AUS", "AUT", "BEL", "CAN", "DNK", "FIN", "FRA", "DEU", "…
$ value_pct    <dbl> 0.04692378933177, 0.04572291615032, 0.04605351394130, 0.0…
$ index        <chr> "Developed Markets", "Developed Markets", "Developed Mark…
$ em_dm        <chr> "Developed Markets", "Developed Markets", "Developed Mark…

One challenge with using real-world bond indices is that they generally don’t mix emerging markets and developed markets. And because part of what we’re looking at is the relative impact of different carbon intensity measures on EM and DM, it would be nice to have an index that covers both. Also, there are some countries, like China & India, for example, that are very large economies but have relatively low weights in government bond indices.

So we’re making a synthetic bond benchmark using the Bank for International Settlements (BIS) data on outstanding sovereign bonds. The logic is simple. Most investment benchmarks are based on market capitalization weighting (sometimes modified slightly to ensure diversification). We create an integrated EM/DM index using the total debt securities outstanding to determine the weights.

If the data you would like does not exist, figure out the best proxy. Keep it transparent and make sure you can justify to others why the assumptions are reasonable and the process adds value to the analysis.

bis_gov_securities |> glimpse()

Rows: 55
Columns: 5
$ country_name          <chr> "United States", "Japan", "China", "United Kingd…
$ iso3c                 <chr> "USA", "JPN", "CHN", "GBR", "FRA", "ITA", "DEU",…
$ total_debt_securities <dbl> 19115.4810, 9088.4047, 7056.0582, 2571.9160, 251…
$ pct_of_total          <dbl> 0.326874485, 0.155411606, 0.120658507, 0.0439797…
$ em_dm                 <chr> "Developed Markets", "Developed Markets", "Emerg…

Helpful Features

This dataset has all of the IMF & World Bank country groupings

imf_wb_country_groups |> glimpse()

Rows: 2,587
Columns: 3
$ country_name  <chr> "Australia", "Austria", "Belgium", "Canada", "Switzerlan…
$ country_group <chr> "Advanced Economies", "Advanced Economies", "Advanced Ec…
$ group_type    <chr> "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", …

imf_wb_country_groups |>
  group_by(country_group, group_type) |>
  count()

# A tibble: 67 × 3
# Groups:   country_group, group_type [67]
   country_group                    group_type     n
   <chr>                            <chr>      <int>
 1 Advanced Economies               IMF           39
 2 Advanced G20                     IMF            9
 3 Africa Eastern and Southern      World Bank    26
 4 Africa Western and Central       World Bank    22
 5 Arab World                       World Bank    22
 6 Caribbean small states           World Bank    13
 7 Central Europe and the Baltics   World Bank    11
 8 Early-demographic dividend       World Bank    62
 9 East Asia & Pacific              World Bank    38
10 East Asia & Pacific (IDA & IBRD) World Bank    23
# … with 57 more rows

And this is the country features dataset we looked at last week.

country_features |> glimpse()

Rows: 217
Columns: 8
$ country_name               <chr> "Aruba", "Afghanistan", "Angola", "Albania"…
$ iso3c                      <chr> "ABW", "AFG", "AGO", "ALB", "AND", "ARE", "…
$ wb_income_group            <chr> "High", "Low", "Lower Middle", "Upper Middl…
$ wb_region                  <chr> "Latin America & Caribbean", "South Asia", …
$ debt_gross_percent_of_gdp  <dbl> 95.042, 7.397, 56.564, 70.278, 43.050, 30.7…
$ nominal_gdp_bn_ppp         <dbl> 5.066, 80.912, 245.442, 51.189, 5.301, 814.…
$ nominal_gdp_per_capita_ppp <dbl> 46308.66, 2456.29, 7455.47, 17858.42, 65371…
$ population_mn              <dbl> 0.109, 32.941, 32.921, 2.866, 0.081, 10.544…

The Latest Data

The latest data is from 2019. It’s very common for international environmental data to have significant time lags. Let’s create a dataset that just focus on the latest data by filtering for year==2019. We’re also going to select just the data we want to look at here, and pivot it longer into tidy format.

emissions_dataset_2019 <- emissions_dataset |>
  filter(year == 2019) |>
  select(-year) |>
  pivot_longer(gdp_usd_current_prices:cumulative_co2_per_capita)
  

emissions_dataset_2019

# A tibble: 6,496 × 3
   iso3c name                        value
   <chr> <chr>                       <dbl>
 1 AFG   gdp_usd_current_prices      18.9 
 2 AFG   gdp_ppp_current_prices      81.9 
 3 AFG   gdp_pc_usd_current_prices  586.  
 4 AFG   gdp_pc_ppp_current_prices 2543.  
 5 AFG   population                  32.2 
 6 AFG   govt_expenditure_pct_gdp    28.0 
 7 AFG   debt_pct_gdp                 6.13
 8 AFG   territorial_co2             12.1 
 9 AFG   trade_co2                   NA   
10 AFG   consumption_co2             NA   
# … with 6,486 more rows

We use our friend left_join() to connect these datasets like legos.

bond_indices_w_data <- bond_indices |> 
  left_join(emissions_dataset_2019, by = c("iso3c")) |>
  rename(index_pct = value_pct, variable = name)

Warning in left_join(bond_indices, emissions_dataset_2019, by = c("iso3c")): Each row in `x` is expected to match at most 1 row in `y`.
ℹ Row 1 of `x` matches multiple rows.
ℹ If multiple matches are expected, set `multiple = "all"` to silence this
  warning.

bond_indices_w_data

# A tibble: 4,582 × 7
   country_name iso3c index_pct index             em_dm          varia…¹   value
   <chr>        <chr>     <dbl> <chr>             <chr>          <chr>     <dbl>
 1 Australia    AUS      0.0469 Developed Markets Developed Mar… gdp_us…  1392. 
 2 Australia    AUS      0.0469 Developed Markets Developed Mar… gdp_pp…  1346. 
 3 Australia    AUS      0.0469 Developed Markets Developed Mar… gdp_pc… 54477. 
 4 Australia    AUS      0.0469 Developed Markets Developed Mar… gdp_pc… 52675. 
 5 Australia    AUS      0.0469 Developed Markets Developed Mar… popula…    25.6
 6 Australia    AUS      0.0469 Developed Markets Developed Mar… govt_e…    38.9
 7 Australia    AUS      0.0469 Developed Markets Developed Mar… debt_p…    46.6
 8 Australia    AUS      0.0469 Developed Markets Developed Mar… territ…   415. 
 9 Australia    AUS      0.0469 Developed Markets Developed Mar… trade_…   -39.1
10 Australia    AUS      0.0469 Developed Markets Developed Mar… consum…   375. 
# … with 4,572 more rows, and abbreviated variable name ¹variable

Calculated Benchmark Weighted Statistics

At the highest level, we’re trying to find the forest for the trees. Each investment benchmark has a lot of countries. How can we describe the characteristics of the entire benchmark? Most commonly, we do this using weighted averages. For an investment benchmark we use the sum of weight % multiplied by the figure in question (population, GDP, emissions, etc.).

First: Check The Data

bond_indices_w_data |>
  # filter for NA values)
  filter(is.na(value)) |>
  # filter for when the variable includes "trade" or "consumption"
  filter(str_detect(variable,pattern ="trade|consumption"))

# A tibble: 121 × 7
   country_name iso3c index_pct index               em_dm          varia…¹ value
   <chr>        <chr>     <dbl> <chr>               <chr>          <chr>   <dbl>
 1 Serbia       SRB      0.0451 EM - Local Currency Emerging Mark… trade_…    NA
 2 Serbia       SRB      0.0451 EM - Local Currency Emerging Mark… consum…    NA
 3 Serbia       SRB      0.0451 EM - Local Currency Emerging Mark… trade_…    NA
 4 Serbia       SRB      0.0451 EM - Local Currency Emerging Mark… consum…    NA
 5 Serbia       SRB      0.0451 EM - Local Currency Emerging Mark… trade_…    NA
 6 Serbia       SRB      0.0451 EM - Local Currency Emerging Mark… consum…    NA
 7 Serbia       SRB      0.0451 EM - Local Currency Emerging Mark… trade_…    NA
 8 Serbia       SRB      0.0451 EM - Local Currency Emerging Mark… consum…    NA
 9 Serbia       SRB      0.0451 EM - Local Currency Emerging Mark… trade_…    NA
10 Serbia       SRB      0.0451 EM - Local Currency Emerging Mark… consum…    NA
# … with 111 more rows, and abbreviated variable name ¹variable

missing_trade_data <- bond_indices_w_data |>
  filter(is.na(value)) |>
  filter(str_detect(variable,pattern ="trade|consumption")) |>
  select(country_name, index_pct, index, em_dm) |>
  unique() 

missing_trade_data

# A tibble: 11 × 4
   country_name index_pct index               em_dm           
   <chr>            <dbl> <chr>               <chr>           
 1 Serbia         0.0451  EM - Local Currency Emerging Markets
 2 Angola         0.0119  EM - Hard Currency  Emerging Markets
 3 Gabon          0.00146 EM - Hard Currency  Emerging Markets
 4 Iraq           0.00410 EM - Hard Currency  Emerging Markets
 5 Lebanon        0.00138 EM - Hard Currency  Emerging Markets
 6 Serbia         0.00157 EM - Hard Currency  Emerging Markets
 7 Angola         0.0141  EM - USD (sov only) Emerging Markets
 8 Gabon          0.00172 EM - USD (sov only) Emerging Markets
 9 Iraq           0.00485 EM - USD (sov only) Emerging Markets
10 Lebanon        0.00163 EM - USD (sov only) Emerging Markets
11 Serbia         0.00186 EM - USD (sov only) Emerging Markets

How much coverage do we have?

missing_trade_data |>
  group_by(index) |>
  summarize(pct_of_index_missing = sum(index_pct) * 100,
            pct_of_index_complete = 100 - pct_of_index_missing)

# A tibble: 3 × 3
  index               pct_of_index_missing pct_of_index_complete
  <chr>                              <dbl>                 <dbl>
1 EM - Hard Currency                  2.05                  98.0
2 EM - Local Currency                 4.51                  95.5
3 EM - USD (sov only)                 2.42                  97.6

We’ve got 95%+ coverage. This is pretty good! We can work with that.

What other data is missing?

bond_indices_w_data |>
  filter(is.na(value)) |>
  # use the ! (not) sign to search for anythinge except what we searched for above
  filter(!str_detect(variable,pattern ="trade|consumption")) |>
  select(country_name, index_pct, index, em_dm) |>
  unique()

# A tibble: 0 × 4
# … with 4 variables: country_name <chr>, index_pct <dbl>, index <chr>,
#   em_dm <chr>

And it looks like nothing else is missing.

Reweighting the indices

Do the current index weighst sum to 1? yep.

bond_indices_w_data |>
  select(country_name, index_pct, index) |>
  unique() |>
  group_by(index) |>
  summarize(index_total_weight = sum(index_pct))

# A tibble: 4 × 2
  index               index_total_weight
  <chr>                            <dbl>
1 Developed Markets                    1
2 EM - Hard Currency                   1
3 EM - Local Currency                  1
4 EM - USD (sov only)                  1

Take out the missing data, and re-calculate the weights

bond_indices_w_data_reweighted <- bond_indices_w_data |>
  filter(!is.na(value)) |>
  group_by(em_dm, index, variable) |>
  mutate(index_pct = index_pct/sum(index_pct)) |>
  ungroup()

bond_indices_w_data_reweighted

# A tibble: 4,461 × 7
   country_name iso3c index_pct index             em_dm          varia…¹   value
   <chr>        <chr>     <dbl> <chr>             <chr>          <chr>     <dbl>
 1 Australia    AUS      0.0469 Developed Markets Developed Mar… gdp_us…  1392. 
 2 Australia    AUS      0.0469 Developed Markets Developed Mar… gdp_pp…  1346. 
 3 Australia    AUS      0.0469 Developed Markets Developed Mar… gdp_pc… 54477. 
 4 Australia    AUS      0.0469 Developed Markets Developed Mar… gdp_pc… 52675. 
 5 Australia    AUS      0.0469 Developed Markets Developed Mar… popula…    25.6
 6 Australia    AUS      0.0469 Developed Markets Developed Mar… govt_e…    38.9
 7 Australia    AUS      0.0469 Developed Markets Developed Mar… debt_p…    46.6
 8 Australia    AUS      0.0469 Developed Markets Developed Mar… territ…   415. 
 9 Australia    AUS      0.0469 Developed Markets Developed Mar… trade_…   -39.1
10 Australia    AUS      0.0469 Developed Markets Developed Mar… consum…   375. 
# … with 4,451 more rows, and abbreviated variable name ¹variable

Calculated Benchmark Weighted Statistics

Now we can calculate the benchmark weighted for everything in just a few lines.

bond_indices_weighted_avg <- bond_indices_w_data_reweighted |>
  mutate(contribution = index_pct * value) |>
  group_by(index, variable) |>
  summarize(weighted_avg = sum(contribution)) |>
  ungroup() |>
  arrange(variable, weighted_avg)

`summarise()` has grouped output by 'index'. You can override using the
`.groups` argument.

bond_indices_weighted_avg

# A tibble: 116 × 3
   index               variable                         weighted_avg
   <chr>               <chr>                                   <dbl>
 1 Developed Markets   consumption_co2                     374.     
 2 EM - USD (sov only) consumption_co2                     401.     
 3 EM - Hard Currency  consumption_co2                     636.     
 4 EM - Local Currency consumption_co2                    1470.     
 5 Developed Markets   consumption_co2_govt_expenditure      0.00562
 6 EM - Local Currency consumption_co2_govt_expenditure      0.0172 
 7 EM - USD (sov only) consumption_co2_govt_expenditure      0.0186 
 8 EM - Hard Currency  consumption_co2_govt_expenditure      0.0193 
 9 EM - Local Currency consumption_co2_per_capita            4.92   
10 EM - USD (sov only) consumption_co2_per_capita            7.13   
# … with 106 more rows

bond_indices_weighted_avg |>
  summarize(n_index = n_distinct(index),
            n_variable = n_distinct(variable)) |>
  mutate(total_n = n_index * n_variable)

# A tibble: 1 × 3
  n_index n_variable total_n
    <int>      <int>   <int>
1       4         29     116

Let’s just stop to think about this for a minute. Tidy datasets like the one above are not intuitive to work with at first. But there are some great reasons why we learn to love them. And this is one of them. We just calculated 108 different weighted averages in just a few lines of code. Doing this in Excel would be unfun.

Comparing Numerators

We’ll use a workflow similar to what we’ve used before. We’ll pivot tidy data into wide data so that we can compare two groups.

numerator_compare <- bond_indices_weighted_avg |>
  filter(variable == "consumption_co2_per_capita"|
           variable == "territorial_co2_per_capita") |>
  pivot_wider(names_from = variable, values_from = weighted_avg) |>
  mutate(pct_delta_from_consumption = (territorial_co2_per_capita/consumption_co2_per_capita - 1) * 100)

numerator_compare

# A tibble: 4 × 4
  index               consumption_co2_per_capita territorial_co2_per_c…¹ pct_d…²
  <chr>                                    <dbl>                   <dbl>   <dbl>
1 EM - Local Currency                       4.92                    5.12    4.18
2 EM - USD (sov only)                       7.13                    7.69    7.82
3 EM - Hard Currency                        7.32                    7.92    8.14
4 Developed Markets                         9.93                    7.81  -21.3 
# … with abbreviated variable names ¹territorial_co2_per_capita,
#   ²pct_delta_from_consumption

numerator_compare |>
  ggplot(aes(x = territorial_co2_per_capita, y = consumption_co2_per_capita, size = abs(pct_delta_from_consumption), label = index, alpha = abs(pct_delta_from_consumption))) +
  geom_point() +
  geom_text_repel() +
  labs(size = "",
       alpha = "") +
  geom_abline(intercept = 0, lty = "dashed", alpha = .6) +
  scale_x_continuous(limits = c(0,10)) +
  scale_y_continuous(limits = c(0,10))

We’ve made the size aesthetic map to the absolute value of the percentage difference. What we can see is that

developed markets are much more carbon intensive from a consumption perspective than a territorial perspective
EM hard currency debt is the opposite.
EM local currency debt is about the same in both.

This kind of chart brings up a lot of questions that we can explore in the data later.

Squeezing Out More Insights

What percent of consumption emissions come from net imports?

Well, for Developed Markets its pretty high. This is relevant because the current policy proposals suggest using territorial emissions. We can see that the main policy implications of that decision is to make Developed Markets look a lot cleaner.

bond_indices_weighted_avg |>
  filter(variable == "trade_pct_of_consumption_co2") |>
  mutate(weighted_avg = weighted_avg * 100)

# A tibble: 4 × 3
  index               variable                     weighted_avg
  <chr>               <chr>                               <dbl>
1 EM - Hard Currency  trade_pct_of_consumption_co2       0.0733
2 EM - USD (sov only) trade_pct_of_consumption_co2       0.986 
3 EM - Local Currency trade_pct_of_consumption_co2       1.23  
4 Developed Markets   trade_pct_of_consumption_co2      20.3

The DM benchmark is 2-2.5x richer than the EM benchmarks

bond_indices_weighted_avg |>
  filter(variable == "gdp_pc_ppp_current_prices")

# A tibble: 4 × 3
  index               variable                  weighted_avg
  <chr>               <chr>                            <dbl>
1 EM - Local Currency gdp_pc_ppp_current_prices       22003.
2 EM - USD (sov only) gdp_pc_ppp_current_prices       26909.
3 EM - Hard Currency  gdp_pc_ppp_current_prices       27326.
4 Developed Markets   gdp_pc_ppp_current_prices       54770.

this matters because the only difference between per capita and per GDP calculations is that the denominator is multiplied by GDP per capita.

emission per capita = emissions/population

emission per GDP = emissions/GDP = emissions/(population * GDP per capita).

Think about what this means. Because it inflates the denominator by GDP per capita, it’s basically lowering the resulting emission intensity by a country’s level of income. Morally questionable? Hmmm.

BIS Securities

Now you’re going to play around with the BIS data we showed earlier.

bis_gov_securities |> glimpse()

Rows: 55
Columns: 5
$ country_name          <chr> "United States", "Japan", "China", "United Kingd…
$ iso3c                 <chr> "USA", "JPN", "CHN", "GBR", "FRA", "ITA", "DEU",…
$ total_debt_securities <dbl> 19115.4810, 9088.4047, 7056.0582, 2571.9160, 251…
$ pct_of_total          <dbl> 0.326874485, 0.155411606, 0.120658507, 0.0439797…
$ em_dm                 <chr> "Developed Markets", "Developed Markets", "Emerg…

Putting Together The Data

“Developed Markets” and “Emerging Markets” are ultimately marketing constructs to describe investment benchmarks. They are nebulous and somewhat arbitrary.

So let’s also look at the IMF’s classification of Advanced Economies and Emerging Market Economies. It overlaps mostly, but is less arbitrary. And it may be more meaningful to policymakers, who are another audience for this analysis.

imf_ae_eme <- imf_wb_country_groups |>
  filter(country_group %in% c("Advanced Economies", "Emerging Market Economies")) |>
  select(-group_type)

imf_ae_eme

# A tibble: 135 × 2
   country_name country_group     
   <chr>        <chr>             
 1 Australia    Advanced Economies
 2 Austria      Advanced Economies
 3 Belgium      Advanced Economies
 4 Canada       Advanced Economies
 5 Switzerland  Advanced Economies
 6 Cyprus       Advanced Economies
 7 Czechia      Advanced Economies
 8 Germany      Advanced Economies
 9 Denmark      Advanced Economies
10 Spain        Advanced Economies
# … with 125 more rows

We’ll add togeher a variety of datasets here so we have lots of features to play around with

bis_emissions <- bis_gov_securities |>
  left_join(imf_ae_eme) |>
  rename(imf_ae_eme = country_group) |>
  mutate(region = countrycode(iso3c, origin = "iso3c", destination = "un.region.name"),
         sub_region = countrycode(iso3c, origin = "iso3c", destination = "un.regionsub.name")) |>
  left_join(country_features) |>
  left_join(emissions_dataset_2019, by = c("iso3c")) |>
  rename(characteristic = name)

Joining with `by = join_by(country_name)`

Warning: There were 2 warnings in `mutate()`.
The first warning was:
ℹ In argument: `region = countrycode(iso3c, origin = "iso3c", destination =
  "un.region.name")`.
Caused by warning in `countrycode_convert()`:
! Some values were not matched unambiguously: TWN
ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.

Joining with `by = join_by(country_name, iso3c)`

Warning in left_join(left_join(mutate(rename(left_join(bis_gov_securities, : Each row in `x` is expected to match at most 1 row in `y`.
ℹ Row 1 of `x` matches multiple rows.
ℹ If multiple matches are expected, set `multiple = "all"` to silence this
  warning.

bis_emissions

# A tibble: 1,595 × 16
   country_…¹ iso3c total…² pct_o…³ em_dm imf_a…⁴ region sub_r…⁵ wb_in…⁶ wb_re…⁷
   <chr>      <chr>   <dbl>   <dbl> <chr> <chr>   <chr>  <chr>   <chr>   <chr>  
 1 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 2 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 3 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 4 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 5 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 6 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 7 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 8 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 9 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
10 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
# … with 1,585 more rows, 6 more variables: debt_gross_percent_of_gdp <dbl>,
#   nominal_gdp_bn_ppp <dbl>, nominal_gdp_per_capita_ppp <dbl>,
#   population_mn <dbl>, characteristic <chr>, value <dbl>, and abbreviated
#   variable names ¹country_name, ²total_debt_securities, ³pct_of_total,
#   ⁴imf_ae_eme, ⁵sub_region, ⁶wb_income_group, ⁷wb_region

We get the following warning: Some values were not matched unambiguously: TWN

Country and territory names and classifications are geopolitical. These datasets often come from the UN System, and China doesn’t want Taiwan included as a country.

So for our analysis, we’ll add the region data manually.

bis_emissions_clean <- bis_emissions |>
  # case_when() is a more flexible if_else().  It's great.  learn it. 
  mutate(region = case_when(iso3c == "TWN" ~ "Asia",
                            TRUE ~ region),
         sub_region = case_when(iso3c == "TWN" ~ "Eastern Asia",
                            TRUE ~ sub_region))

bis_emissions_clean

# A tibble: 1,595 × 16
   country_…¹ iso3c total…² pct_o…³ em_dm imf_a…⁴ region sub_r…⁵ wb_in…⁶ wb_re…⁷
   <chr>      <chr>   <dbl>   <dbl> <chr> <chr>   <chr>  <chr>   <chr>   <chr>  
 1 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 2 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 3 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 4 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 5 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 6 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 7 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 8 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 9 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
10 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
# … with 1,585 more rows, 6 more variables: debt_gross_percent_of_gdp <dbl>,
#   nominal_gdp_bn_ppp <dbl>, nominal_gdp_per_capita_ppp <dbl>,
#   population_mn <dbl>, characteristic <chr>, value <dbl>, and abbreviated
#   variable names ¹country_name, ²total_debt_securities, ³pct_of_total,
#   ⁴imf_ae_eme, ⁵sub_region, ⁶wb_income_group, ⁷wb_region

Now instead of re-writing code. We can use the function we wrote in a previous exercise. Normally you might put this function in a functions.r file for easy future use. And eventually you can make your own R package.

add_rank_features <- function(data, value_var = value, rank_by_lowest = TRUE, 
                              quantile_n = 5) {
  

  # If rank_by_lowest is set to TRUE, as it is by default, then it will use this first code chunk 
if (rank_by_lowest)
  data |>
    mutate("{{value_var}}_z_score" := ({{ value_var }}-mean({{ value_var }}))/
             sd({{ value_var }}),
         "{{value_var}}_rank" := rank({{ value_var }}), 
         "{{value_var}}_percentile" := percent_rank({{ value_var }}),
         "{{value_var}}_quantile_{{quantile_n}}" := ntile({{ value_var }}, 
                                                          {{ quantile_n }}),
         "{{value_var}}_pct_total" := {{ value_var }}/sum({{ value_var }}))
  
   # If rank_by_lowest is set to FALSE, as it is by default, then it will use this second code chunk
else
  
  data |>
    mutate("{{value_var}}_z_score" := ({{ value_var }}-mean({{ value_var }}))/
             sd({{ value_var }}) * -1, # multiply by -1
           # put negative sign
           "{{value_var}}_rank" := rank(-{{ value_var }}), 
           # put negative sign
           "{{value_var}}_percentile" := percent_rank(-{{ value_var }}),
           # put negative sign
           "{{value_var}}_quantile_{{quantile_n}}" := ntile(-{{ value_var }}, 
                                                            {{ quantile_n }}), 
           # no change, not relevant for pct_total
           "{{value_var}}_pct_total" := {{ value_var }}/sum({{ value_var }})) 
    
}

bis_emissions_w_features <- bis_emissions_clean |>
  group_by(characteristic) |>
  add_rank_features(rank_by_lowest = FALSE) |>
  ungroup()

bis_emissions_w_features

# A tibble: 1,595 × 21
   country_…¹ iso3c total…² pct_o…³ em_dm imf_a…⁴ region sub_r…⁵ wb_in…⁶ wb_re…⁷
   <chr>      <chr>   <dbl>   <dbl> <chr> <chr>   <chr>  <chr>   <chr>   <chr>  
 1 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 2 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 3 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 4 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 5 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 6 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 7 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 8 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
 9 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
10 United St… USA    19115.   0.327 Deve… Advanc… Ameri… Northe… High    North …
# … with 1,585 more rows, 11 more variables: debt_gross_percent_of_gdp <dbl>,
#   nominal_gdp_bn_ppp <dbl>, nominal_gdp_per_capita_ppp <dbl>,
#   population_mn <dbl>, characteristic <chr>, value <dbl>,
#   value_z_score <dbl>, value_rank <dbl>, value_percentile <dbl>,
#   value_quantile_5 <int>, value_pct_total <dbl>, and abbreviated variable
#   names ¹country_name, ²total_debt_securities, ³pct_of_total, ⁴imf_ae_eme,
#   ⁵sub_region, ⁶wb_income_group, ⁷wb_region

bis_emissions_w_features |> glimpse()

Rows: 1,595
Columns: 21
$ country_name               <chr> "United States", "United States", "United S…
$ iso3c                      <chr> "USA", "USA", "USA", "USA", "USA", "USA", "…
$ total_debt_securities      <dbl> 19115.48, 19115.48, 19115.48, 19115.48, 191…
$ pct_of_total               <dbl> 0.3268745, 0.3268745, 0.3268745, 0.3268745,…
$ em_dm                      <chr> "Developed Markets", "Developed Markets", "…
$ imf_ae_eme                 <chr> "Advanced Economies", "Advanced Economies",…
$ region                     <chr> "Americas", "Americas", "Americas", "Americ…
$ sub_region                 <chr> "Northern America", "Northern America", "No…
$ wb_income_group            <chr> "High", "High", "High", "High", "High", "Hi…
$ wb_region                  <chr> "North America", "North America", "North Am…
$ debt_gross_percent_of_gdp  <dbl> 122.103, 122.103, 122.103, 122.103, 122.103…
$ nominal_gdp_bn_ppp         <dbl> 25035.16, 25035.16, 25035.16, 25035.16, 250…
$ nominal_gdp_per_capita_ppp <dbl> 75179.59, 75179.59, 75179.59, 75179.59, 751…
$ population_mn              <dbl> 333.005, 333.005, 333.005, 333.005, 333.005…
$ characteristic             <chr> "gdp_usd_current_prices", "gdp_ppp_current_…
$ value                      <dbl> 21372.6000000000, 21372.6000000000, 65051.8…
$ value_z_score              <dbl> -5.78563491, -4.45319392, -1.38198117, -1.0…
$ value_rank                 <dbl> 1, 2, 6, 6, 3, 32, 6, 2, 1, 2, 1, 1, 1, 3, …
$ value_percentile           <dbl> 0.00000000, 0.01851852, 0.09259259, 0.09259…
$ value_quantile_5           <int> 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3…
$ value_pct_total            <dbl> 0.2659056825, 0.1819383748, 0.0375239280, 0…

Now it’s your turn

You have a lot of features to play around with.

Get a sense for the data. What is the geographic dispersion of countries? What income groups?
What does the ranking of countries look like? How concentrated is the data.
Come up with interesting ways of illustrating (visually or in a factoid) how the changes in numerators and denominators creates winners and losers.

Sorted bar charts

bond_indices_weighted_avg |>
  filter(variable == "gdp_pc_ppp_current_prices") |>
  ggplot(aes(x = weighted_avg, y = fct_reorder(.f = index, .x = weighted_avg, .desc = TRUE))) +
  geom_bar(stat = "identity")