library(tidyverse) # because, always
library(countrycode) # because, almost always
library(ggrepel) # great for making charts with labels that don't overlap
options(scipen=10) # forces regular notation vs scientific notation (ie5)Working With Sovereign Carbon Emissions
Introduction
This week we will introduce new datasets that will be useful for analyzing sovereign carbon accounting proposals. You’ll see why we spent so much time creating tidy datasets with standardized country names. Because now you can stick them together like Legos however you would like in order to come up with unique insights.
First, let’s load our packages.
Reading in our data
We’re going to be using some data from the class GitHub Data repository.
Like all good programmers, we look for opportunities to do a little work that lets us be lazy. Here, all of the files are in the same folder on GitHub. So they all have the same URL except for the filename at the end: https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/emissions_dataset_full.csv
So let’s take a second to make a simple function to read in a dataset from that folder:
get_class_github_dataset <- function(file_name) {
full_url <- paste0("https://raw.githubusercontent.com/t-emery/sais-susfin_data/main/datasets/",file_name)
full_url |>
read_csv()
}Let’s test it out on the emissions dataset
emissions_dataset <- get_class_github_dataset("emissions_dataset_full.csv") Rows: 6702 Columns: 31
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): iso3c
dbl (30): year, gdp_usd_current_prices, gdp_ppp_current_prices, gdp_pc_usd_c...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
emissions_dataset# A tibble: 6,702 × 31
iso3c year gdp_usd…¹ gdp_p…² gdp_p…³ gdp_p…⁴ popul…⁵ govt_…⁶ debt_…⁷ terri…⁸
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AFG 1990 NA NA NA NA NA NA NA 2.60
2 AFG 1991 NA NA NA NA NA NA NA 2.43
3 AFG 1992 NA NA NA NA NA NA NA 1.38
4 AFG 1993 NA NA NA NA NA NA NA 1.33
5 AFG 1994 NA NA NA NA NA NA NA 1.28
6 AFG 1995 NA NA NA NA NA NA NA 1.23
7 AFG 1996 NA NA NA NA NA NA NA 1.16
8 AFG 1997 NA NA NA NA NA NA NA 1.08
9 AFG 1998 NA NA NA NA NA NA NA 1.03
10 AFG 1999 NA NA NA NA NA NA NA 0.81
# … with 6,692 more rows, 21 more variables: trade_co2 <dbl>,
# consumption_co2 <dbl>, cumulative_co2 <dbl>, debt_usd <dbl>,
# govt_expenditure_usd <dbl>, territorial_co2_per_capita <dbl>,
# trade_co2_per_capita <dbl>, consumption_co2_per_capita <dbl>,
# territorial_co2_per_gdp <dbl>, trade_co2_per_gdp <dbl>,
# consumption_co2_per_gdp <dbl>, territorial_co2_per_debt <dbl>,
# trade_co2_per_debt <dbl>, consumption_co2_per_debt <dbl>, …
This tibble contains the country weights for key sovereign debt indices.
bond_indices <- get_class_github_dataset("country_list_combined_bond_indices.csv")Rows: 158 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): country_name, iso3c, index, em_dm
dbl (1): value_pct
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
bond_indices # A tibble: 158 × 5
country_name iso3c value_pct index em_dm
<chr> <chr> <dbl> <chr> <chr>
1 Australia AUS 0.0469 Developed Markets Developed Markets
2 Austria AUT 0.0457 Developed Markets Developed Markets
3 Belgium BEL 0.0461 Developed Markets Developed Markets
4 Canada CAN 0.0480 Developed Markets Developed Markets
5 Denmark DNK 0.0471 Developed Markets Developed Markets
6 Finland FIN 0.0462 Developed Markets Developed Markets
7 France FRA 0.0816 Developed Markets Developed Markets
8 Germany DEU 0.0596 Developed Markets Developed Markets
9 Ireland IRL 0.0463 Developed Markets Developed Markets
10 Israel ISR 0.0468 Developed Markets Developed Markets
# … with 148 more rows
This data gives the local & hard currency debt outstanding provided by the Bank for International Settlements (BIS).
bis_gov_securities <- get_class_github_dataset("country_list_bis_debt_securities.csv")Rows: 55 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country_name, iso3c, em_dm
dbl (2): total_debt_securities, pct_of_total
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
bis_gov_securities# A tibble: 55 × 5
country_name iso3c total_debt_securities pct_of_total em_dm
<chr> <chr> <dbl> <dbl> <chr>
1 United States USA 19115. 0.327 Developed Markets
2 Japan JPN 9088. 0.155 Developed Markets
3 China CHN 7056. 0.121 Emerging Markets
4 United Kingdom GBR 2572. 0.0440 Developed Markets
5 France FRA 2520. 0.0431 Developed Markets
6 Italy ITA 2486. 0.0425 Developed Markets
7 Germany DEU 2068. 0.0354 Developed Markets
8 India IND 1450. 0.0248 Emerging Markets
9 Canada CAN 1411. 0.0241 Developed Markets
10 Spain ESP 1335. 0.0228 Developed Markets
# … with 45 more rows
This data gives you all of the country groupings from the World Bank and IMF.
imf_wb_country_groups <- get_class_github_dataset("imf_wb_country_groups.csv")Rows: 2587 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country_name, country_group, group_type
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
imf_wb_country_groups# A tibble: 2,587 × 3
country_name country_group group_type
<chr> <chr> <chr>
1 Australia Advanced Economies IMF
2 Austria Advanced Economies IMF
3 Belgium Advanced Economies IMF
4 Canada Advanced Economies IMF
5 Switzerland Advanced Economies IMF
6 Cyprus Advanced Economies IMF
7 Czechia Advanced Economies IMF
8 Germany Advanced Economies IMF
9 Denmark Advanced Economies IMF
10 Spain Advanced Economies IMF
# … with 2,577 more rows
country_features <- get_class_github_dataset("country_features_2022-10.csv")Rows: 217 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): country_name, iso3c, wb_income_group, wb_region
dbl (4): debt_gross_percent_of_gdp, nominal_gdp_bn_ppp, nominal_gdp_per_capi...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
country_features# A tibble: 217 × 8
country_name iso3c wb_incom…¹ wb_re…² debt_…³ nomin…⁴ nomin…⁵ popul…⁶
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Aruba ABW High Latin … 95.0 5.07 46309. 0.109
2 Afghanistan AFG Low South … 7.40 80.9 2456. 32.9
3 Angola AGO Lower Mid… Sub-Sa… 56.6 245. 7455. 32.9
4 Albania ALB Upper Mid… Europe… 70.3 51.2 17858. 2.87
5 Andorra AND High Europe… 43.0 5.30 65372. 0.081
6 United Arab Emirates ARE High Middle… 30.7 815. 77272. 10.5
7 Argentina ARG Upper Mid… Latin … 76.0 1207. 26074. 46.3
8 Armenia ARM Upper Mid… Europe… 52.3 49.8 16798 2.96
9 American Samoa ASM Upper Mid… East A… NA NA NA NA
10 Antigua & Barbuda ATG High Latin … 91.2 2.22 22070. 0.101
# … with 207 more rows, and abbreviated variable names ¹wb_income_group,
# ²wb_region, ³debt_gross_percent_of_gdp, ⁴nominal_gdp_bn_ppp,
# ⁵nominal_gdp_per_capita_ppp, ⁶population_mn
Taking a Look at Our Datasets
Emissions Datasets
The main emissions dataset is created from 2 other datasets.
Carbon data comes from OWID, and the population and macroeconomic data come from the IMF. Many of the variables you see below are “feature engineered” from the base data using the dply::mutate() function as we’ve done elsewhere. For example, consumption_co2_per_capita is caluclated by dividing the consumption carbon emissions figure from OWID by the population data provided by the IMF.
I’m a big fan of doing ones own calculation of figures like this from base data that you trust and know the source of. If you see differences in a per capita or per GDP figure, you can get a transparent sense of whether that difference is because of the numerator or the denominator.
Here are the numerators:
- Territorial Emissions: the Co2 emissions created in a country’s physical territory.
- Trade Emissions: Emissions imported or exported in trade with other countries.
- Consumption Emissions: Emissions resulting from territorial emissions + Net trade emissions.
- Cumulative Emissions: Cumulative territorial emissions.
Here are the denominators:
- Population
- GDP
- Debt
- Government Expenditure
Current GFANZ guidance for “Net Zero” investing in sovereign debt suggests using territorial emissions per capita.
This version of the emissions dataset has only the countries that are in the bond indices we’re using. We also have the other countries in emissions_dataset_full should you want to use them.
emissions_dataset |> glimpse()Rows: 6,702
Columns: 31
$ iso3c <chr> "AFG", "AFG", "AFG", "AFG", "AFG"…
$ year <dbl> 1990, 1991, 1992, 1993, 1994, 199…
$ gdp_usd_current_prices <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ gdp_ppp_current_prices <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ gdp_pc_usd_current_prices <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ gdp_pc_ppp_current_prices <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ population <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ govt_expenditure_pct_gdp <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ debt_pct_gdp <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ territorial_co2 <dbl> 2.603, 2.427, 1.379, 1.333, 1.282…
$ trade_co2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ consumption_co2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ cumulative_co2 <dbl> 59.182, 61.610, 62.989, 64.322, 6…
$ debt_usd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ govt_expenditure_usd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ territorial_co2_per_capita <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trade_co2_per_capita <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ consumption_co2_per_capita <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ territorial_co2_per_gdp <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trade_co2_per_gdp <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ consumption_co2_per_gdp <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ territorial_co2_per_debt <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trade_co2_per_debt <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ consumption_co2_per_debt <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ debt_usd_per_capita <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ territorial_co2_per_govt_expenditure <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trade_co2_per_govt_expenditure <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ consumption_co2_govt_expenditure <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ govt_expenditure_per_capita <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trade_pct_of_consumption_co2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
$ cumulative_co2_per_capita <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
Bond Indices
This data comes from BlackRock government bond ETFs. It’s useful because it can help us clarify how different decisions would affect asset allocation decisions between asset classes.
bond_indices |> glimpse()Rows: 158
Columns: 5
$ country_name <chr> "Australia", "Austria", "Belgium", "Canada", "Denmark", "…
$ iso3c <chr> "AUS", "AUT", "BEL", "CAN", "DNK", "FIN", "FRA", "DEU", "…
$ value_pct <dbl> 0.04692378933177, 0.04572291615032, 0.04605351394130, 0.0…
$ index <chr> "Developed Markets", "Developed Markets", "Developed Mark…
$ em_dm <chr> "Developed Markets", "Developed Markets", "Developed Mark…
One challenge with using real-world bond indices is that they generally don’t mix emerging markets and developed markets. And because part of what we’re looking at is the relative impact of different carbon intensity measures on EM and DM, it would be nice to have an index that covers both. Also, there are some countries, like China & India, for example, that are very large economies but have relatively low weights in government bond indices.
So we’re making a synthetic bond benchmark using the Bank for International Settlements (BIS) data on outstanding sovereign bonds. The logic is simple. Most investment benchmarks are based on market capitalization weighting (sometimes modified slightly to ensure diversification). We create an integrated EM/DM index using the total debt securities outstanding to determine the weights.
If the data you would like does not exist, figure out the best proxy. Keep it transparent and make sure you can justify to others why the assumptions are reasonable and the process adds value to the analysis.
bis_gov_securities |> glimpse()Rows: 55
Columns: 5
$ country_name <chr> "United States", "Japan", "China", "United Kingd…
$ iso3c <chr> "USA", "JPN", "CHN", "GBR", "FRA", "ITA", "DEU",…
$ total_debt_securities <dbl> 19115.4810, 9088.4047, 7056.0582, 2571.9160, 251…
$ pct_of_total <dbl> 0.326874485, 0.155411606, 0.120658507, 0.0439797…
$ em_dm <chr> "Developed Markets", "Developed Markets", "Emerg…
Helpful Features
This dataset has all of the IMF & World Bank country groupings
imf_wb_country_groups |> glimpse()Rows: 2,587
Columns: 3
$ country_name <chr> "Australia", "Austria", "Belgium", "Canada", "Switzerlan…
$ country_group <chr> "Advanced Economies", "Advanced Economies", "Advanced Ec…
$ group_type <chr> "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", "IMF", …
imf_wb_country_groups |>
group_by(country_group, group_type) |>
count()# A tibble: 67 × 3
# Groups: country_group, group_type [67]
country_group group_type n
<chr> <chr> <int>
1 Advanced Economies IMF 39
2 Advanced G20 IMF 9
3 Africa Eastern and Southern World Bank 26
4 Africa Western and Central World Bank 22
5 Arab World World Bank 22
6 Caribbean small states World Bank 13
7 Central Europe and the Baltics World Bank 11
8 Early-demographic dividend World Bank 62
9 East Asia & Pacific World Bank 38
10 East Asia & Pacific (IDA & IBRD) World Bank 23
# … with 57 more rows
And this is the country features dataset we looked at last week.
country_features |> glimpse()Rows: 217
Columns: 8
$ country_name <chr> "Aruba", "Afghanistan", "Angola", "Albania"…
$ iso3c <chr> "ABW", "AFG", "AGO", "ALB", "AND", "ARE", "…
$ wb_income_group <chr> "High", "Low", "Lower Middle", "Upper Middl…
$ wb_region <chr> "Latin America & Caribbean", "South Asia", …
$ debt_gross_percent_of_gdp <dbl> 95.042, 7.397, 56.564, 70.278, 43.050, 30.7…
$ nominal_gdp_bn_ppp <dbl> 5.066, 80.912, 245.442, 51.189, 5.301, 814.…
$ nominal_gdp_per_capita_ppp <dbl> 46308.66, 2456.29, 7455.47, 17858.42, 65371…
$ population_mn <dbl> 0.109, 32.941, 32.921, 2.866, 0.081, 10.544…
The Latest Data
The latest data is from 2019. It’s very common for international environmental data to have significant time lags. Let’s create a dataset that just focus on the latest data by filtering for year==2019. We’re also going to select just the data we want to look at here, and pivot it longer into tidy format.
emissions_dataset_2019 <- emissions_dataset |>
filter(year == 2019) |>
select(-year) |>
pivot_longer(gdp_usd_current_prices:cumulative_co2_per_capita)
emissions_dataset_2019# A tibble: 6,496 × 3
iso3c name value
<chr> <chr> <dbl>
1 AFG gdp_usd_current_prices 18.9
2 AFG gdp_ppp_current_prices 81.9
3 AFG gdp_pc_usd_current_prices 586.
4 AFG gdp_pc_ppp_current_prices 2543.
5 AFG population 32.2
6 AFG govt_expenditure_pct_gdp 28.0
7 AFG debt_pct_gdp 6.13
8 AFG territorial_co2 12.1
9 AFG trade_co2 NA
10 AFG consumption_co2 NA
# … with 6,486 more rows
We use our friend left_join() to connect these datasets like legos.
bond_indices_w_data <- bond_indices |>
left_join(emissions_dataset_2019, by = c("iso3c")) |>
rename(index_pct = value_pct, variable = name)Warning in left_join(bond_indices, emissions_dataset_2019, by = c("iso3c")): Each row in `x` is expected to match at most 1 row in `y`.
ℹ Row 1 of `x` matches multiple rows.
ℹ If multiple matches are expected, set `multiple = "all"` to silence this
warning.
bond_indices_w_data# A tibble: 4,582 × 7
country_name iso3c index_pct index em_dm varia…¹ value
<chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
1 Australia AUS 0.0469 Developed Markets Developed Mar… gdp_us… 1392.
2 Australia AUS 0.0469 Developed Markets Developed Mar… gdp_pp… 1346.
3 Australia AUS 0.0469 Developed Markets Developed Mar… gdp_pc… 54477.
4 Australia AUS 0.0469 Developed Markets Developed Mar… gdp_pc… 52675.
5 Australia AUS 0.0469 Developed Markets Developed Mar… popula… 25.6
6 Australia AUS 0.0469 Developed Markets Developed Mar… govt_e… 38.9
7 Australia AUS 0.0469 Developed Markets Developed Mar… debt_p… 46.6
8 Australia AUS 0.0469 Developed Markets Developed Mar… territ… 415.
9 Australia AUS 0.0469 Developed Markets Developed Mar… trade_… -39.1
10 Australia AUS 0.0469 Developed Markets Developed Mar… consum… 375.
# … with 4,572 more rows, and abbreviated variable name ¹variable
Calculated Benchmark Weighted Statistics
At the highest level, we’re trying to find the forest for the trees. Each investment benchmark has a lot of countries. How can we describe the characteristics of the entire benchmark? Most commonly, we do this using weighted averages. For an investment benchmark we use the sum of weight % multiplied by the figure in question (population, GDP, emissions, etc.).
First: Check The Data
bond_indices_w_data |>
# filter for NA values)
filter(is.na(value)) |>
# filter for when the variable includes "trade" or "consumption"
filter(str_detect(variable,pattern ="trade|consumption"))# A tibble: 121 × 7
country_name iso3c index_pct index em_dm varia…¹ value
<chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
1 Serbia SRB 0.0451 EM - Local Currency Emerging Mark… trade_… NA
2 Serbia SRB 0.0451 EM - Local Currency Emerging Mark… consum… NA
3 Serbia SRB 0.0451 EM - Local Currency Emerging Mark… trade_… NA
4 Serbia SRB 0.0451 EM - Local Currency Emerging Mark… consum… NA
5 Serbia SRB 0.0451 EM - Local Currency Emerging Mark… trade_… NA
6 Serbia SRB 0.0451 EM - Local Currency Emerging Mark… consum… NA
7 Serbia SRB 0.0451 EM - Local Currency Emerging Mark… trade_… NA
8 Serbia SRB 0.0451 EM - Local Currency Emerging Mark… consum… NA
9 Serbia SRB 0.0451 EM - Local Currency Emerging Mark… trade_… NA
10 Serbia SRB 0.0451 EM - Local Currency Emerging Mark… consum… NA
# … with 111 more rows, and abbreviated variable name ¹variable
missing_trade_data <- bond_indices_w_data |>
filter(is.na(value)) |>
filter(str_detect(variable,pattern ="trade|consumption")) |>
select(country_name, index_pct, index, em_dm) |>
unique()
missing_trade_data# A tibble: 11 × 4
country_name index_pct index em_dm
<chr> <dbl> <chr> <chr>
1 Serbia 0.0451 EM - Local Currency Emerging Markets
2 Angola 0.0119 EM - Hard Currency Emerging Markets
3 Gabon 0.00146 EM - Hard Currency Emerging Markets
4 Iraq 0.00410 EM - Hard Currency Emerging Markets
5 Lebanon 0.00138 EM - Hard Currency Emerging Markets
6 Serbia 0.00157 EM - Hard Currency Emerging Markets
7 Angola 0.0141 EM - USD (sov only) Emerging Markets
8 Gabon 0.00172 EM - USD (sov only) Emerging Markets
9 Iraq 0.00485 EM - USD (sov only) Emerging Markets
10 Lebanon 0.00163 EM - USD (sov only) Emerging Markets
11 Serbia 0.00186 EM - USD (sov only) Emerging Markets
How much coverage do we have?
missing_trade_data |>
group_by(index) |>
summarize(pct_of_index_missing = sum(index_pct) * 100,
pct_of_index_complete = 100 - pct_of_index_missing)# A tibble: 3 × 3
index pct_of_index_missing pct_of_index_complete
<chr> <dbl> <dbl>
1 EM - Hard Currency 2.05 98.0
2 EM - Local Currency 4.51 95.5
3 EM - USD (sov only) 2.42 97.6
We’ve got 95%+ coverage. This is pretty good! We can work with that.
What other data is missing?
bond_indices_w_data |>
filter(is.na(value)) |>
# use the ! (not) sign to search for anythinge except what we searched for above
filter(!str_detect(variable,pattern ="trade|consumption")) |>
select(country_name, index_pct, index, em_dm) |>
unique() # A tibble: 0 × 4
# … with 4 variables: country_name <chr>, index_pct <dbl>, index <chr>,
# em_dm <chr>
And it looks like nothing else is missing.
Reweighting the indices
Do the current index weighst sum to 1? yep.
bond_indices_w_data |>
select(country_name, index_pct, index) |>
unique() |>
group_by(index) |>
summarize(index_total_weight = sum(index_pct))# A tibble: 4 × 2
index index_total_weight
<chr> <dbl>
1 Developed Markets 1
2 EM - Hard Currency 1
3 EM - Local Currency 1
4 EM - USD (sov only) 1
Take out the missing data, and re-calculate the weights
bond_indices_w_data_reweighted <- bond_indices_w_data |>
filter(!is.na(value)) |>
group_by(em_dm, index, variable) |>
mutate(index_pct = index_pct/sum(index_pct)) |>
ungroup()
bond_indices_w_data_reweighted # A tibble: 4,461 × 7
country_name iso3c index_pct index em_dm varia…¹ value
<chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
1 Australia AUS 0.0469 Developed Markets Developed Mar… gdp_us… 1392.
2 Australia AUS 0.0469 Developed Markets Developed Mar… gdp_pp… 1346.
3 Australia AUS 0.0469 Developed Markets Developed Mar… gdp_pc… 54477.
4 Australia AUS 0.0469 Developed Markets Developed Mar… gdp_pc… 52675.
5 Australia AUS 0.0469 Developed Markets Developed Mar… popula… 25.6
6 Australia AUS 0.0469 Developed Markets Developed Mar… govt_e… 38.9
7 Australia AUS 0.0469 Developed Markets Developed Mar… debt_p… 46.6
8 Australia AUS 0.0469 Developed Markets Developed Mar… territ… 415.
9 Australia AUS 0.0469 Developed Markets Developed Mar… trade_… -39.1
10 Australia AUS 0.0469 Developed Markets Developed Mar… consum… 375.
# … with 4,451 more rows, and abbreviated variable name ¹variable
Calculated Benchmark Weighted Statistics
Now we can calculate the benchmark weighted for everything in just a few lines.
bond_indices_weighted_avg <- bond_indices_w_data_reweighted |>
mutate(contribution = index_pct * value) |>
group_by(index, variable) |>
summarize(weighted_avg = sum(contribution)) |>
ungroup() |>
arrange(variable, weighted_avg)`summarise()` has grouped output by 'index'. You can override using the
`.groups` argument.
bond_indices_weighted_avg # A tibble: 116 × 3
index variable weighted_avg
<chr> <chr> <dbl>
1 Developed Markets consumption_co2 374.
2 EM - USD (sov only) consumption_co2 401.
3 EM - Hard Currency consumption_co2 636.
4 EM - Local Currency consumption_co2 1470.
5 Developed Markets consumption_co2_govt_expenditure 0.00562
6 EM - Local Currency consumption_co2_govt_expenditure 0.0172
7 EM - USD (sov only) consumption_co2_govt_expenditure 0.0186
8 EM - Hard Currency consumption_co2_govt_expenditure 0.0193
9 EM - Local Currency consumption_co2_per_capita 4.92
10 EM - USD (sov only) consumption_co2_per_capita 7.13
# … with 106 more rows
bond_indices_weighted_avg |>
summarize(n_index = n_distinct(index),
n_variable = n_distinct(variable)) |>
mutate(total_n = n_index * n_variable)# A tibble: 1 × 3
n_index n_variable total_n
<int> <int> <int>
1 4 29 116
Let’s just stop to think about this for a minute. Tidy datasets like the one above are not intuitive to work with at first. But there are some great reasons why we learn to love them. And this is one of them. We just calculated 108 different weighted averages in just a few lines of code. Doing this in Excel would be unfun.
Comparing Numerators
We’ll use a workflow similar to what we’ve used before. We’ll pivot tidy data into wide data so that we can compare two groups.
numerator_compare <- bond_indices_weighted_avg |>
filter(variable == "consumption_co2_per_capita"|
variable == "territorial_co2_per_capita") |>
pivot_wider(names_from = variable, values_from = weighted_avg) |>
mutate(pct_delta_from_consumption = (territorial_co2_per_capita/consumption_co2_per_capita - 1) * 100)
numerator_compare# A tibble: 4 × 4
index consumption_co2_per_capita territorial_co2_per_c…¹ pct_d…²
<chr> <dbl> <dbl> <dbl>
1 EM - Local Currency 4.92 5.12 4.18
2 EM - USD (sov only) 7.13 7.69 7.82
3 EM - Hard Currency 7.32 7.92 8.14
4 Developed Markets 9.93 7.81 -21.3
# … with abbreviated variable names ¹territorial_co2_per_capita,
# ²pct_delta_from_consumption
numerator_compare |>
ggplot(aes(x = territorial_co2_per_capita, y = consumption_co2_per_capita, size = abs(pct_delta_from_consumption), label = index, alpha = abs(pct_delta_from_consumption))) +
geom_point() +
geom_text_repel() +
labs(size = "",
alpha = "") +
geom_abline(intercept = 0, lty = "dashed", alpha = .6) +
scale_x_continuous(limits = c(0,10)) +
scale_y_continuous(limits = c(0,10))We’ve made the size aesthetic map to the absolute value of the percentage difference. What we can see is that
- developed markets are much more carbon intensive from a consumption perspective than a territorial perspective
- EM hard currency debt is the opposite.
- EM local currency debt is about the same in both.
This kind of chart brings up a lot of questions that we can explore in the data later.
Squeezing Out More Insights
What percent of consumption emissions come from net imports?
Well, for Developed Markets its pretty high. This is relevant because the current policy proposals suggest using territorial emissions. We can see that the main policy implications of that decision is to make Developed Markets look a lot cleaner.
bond_indices_weighted_avg |>
filter(variable == "trade_pct_of_consumption_co2") |>
mutate(weighted_avg = weighted_avg * 100)# A tibble: 4 × 3
index variable weighted_avg
<chr> <chr> <dbl>
1 EM - Hard Currency trade_pct_of_consumption_co2 0.0733
2 EM - USD (sov only) trade_pct_of_consumption_co2 0.986
3 EM - Local Currency trade_pct_of_consumption_co2 1.23
4 Developed Markets trade_pct_of_consumption_co2 20.3
The DM benchmark is 2-2.5x richer than the EM benchmarks
bond_indices_weighted_avg |>
filter(variable == "gdp_pc_ppp_current_prices")# A tibble: 4 × 3
index variable weighted_avg
<chr> <chr> <dbl>
1 EM - Local Currency gdp_pc_ppp_current_prices 22003.
2 EM - USD (sov only) gdp_pc_ppp_current_prices 26909.
3 EM - Hard Currency gdp_pc_ppp_current_prices 27326.
4 Developed Markets gdp_pc_ppp_current_prices 54770.
this matters because the only difference between per capita and per GDP calculations is that the denominator is multiplied by GDP per capita.
emission per capita = emissions/population
emission per GDP = emissions/GDP = emissions/(population * GDP per capita).
Think about what this means. Because it inflates the denominator by GDP per capita, it’s basically lowering the resulting emission intensity by a country’s level of income. Morally questionable? Hmmm.
BIS Securities
Now you’re going to play around with the BIS data we showed earlier.
bis_gov_securities |> glimpse()Rows: 55
Columns: 5
$ country_name <chr> "United States", "Japan", "China", "United Kingd…
$ iso3c <chr> "USA", "JPN", "CHN", "GBR", "FRA", "ITA", "DEU",…
$ total_debt_securities <dbl> 19115.4810, 9088.4047, 7056.0582, 2571.9160, 251…
$ pct_of_total <dbl> 0.326874485, 0.155411606, 0.120658507, 0.0439797…
$ em_dm <chr> "Developed Markets", "Developed Markets", "Emerg…
Putting Together The Data
“Developed Markets” and “Emerging Markets” are ultimately marketing constructs to describe investment benchmarks. They are nebulous and somewhat arbitrary.
So let’s also look at the IMF’s classification of Advanced Economies and Emerging Market Economies. It overlaps mostly, but is less arbitrary. And it may be more meaningful to policymakers, who are another audience for this analysis.
imf_ae_eme <- imf_wb_country_groups |>
filter(country_group %in% c("Advanced Economies", "Emerging Market Economies")) |>
select(-group_type)
imf_ae_eme# A tibble: 135 × 2
country_name country_group
<chr> <chr>
1 Australia Advanced Economies
2 Austria Advanced Economies
3 Belgium Advanced Economies
4 Canada Advanced Economies
5 Switzerland Advanced Economies
6 Cyprus Advanced Economies
7 Czechia Advanced Economies
8 Germany Advanced Economies
9 Denmark Advanced Economies
10 Spain Advanced Economies
# … with 125 more rows
We’ll add togeher a variety of datasets here so we have lots of features to play around with
bis_emissions <- bis_gov_securities |>
left_join(imf_ae_eme) |>
rename(imf_ae_eme = country_group) |>
mutate(region = countrycode(iso3c, origin = "iso3c", destination = "un.region.name"),
sub_region = countrycode(iso3c, origin = "iso3c", destination = "un.regionsub.name")) |>
left_join(country_features) |>
left_join(emissions_dataset_2019, by = c("iso3c")) |>
rename(characteristic = name)Joining with `by = join_by(country_name)`
Warning: There were 2 warnings in `mutate()`.
The first warning was:
ℹ In argument: `region = countrycode(iso3c, origin = "iso3c", destination =
"un.region.name")`.
Caused by warning in `countrycode_convert()`:
! Some values were not matched unambiguously: TWN
ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
Joining with `by = join_by(country_name, iso3c)`
Warning in left_join(left_join(mutate(rename(left_join(bis_gov_securities, : Each row in `x` is expected to match at most 1 row in `y`.
ℹ Row 1 of `x` matches multiple rows.
ℹ If multiple matches are expected, set `multiple = "all"` to silence this
warning.
bis_emissions# A tibble: 1,595 × 16
country_…¹ iso3c total…² pct_o…³ em_dm imf_a…⁴ region sub_r…⁵ wb_in…⁶ wb_re…⁷
<chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
2 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
3 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
4 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
5 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
6 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
7 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
8 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
9 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
10 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
# … with 1,585 more rows, 6 more variables: debt_gross_percent_of_gdp <dbl>,
# nominal_gdp_bn_ppp <dbl>, nominal_gdp_per_capita_ppp <dbl>,
# population_mn <dbl>, characteristic <chr>, value <dbl>, and abbreviated
# variable names ¹country_name, ²total_debt_securities, ³pct_of_total,
# ⁴imf_ae_eme, ⁵sub_region, ⁶wb_income_group, ⁷wb_region
We get the following warning: Some values were not matched unambiguously: TWN
Country and territory names and classifications are geopolitical. These datasets often come from the UN System, and China doesn’t want Taiwan included as a country.
So for our analysis, we’ll add the region data manually.
bis_emissions_clean <- bis_emissions |>
# case_when() is a more flexible if_else(). It's great. learn it.
mutate(region = case_when(iso3c == "TWN" ~ "Asia",
TRUE ~ region),
sub_region = case_when(iso3c == "TWN" ~ "Eastern Asia",
TRUE ~ sub_region))
bis_emissions_clean# A tibble: 1,595 × 16
country_…¹ iso3c total…² pct_o…³ em_dm imf_a…⁴ region sub_r…⁵ wb_in…⁶ wb_re…⁷
<chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
2 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
3 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
4 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
5 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
6 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
7 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
8 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
9 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
10 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
# … with 1,585 more rows, 6 more variables: debt_gross_percent_of_gdp <dbl>,
# nominal_gdp_bn_ppp <dbl>, nominal_gdp_per_capita_ppp <dbl>,
# population_mn <dbl>, characteristic <chr>, value <dbl>, and abbreviated
# variable names ¹country_name, ²total_debt_securities, ³pct_of_total,
# ⁴imf_ae_eme, ⁵sub_region, ⁶wb_income_group, ⁷wb_region
Now instead of re-writing code. We can use the function we wrote in a previous exercise. Normally you might put this function in a functions.r file for easy future use. And eventually you can make your own R package.
add_rank_features <- function(data, value_var = value, rank_by_lowest = TRUE,
quantile_n = 5) {
# If rank_by_lowest is set to TRUE, as it is by default, then it will use this first code chunk
if (rank_by_lowest)
data |>
mutate("{{value_var}}_z_score" := ({{ value_var }}-mean({{ value_var }}))/
sd({{ value_var }}),
"{{value_var}}_rank" := rank({{ value_var }}),
"{{value_var}}_percentile" := percent_rank({{ value_var }}),
"{{value_var}}_quantile_{{quantile_n}}" := ntile({{ value_var }},
{{ quantile_n }}),
"{{value_var}}_pct_total" := {{ value_var }}/sum({{ value_var }}))
# If rank_by_lowest is set to FALSE, as it is by default, then it will use this second code chunk
else
data |>
mutate("{{value_var}}_z_score" := ({{ value_var }}-mean({{ value_var }}))/
sd({{ value_var }}) * -1, # multiply by -1
# put negative sign
"{{value_var}}_rank" := rank(-{{ value_var }}),
# put negative sign
"{{value_var}}_percentile" := percent_rank(-{{ value_var }}),
# put negative sign
"{{value_var}}_quantile_{{quantile_n}}" := ntile(-{{ value_var }},
{{ quantile_n }}),
# no change, not relevant for pct_total
"{{value_var}}_pct_total" := {{ value_var }}/sum({{ value_var }}))
}bis_emissions_w_features <- bis_emissions_clean |>
group_by(characteristic) |>
add_rank_features(rank_by_lowest = FALSE) |>
ungroup()
bis_emissions_w_features# A tibble: 1,595 × 21
country_…¹ iso3c total…² pct_o…³ em_dm imf_a…⁴ region sub_r…⁵ wb_in…⁶ wb_re…⁷
<chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
2 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
3 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
4 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
5 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
6 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
7 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
8 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
9 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
10 United St… USA 19115. 0.327 Deve… Advanc… Ameri… Northe… High North …
# … with 1,585 more rows, 11 more variables: debt_gross_percent_of_gdp <dbl>,
# nominal_gdp_bn_ppp <dbl>, nominal_gdp_per_capita_ppp <dbl>,
# population_mn <dbl>, characteristic <chr>, value <dbl>,
# value_z_score <dbl>, value_rank <dbl>, value_percentile <dbl>,
# value_quantile_5 <int>, value_pct_total <dbl>, and abbreviated variable
# names ¹country_name, ²total_debt_securities, ³pct_of_total, ⁴imf_ae_eme,
# ⁵sub_region, ⁶wb_income_group, ⁷wb_region
bis_emissions_w_features |> glimpse()Rows: 1,595
Columns: 21
$ country_name <chr> "United States", "United States", "United S…
$ iso3c <chr> "USA", "USA", "USA", "USA", "USA", "USA", "…
$ total_debt_securities <dbl> 19115.48, 19115.48, 19115.48, 19115.48, 191…
$ pct_of_total <dbl> 0.3268745, 0.3268745, 0.3268745, 0.3268745,…
$ em_dm <chr> "Developed Markets", "Developed Markets", "…
$ imf_ae_eme <chr> "Advanced Economies", "Advanced Economies",…
$ region <chr> "Americas", "Americas", "Americas", "Americ…
$ sub_region <chr> "Northern America", "Northern America", "No…
$ wb_income_group <chr> "High", "High", "High", "High", "High", "Hi…
$ wb_region <chr> "North America", "North America", "North Am…
$ debt_gross_percent_of_gdp <dbl> 122.103, 122.103, 122.103, 122.103, 122.103…
$ nominal_gdp_bn_ppp <dbl> 25035.16, 25035.16, 25035.16, 25035.16, 250…
$ nominal_gdp_per_capita_ppp <dbl> 75179.59, 75179.59, 75179.59, 75179.59, 751…
$ population_mn <dbl> 333.005, 333.005, 333.005, 333.005, 333.005…
$ characteristic <chr> "gdp_usd_current_prices", "gdp_ppp_current_…
$ value <dbl> 21372.6000000000, 21372.6000000000, 65051.8…
$ value_z_score <dbl> -5.78563491, -4.45319392, -1.38198117, -1.0…
$ value_rank <dbl> 1, 2, 6, 6, 3, 32, 6, 2, 1, 2, 1, 1, 1, 3, …
$ value_percentile <dbl> 0.00000000, 0.01851852, 0.09259259, 0.09259…
$ value_quantile_5 <int> 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3…
$ value_pct_total <dbl> 0.2659056825, 0.1819383748, 0.0375239280, 0…
Now it’s your turn
You have a lot of features to play around with.
- Get a sense for the data. What is the geographic dispersion of countries? What income groups?
- What does the ranking of countries look like? How concentrated is the data.
- Come up with interesting ways of illustrating (visually or in a factoid) how the changes in numerators and denominators creates winners and losers.
Sorted bar charts
bond_indices_weighted_avg |>
filter(variable == "gdp_pc_ppp_current_prices") |>
ggplot(aes(x = weighted_avg, y = fct_reorder(.f = index, .x = weighted_avg, .desc = TRUE))) +
geom_bar(stat = "identity")