library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(tidyquant)
## Loading required package: lubridate
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
##
## Loading required package: PerformanceAnalytics
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
##
## Attaching package: 'xts'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
##
## Attaching package: 'PerformanceAnalytics'
##
## The following object is masked from 'package:graphics':
##
## legend
##
## Loading required package: quantmod
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(lubridate)
library(dplyr)
library(readxl)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(esquisse)
library(here)
## here() starts at C:/Users/james/OneDrive/Documents/R Projects/Final Project
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(ggthemes)
library(ggrepel)
library(gt)
library(countrycode)
library(wbstats)
library(ggridges)
library(here)
library(countrycode)
we’re going to start with ND-GAINS data first and take a look at some of the most vulnerable countries to climate change. We’ll use these as our country group for sovereign debt analysis. Based on research and discussions, we would expect a high correlation between the ND-GAIN scores of the country and the type/amount of debt they take on. Once that data is prepped we’ll work on getting the International Debt Statistics (IDS) data cleaned and organized.
There are a few challenges with these two data sets. One is that the naming convention for both the country code and the country name need to be unified. Second is there’s no categorization for region or income group under the IDS data (GAIN does not categorize by region). Interestingly in IDS, each statistic includes data by region and income group (totals) which I think will be useful later. Another challenge is efficiently incorporating all of the data so that it doesn’t take me additional hours (it already dominated much of allotted weekend time and during week).
The GAIN dataset will serve as the frame and we’ll put the IDS data into it to create a unified DF. I’m going to look at the period from 2000-2020, which is as far as the gain data currently goes. One thing I might go back and modify is including additional years for the IDS indicators as there’s goes as far as 2022 and in some instances contains projections as far as 2028. This would be helpful for capturing the large borrowing increases we’ve seen during the pandemic years and might provide more concrete take aways. For now, though, we’re going to stick to 2000-2020 to at least get that DF in order.
This gets tricky as the IDS data set contains over 500 indicators to choose from. I’ll be including the following to start with:
These represent the indicators I think will be most useful at the moment. There’s one that brakes down external debt by currency, and we may go mess with that later, but for now we’re going to stick to these.
italics Note: This process became rather messy near the end. Though I had an idea of what I wanted to do, I hadn’t thought through all the processes and ended up having to go back and tweak stuff so it would all flow once published.
gain <- read.csv("gain.csv")
world_income <- read.csv("world_by_income.csv")
let’s get rid of the “x” next to the years and cut down the number of years so we’re starting around the year 2000 (I forgot until much later to change the columns to characters for easier use, hence this janky code below).
colnames(gain)<-gsub("X","", colnames(gain))
gain_2000 <- gain[-3:-7]
#import Teal's country features so we can use the iso3c, income group, and region columns to match up with the ND-GAIN dataset and categorize the countries.
country_features <- read.csv("country_features.csv")
This next portion has me creating separate DFs by income group and then doing some quick data vis on it. In hindsight I don’t think this is strictly necessary as once I have a unified DF with appropriate categories I should be able to recreate these without breaking them out. One thing that might happen in the future is creating a separate DF that is GAIN scores and select IDS data summarized by regions, just to make it easier to work with. More on that further down.
#joining the datasets (there's some highly redundant code here as I created a master list of joined GAIN and classifier data after the fact and don't feel like going back to correct it at the moment.)
iso3c_classifier <- country_features %>%
select(iso3c, wb_income_group, wb_region) %>%
rename(ISO3 = iso3c)
gain_w_classifiers <- gain_2000 %>%
right_join(iso3c_classifier, by = "ISO3") %>%
relocate(wb_income_group, wb_region, .after=Name) %>%
rename("Income Group" = wb_income_group) %>%
rename("Region" = wb_region) %>%
na.omit()
gain_low <- gain_w_classifiers %>%
filter(`Income Group` == "Low")
gain_low %>%
gather(key = Year, value = Value, "2000":"2020") %>%
group_by(ISO3) %>%
summarize(average = mean(Value)) %>%
arrange(desc(average)) %>%
ggplot(aes(x = reorder(ISO3, average), y = average)) +
geom_point(stat = "identity",fill = "#0099f9") +
theme_clean() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(
title = "Average ND-GAIN value among low income countries",
subtitle = "Years 2000 - 2020",
caption = "Source: Notre Dame Global Adaptation Index",
x = "Country",
y = "Average Value"
)
gain_lmid <- gain_2000 %>%
right_join(iso3c_classifier, by = "ISO3") %>%
relocate(wb_income_group, wb_region, .after=Name) %>%
rename("Income_Group" = wb_income_group) %>%
rename("Region" = wb_region) %>%
filter(Income_Group == "Lower Middle") %>%
na.omit()
gain_lmid %>%
gather(key = Year, value = Value, "2000":"2020") %>%
group_by(ISO3) %>%
summarize(average = mean(Value)) %>%
arrange(desc(average)) %>%
ggplot(aes(x = reorder(ISO3, average), y = average)) +
geom_point(stat = "identity",fill = "#0099f9") +
theme_clean() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(
title = "Average ND-GAIN value among lower-middle income countries",
subtitle = "Years 2000-2020",
caption = "Source: Notre Dame Global Adaptation Index",
x = "Country",
y = "Average Value"
)
gain_umid <- gain_2000 %>%
right_join(iso3c_classifier, by = "ISO3") %>%
relocate(wb_income_group, wb_region, .after=Name) %>%
rename("Income_Group" = wb_income_group) %>%
rename("Region" = wb_region) %>%
filter(Income_Group == "Upper Middle") %>%
na.omit()
gain_umid %>%
gather(key = Year, value = Value, "2000":"2020") %>%
group_by(ISO3) %>%
summarize(average = mean(Value)) %>%
arrange(desc(average)) %>%
ggplot(aes(x = reorder(ISO3, average), y = average)) +
geom_point(stat = "identity",fill = "#0099f9") +
theme_clean() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(
title = "Average ND-GAINS value among upper-middle income countries",
subtitle = "Years 200-2020",
caption = "Source: Notre Dame Global Adaptation Index",
x = "Country",
y = "Average Value"
)
gain_high <- gain_2000 %>%
right_join(iso3c_classifier, by = "ISO3") %>%
relocate(wb_income_group, wb_region, .after=Name) %>%
rename("Income_Group" = wb_income_group) %>%
rename("Region" = wb_region) %>%
filter(Income_Group == "High") %>%
na.omit()
gain_high %>%
gather(key = Year, value = Value, "2000":"2020") %>%
group_by(ISO3) %>%
summarize(average = mean(Value)) %>%
arrange(desc(average)) %>%
ggplot(aes(x = reorder(ISO3, average), y = average)) +
geom_point(stat = "identity",fill = "#0099f9") +
theme_clean() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(
title = "Average ND-GAINS value among high income countries",
subtitle = "Years 2000-2020",
caption = "Source: Notre Dame Global Adaptation Index",
x = "Country",
y = "Average Value"
)
The below graph I like as it does a good job of visualizing the distribution of GAIN scores among the different income groups. We can see that high income countries have much more distributed scores and there is significant overlap between upper/lower middle income countries. Low income countries tend to be much more concentrated.
gain_w_classifiers %>%
gather(key = Year, value = Value, "2000":"2020") %>%
filter(`Income Group` %in% c("Low", "Lower Middle","Upper Middle","High")) %>%
ggplot(aes(x = Value, y = `Income Group` , fill = factor(`Income Group`))) +
geom_density_ridges() +
labs(
x = "ND-GAIN Score Distribution",
y = "Income Group",
title = "Distribution of ND-GAIN Scores",
subtitle = "Years 2000-2020",
caption = "Source: Notre Dame Global Adaptation Index",
legend = ""
)+
scale_color_discrete(name="")+
theme_clean()+
theme(legend.position = "none")
## Picking joint bandwidth of 1.21
## IDS Data
This dataset is a beast. It contains over 500 different indicators to choose from, so sifting through this and deciding what’s relevant is a chore.
IDS_all <- read.csv("IDS_All.csv")
colnames(IDS_all)<-gsub("X","", colnames(IDS_all))
IDS_all_2000 <- IDS_all[-7:-36]
Stealing Teal’s CountryName function to easily match ISO3C between the two datasets
country_name_regex_to_iso3c <- function(country_name) {
country_name %>%
countrycode(origin = "country.name",
destination = "iso3c",
origin_regex = TRUE)
}
Using purr::partial to make it more versatile (still don’t actually understand what this is doing)
iso3c_to_x <- purrr::partial(countrycode, origin = "iso3c")
Here we start adding regions and income categories for the datasets. This code is a little all over the place because I realized after the fact that I wanted a category for income.
IDS_w_iso3c <- IDS_all_2000 %>%
mutate(ISO3 = country_name_regex_to_iso3c(Country.Name)) %>%
relocate(ISO3, .after=Country.Name) %>%
mutate(ISO3 = coalesce(ISO3,Country.Code)) %>%
mutate(region = iso3c_to_x(ISO3, destination = "region"))%>%
relocate(region, .after=ISO3)
## Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: , Data from database: International Debt Statistics, East Asia & Pacific (excluding high income), Europe & Central Asia (excluding high income), IDA only, Kosovo, Last Updated: 10/10/2021, Latin America & Caribbean (excluding high income), Least developed countries: UN classification, Low & middle income, Low income, Lower middle income, Middle East & North Africa (excluding high income), Middle income, South Asia, Sub-Saharan Africa (excluding high income), Upper middle income
## Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: , EAP, ECA, IDX, LAC, LDC, LIC, LMC, LMY, MIC, MNA, SAS, SSA, UMC, XKX
# I want to match the datasets by ISO3C to ensure consistency and appropriate matching of variables. So this creates that category and drops "Country" to ensure no confustion.
world_income <- world_income %>%
mutate(ISO3 = country_name_regex_to_iso3c(Country)) %>%
relocate(ISO3, .after=Country) %>%
mutate(ISO3 = coalesce(ISO3,Country)) %>%
select(-Country)
## Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Channel Islands, Kosovo
gain_2000 <- gain_2000 %>%
mutate(region = iso3c_to_x(ISO3, destination = "region"))%>%
relocate(region, .after=Name) %>%
left_join(world_income, by = "ISO3")
#separating out because R can't seem to figure out that Income.group exists when used in the above process...
gain_2000 <- gain_2000%>%
relocate(Income.group, .after=region)
# We can now join the income group with IDS DF. We do this because later we'll be importing region and income group along wit the indicators into the master DF.
IDS_w_iso3c <- IDS_w_iso3c %>%
left_join(world_income, by = "ISO3") %>%
relocate(Income.group, .after=region)
# Our first DF that we'll beind to the master set.
IDS_total_external_debt_stock <- IDS_w_iso3c %>%
filter(Series.Name == "External debt stocks, total (DOD, current US$)") %>%
select(-c(Country.Code, Series.Code, Counterpart.Area.Name, Counterpart.Area.Code)) %>%
rename("Name" = Country.Name) %>%
gather(key = Year, value = Value, "2000":"2020") %>%
mutate(across(.fns = as.character)) %>%
select(-c("2021","2022","2023","2024","2025","2026","2027","2028"))
#pivoting into longer format
gain_2000_long <- gain_2000 %>%
gather(key = Year, value = Value, "2000":"2020") %>%
add_column(Series.Name = "Gain Score") %>%
relocate(Series.Name, .after=Year) %>%
mutate(across(.fns = as.character))
# creating the master DF.
gain_IDS <- gain_2000_long %>%
bind_rows(IDS_total_external_debt_stock)
# creating a function that'll do all this cleaning for us as we grab individual indicators.
IDS_series_process <- function(series_name) {
IDS_w_iso3c %>%
filter(Series.Name == series_name) %>%
select(-c(Country.Code, Series.Code, Counterpart.Area.Name, Counterpart.Area.Code)) %>%
rename("Name" = Country.Name) %>%
gather(key = Year, value = Value, "2000":"2020") %>%
mutate(across(.fns = as.character)) %>%
select(-c("2021","2022","2023","2024","2025","2026","2027","2028"))
}
Below we use the function to grab the various indicators we want and add them to the master DF.
IDS_external_debt_variable_rate <- IDS_series_process(series_name = "External debt stocks, variable rate (DOD, current US$)")
gain_IDS <- gain_IDS %>%
bind_rows(IDS_external_debt_variable_rate)
IDS_GG_bonds_DOD <- IDS_series_process(series_name = "GG, bonds (DOD, current US$)")
gain_IDS <- gain_IDS %>%
bind_rows(IDS_GG_bonds_DOD)
IDS_official_creditor_DOD <- IDS_series_process(series_name = "GG, official creditors (DOD, current US$)")
gain_IDS <- gain_IDS %>%
bind_rows(IDS_official_creditor_DOD)
IDS_official_creditor_INT <- IDS_series_process(series_name = "GG, official creditors (INT, current US$)")
gain_IDS <- gain_IDS %>%
bind_rows(IDS_official_creditor_INT)
IDS_interest_arrears_private_creditors <- IDS_series_process(series_name = "Interest arrears, private creditors (current US$)")
gain_IDS <- gain_IDS %>%
bind_rows(IDS_interest_arrears_private_creditors)
IDS_total_debt_rescheduled <- IDS_series_process(series_name = "Total amount of debt rescheduled (current US$)")
gain_IDS <- gain_IDS %>%
bind_rows(IDS_total_debt_rescheduled)
IDS_total_change_external_debt <- IDS_series_process(series_name = "Total change in external debt stocks (current US$)")
gain_IDS <- gain_IDS %>%
bind_rows(IDS_total_change_external_debt)
IDS_total_reserves_percent_debt <- IDS_series_process(series_name = "Total reserves (% of total external debt)")
gain_IDS <- gain_IDS %>%
bind_rows(IDS_total_reserves_percent_debt)
IDS_total_reserves <- IDS_series_process(series_name = "Total reserves (includes gold, current US$)")
gain_IDS <- gain_IDS %>%
bind_rows(IDS_total_reserves)
# replacing all NAs with 0
gain_IDS <- gain_IDS %>%
mutate_if(is.character, ~replace(., is.na(.), 0))
character_num_to_numeric <- function(character_num) {
character_num %>%
# take out the commas
str_remove_all(pattern = ",") %>%
# take out any blank spaces before or after the number
str_trim() %>%
# coerce to numeric
as.numeric()
}
Create a reference for our variables so we can easily copy and paste.
gain_IDS_categories <- unique(gain_IDS$Series.Name)
Here is a test graph looking at Zambia’s gain score vs its external debt, to see that the DF can actually produce a result.
gain_IDS %>%
filter(Name == "Zambia") %>%
filter(Series.Name %in% c("Gain Score", "External debt stocks, total (DOD, current US$)")) %>%
mutate(Value = character_num_to_numeric(Value)) %>%
mutate(Year = character_num_to_numeric(Year)) %>%
pivot_wider(names_from = "Series.Name", values_from = Value) %>%
ggplot(aes(x = Year)) +
geom_line(aes(y =`Gain Score`)) +
geom_line(aes(y = `External debt stocks, total (DOD, current US$)`/1000000000)) +
scale_y_continuous(name = "Gain Score",
sec.axis = sec_axis(trans=~.*1, name= "Total External Debt ($bn)")) +
labs(
title = "Zambia has taken on higher debt but its Gain Score has dipped",
subtitle = "GAIN Score vs Total External Debt",
caption = "Source: Notre Dame Global Adaptation Index and World Bank"
) +
theme_clean()