Since I’m analyzing the effect of green financing on carbon emission for each country, and the effectiveness of different type of financial instrument, I will start with finding data:
Carbon emission data, by country, by region: carbon emissions data based on production
Green financing data, by country, by region: Data on green bonds, loans, and investments
Project Outcome Data: Information on the estimated and actual carbon reduction achieved through financed projects - skipped (can’t access Bloomberg/climate bond initiative database). So I will run regression to predict the result.
Policy and Regulatory Data: Details on climate-related financial regulations (again, the database that contains such data is unavailable to students) so I will try to search for relevent results directly.
library(readxl)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(writexl)library(dplyr)library(countrycode)library(here) # for using relative file paths, here() starts at /Users/ciel.wang/Desktop/JH/R/00_data_raw
Before cleaning and analyzing the data, we first standardized the country names and codes to make sure that they are matched in both data frames:
#design a function to standardize country name and code country_regex_to_iso3c <-function(country_string) { country_string |> countrycode::countrycode(origin ="country.name", destination ="iso3c", origin_regex =TRUE)}iso3c_to_country_name <-function(country_string) { country_string |> countrycode::countrycode(origin ="iso3c", destination ="country.name")}
Next, pivot longer each data sets and select the relevant columns:
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `continent = countrycode(Country_Name, origin = "country.name",
destination = "continent")`.
Caused by warning:
! Some values were not matched unambiguously: Int. Aviation, Int. Shipping, Serbia and Montenegro, Virgin Islands_USA
# A tibble: 863 × 4
ISO3 Country_Name year debt_issuance
<chr> <chr> <dbl> <dbl>
1 ARG Argentina 2017 0.974
2 ARG Argentina 2020 0.0500
3 ARG Argentina 2021 0.916
4 ARG Argentina 2022 0.207
5 AUS Australia 2014 0.526
6 AUS Australia 2015 0.413
7 AUS Australia 2016 0.531
8 AUS Australia 2017 2.53
9 AUS Australia 2018 2.22
10 AUS Australia 2019 1.98
# ℹ 853 more rows
Then we can try to standardize country variables so that the two data frames can be merged:
#### standardzied country names and codesemission_long |>mutate(ISO3 =country_regex_to_iso3c(Country_Name))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `ISO3 = country_regex_to_iso3c(Country_Name)`.
Caused by warning:
! Some values were not matched unambiguously: Int. Aviation, Int. Shipping, Netherlands Antilles, Serbia and Montenegro, Virgin Islands_USA
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `ISO3 = country_regex_to_iso3c(Country_Name)`.
Caused by warning:
! Some values were not matched unambiguously: World
# A tibble: 863 × 4
ISO3 Country_Name year debt_issuance
<chr> <chr> <dbl> <dbl>
1 ARG Argentina 2017 0.974
2 ARG Argentina 2020 0.0500
3 ARG Argentina 2021 0.916
4 ARG Argentina 2022 0.207
5 AUS Australia 2014 0.526
6 AUS Australia 2015 0.413
7 AUS Australia 2016 0.531
8 AUS Australia 2017 2.53
9 AUS Australia 2018 2.22
10 AUS Australia 2019 1.98
# ℹ 853 more rows
Join the data frames by country and year and calculate the financed emission based on the sovereign debt metric:
# A tibble: 863 × 6
ISO3 Country_Name year debt_issuance emission percentage_change_emission
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ARG Argentina 2017 0.974 379169. NA
2 ARG Argentina 2020 0.0500 359026. -5.31
3 ARG Argentina 2021 0.916 378420. 5.40
4 ARG Argentina 2022 0.207 382992. 1.21
5 AUS Australia 2014 0.526 585442. 52.9
6 AUS Australia 2015 0.413 593561. 1.39
7 AUS Australia 2016 0.531 594024. 0.0780
8 AUS Australia 2017 2.53 601065. 1.19
9 AUS Australia 2018 2.22 597774. -0.548
10 AUS Australia 2019 1.98 596032. -0.291
# ℹ 853 more rows
Folder Structure: Create 3 folders (and as many sub-folders as you would like)
00_data_raw: put your raw files in here.
01_data_cleaning: put your r notebooks in hear that you use to clean the data.
03_data_processed: write your processed files into this folder (as .csv, .rds, or whatever file format you prefer).
Include your data tidying plan in an initial section. Try to execute your plan from step 1. Use headers and text to narrate what you are trying to do in code chunks.
When you’ve finished tidying your data, write it to the folder 03_data_processed