SusFin Week7 Find your data & clean it up

Understanding the PCAF Methodology

The PCAF Global GHG Accounting Standard provides guidance on how to measure and report greenhouse gas (GHG) emissions associated with financial activities. The standard includes a methodology for assessing the GHG emissions associated with sovereign debt.

Scope1 Scope2 Scope3
Domestic GHG emissions from sources located within the country territory GHG emissions occurring as a consequence of the domestic use of grid-supplied electricity, heat, steam and/or cooling which is imported from another territory Emissions attributable to non- energy imports as a result of activities taking place within the country territory

To calculate the metrics in the sovereign debt methodology, I download 3 data sets, World Total GHG Emissions (including LUCF) from Climate Watch, Carbon dioxide emissions embodied in international trade (2021 ed.) from OECD, and Standard macroeconomic metrics such as PPP, GDP, and population from World Bank. My processing of these three data sets can be found below:

Climate Watch Data

It shows GHG emissions data for 193 countries from 1990 to 2019, using MtCO2e as the unit. For presentation purposes, I have selected the latest 10 years of data, i.e. 2010-2019. The data provided by Climate Watch is already relatively well standardized, as follows.

Click here to show/hide the table

(193 observations of 11 variables)

OECD Data

It shows the CO2 emissions embodied in international trade for 84 countries and regions from 1995 to 2018, using millions tonnes as the unit. Same as above, I have selected the latest 10 years of data, i.e. 2009-2018. In addition to countries, OECD data also includes “World”, “OECD countries”, “EU” and other country groups.

The biggest issue with this dataset is that the “country” column is not the standard country name, but rather a “country code: country name” format, like this:

Click here to show/hide the table

(84 observations of 11 variables)

In order to tidy this data, it is first necessary to separate the “country” column into “country name” and “country code” using the “separate()” function in tidyr package.

Click here to show/hide the table

(84 observations of 12 variables)

Then, the “country name” needs to be standardized (e.g., Türkiye to Turkey) Given I already have the “country code”, I am using the “iso3c” argument in the countryname() function in the countrycode package.

The tidy OECD data are as follows

Click here to show/hide the table

(84 observations of 12 variables)

World Bank Data

It combined 3 data sets available at the World Bank website for 266 countries and regions: total population since 1960, GDP in $ since 1960, and PPP in $ since 1990. For presentation purposes, I have chosen the last 5 years of data, i.e., population, GDP, and PPP for 2017-2021.

Click here to show/hide the table

( 266 observations of 17 variables)

Country names in World Bank data are already standardized, and data tidying for it focuses on turning wide-format into long-format, using the pivot() function. First, I separate the three indicators of population, GPD and PPP, keeping five years of data for each.

Click here to show/hide the table

( 798 observations of 8 variables)

Now the data already looks better, but not enough. Now, I pivot it further, keeping only the variables “Year” and “Value”.

Click here to show/hide the table

( 3990 observations of 5 variables)

Based on this, I use the as.Date() and as.numeric() functions to handle the values under the “Year” and “Value” variables.

Click here to show/hide the table

( 3990 observations of 5 variables)