Exploring PCAF Data

Author

Okung Obang

Exploring Processed PCAF Data

Code
# Install necessary packages
install.packages('rnaturalearth', repos = "http://cran.us.r-project.org")
Installing package into 'C:/Users/okung/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)
package 'rnaturalearth' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\okung\AppData\Local\Temp\Rtmp0CHV6L\downloaded_packages
Code
install.packages('wbstats', repos = "http://cran.us.r-project.org")
Installing package into 'C:/Users/okung/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)
package 'wbstats' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\okung\AppData\Local\Temp\Rtmp0CHV6L\downloaded_packages
Code
install.packages('here', repos = "http://cran.us.r-project.org")
Installing package into 'C:/Users/okung/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)
package 'here' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\okung\AppData\Local\Temp\Rtmp0CHV6L\downloaded_packages
Code
install.packages('readxl1', repos = "http://cran.us.r-project.org")
Installing package into 'C:/Users/okung/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)
Warning: package 'readxl1' is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
Code
# Load necessary library
library(tidyverse) 
Warning: package 'tidyverse' was built under R version 4.2.2
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.2     ✔ forcats 0.5.2
Warning: package 'ggplot2' was built under R version 4.2.2
Warning: package 'tidyr' was built under R version 4.2.2
Warning: package 'purrr' was built under R version 4.2.2
Warning: package 'dplyr' was built under R version 4.2.2
Warning: package 'stringr' was built under R version 4.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Code
library(janitor)
Warning: package 'janitor' was built under R version 4.2.2

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
Code
library(here)
Warning: package 'here' was built under R version 4.2.3
here() starts at C:/Users/okung/OneDrive/Johns Hopkins - SAIS/2022-2023/SP23/Sustainable Finance/Week 7
Code
library(readxl)
Warning: package 'readxl' was built under R version 4.2.2
Code
library(rnaturalearth)
Warning: package 'rnaturalearth' was built under R version 4.2.3
Code
library(countrycode)
Warning: package 'countrycode' was built under R version 4.2.2
Code
library(wbstats)
Warning: package 'wbstats' was built under R version 4.2.3
Code
# Open processed data

pcaf <- read_csv("C:/Users/okung/OneDrive/Johns Hopkins - SAIS/2022-2023/SP23/Sustainable Finance/Week 7/ipcc_wb_imf_processed.csv")
Rows: 16918 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): IPCC_annex, C_group_IM24_sh, Countrcode_A3, Country Name, Substance...
dbl (4): year, CO2 Emissions, GDP PPP, GDP, bn of USD

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
glimpse(pcaf)
Rows: 16,918
Columns: 12
$ IPCC_annex       <chr> "Non-Annex_I", "Non-Annex_I", "Non-Annex_I", "Non-Ann…
$ C_group_IM24_sh  <chr> "Rest Central America", "Rest Central America", "Rest…
$ Countrcode_A3    <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW…
$ `Country Name`   <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba",…
$ Substance        <chr> "CO2", "CO2", "CO2", "CO2", "CO2", "CO2", "CO2", "CO2…
$ year             <dbl> 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978,…
$ `CO2 Emissions`  <dbl> 95.06542, 94.50371, 105.84781, 109.86386, 99.97718, 1…
$ `Country Code`   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `Indicator Name` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `Indicator Code` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `GDP PPP`        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ `GDP, bn of USD` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
Code
pcaf <- pcaf |>
  mutate(region = countrycode(sourcevar = `Country Name`,
                              origin = "country.name",
                              destination = "region"))
Warning: There were 2 warnings in `mutate()`.
The first warning was:
ℹ In argument: `region = countrycode(...)`.
Caused by warning in `countrycode_convert()`:
! Some values were not matched unambiguously: Advanced economies, Africa (Region), Africa Eastern and Southern, Africa Western and Central, Arab World, ASEAN-5, Asia and Pacific, Australia and New Zealand, Caribbean, Caribbean small states, Central America, Central Asia and the Caucasus, Central Europe and the Baltics, Early-demographic dividend, East Asia, East Asia & Pacific, East Asia & Pacific (excluding high income), East Asia & Pacific (IDA & IBRD countries), Eastern Europe, Emerging and Developing Asia, Emerging and Developing Europe, Emerging market and developing economies, Euro area, Europe, Europe & Central Asia, Europe & Central Asia (excluding high income), Europe & Central Asia (IDA & IBRD countries), European Union, Fragile and conflict affected situations, Heavily indebted poor countries (HIPC), High income, IBRD only, IDA & IBRD total, IDA blend, IDA only, IDA total, Int. Aviation, Int. Shipping, Late-demographic dividend, Latin America & Caribbean, Latin America & Caribbean (excluding high income), Latin America & the Caribbean (IDA & IBRD countries), Latin America and the Caribbean, Least developed countries: UN classification, Low & middle income, Low income, Lower middle income, Major advanced economies (G7), Middle East & North Africa, Middle East & North Africa (excluding high income), Middle East & North Africa (IDA & IBRD countries), Middle East (Region), Middle East and Central Asia, Middle income, North Africa, North America, OECD members, Other advanced economies, Other small states, Pacific island small states, Pacific Islands, Post-demographic dividend, Pre-demographic dividend, Reunion, Saint Helena, Small states, South America, South Asia, South Asia (IDA & IBRD), Southeast Asia, Sub-Saharan Africa, Sub-Saharan Africa (excluding high income), Sub-Saharan Africa (IDA & IBRD countries), Sub-Saharan Africa (Region), Turkiye, Türkiye, Republic of, Upper middle income, Western Europe, Western Hemisphere (Region), Western Sahara, World
ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
Code
pcaf
# A tibble: 16,918 × 13
   IPCC_…¹ C_gro…² Count…³ Count…⁴ Subst…⁵  year CO2 E…⁶ Count…⁷ Indic…⁸ Indic…⁹
   <chr>   <chr>   <chr>   <chr>   <chr>   <dbl>   <dbl> <chr>   <chr>   <chr>  
 1 Non-An… Rest C… ABW     Aruba   CO2      1970    95.1 <NA>    <NA>    <NA>   
 2 Non-An… Rest C… ABW     Aruba   CO2      1971    94.5 <NA>    <NA>    <NA>   
 3 Non-An… Rest C… ABW     Aruba   CO2      1972   106.  <NA>    <NA>    <NA>   
 4 Non-An… Rest C… ABW     Aruba   CO2      1973   110.  <NA>    <NA>    <NA>   
 5 Non-An… Rest C… ABW     Aruba   CO2      1974   100.  <NA>    <NA>    <NA>   
 6 Non-An… Rest C… ABW     Aruba   CO2      1975   115.  <NA>    <NA>    <NA>   
 7 Non-An… Rest C… ABW     Aruba   CO2      1976   105.  <NA>    <NA>    <NA>   
 8 Non-An… Rest C… ABW     Aruba   CO2      1977   113.  <NA>    <NA>    <NA>   
 9 Non-An… Rest C… ABW     Aruba   CO2      1978   116.  <NA>    <NA>    <NA>   
10 Non-An… Rest C… ABW     Aruba   CO2      1979   109.  <NA>    <NA>    <NA>   
# … with 16,908 more rows, 3 more variables: `GDP PPP` <dbl>,
#   `GDP, bn of USD` <dbl>, region <chr>, and abbreviated variable names
#   ¹​IPCC_annex, ²​C_group_IM24_sh, ³​Countrcode_A3, ⁴​`Country Name`, ⁵​Substance,
#   ⁶​`CO2 Emissions`, ⁷​`Country Code`, ⁸​`Indicator Name`, ⁹​`Indicator Code`

Annex vs. Non-Annex

Code
# How do annex and non-annex compare over the years in terms of emissions?

annex <- pcaf |>
  group_by(IPCC_annex, year) |>
  summarize(sum_emissions = sum(`CO2 Emissions`),
            avg_emissions = mean(`CO2 Emissions`)) |>
  ungroup()
`summarise()` has grouped output by 'IPCC_annex'. You can override using the
`.groups` argument.
Code
annex
# A tibble: 256 × 4
   IPCC_annex  year sum_emissions avg_emissions
   <chr>      <dbl>         <dbl>         <dbl>
 1 Annex_I     1970     13241630.       315277.
 2 Annex_I     1971     13127001.       312548.
 3 Annex_I     1972     13693147.       326027.
 4 Annex_I     1973     14406991.       343024.
 5 Annex_I     1974     14255888.       339426.
 6 Annex_I     1975     13983172.       332933.
 7 Annex_I     1976     14750293.       351197.
 8 Annex_I     1977     15018366.       357580.
 9 Annex_I     1978     15219078.       362359.
10 Annex_I     1979     15553188.       370314.
# … with 246 more rows
Code
annex |>
  ggplot(mapping = aes(x=year, y=sum_emissions, fill = IPCC_annex)) + geom_col() 
Warning: Removed 48 rows containing missing values (`position_stack()`).

Code
annex |>
  ggplot(mapping = aes(x=year, y=avg_emissions, fill = IPCC_annex)) + geom_col()
Warning: Removed 48 rows containing missing values (`position_stack()`).

The total CO2 emissions between Annex and Non Annex nations over the years has trended upward, with most of originating from Non-Annex nations. But, when we look at the average emissions for each, it is evident that Non-Annex nations have lower emissions. The bulk of average emissions emerges from international shipping and aviation through trade, followed by annex nations.

Non-Annex Regional Countries

Code
# How do regional non-annex countries look?

regional <- pcaf |>
  filter( IPCC_annex == "Non-Annex_I") |>
  group_by(year, region) |>
  summarize(sum_emissions = sum(`CO2 Emissions`),
            avg_emissions = mean(`CO2 Emissions`)) |>
  ungroup()
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
Code
regional
# A tibble: 416 × 4
    year region                     sum_emissions avg_emissions
   <dbl> <chr>                              <dbl>         <dbl>
 1  1970 East Asia & Pacific             2319170.        79971.
 2  1970 Europe & Central Asia            415212.        29658.
 3  1970 Latin America & Caribbean       1098980.        24977.
 4  1970 Middle East & North Africa       382000.        20105.
 5  1970 North America                       549.          274.
 6  1970 South Asia                       935416.       116927.
 7  1970 Sub-Saharan Africa               776871.        16529.
 8  1970 <NA>                               2596.          865.
 9  1971 East Asia & Pacific             2325218.        80180.
10  1971 Europe & Central Asia            414992.        29642.
# … with 406 more rows
Code
regional |>
  ggplot(mapping = aes(x=year, y=sum_emissions, fill = region)) + geom_col()

Code
regional |>
  ggplot(mapping = aes(x=year, y=avg_emissions, fill = region)) + geom_col()

It was interesting to see how annex and non-annex countries differ, but looking at emission through a regional lens was also insightful as we see most of the total emission increases have been seen within East Asia, particularly with the rise of China, followed by South Asia. But then, when we take a glance at the averages, East Asia and South Asia are closer in footing on average emissions.

GDP Growth vs. Emissions

Code
# Is there correlation between the rate of gdp growth in the regional areas and their emissions? How about for annex nations?

growth <- pcaf |>
  group_by(IPCC_annex,year, region) |>
  summarize(avg_gdp = mean(`GDP PPP`)) |>
  ungroup()
`summarise()` has grouped output by 'IPCC_annex', 'year'. You can override
using the `.groups` argument.
Code
growth
# A tibble: 1,028 × 4
   IPCC_annex  year region                     avg_gdp
   <chr>      <dbl> <chr>                        <dbl>
 1 Annex_I     1970 East Asia & Pacific             NA
 2 Annex_I     1970 Europe & Central Asia           NA
 3 Annex_I     1970 Middle East & North Africa      NA
 4 Annex_I     1970 North America                   NA
 5 Annex_I     1971 East Asia & Pacific             NA
 6 Annex_I     1971 Europe & Central Asia           NA
 7 Annex_I     1971 Middle East & North Africa      NA
 8 Annex_I     1971 North America                   NA
 9 Annex_I     1972 East Asia & Pacific             NA
10 Annex_I     1972 Europe & Central Asia           NA
# … with 1,018 more rows
Code
growth |>
  ggplot(mapping = aes(x=year, y = avg_gdp, fill = IPCC_annex, region)) + geom_col()
Warning: Removed 887 rows containing missing values (`position_stack()`).

Exports by Region over the Years

Code
# In terms of exports, what does it look like between regions?

exports <- pcaf |>
  group_by(year, region) |>
  summarize(sum_exp = sum(`GDP, bn of USD`),
            avg_exp = mean(`GDP, bn of USD`)) |>
  ungroup() 
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
Code
  #drop_na()
exports
# A tibble: 464 × 4
    year region                     sum_exp avg_exp
   <dbl> <chr>                        <dbl>   <dbl>
 1  1970 East Asia & Pacific             NA      NA
 2  1970 Europe & Central Asia           NA      NA
 3  1970 Latin America & Caribbean       NA      NA
 4  1970 Middle East & North Africa      NA      NA
 5  1970 North America                   NA      NA
 6  1970 South Asia                      NA      NA
 7  1970 Sub-Saharan Africa              NA      NA
 8  1970 <NA>                            NA      NA
 9  1971 East Asia & Pacific             NA      NA
10  1971 Europe & Central Asia           NA      NA
# … with 454 more rows
Code
exports |>
  ggplot(mapping = aes(x=year, y=sum_exp, color = region)) + geom_point()
Warning: Removed 419 rows containing missing values (`geom_point()`).

Code
exports |>
  ggplot(mapping = aes(x=year, y=avg_exp, color = region)) + geom_point()
Warning: Removed 419 rows containing missing values (`geom_point()`).

Average GDP growth vs Average CO2 emissions for Annex and non-Annex

Code
# In terms of exports, what does it look like between Annex and non-Annex nations?

exports <- pcaf |>
  group_by(year, IPCC_annex, region) |>
  drop_na() |>
  summarise(avg_gdp_bn = mean(`GDP, bn of USD`),
            avg_co2 = mean(`CO2 Emissions`))
`summarise()` has grouped output by 'year', 'IPCC_annex'. You can override
using the `.groups` argument.
Code
exports
# A tibble: 320 × 5
# Groups:   year, IPCC_annex [64]
    year IPCC_annex  region                     avg_gdp_bn  avg_co2
   <dbl> <chr>       <chr>                           <dbl>    <dbl>
 1  1990 Annex_I     East Asia & Pacific           1189.    523217.
 2  1990 Annex_I     Europe & Central Asia          400.    307124.
 3  1990 Annex_I     Middle East & North Africa       2.40    2352.
 4  1990 Annex_I     North America                 5963.   5435172.
 5  1990 Non-Annex_I East Asia & Pacific             27.0    59835.
 6  1990 Non-Annex_I Europe & Central Asia            2.22    8352.
 7  1990 Non-Annex_I Latin America & Caribbean       40.7    49530.
 8  1990 Non-Annex_I Middle East & North Africa      33.4    43227.
 9  1990 Non-Annex_I South Asia                      73.5   265533.
10  1990 Non-Annex_I Sub-Saharan Africa               9.69   32084.
# … with 310 more rows
Code
exports |>
  ggplot(mapping = aes(x=avg_gdp_bn, y=avg_co2, color = region)) + geom_point()

Looking at emission on the basis of origin shows that developing or annex nations produce the most emissions, but when we look at the relationship between average emissions and average GDP, it shows that North America is the outlier on average emission.

CO2 Intensity by Region

Code
# How does CO2 Intensity look by region?

co2_intens <- pcaf |>
  group_by(year, region) |>
  drop_na() |>
  mutate(co2_intensity = `CO2 Emissions`/`GDP PPP`) |>
  summarize(avg_co2 = mean(co2_intensity),
            sum_co2 = sum(co2_intensity)) |>
  ungroup()
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
Code
co2_intens
# A tibble: 224 × 4
    year region                         avg_co2     sum_co2
   <dbl> <chr>                            <dbl>       <dbl>
 1  1990 East Asia & Pacific        0.000000708 0.0000120  
 2  1990 Europe & Central Asia      0.000000786 0.0000181  
 3  1990 Latin America & Caribbean  0.000000770 0.0000215  
 4  1990 Middle East & North Africa 0.000000528 0.00000528 
 5  1990 North America              0.000000911 0.000000911
 6  1990 South Asia                 0.00000122  0.00000732 
 7  1990 Sub-Saharan Africa         0.00000155  0.0000556  
 8  1991 East Asia & Pacific        0.000000696 0.0000118  
 9  1991 Europe & Central Asia      0.000000769 0.0000185  
10  1991 Latin America & Caribbean  0.000000727 0.0000204  
# … with 214 more rows
Code
co2_intens |>
  ggplot(mapping = aes(x=year, y = sum_co2, color = region)) + geom_line()

Code
co2_intens |>
  ggplot(mapping = aes(x=year, y = avg_co2, color = region)) + geom_line()

Despite that emissions seem to be increasing, and data seems to point that developing nations have greater carbon intensity due to their low GDP PPP relative to developed nations. But again, pointing to the previous graph, North America has higher emissions, but notably lower carbon intensity, again which could be due to its relatively high GDP per capita. But either way, carbon intensity across all regions is decreasing, which means that countries are taking action to decrease emissions, but it seems that developing nations may be penalized in attempts to grow their GDP PPP.