Background

This data covers the global carbon dioxide (CO2) emissions broken down by each country from 1960 to 2005. The data includes the country name, the corresponding country code, CO2 emissions per year in the units metric tons per capita, an indicator field specifying the units for CO2 emissions, and a indicator field (EN.ATM.CO2E.PC) specifying the name of the dataset.

This data was collected by the Carbon Dioxide Information Analysis Center under the Environmental Sciences Division at Oak Ridge National Laboratory in Tennessee. The data is made publicly available by the World Bank Group.

It’s important to understand CO2 emissions because carbon dioxide is a greenhouse gas that contributes to global climate change. As the volume of greenhouse gases in the atmosphere increases, the faster we see increases in average global temperatures. These changes in temperature can negatively impact ecosystems and even our quality of life. The annual emissions by country presented in this data provide a straightforward way to identify which countries are the primary contributors of CO2 emissions.

Sources: https://ourworldindata.org/co2-emissions https://data.worldbank.org/indicator/EN.ATM.CO2E.PC

Setup

Import libraries and the dataset. I will use tidyverse to cleanup the data and plotly to visualize and animate the data. I also want to test animations using ggplot and potentially the gganimate library. The CO2 Global Emissions data is contained in a CSV file that I will read in using default settings.

library(tidyverse)
library(gganimate)
library(countrycode)
#library(gifski)
library(plotly)

csvfile <- "co2_global_emissions.csv"
emissions <- read_csv(csvfile)

Data Cleanup

After looking at the columns in the data, I determined that Indicator Name and Indicator Code are not useful because they are the same for every row and only explain the units and source of the data.

summary (emissions)
##  Country Name       Country Code       Indicator Name     Indicator Code    
##  Length:264         Length:264         Length:264         Length:264        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##       1960               1961               1962               1963         
##  Min.   : 0.00802   Min.   : 0.00789   Min.   : 0.00848   Min.   : 0.00938  
##  1st Qu.: 0.18213   1st Qu.: 0.18025   1st Qu.: 0.20020   1st Qu.: 0.19774  
##  Median : 0.62003   Median : 0.64923   Median : 0.65233   Median : 0.64797  
##  Mean   : 2.04418   Mean   : 2.15748   Mean   : 2.24880   Mean   : 2.76342  
##  3rd Qu.: 1.70291   3rd Qu.: 1.74622   3rd Qu.: 1.94327   3rd Qu.: 1.72018  
##  Max.   :36.68518   Max.   :36.58378   Max.   :42.24200   Max.   :99.46300  
##  NA's   :72         NA's   :71         NA's   :69         NA's   :68        
##       1964              1965               1966               1967        
##  Min.   : 0.0116   Min.   : 0.01191   Min.   : 0.01326   Min.   : 0.0118  
##  1st Qu.: 0.2176   1st Qu.: 0.23441   1st Qu.: 0.24922   1st Qu.: 0.2518  
##  Median : 0.7661   Median : 0.69654   Median : 0.74914   Median : 0.8029  
##  Mean   : 2.9127   Mean   : 3.03167   Mean   : 3.04470   Mean   : 3.1112  
##  3rd Qu.: 2.0244   3rd Qu.: 2.19005   3rd Qu.: 2.45582   3rd Qu.: 2.9155  
##  Max.   :92.8595   Max.   :85.45859   Max.   :78.62712   Max.   :77.5086  
##  NA's   :61        NA's   :61         NA's   :61         NA's   :61       
##       1968              1969                1970               1971        
##  Min.   :-0.0201   Min.   :  0.01612   Min.   : 0.01229   Min.   : 0.0119  
##  1st Qu.: 0.2700   1st Qu.:  0.32092   1st Qu.: 0.34980   1st Qu.: 0.3405  
##  Median : 0.9867   Median :  1.05863   Median : 1.00036   Median : 1.0981  
##  Mean   : 3.3093   Mean   :  3.91912   Mean   : 4.19749   Mean   : 4.4219  
##  3rd Qu.: 3.2576   3rd Qu.:  3.59743   3rd Qu.: 4.01242   3rd Qu.: 4.5024  
##  Max.   :75.9753   Max.   :100.69767   Max.   :69.11160   Max.   :76.6415  
##  NA's   :61        NA's   :61          NA's   :59         NA's   :58       
##       1972               1973               1974               1975         
##  Min.   : 0.01153   Min.   : 0.01117   Min.   : 0.00974   Min.   : 0.00975  
##  1st Qu.: 0.35436   1st Qu.: 0.36669   1st Qu.: 0.37185   1st Qu.: 0.38111  
##  Median : 1.11133   Median : 1.13637   Median : 1.23269   Median : 1.28549  
##  Mean   : 4.48812   Mean   : 4.80584   Mean   : 4.49946   Mean   : 4.36611  
##  3rd Qu.: 4.51785   3rd Qu.: 5.11695   3rd Qu.: 4.64417   3rd Qu.: 4.85223  
##  Max.   :82.61945   Max.   :87.65265   Max.   :68.23258   Max.   :66.64312  
##  NA's   :56         NA's   :56         NA's   :56         NA's   :56        
##       1976               1977               1978               1979         
##  Min.   : 0.00991   Min.   : 0.01019   Min.   : 0.00738   Min.   : 0.00433  
##  1st Qu.: 0.36318   1st Qu.: 0.38780   1st Qu.: 0.40225   1st Qu.: 0.43831  
##  Median : 1.36287   Median : 1.43705   Median : 1.51862   Median : 1.57835  
##  Mean   : 4.35662   Mean   : 4.48666   Mean   : 4.51104   Mean   : 4.56304  
##  3rd Qu.: 5.17443   3rd Qu.: 5.28561   3rd Qu.: 5.74670   3rd Qu.: 5.49269  
##  Max.   :61.29021   Max.   :54.40915   Max.   :54.82565   Max.   :69.94185  
##  NA's   :56         NA's   :56         NA's   :56         NA's   :56        
##       1980               1981               1982               1983         
##  Min.   : 0.03563   Min.   : 0.02982   Min.   : 0.02843   Min.   : 0.03099  
##  1st Qu.: 0.44931   1st Qu.: 0.46582   1st Qu.: 0.45183   1st Qu.: 0.45037  
##  Median : 1.52564   Median : 1.52441   Median : 1.47916   Median : 1.36581  
##  Mean   : 4.46439   Mean   : 3.99356   Mean   : 3.87247   Mean   : 3.72682  
##  3rd Qu.: 5.49203   3rd Qu.: 5.30724   3rd Qu.: 5.37670   3rd Qu.: 5.40872  
##  Max.   :58.53435   Max.   :51.82543   Max.   :44.53605   Max.   :36.41181  
##  NA's   :56         NA's   :56         NA's   :56         NA's   :56        
##       1984               1985               1986               1987         
##  Min.   : 0.04113   Min.   : 0.03529   Min.   : 0.03567   Min.   : 0.03662  
##  1st Qu.: 0.47515   1st Qu.: 0.46528   1st Qu.: 0.44069   1st Qu.: 0.48470  
##  Median : 1.44877   Median : 1.54159   Median : 1.55041   Median : 1.63990  
##  Mean   : 3.82439   Mean   : 3.91770   Mean   : 3.90545   Mean   : 3.94261  
##  3rd Qu.: 5.25908   3rd Qu.: 5.56298   3rd Qu.: 4.97794   3rd Qu.: 5.38631  
##  Max.   :36.11639   Max.   :35.89097   Max.   :33.41411   Max.   :30.55837  
##  NA's   :56         NA's   :56         NA's   :55         NA's   :55        
##       1988               1989              1990               1991         
##  Min.   : 0.01182   Min.   : 0.0178   Min.   : 0.02401   Min.   : 0.01073  
##  1st Qu.: 0.50639   1st Qu.: 0.4992   1st Qu.: 0.46026   1st Qu.: 0.45024  
##  Median : 1.75620   Median : 1.6438   Median : 1.67303   Median : 1.86103  
##  Mean   : 4.07731   Mean   : 4.2133   Mean   : 4.08245   Mean   : 4.12135  
##  3rd Qu.: 5.79625   3rd Qu.: 5.8379   3rd Qu.: 5.91487   3rd Qu.: 5.98891  
##  Max.   :29.21023   Max.   :31.0288   Max.   :27.95925   Max.   :36.31713  
##  NA's   :55         NA's   :55        NA's   :49         NA's   :47        
##       1992               1993               1994               1995         
##  Min.   : 0.01328   Min.   : 0.01398   Min.   : 0.01516   Min.   : 0.01571  
##  1st Qu.: 0.57081   1st Qu.: 0.52776   1st Qu.: 0.57165   1st Qu.: 0.58101  
##  Median : 2.27881   Median : 2.23531   Median : 2.19315   Median : 2.32266  
##  Mean   : 4.47999   Mean   : 4.50271   Mean   : 4.42463   Mean   : 4.47459  
##  3rd Qu.: 6.47191   3rd Qu.: 6.65546   3rd Qu.: 6.44669   3rd Qu.: 6.47766  
##  Max.   :54.08917   Max.   :61.25241   Max.   :59.60109   Max.   :61.91238  
##  NA's   :23         NA's   :23         NA's   :22         NA's   :21        
##       1996               1997               1998               1999         
##  Min.   : 0.01722   Min.   : 0.01909   Min.   : 0.01938   Min.   : 0.02006  
##  1st Qu.: 0.61819   1st Qu.: 0.68144   1st Qu.: 0.70258   1st Qu.: 0.74136  
##  Median : 2.39780   Median : 2.27434   Median : 2.25260   Median : 2.25969  
##  Mean   : 4.49417   Mean   : 4.49199   Mean   : 4.48218   Mean   : 4.44955  
##  3rd Qu.: 6.75816   3rd Qu.: 6.57679   3rd Qu.: 6.55386   3rd Qu.: 6.69472  
##  Max.   :61.83934   Max.   :70.13564   Max.   :58.86600   Max.   :55.15501  
##  NA's   :21         NA's   :20         NA's   :19         NA's   :19        
##       2000               2001               2002               2003         
##  Min.   : 0.01729   Min.   : 0.01728   Min.   : 0.01862   Min.   : 0.01919  
##  1st Qu.: 0.74018   1st Qu.: 0.76470   1st Qu.: 0.75710   1st Qu.: 0.80170  
##  Median : 2.33916   Median : 2.43634   Median : 2.50363   Median : 2.62574  
##  Mean   : 4.57853   Mean   : 4.63067   Mean   : 4.59742   Mean   : 4.72905  
##  3rd Qu.: 6.60642   3rd Qu.: 6.91775   3rd Qu.: 6.94779   3rd Qu.: 7.24342  
##  Max.   :58.63936   Max.   :67.10602   Max.   :63.35447   Max.   :60.29957  
##  NA's   :19         NA's   :19         NA's   :18         NA's   :18        
##       2004               2005               2006               2007         
##  Min.   : 0.02261   Min.   : 0.02075   Min.   : 0.02437   Min.   : 0.02356  
##  1st Qu.: 0.83543   1st Qu.: 0.85543   1st Qu.: 0.79841   1st Qu.: 0.89101  
##  Median : 2.65745   Median : 2.76730   Median : 2.91508   Median : 2.88561  
##  Mean   : 4.77632   Mean   : 4.82026   Mean   : 4.89865   Mean   : 4.92978  
##  3rd Qu.: 7.13041   3rd Qu.: 7.03361   3rd Qu.: 7.03593   3rd Qu.: 6.92544  
##  Max.   :56.59083   Max.   :58.91873   Max.   :62.82354   Max.   :53.19099  
##  NA's   :18         NA's   :17         NA's   :16         NA's   :15        
##       2008               2009               2010               2011         
##  Min.   : 0.02322   Min.   : 0.02246   Min.   : 0.02426   Min.   : 0.02676  
##  1st Qu.: 0.81606   1st Qu.: 0.82785   1st Qu.: 0.82081   1st Qu.: 0.83982  
##  Median : 3.02796   Median : 2.95356   Median : 2.93322   Median : 2.92997  
##  Mean   : 4.93589   Mean   : 4.72189   Mean   : 4.84474   Mean   : 4.80634  
##  3rd Qu.: 7.01056   3rd Qu.: 6.31907   3rd Qu.: 6.64135   3rd Qu.: 6.71538  
##  Max.   :46.67214   Max.   :43.51448   Max.   :40.74202   Max.   :41.20565  
##  NA's   :15         NA's   :15         NA's   :15         NA's   :15        
##       2012              2013               2014            2015        
##  Min.   : 0.0303   Min.   : 0.03018   Min.   : 0.04449   Mode:logical  
##  1st Qu.: 0.8280   1st Qu.: 0.84854   1st Qu.: 0.88172   NA's:264      
##  Median : 3.0259   Median : 3.00557   Median : 3.15330                 
##  Mean   : 4.9488   Mean   : 4.86222   Mean   : 4.87468                 
##  3rd Qu.: 6.6646   3rd Qu.: 6.71259   3rd Qu.: 6.36518                 
##  Max.   :44.6179   Max.   :37.78009   Max.   :45.42324                 
##  NA's   :13        NA's   :13         NA's   :14                       
##    2016           2017           2018        
##  Mode:logical   Mode:logical   Mode:logical  
##  NA's:264       NA's:264       NA's:264      
##                                              
##                                              
##                                              
##                                              
## 
e2 <- emissions %>%
    select(-c(`Indicator Name`, `Indicator Code`))

I want to transform the dataframe into a tidy dataset with all of the years as entries in the rows. I pivot the table to move the emissions per year to individual rows for each country. I also use the countrycode library to assign a region to each country.

e3 <- e2 %>%
    pivot_longer(
        !c(`Country Name`, `Country Code`),
        names_to = "year",
        values_to = "co2_emissions",
        values_drop_na = TRUE
        )

e4 <- e3 %>%
    mutate(
        region = countrycode(sourcevar = e3$`Country Name`,
                            origin = "country.name",
                            destination = "region")
    )

I found that after assigning countries to regions, there are some entries in the Country Name columns that are already formatted as regions such as Europe & Central Asia and Arab World. There are also special groups such as IBRD only which are member countries of the World Bank Group. I remove these rows and only use the data for individual countries so I can choose how to group them by regions.

e5 <- e4 %>%
    drop_na() %>%
    rename(country_name = `Country Name`, country_code = `Country Code`)

Statistical Analysis

I look at the mean CO2 emissions per year with the standard deviation and plot this data to see how the world overall has been performing. Due to the range in emissions of countries, the standard deviation is large but the average emissions have remained steady over the last 30 years included in the data.

stats <- e5 %>% 
    group_by(year) %>%
    summarize(co2_mean = mean(co2_emissions),  co2_stdev = sd(co2_emissions))
## `summarise()` ungrouping output (override with `.groups` argument)
p1 <- stats %>% 
    ggplot(aes(x = year, y = co2_mean)) +
    geom_line(group=1) +
    geom_errorbar(aes(ymin = co2_mean - co2_stdev, ymax = co2_mean + co2_stdev)) + 
    scale_x_discrete(name ="Year", breaks=c("1960","1975","1990", "2005")) +
    theme_bw() 

p1

CO2 Emissions per Year

I create a new dataframe that groups the data by year and region. I also look at the mean CO2 emissions per region. To help with formatting in the plots, I convert the years to numeric values and I convert the regions into ordered factors. The order is based on general knowledge of which regions emit more CO2.

df <- e5 %>% 
    group_by(year, region) %>%
    summarize(co2_mean = mean(co2_emissions), co2_stdev = sd(co2_emissions))
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
df$year <- as.numeric(df$year)
df$region <- factor(df$region, ordered = TRUE, levels = c("North America", "Middle East & North Africa", "Europe & Central Asia", "East Asia & Pacific", "Latin America & Caribbean", "Sub-Saharan Africa", "South Asia"))

I first wanted to create an animated plot using only ggplot. This bar chart is this saved as an animated GIF.

ggplot(df, aes(x = region, y = co2_mean, fill = region, frame = year)) +
    geom_bar(stat="identity") +
    theme_bw() +
    labs(title = "Average CO2 Emissions per Region", subtitle = "Year: {as.integer(frame_time)}") + 
    ylab("CO2 Emissions (metric tons per capita") +
    xlab("Global Regions") +
    #transition_states(year, transition_length = 2, state_length = 1) + 
    transition_time(year, range = c(1960, 2014)) +
    ease_aes('sine-in-out') +
    theme(axis.text.x = element_blank(), axis.ticks = element_blank())

anim_save("animated-barplot-co2_emissions.gif")

Next, I remade the same animated bar chart using plotly.

fig <- df %>%
  plot_ly(
    x = ~region, 
    y = ~co2_mean, 
    color = ~region, 
    frame = ~year, 
    text = ~co2_stdev, 
    hovertemplate = '<br>CO2_mean: %{y:.2f}</br>',
    type = 'bar'
  )

fig <- fig %>% layout(title = "Average CO2 Emissions per Region",
         yaxis = list(title = "CO2 Emissions (metric tons per capita)"),
         legend = list(title="Region"))

fig
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

Conclusions

Overall, looking at the average CO2 emissions per region shows a large spike in the Middle East & North Africa and then the Top 3 regions remain consistent as one would expect with North America leading the emissions followed by the Middle East and Europe. It should be noted that Sub-Saharan Africa and South Asia are steadily rising as they become more and more industrialized.