This data covers the global carbon dioxide (CO2) emissions broken down by each country from 1960 to 2005. The data includes the country name, the corresponding country code, CO2 emissions per year in the units metric tons per capita, an indicator field specifying the units for CO2 emissions, and a indicator field (EN.ATM.CO2E.PC) specifying the name of the dataset.
This data was collected by the Carbon Dioxide Information Analysis Center under the Environmental Sciences Division at Oak Ridge National Laboratory in Tennessee. The data is made publicly available by the World Bank Group.
It’s important to understand CO2 emissions because carbon dioxide is a greenhouse gas that contributes to global climate change. As the volume of greenhouse gases in the atmosphere increases, the faster we see increases in average global temperatures. These changes in temperature can negatively impact ecosystems and even our quality of life. The annual emissions by country presented in this data provide a straightforward way to identify which countries are the primary contributors of CO2 emissions.
Sources: https://ourworldindata.org/co2-emissions https://data.worldbank.org/indicator/EN.ATM.CO2E.PC
Import libraries and the dataset. I will use tidyverse to cleanup the data and plotly to visualize and animate the data. I also want to test animations using ggplot and potentially the gganimate library. The CO2 Global Emissions data is contained in a CSV file that I will read in using default settings.
library(tidyverse)
library(gganimate)
library(countrycode)
#library(gifski)
library(plotly)
csvfile <- "co2_global_emissions.csv"
emissions <- read_csv(csvfile)
After looking at the columns in the data, I determined that Indicator Name and Indicator Code are not useful because they are the same for every row and only explain the units and source of the data.
summary (emissions)
## Country Name Country Code Indicator Name Indicator Code
## Length:264 Length:264 Length:264 Length:264
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 1960 1961 1962 1963
## Min. : 0.00802 Min. : 0.00789 Min. : 0.00848 Min. : 0.00938
## 1st Qu.: 0.18213 1st Qu.: 0.18025 1st Qu.: 0.20020 1st Qu.: 0.19774
## Median : 0.62003 Median : 0.64923 Median : 0.65233 Median : 0.64797
## Mean : 2.04418 Mean : 2.15748 Mean : 2.24880 Mean : 2.76342
## 3rd Qu.: 1.70291 3rd Qu.: 1.74622 3rd Qu.: 1.94327 3rd Qu.: 1.72018
## Max. :36.68518 Max. :36.58378 Max. :42.24200 Max. :99.46300
## NA's :72 NA's :71 NA's :69 NA's :68
## 1964 1965 1966 1967
## Min. : 0.0116 Min. : 0.01191 Min. : 0.01326 Min. : 0.0118
## 1st Qu.: 0.2176 1st Qu.: 0.23441 1st Qu.: 0.24922 1st Qu.: 0.2518
## Median : 0.7661 Median : 0.69654 Median : 0.74914 Median : 0.8029
## Mean : 2.9127 Mean : 3.03167 Mean : 3.04470 Mean : 3.1112
## 3rd Qu.: 2.0244 3rd Qu.: 2.19005 3rd Qu.: 2.45582 3rd Qu.: 2.9155
## Max. :92.8595 Max. :85.45859 Max. :78.62712 Max. :77.5086
## NA's :61 NA's :61 NA's :61 NA's :61
## 1968 1969 1970 1971
## Min. :-0.0201 Min. : 0.01612 Min. : 0.01229 Min. : 0.0119
## 1st Qu.: 0.2700 1st Qu.: 0.32092 1st Qu.: 0.34980 1st Qu.: 0.3405
## Median : 0.9867 Median : 1.05863 Median : 1.00036 Median : 1.0981
## Mean : 3.3093 Mean : 3.91912 Mean : 4.19749 Mean : 4.4219
## 3rd Qu.: 3.2576 3rd Qu.: 3.59743 3rd Qu.: 4.01242 3rd Qu.: 4.5024
## Max. :75.9753 Max. :100.69767 Max. :69.11160 Max. :76.6415
## NA's :61 NA's :61 NA's :59 NA's :58
## 1972 1973 1974 1975
## Min. : 0.01153 Min. : 0.01117 Min. : 0.00974 Min. : 0.00975
## 1st Qu.: 0.35436 1st Qu.: 0.36669 1st Qu.: 0.37185 1st Qu.: 0.38111
## Median : 1.11133 Median : 1.13637 Median : 1.23269 Median : 1.28549
## Mean : 4.48812 Mean : 4.80584 Mean : 4.49946 Mean : 4.36611
## 3rd Qu.: 4.51785 3rd Qu.: 5.11695 3rd Qu.: 4.64417 3rd Qu.: 4.85223
## Max. :82.61945 Max. :87.65265 Max. :68.23258 Max. :66.64312
## NA's :56 NA's :56 NA's :56 NA's :56
## 1976 1977 1978 1979
## Min. : 0.00991 Min. : 0.01019 Min. : 0.00738 Min. : 0.00433
## 1st Qu.: 0.36318 1st Qu.: 0.38780 1st Qu.: 0.40225 1st Qu.: 0.43831
## Median : 1.36287 Median : 1.43705 Median : 1.51862 Median : 1.57835
## Mean : 4.35662 Mean : 4.48666 Mean : 4.51104 Mean : 4.56304
## 3rd Qu.: 5.17443 3rd Qu.: 5.28561 3rd Qu.: 5.74670 3rd Qu.: 5.49269
## Max. :61.29021 Max. :54.40915 Max. :54.82565 Max. :69.94185
## NA's :56 NA's :56 NA's :56 NA's :56
## 1980 1981 1982 1983
## Min. : 0.03563 Min. : 0.02982 Min. : 0.02843 Min. : 0.03099
## 1st Qu.: 0.44931 1st Qu.: 0.46582 1st Qu.: 0.45183 1st Qu.: 0.45037
## Median : 1.52564 Median : 1.52441 Median : 1.47916 Median : 1.36581
## Mean : 4.46439 Mean : 3.99356 Mean : 3.87247 Mean : 3.72682
## 3rd Qu.: 5.49203 3rd Qu.: 5.30724 3rd Qu.: 5.37670 3rd Qu.: 5.40872
## Max. :58.53435 Max. :51.82543 Max. :44.53605 Max. :36.41181
## NA's :56 NA's :56 NA's :56 NA's :56
## 1984 1985 1986 1987
## Min. : 0.04113 Min. : 0.03529 Min. : 0.03567 Min. : 0.03662
## 1st Qu.: 0.47515 1st Qu.: 0.46528 1st Qu.: 0.44069 1st Qu.: 0.48470
## Median : 1.44877 Median : 1.54159 Median : 1.55041 Median : 1.63990
## Mean : 3.82439 Mean : 3.91770 Mean : 3.90545 Mean : 3.94261
## 3rd Qu.: 5.25908 3rd Qu.: 5.56298 3rd Qu.: 4.97794 3rd Qu.: 5.38631
## Max. :36.11639 Max. :35.89097 Max. :33.41411 Max. :30.55837
## NA's :56 NA's :56 NA's :55 NA's :55
## 1988 1989 1990 1991
## Min. : 0.01182 Min. : 0.0178 Min. : 0.02401 Min. : 0.01073
## 1st Qu.: 0.50639 1st Qu.: 0.4992 1st Qu.: 0.46026 1st Qu.: 0.45024
## Median : 1.75620 Median : 1.6438 Median : 1.67303 Median : 1.86103
## Mean : 4.07731 Mean : 4.2133 Mean : 4.08245 Mean : 4.12135
## 3rd Qu.: 5.79625 3rd Qu.: 5.8379 3rd Qu.: 5.91487 3rd Qu.: 5.98891
## Max. :29.21023 Max. :31.0288 Max. :27.95925 Max. :36.31713
## NA's :55 NA's :55 NA's :49 NA's :47
## 1992 1993 1994 1995
## Min. : 0.01328 Min. : 0.01398 Min. : 0.01516 Min. : 0.01571
## 1st Qu.: 0.57081 1st Qu.: 0.52776 1st Qu.: 0.57165 1st Qu.: 0.58101
## Median : 2.27881 Median : 2.23531 Median : 2.19315 Median : 2.32266
## Mean : 4.47999 Mean : 4.50271 Mean : 4.42463 Mean : 4.47459
## 3rd Qu.: 6.47191 3rd Qu.: 6.65546 3rd Qu.: 6.44669 3rd Qu.: 6.47766
## Max. :54.08917 Max. :61.25241 Max. :59.60109 Max. :61.91238
## NA's :23 NA's :23 NA's :22 NA's :21
## 1996 1997 1998 1999
## Min. : 0.01722 Min. : 0.01909 Min. : 0.01938 Min. : 0.02006
## 1st Qu.: 0.61819 1st Qu.: 0.68144 1st Qu.: 0.70258 1st Qu.: 0.74136
## Median : 2.39780 Median : 2.27434 Median : 2.25260 Median : 2.25969
## Mean : 4.49417 Mean : 4.49199 Mean : 4.48218 Mean : 4.44955
## 3rd Qu.: 6.75816 3rd Qu.: 6.57679 3rd Qu.: 6.55386 3rd Qu.: 6.69472
## Max. :61.83934 Max. :70.13564 Max. :58.86600 Max. :55.15501
## NA's :21 NA's :20 NA's :19 NA's :19
## 2000 2001 2002 2003
## Min. : 0.01729 Min. : 0.01728 Min. : 0.01862 Min. : 0.01919
## 1st Qu.: 0.74018 1st Qu.: 0.76470 1st Qu.: 0.75710 1st Qu.: 0.80170
## Median : 2.33916 Median : 2.43634 Median : 2.50363 Median : 2.62574
## Mean : 4.57853 Mean : 4.63067 Mean : 4.59742 Mean : 4.72905
## 3rd Qu.: 6.60642 3rd Qu.: 6.91775 3rd Qu.: 6.94779 3rd Qu.: 7.24342
## Max. :58.63936 Max. :67.10602 Max. :63.35447 Max. :60.29957
## NA's :19 NA's :19 NA's :18 NA's :18
## 2004 2005 2006 2007
## Min. : 0.02261 Min. : 0.02075 Min. : 0.02437 Min. : 0.02356
## 1st Qu.: 0.83543 1st Qu.: 0.85543 1st Qu.: 0.79841 1st Qu.: 0.89101
## Median : 2.65745 Median : 2.76730 Median : 2.91508 Median : 2.88561
## Mean : 4.77632 Mean : 4.82026 Mean : 4.89865 Mean : 4.92978
## 3rd Qu.: 7.13041 3rd Qu.: 7.03361 3rd Qu.: 7.03593 3rd Qu.: 6.92544
## Max. :56.59083 Max. :58.91873 Max. :62.82354 Max. :53.19099
## NA's :18 NA's :17 NA's :16 NA's :15
## 2008 2009 2010 2011
## Min. : 0.02322 Min. : 0.02246 Min. : 0.02426 Min. : 0.02676
## 1st Qu.: 0.81606 1st Qu.: 0.82785 1st Qu.: 0.82081 1st Qu.: 0.83982
## Median : 3.02796 Median : 2.95356 Median : 2.93322 Median : 2.92997
## Mean : 4.93589 Mean : 4.72189 Mean : 4.84474 Mean : 4.80634
## 3rd Qu.: 7.01056 3rd Qu.: 6.31907 3rd Qu.: 6.64135 3rd Qu.: 6.71538
## Max. :46.67214 Max. :43.51448 Max. :40.74202 Max. :41.20565
## NA's :15 NA's :15 NA's :15 NA's :15
## 2012 2013 2014 2015
## Min. : 0.0303 Min. : 0.03018 Min. : 0.04449 Mode:logical
## 1st Qu.: 0.8280 1st Qu.: 0.84854 1st Qu.: 0.88172 NA's:264
## Median : 3.0259 Median : 3.00557 Median : 3.15330
## Mean : 4.9488 Mean : 4.86222 Mean : 4.87468
## 3rd Qu.: 6.6646 3rd Qu.: 6.71259 3rd Qu.: 6.36518
## Max. :44.6179 Max. :37.78009 Max. :45.42324
## NA's :13 NA's :13 NA's :14
## 2016 2017 2018
## Mode:logical Mode:logical Mode:logical
## NA's:264 NA's:264 NA's:264
##
##
##
##
##
e2 <- emissions %>%
select(-c(`Indicator Name`, `Indicator Code`))
I want to transform the dataframe into a tidy dataset with all of the years as entries in the rows. I pivot the table to move the emissions per year to individual rows for each country. I also use the countrycode library to assign a region to each country.
e3 <- e2 %>%
pivot_longer(
!c(`Country Name`, `Country Code`),
names_to = "year",
values_to = "co2_emissions",
values_drop_na = TRUE
)
e4 <- e3 %>%
mutate(
region = countrycode(sourcevar = e3$`Country Name`,
origin = "country.name",
destination = "region")
)
I found that after assigning countries to regions, there are some entries in the Country Name columns that are already formatted as regions such as Europe & Central Asia and Arab World. There are also special groups such as IBRD only which are member countries of the World Bank Group. I remove these rows and only use the data for individual countries so I can choose how to group them by regions.
e5 <- e4 %>%
drop_na() %>%
rename(country_name = `Country Name`, country_code = `Country Code`)
I look at the mean CO2 emissions per year with the standard deviation and plot this data to see how the world overall has been performing. Due to the range in emissions of countries, the standard deviation is large but the average emissions have remained steady over the last 30 years included in the data.
stats <- e5 %>%
group_by(year) %>%
summarize(co2_mean = mean(co2_emissions), co2_stdev = sd(co2_emissions))
## `summarise()` ungrouping output (override with `.groups` argument)
p1 <- stats %>%
ggplot(aes(x = year, y = co2_mean)) +
geom_line(group=1) +
geom_errorbar(aes(ymin = co2_mean - co2_stdev, ymax = co2_mean + co2_stdev)) +
scale_x_discrete(name ="Year", breaks=c("1960","1975","1990", "2005")) +
theme_bw()
p1
I create a new dataframe that groups the data by year and region. I also look at the mean CO2 emissions per region. To help with formatting in the plots, I convert the years to numeric values and I convert the regions into ordered factors. The order is based on general knowledge of which regions emit more CO2.
df <- e5 %>%
group_by(year, region) %>%
summarize(co2_mean = mean(co2_emissions), co2_stdev = sd(co2_emissions))
## `summarise()` regrouping output by 'year' (override with `.groups` argument)
df$year <- as.numeric(df$year)
df$region <- factor(df$region, ordered = TRUE, levels = c("North America", "Middle East & North Africa", "Europe & Central Asia", "East Asia & Pacific", "Latin America & Caribbean", "Sub-Saharan Africa", "South Asia"))
I first wanted to create an animated plot using only ggplot. This bar chart is this saved as an animated GIF.
ggplot(df, aes(x = region, y = co2_mean, fill = region, frame = year)) +
geom_bar(stat="identity") +
theme_bw() +
labs(title = "Average CO2 Emissions per Region", subtitle = "Year: {as.integer(frame_time)}") +
ylab("CO2 Emissions (metric tons per capita") +
xlab("Global Regions") +
#transition_states(year, transition_length = 2, state_length = 1) +
transition_time(year, range = c(1960, 2014)) +
ease_aes('sine-in-out') +
theme(axis.text.x = element_blank(), axis.ticks = element_blank())
anim_save("animated-barplot-co2_emissions.gif")
Next, I remade the same animated bar chart using plotly.
fig <- df %>%
plot_ly(
x = ~region,
y = ~co2_mean,
color = ~region,
frame = ~year,
text = ~co2_stdev,
hovertemplate = '<br>CO2_mean: %{y:.2f}</br>',
type = 'bar'
)
fig <- fig %>% layout(title = "Average CO2 Emissions per Region",
yaxis = list(title = "CO2 Emissions (metric tons per capita)"),
legend = list(title="Region"))
fig
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Overall, looking at the average CO2 emissions per region shows a large spike in the Middle East & North Africa and then the Top 3 regions remain consistent as one would expect with North America leading the emissions followed by the Middle East and Europe. It should be noted that Sub-Saharan Africa and South Asia are steadily rising as they become more and more industrialized.