This dataset contains information on sustainable energy indicators across all countries from 2000 to 2020. Some aspects it covers include access to electricity, renewable energy capacity, energy intensity (energy use per unit of GDP at purchasing power parity), and financial flows (aid from developed countries for clean energy projects). This project explored the evolving relationship between renewable energy adoption (measured by the percentage of renewable energy in a country’s final energy consumption) and carbon emissions per capita, focusing on the most carbon-intensive countries. The data and background information was primarily derived from the scientific online publication Our World in Data, with additional references from the World Bank and the International Energy Agency.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
energy <- read_csv("global-data-on-sustainable-energy (1).csv")
## Rows: 3649 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Entity
## dbl (19): Year, Access to electricity (% of population), Access to clean fue...
## num (1): Density\n(P/Km2)
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
names(energy) <- tolower(names(energy))
names(energy) <- gsub(" ", "_", names(energy))
head(energy)
## # A tibble: 6 × 21
## entity year access_to_electricity_(%_of_populat…¹ access_to_clean_fuel…²
## <chr> <dbl> <dbl> <dbl>
## 1 Afghanistan 2000 1.61 6.2
## 2 Afghanistan 2001 4.07 7.2
## 3 Afghanistan 2002 9.41 8.2
## 4 Afghanistan 2003 14.7 9.5
## 5 Afghanistan 2004 20.1 10.9
## 6 Afghanistan 2005 25.4 12.2
## # ℹ abbreviated names: ¹`access_to_electricity_(%_of_population)`,
## # ²access_to_clean_fuels_for_cooking
## # ℹ 17 more variables:
## # `renewable-electricity-generating-capacity-per-capita` <dbl>,
## # `financial_flows_to_developing_countries_(us_$)` <dbl>,
## # `renewable_energy_share_in_the_total_final_energy_consumption_(%)` <dbl>,
## # `electricity_from_fossil_fuels_(twh)` <dbl>, …
energy |> group_by(entity) |> summarise(mean_co2 = mean(value_co2_emissions_kt_by_country, na.rm = TRUE)) |> arrange(desc(mean_co2))
## # A tibble: 176 × 2
## entity mean_co2
## <chr> <dbl>
## 1 China 7636642.
## 2 United States 5329539.
## 3 India 1633979.
## 4 Japan 1183734.
## 5 Germany 773645.
## 6 Canada 547645.
## 7 United Kingdom 470604.
## 8 Mexico 444619.
## 9 Indonesia 420334.
## 10 Saudi Arabia 416248.
## # ℹ 166 more rows
co2_leaders <- energy |> filter(entity %in% c("China", "United States", "India", "Japan")) |> mutate(co2_emissions = value_co2_emissions_kt_by_country / 10^6) |> select(entity, year, `renewable_energy_share_in_the_total_final_energy_consumption_(%)`, co2_emissions)
co2_leaders
## # A tibble: 84 × 4
## entity year renewable_energy_share_in_the_total_final_energy…¹ co2_emissions
## <chr> <dbl> <dbl> <dbl>
## 1 China 2000 29.6 3.35
## 2 China 2001 28.4 3.53
## 3 China 2002 27 3.81
## 4 China 2003 23.9 4.42
## 5 China 2004 20.2 5.12
## 6 China 2005 17.4 5.82
## 7 China 2006 16.4 6.44
## 8 China 2007 14.9 6.99
## 9 China 2008 14.1 7.20
## 10 China 2009 13.4 7.72
## # ℹ 74 more rows
## # ℹ abbreviated name:
## # ¹`renewable_energy_share_in_the_total_final_energy_consumption_(%)`
p2 <- co2_leaders |> ggplot(aes(x = year, y = co2_emissions, color = entity)) +
geom_point() +
geom_line() +
labs(
x = "Year",
y = "CO2 Emissions\n(millions of metric tons per capita)",
title = "CO2 Emissions Per Capita in Leading Carbon-Intensive Countries (2000-2019)"
) +
theme_minimal() +
scale_color_brewer(palette = "Set1", name = "Country")
p2
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_line()`).
grid.arrange()
function to show both plots on the same page#install.packages("gridExtra")
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
grid.arrange(p1, p2)
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_line()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
## Warning: Removed 4 rows containing missing values (`geom_line()`).
In this project, the data was cleaned by lowercasing all variable names and replacing all spaces with underscores in addition to dropping missing values from calculations. The first visualization depicts yearly changes in renewable energy share of total energy consumption in the top four countries with the highest mean CO2 emissions per capita in the 20-year period the data was collected from. The second visualization illustrates yearly changes in CO2 emissions per capita in those countries. China and India reduced their renewable energy share over the years and their CO2 emissions simultaneously rose during that period, though not necessarily at the same rate. By contrast, Japan and the US lowered their CO2 emissions per capita as they increased their renewable energy share percentage, which implies that the adoption of clean energy sources can serve as an effective means to reduce carbon footprints and help mitigate the global issue of climate change. One thing I wish I could have done for this project was create a heatmap showing all the countries and color-coding based on renewable electricity generating capacity or energy consumption per person.