About the Dataset
The dataset utilized here is a dataset titled “Sustainable Energy for All” which measures energy access and sustainable energy consumption. It has a load of metrics from access to electricity for the urban and rural populations of different countries, to Hydro energy consumption (TJ). From GitHub: “Beyond the raw metrics, this dataset offers a window into how nations are balancing growth with green initiatives, challenging us to visualize the actual momentum behind the global energy transition.” One of my guiding questions is from the Github: What form of renewable energy has, on average, experienced the fasted rate of adoption? Moreover, I want to explore what countries have the most renewable energy in their share of energy consumption.
library(ggplot2)
library(data.table)
library(RColorBrewer)
pal_set2 <- brewer.pal(8, "Set2")
pal_ylorrd <- brewer.pal(9, "YlOrRd")
pal_rdylgn <- brewer.pal(11, "RdYlGn")
library(scales)
library(forcats)
energy <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-05-26/energy_cleaned.csv')
## Rows: 5271 Columns: 52
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): country_name, country_code
## dbl (50): yr, access_non_solid_fuel_rural_pop_pct, access_non_solid_fuel_tot...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
setDT(energy)
cat("Rows: ", nrow(energy), "\n")
## Rows: 5271
cat("Columns: ", ncol(energy), "\n")
## Columns: 52
cat("Years: ", min(energy$yr, na.rm = TRUE), "–",
max(energy$yr, na.rm = TRUE), "\n")
## Years: 1990 – 2010
cat("Countries:", uniqueN(energy$country_name), "\n")
## Countries: 251
energy_clean <- energy[
yr >= 1990 & yr <= 2010 &
!is.na(access_electricity_total_pop_pct) &
!is.na(renewable_energy_consumption_tfec_pct)
]
cat("Rows after filtering:", nrow(energy_clean), "\n")
## Rows after filtering: 532
cat("Countries remaining: ", uniqueN(energy_clean$country_name), "\n")
## Countries remaining: 198
high_re <- energy_clean[renewable_energy_consumption_tfec_pct > 50]
cat("Country-year rows with >50% renewable share:", nrow(high_re), "\n")
## Country-year rows with >50% renewable share: 161
cat("Distinct countries: ",
uniqueN(high_re$country_name), "\n")
## Distinct countries: 74
With the data filtered, I want to get a sense of which country-years are already heavily renewable, defined here as exceeding 50% of total final energy consumption. This gives a rough upper bound on how many observations represent “renewable-dominant” energy systems.
global_yr <- energy_clean[
,
.(
avg_elec_access = round(mean(access_electricity_total_pop_pct, na.rm = TRUE), 2),
avg_renewable_pct = round(mean(renewable_energy_consumption_tfec_pct, na.rm = TRUE), 2),
avg_solar_pct = round(mean(solar_energy_consumption_tfec_pct, na.rm = TRUE), 2),
avg_wind_pct = round(mean(wind_energy_consumption_tfec_pct, na.rm = TRUE), 2),
avg_hydro_pct = round(mean(hydro_energy_consumption_tfec_pct, na.rm = TRUE), 2),
n_countries = uniqueN(country_name)
),
by = yr
][order(yr)]
knitr::kable(global_yr,
caption = "Global annual averages",
col.names = c("Year", "Elec. access %", "Renewable %",
"Solar %", "Wind %", "Hydro %", "Countries")
)
| Year | Elec. access % | Renewable % | Solar % | Wind % | Hydro % | Countries |
|---|---|---|---|---|---|---|
| 1990 | 68.37 | 36.48 | 0.06 | 0.02 | 3.55 | 191 |
| 2000 | 71.77 | 35.07 | 0.28 | 0.14 | 5.53 | 193 |
| 2010 | 84.55 | 28.85 | 0.36 | 0.35 | 5.83 | 148 |
The table confirms a steady upward trend in electricity access across the period, while renewable share remains relatively flat globally. This suggests that new electricity access was not primarily being met by renewables. Solar and wind percentages are negligibly small throughout, with hydro carrying most of the renewable weight.
The chart below makes the electricity access trend clear: access climbs consistently from 1990 to 2010, with no major reversals.
ggplot(global_yr, aes(x = yr, y = avg_elec_access)) +
geom_area(fill = pal_set2[2], alpha = 0.20) +
geom_line(colour = pal_set2[2], linewidth = 1.1) +
geom_point(colour = pal_set2[2], size = 2.5) +
scale_x_continuous(breaks = seq(1990, 2010, 2)) +
theme_bw()+
scale_y_continuous(limits = c(0, 100),
labels = function(x) paste0(x, "%")) +
labs(
title = "Global Average Electricity Access, 1990–2010",
x = "Year",
y = "Population with access (%)"
)
I pull each country’s electricity access change from 2000 to 2010 and rank the top 20 gainers. This shifts the frame from who has the most access to who made the most progress — a development-focused lens that tends to surface different countries entirely.
change_2010 <- panel[
yr == 2010 & !is.na(elec_access_change),
.(country_name, elec_access_change)
][order(-elec_access_change)][1:20]
change_2010[, country_name := fct_reorder(country_name, elec_access_change)]
ggplot(change_2010,
aes(x = elec_access_change, y = country_name, fill = elec_access_change)) +
geom_col() +
scale_fill_gradientn(colours = pal_ylorrd[3:9],
name = "pp gain") +
theme_bw()+
scale_x_continuous(labels = function(x) paste0("+", x, " pp")) +
labs(
title = "Top 20 Countries: Electricity Access Gain, 2000–2010",
subtitle = "Percentage-point improvement from 2000 baseline to 2010",
x = "Gain in electricity access (percentage points)",
y = NULL
)
Next I want to break down which renewable sources are driving consumption. Rather than a single aggregate, this plot tracks hydro, wind, solar, and others separately — which should reveal whether the flat global renewable share is hiding divergent source-level stories.
sources_yr <- energy_clean[
,
.(
Solar = sum(solar_energy_consumption_terajoules, na.rm = TRUE),
Wind = sum(wind_energy_consumption_terajoules, na.rm = TRUE),
Hydro = sum(hydro_energy_consumption_terajoules, na.rm = TRUE),
Biogas = sum(biogas_consumption_terajoules, na.rm = TRUE),
ModBiomass = sum(modern_biomass_consumption_terajoules, na.rm = TRUE),
Geothermal = sum(geothermal_energy_consumption_terajoules, na.rm = TRUE)
),
by = yr
][order(yr)]
sources_long <- melt(sources_yr, id.vars = "yr",
variable.name = "source",
value.name = "terajoules")
ggplot(sources_long, aes(x = yr, y = terajoules / 1e6,
colour = source, group = source)) +
geom_line(linewidth = 1.0) +
geom_point(size = 1.8) +
theme_bw()+
scale_colour_manual(values = pal_set2[1:6], name = "Source") +
scale_x_continuous(breaks = seq(1990, 2010, 2)) +
labs(
title = "Global Renewable Energy Consumption by Source",
subtitle = "*Excludes traditional biomass",
x = "Year",
y = "Energy consumed (million TJ)"
)
The cross-sectional distribution of renewable share is just as informative as the mean trend. Plotting snapshots at three points in time shows whether the spread across countries is narrowing (convergence) or whether a small group of high-renewable countries is pulling the average.
sel_years <- energy_clean[yr %in% c(1990, 2000, 2010)]
sel_years[, yr_fac := factor(yr)]
ggplot(sel_years,
aes(x = yr_fac,
y = renewable_energy_consumption_tfec_pct,
fill = yr_fac)) +
geom_boxplot(alpha = 0.7, outlier.size = 0.9, outlier.alpha = 0.25) +
geom_jitter(width = 0.14, size = 0.35, alpha = 0.07) +
theme_bw()+
scale_fill_manual(values = pal_set2[1:5], guide = "none") +
scale_y_continuous(labels = function(x) paste0(x, "%")) +
labs(
title = "Distribution of Renewable Energy Share (1990, 2000, 2010)",
x = "Year",
y = "Renewable share of TFEC (%)"
)
To identify which countries lead on renewable energy, I aggregate each country’s average renewable share over 2000–2010 and pull the top 20.
top_re <- energy_clean[
yr >= 2000,
.(avg_re = round(mean(renewable_energy_consumption_tfec_pct, na.rm = TRUE), 1)),
by = country_name
][order(-avg_re)][1:20]
top_re[, country_name := fct_reorder(country_name, avg_re)]
ggplot(top_re, aes(x = avg_re, y = country_name, fill = avg_re)) +
geom_col() +
scale_fill_gradientn(colours = pal_rdylgn[c(3, 7, 10)],
name = "Avg RE\nshare (%)") +
scale_x_continuous(labels = function(x) paste0(x, "%")) +
theme_bw()+
labs(
title = "Top 20 Countries by Average Renewable Energy Share",
subtitle = "Average over 2000–2010; includes all renewable sources",
x = "Average renewable share of TFEC (%)",
y = NULL
)
A simpler decade-level comparison strips away the year-by-year noise and asks a blunter question: did the distribution of electricity access meaningfully shift between the 1990s and 2000s? The jittered points show individual country-years so the boxplot summary doesn’t hide skew or outliers.
energy_clean[, decade := ifelse(yr < 2000, "1990s", "2000s")]
ggplot(energy_clean,
aes(x = decade, y = access_electricity_total_pop_pct, fill = decade)) +
geom_boxplot(alpha = 0.7, outlier.size = 0.9, outlier.alpha = 0.3) +
geom_jitter(width = 0.12, size = 0.35, alpha = 0.07) +
theme_bw()+
scale_fill_manual(values = pal_set2[3:4], guide = "none") +
scale_y_continuous(limits = c(0, 100),
labels = function(x) paste0(x, "%")) +
labs(
title = "Electricity Access Distribution: 1990s vs 2000s",
subtitle = "Each point is one country-year observation (jittered)",
x = "Decade",
y = "Population with electricity access (%)"
)
## Warning: Removed 68 rows containing missing values or values outside the scale range
## (`geom_point()`).
Given how small wind and solar appeared in the source breakdown, I isolate just those two to see their absolute growth more clearly. Stacking them shows combined trajectory. Moreover, when a broader audience typically thinks of renewable energy, solar and wind are often the subjects.
wind_solar <- sources_long[source %in% c("Wind", "Solar")]
ggplot(wind_solar,
aes(x = yr, y = terajoules / 1e3, fill = source)) +
geom_area(position = "stack", alpha = 0.85) +
scale_fill_manual(values = c(Solar = pal_ylorrd[7],
Wind = pal_set2[2]),
name = "Source") +
scale_x_continuous(breaks = seq(1990, 2010, 2)) +
theme_bw()+
scale_y_continuous(labels = comma) +
labs(
title = "Global Wind and Solar Energy Consumption Growth",
x = "Year",
y = "Energy consumed (thousand TJ)"
)
Conclusion
Surprisingly to me, a large share of the countries leveraging renewable energy are in Africa. Similarly, many of the countries having adopted more renewable energy options were also in Africa. This likely reflects the dominance of traditional hydro and biomass sources rather than new solar or wind investment.
Globally, electricity access improved steadily across the two decades, but the flat renewable share suggests that growth in access was largely fueled by conventional sources. Wind and solar, while growing, remained marginal in absolute terms through 2010.
The most striking finding from the baseline panel is how concentrated the biggest electricity access gains were, with several countries making 40+ percentage point jumps in a single decade. That kind of rapid expansion almost certainly reflects infrastructure investment driven by development aid and policy, not market forces alone. A natural next step would be to link those access gains to energy source data to ask whether the countries that electrified fastest did so more or less sustainably than the rest.