This project analyses the European Energy dataset published through the TidyTuesday project (week 2020-08-04). The data originate from Eurostat and describe how electricity was generated in 37 European countries during 2016, 2017 and 2018, measured in gigawatt-hours (GWh).
The project ships as two tables that share the same
key (country):
energy_types — net electricity
production broken down by source (Conventional thermal,
Nuclear, Hydro, Wind, Solar, Geothermal, Other, and a Level-2
sub-component “Pumped hydro power”).country_totals — country-level
aggregates: total net production, imports,
exports, energy absorbed by pumping, and energy
supplied.Because electricity barely stores at grid scale, a country’s generation mix and its position as a net importer or exporter are two sides of the same coin. That makes these two tables a natural candidate for a merge.
A note on definitions. I treat renewables as
Hydro + Wind + Solar + Geothermal. “Pumped hydro power” is a Level-2 storage sub-item that is already counted inside Hydro, so it is excluded from every sum to avoid double-counting. The ambiguous “Other” category is kept separate rather than assumed to be green.
I work almost entirely in data.table
for the transformations (fast filtering, aggregation and reshaping) and
ggplot2 for the visuals. Colour palettes
come from colorbrewer2.org via the
RColorBrewer package.
library(data.table) # fast filtering / aggregation / reshaping
library(ggplot2) # grammar-of-graphics plots
library(RColorBrewer) # ColorBrewer palettes (colorbrewer2.org)The two CSVs are read straight from the TidyTuesday GitHub mirror
with fread(). The year columns are named
2016–2018, so I pass
header = TRUE explicitly (otherwise fread
mistakes the numeric header for data).
base_url <- paste0(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/",
"master/data/2020/2020-08-04/"
)
energy_types <- fread(paste0(base_url, "energy_types.csv"), header = TRUE)
country_totals <- fread(paste0(base_url, "country_totals.csv"), header = TRUE)
dim(energy_types)## [1] 296 7
A few small clean-ups make the rest of the analysis painless.
The UK row has a blank country_name, and Eurostat codes
Greece as EL. I patch both with data.table’s
in-place := assignment.
for (dt in list(energy_types, country_totals)) {
dt[country == "UK", country_name := "United Kingdom"]
dt[country == "EL", country_name := "Greece"]
}
country_totals[country %in% c("UK", "EL"), .(country, country_name)][1:2]The years sit in three separate columns. melt() pivots
them into a tidy year / gwh pair, which is the
shape ggplot2 likes.
energy_long <- melt(
energy_types,
id.vars = c("country", "country_name", "type", "level"),
measure.vars = c("2016", "2017", "2018"),
variable.name = "year",
value.name = "gwh"
)
energy_long[, year := as.integer(as.character(year))]
totals_long <- melt(
country_totals,
id.vars = c("country", "country_name", "type", "level"),
measure.vars = c("2016", "2017", "2018"),
variable.name = "year",
value.name = "gwh"
)
totals_long[, year := as.integer(as.character(year))]
head(energy_long)To satisfy the “apply a theme” requirement and keep every chart consistent, I define one custom theme and a fixed ordering / palette for the energy sources.
theme_energy <- theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(colour = "grey35"),
panel.grid.minor = element_blank(),
legend.position = "bottom"
)
# A consistent source order, low-carbon sources grouped together
source_levels <- c("Conventional thermal", "Nuclear",
"Hydro", "Wind", "Solar", "Geothermal", "Other")
energy_long[, type := factor(type, levels = c(source_levels, "Pumped hydro power"))]
renewables <- c("Hydro", "Wind", "Solar", "Geothermal")data.tableI keep only the Level-1 rows (dropping the
pumped-hydro sub-item) for 2018 — a textbook
data.table filter — then
aggregate total production per country to find the
biggest generators.
# FILTER: Level-1 sources only, year 2018
mix_2018 <- energy_long[level == "Level 1" & year == 2018]
# AGGREGATE: total production per country, then rank
country_size <- mix_2018[, .(total_gwh = sum(gwh)),
by = .(country, country_name)][order(-total_gwh)]
head(country_size, 8)ggplot(mix_top12, aes(x = country_name, y = gwh / 1000, fill = type)) +
geom_col() + # geom 1: bar/column
scale_fill_brewer(palette = "Set2") + # ColorBrewer qualitative palette
labs(
title = "Electricity generation mix, 2018",
subtitle = "Twelve largest European producers, by source",
x = NULL, y = "Net generation (thousand GWh)", fill = "Source"
) +
theme_energy +
theme(axis.text.x = element_text(angle = 35, hjust = 1))Comment. Two production “superpowers” dominate: Germany and France each generate well over 500,000 GWh, but their colours could not be more different. France is overwhelmingly nuclear (the orange block), while Germany still leans heavily on conventional thermal (fossil) generation alongside a substantial green slice. The UK, Turkey, Italy and Spain form a second tier, each with a visibly different recipe — Italy and Spain show much larger renewable (green) bands than Poland, which is almost entirely thermal.
eu_by_year <- energy_long[level == "Level 1",
.(gwh = sum(gwh)), by = .(year, type)]
ggplot(eu_by_year, aes(x = factor(year), y = gwh / 1e6, fill = type)) +
geom_col(position = "stack") + # geom: column (stacked)
scale_fill_brewer(palette = "Set2") +
labs(
title = "Aggregate European generation mix over time",
subtitle = "All 37 countries combined, 2016-2018",
x = "Year", y = "Net generation (million GWh)", fill = "Source"
) +
theme_energyComment. At the aggregate level the mix is strikingly stable across the three years: conventional thermal and nuclear together still supply the bulk of European electricity, with renewables a steady but minority share. Three years is simply too short a window to see the structural energy transition — a useful reminder that decarbonisation is measured in decades, not single years.
ggplot(mix_2018[gwh > 0], aes(x = type, y = gwh, fill = type)) +
geom_boxplot(show.legend = FALSE) + # geom: boxplot
scale_y_log10(labels = scales::comma) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Distribution of generation by source across countries, 2018",
subtitle = "Log scale; each point behind the box is one country",
x = NULL, y = "Net generation (GWh, log scale)"
) +
theme_energy +
theme(axis.text.x = element_text(angle = 25, hjust = 1))Comment. Conventional thermal and nuclear show the highest medians — they remain the workhorses of European power. Solar and especially geothermal sit far lower and are far more dispersed: geothermal is a niche resource concentrated in a handful of geologically lucky countries (Iceland is absent here, but Italy and Turkey stand out), which is why its box stretches across several orders of magnitude.
trend <- energy_long[level == "Level 1",
.(gwh = sum(gwh)), by = .(year, type)]
major <- trend[, .(tot = sum(gwh)), by = type][order(-tot)][1:4, type]
ggplot(trend[type %in% major], aes(year, gwh / 1000, colour = type)) +
geom_line(linewidth = 1) + # layer 1: line
geom_point(size = 2.5) + # layer 2: point
scale_colour_brewer(palette = "Dark2") + # ColorBrewer qualitative
scale_x_continuous(breaks = 2016:2018) +
labs(
title = "European generation trend by source, 2016-2018",
subtitle = "Four largest sources, continent-wide totals",
x = "Year", y = "Net generation (thousand GWh)", colour = "Source"
) +
theme_energyComment. Wind is the clear riser — its line climbs steadily across all three years, the only source with an unambiguous upward trajectory. Conventional thermal dips slightly while nuclear and hydro wobble around flat lines (hydro is weather-driven, so year-to-year rainfall matters more than policy). Even in a three-year snapshot, the direction of travel — wind eating into the margin of fossil generation — is visible.
From country_totals I reshape Imports
and Exports back into columns with dcast,
derive a net-import balance, then
merge it onto the greenness table from Q2. One row per
country, keyed on country.
trade_2018 <- dcast(
totals_long[year == 2018 & type %in% c("Imports", "Exports")],
country + country_name ~ type, value.var = "gwh"
)
trade_2018[, net_import := Imports - Exports] # >0 = net importer
merged <- merge(green_2018, trade_2018,
by = c("country", "country_name"))
merged[, position := fifelse(net_import > 0, "Net importer", "Net exporter")]
merged[order(-total_gwh), .(country_name, total_gwh, renew_share,
net_import, position)][1:6]Three layers on one plot: points sized by net production, a linear trend, and text labels for the biggest grids.
big <- merged[total_gwh > 90000] # label only the large grids
ggplot(merged, aes(total_gwh / 1000, renew_share)) +
geom_point(aes(colour = position, size = total_gwh / 1000),
alpha = 0.8) + # layer 1: point
geom_smooth(method = "lm", se = FALSE,
colour = "grey40", linetype = "dashed") + # layer 2: smooth
geom_text(data = big, aes(label = country_name),
vjust = -1, size = 3, colour = "grey25") + # layer 3: text
scale_colour_brewer(palette = "Set1") +
scale_size_continuous(guide = "none") +
labs(
title = "Does size relate to greenness? 2018",
subtitle = "Bubble size = total net generation; dashed line = linear fit",
x = "Total net generation (thousand GWh)",
y = "Renewable share (%)", colour = NULL
) +
theme_energyComment. The fitted line is almost flat — scale and greenness are essentially unrelated. Large producers span the full range, from green-leaning Italy and Spain to thermal-heavy Poland and nuclear France. The big grids are a mix of net importers and exporters, which sets up the final question.
trade_rank <- merged[abs(net_import) > 500] # drop near-balanced minnows
ggplot(trade_rank,
aes(x = reorder(country_name, net_import), y = net_import / 1000,
fill = position)) +
geom_col() + # geom: column
geom_hline(yintercept = 0, colour = "grey30") +# reference layer
coord_flip() +
scale_fill_brewer(palette = "Set1") +
labs(
title = "Net electricity trade balance, 2018",
subtitle = "Imports minus exports (positive = net importer)",
x = NULL, y = "Net imports (thousand GWh)", fill = NULL
) +
theme_energyComment. France is Europe’s electricity battery: its nuclear surplus makes it the largest net exporter by a wide margin, with Germany and Sweden also exporting. On the import side, Italy and the UK are the hungriest buyers. The pattern lines up with the mix charts — countries with cheap, inflexible baseload (nuclear France, hydro Sweden/Norway) push power outward, while large demand centres without enough domestic generation pull it in.
ggplot(merged, aes(renew_share, net_import / 1000)) +
geom_hline(yintercept = 0, colour = "grey60", linetype = "dashed") + # layer 1
geom_point(aes(colour = position, size = total_gwh / 1000),
alpha = 0.8) + # layer 2
scale_colour_brewer(palette = "Set1") +
scale_size_continuous(guide = "none") +
labs(
title = "Renewable share vs trade position, 2018",
subtitle = "Above the line = net importer; bubble size = grid size",
x = "Renewable share (%)", y = "Net imports (thousand GWh)",
colour = NULL
) +
theme_energyComment. There is no tidy relationship between how green a country is and whether it imports or exports. Very green exporters (Norway, hydro) and very green importers both exist, as do fossil-heavy versions of each. Trade position is driven by the balance of domestic supply and demand, not by the colour of the electrons — a green grid that still under-produces (e.g. a small mountainous importer) must buy from neighbours regardless of how clean its own output is.
Using two Eurostat tables merged on a common country key, the analysis paints a clear picture of European electricity in 2016-2018:
Limitations. The window is short (three years), the “Other” category is undefined and was excluded from renewables, and generation is not the same as consumption (imports/exports and pumping losses sit in the second table only). A natural extension would be to bring in price data (e.g. EPEX SPOT day-ahead prices) to test whether cheap surplus generation actually drives the export flows seen here.
Data: Eurostat via the TidyTuesday
project, 2020-08-04. Palettes from colorbrewer2.org. Built in R with
data.table, ggplot2 and
RColorBrewer.