1 Introduction

This project analyses the European Energy dataset published through the TidyTuesday project (week 2020-08-04). The data originate from Eurostat and describe how electricity was generated in 37 European countries during 2016, 2017 and 2018, measured in gigawatt-hours (GWh).

The project ships as two tables that share the same key (country):

  • energy_types — net electricity production broken down by source (Conventional thermal, Nuclear, Hydro, Wind, Solar, Geothermal, Other, and a Level-2 sub-component “Pumped hydro power”).
  • country_totals — country-level aggregates: total net production, imports, exports, energy absorbed by pumping, and energy supplied.

Because electricity barely stores at grid scale, a country’s generation mix and its position as a net importer or exporter are two sides of the same coin. That makes these two tables a natural candidate for a merge.

1.1 Questions analysed

  1. What does Europe’s electricity mix look like, and who are the largest producers?
  2. How “green” is each country? — i.e. what share of generation comes from renewable sources, and how is that distributed across the continent?
  3. Did generation shift between 2016 and 2018?
  4. Does a country’s domestic production and renewable profile relate to whether it is a net importer or exporter of electricity? (answered using the merged dataset)

A note on definitions. I treat renewables as Hydro + Wind + Solar + Geothermal. “Pumped hydro power” is a Level-2 storage sub-item that is already counted inside Hydro, so it is excluded from every sum to avoid double-counting. The ambiguous “Other” category is kept separate rather than assumed to be green.


2 Setup and data loading

I work almost entirely in data.table for the transformations (fast filtering, aggregation and reshaping) and ggplot2 for the visuals. Colour palettes come from colorbrewer2.org via the RColorBrewer package.

library(data.table)   # fast filtering / aggregation / reshaping
library(ggplot2)      # grammar-of-graphics plots
library(RColorBrewer) # ColorBrewer palettes (colorbrewer2.org)

The two CSVs are read straight from the TidyTuesday GitHub mirror with fread(). The year columns are named 20162018, so I pass header = TRUE explicitly (otherwise fread mistakes the numeric header for data).

base_url <- paste0(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/",
  "master/data/2020/2020-08-04/"
)

energy_types   <- fread(paste0(base_url, "energy_types.csv"),   header = TRUE)
country_totals <- fread(paste0(base_url, "country_totals.csv"), header = TRUE)

dim(energy_types)
## [1] 296   7
head(energy_types)

3 Data transformation

A few small clean-ups make the rest of the analysis painless.

3.0.1 Fix country labels

The UK row has a blank country_name, and Eurostat codes Greece as EL. I patch both with data.table’s in-place := assignment.

for (dt in list(energy_types, country_totals)) {
  dt[country == "UK", country_name := "United Kingdom"]
  dt[country == "EL", country_name := "Greece"]
}
country_totals[country %in% c("UK", "EL"), .(country, country_name)][1:2]

3.0.2 Reshape from wide to long

The years sit in three separate columns. melt() pivots them into a tidy year / gwh pair, which is the shape ggplot2 likes.

energy_long <- melt(
  energy_types,
  id.vars       = c("country", "country_name", "type", "level"),
  measure.vars  = c("2016", "2017", "2018"),
  variable.name = "year",
  value.name    = "gwh"
)
energy_long[, year := as.integer(as.character(year))]

totals_long <- melt(
  country_totals,
  id.vars       = c("country", "country_name", "type", "level"),
  measure.vars  = c("2016", "2017", "2018"),
  variable.name = "year",
  value.name    = "gwh"
)
totals_long[, year := as.integer(as.character(year))]

head(energy_long)

3.0.3 A reusable house style

To satisfy the “apply a theme” requirement and keep every chart consistent, I define one custom theme and a fixed ordering / palette for the energy sources.

theme_energy <- theme_minimal(base_size = 12) +
  theme(
    plot.title    = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(colour = "grey35"),
    panel.grid.minor = element_blank(),
    legend.position  = "bottom"
  )

# A consistent source order, low-carbon sources grouped together
source_levels <- c("Conventional thermal", "Nuclear",
                    "Hydro", "Wind", "Solar", "Geothermal", "Other")
energy_long[, type := factor(type, levels = c(source_levels, "Pumped hydro power"))]

renewables <- c("Hydro", "Wind", "Solar", "Geothermal")

4 Q1 — What does Europe’s electricity mix look like?

4.0.1 Filtering and aggregating with data.table

I keep only the Level-1 rows (dropping the pumped-hydro sub-item) for 2018 — a textbook data.table filter — then aggregate total production per country to find the biggest generators.

# FILTER: Level-1 sources only, year 2018
mix_2018 <- energy_long[level == "Level 1" & year == 2018]

# AGGREGATE: total production per country, then rank
country_size <- mix_2018[, .(total_gwh = sum(gwh)),
                         by = .(country, country_name)][order(-total_gwh)]
head(country_size, 8)
top12 <- country_size[1:12, country]
mix_top12 <- mix_2018[country %in% top12]
# order the x-axis by total size
mix_top12[, country_name := factor(country_name,
            levels = country_size[country %in% top12, country_name])]

4.0.2 Plot 1 — Stacked generation mix of the 12 largest producers

ggplot(mix_top12, aes(x = country_name, y = gwh / 1000, fill = type)) +
  geom_col() +                                   # geom 1: bar/column
  scale_fill_brewer(palette = "Set2") +          # ColorBrewer qualitative palette
  labs(
    title    = "Electricity generation mix, 2018",
    subtitle = "Twelve largest European producers, by source",
    x = NULL, y = "Net generation (thousand GWh)", fill = "Source"
  ) +
  theme_energy +
  theme(axis.text.x = element_text(angle = 35, hjust = 1))

Comment. Two production “superpowers” dominate: Germany and France each generate well over 500,000 GWh, but their colours could not be more different. France is overwhelmingly nuclear (the orange block), while Germany still leans heavily on conventional thermal (fossil) generation alongside a substantial green slice. The UK, Turkey, Italy and Spain form a second tier, each with a visibly different recipe — Italy and Spain show much larger renewable (green) bands than Poland, which is almost entirely thermal.

4.0.3 Plot 2 — Continent-wide mix, year by year

eu_by_year <- energy_long[level == "Level 1",
                          .(gwh = sum(gwh)), by = .(year, type)]

ggplot(eu_by_year, aes(x = factor(year), y = gwh / 1e6, fill = type)) +
  geom_col(position = "stack") +                 # geom: column (stacked)
  scale_fill_brewer(palette = "Set2") +
  labs(
    title    = "Aggregate European generation mix over time",
    subtitle = "All 37 countries combined, 2016-2018",
    x = "Year", y = "Net generation (million GWh)", fill = "Source"
  ) +
  theme_energy

Comment. At the aggregate level the mix is strikingly stable across the three years: conventional thermal and nuclear together still supply the bulk of European electricity, with renewables a steady but minority share. Three years is simply too short a window to see the structural energy transition — a useful reminder that decarbonisation is measured in decades, not single years.


5 Q2 — How green is each country?

5.0.1 Aggregating a renewable share

Another data.table aggregation: for each country I sum total production and renewable production in one pass using a conditional inside sum().

green_2018 <- mix_2018[, .(
  total_gwh = sum(gwh),
  renew_gwh = sum(gwh[type %in% renewables])
), by = .(country, country_name)]
green_2018[, renew_share := 100 * renew_gwh / total_gwh]

green_2018[order(-renew_share)][1:5]   # greenest
green_2018[order(renew_share)][1:5]    # least green

5.0.2 Plot 3 — Renewable share ranked by country (two geom layers)

This chart layers geom_col with geom_text labels on the same plot, and colours the bars with a diverging ColorBrewer palette so the eye reads “green = green”.

ggplot(green_2018,
       aes(x = reorder(country_name, renew_share), y = renew_share,
           fill = renew_share)) +
  geom_col() +                                                  # layer 1
  geom_text(aes(label = sprintf("%.0f%%", renew_share)),        # layer 2
            hjust = -0.15, size = 3, colour = "grey25") +
  coord_flip() +
  scale_fill_distiller(palette = "RdYlGn", direction = 1,
                       guide = "none") +     # ColorBrewer diverging
  scale_y_continuous(limits = c(0, 108), expand = c(0, 0)) +
  labs(
    title    = "Share of electricity from renewables, 2018",
    subtitle = "Hydro + Wind + Solar + Geothermal, as % of net generation",
    x = NULL, y = "Renewable share (%)"
  ) +
  theme_energy

Comment. The spread is enormous. Albania (100%) and Norway (~98%) run almost entirely on hydropower, with Georgia, Luxembourg and Croatia not far behind. At the other extreme Malta (0%), Hungary, Estonia and Czechia sit below 7%. Geography is destiny here: mountainous, water-rich countries decarbonise their grid almost for free, while small or fossil-locked economies start from near zero. Notably, the largest producers (Germany, France) sit in the middle of the pack — scale and greenness are not the same thing.

5.0.3 Plot 4 — How spread out is each source? (boxplot)

ggplot(mix_2018[gwh > 0], aes(x = type, y = gwh, fill = type)) +
  geom_boxplot(show.legend = FALSE) +            # geom: boxplot
  scale_y_log10(labels = scales::comma) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title    = "Distribution of generation by source across countries, 2018",
    subtitle = "Log scale; each point behind the box is one country",
    x = NULL, y = "Net generation (GWh, log scale)"
  ) +
  theme_energy +
  theme(axis.text.x = element_text(angle = 25, hjust = 1))

Comment. Conventional thermal and nuclear show the highest medians — they remain the workhorses of European power. Solar and especially geothermal sit far lower and are far more dispersed: geothermal is a niche resource concentrated in a handful of geologically lucky countries (Iceland is absent here, but Italy and Turkey stand out), which is why its box stretches across several orders of magnitude.

5.0.4 Plot 5 — A source-share heatmap (geom_tile)

# share of each source within each country (2018)
heat <- copy(mix_2018)
heat[, share := 100 * gwh / sum(gwh), by = country]
# order countries by renewable share for a readable gradient
heat[, country_name := factor(country_name,
        levels = green_2018[order(renew_share), country_name])]

ggplot(heat, aes(x = type, y = country_name, fill = share)) +
  geom_tile(colour = "white", linewidth = 0.3) + # geom: tile (heatmap)
  scale_fill_distiller(palette = "YlGnBu", direction = 1) +  # ColorBrewer seq.
  labs(
    title    = "Source intensity by country, 2018",
    subtitle = "Each cell = that source's % of the country's own generation",
    x = NULL, y = NULL, fill = "% of mix"
  ) +
  theme_energy +
  theme(axis.text.x = element_text(angle = 30, hjust = 1),
        panel.grid  = element_blank())

Comment. Reading the heatmap top-to-bottom (countries ordered from least to most renewable) the dark “Conventional thermal” column at the top gives way to a dark “Hydro” column at the bottom. The single brightest cells are the hydro-monocultures (Albania, Norway) and France’s nuclear cell. The chart makes the continent’s two decarbonisation routes — nuclear versus hydro — visible at a glance.


6 Q3 — Did generation shift between 2016 and 2018?

7 Q4 — Production, greenness and trade (the merged dataset)

7.0.1 Building a trade table and merging

From country_totals I reshape Imports and Exports back into columns with dcast, derive a net-import balance, then merge it onto the greenness table from Q2. One row per country, keyed on country.

trade_2018 <- dcast(
  totals_long[year == 2018 & type %in% c("Imports", "Exports")],
  country + country_name ~ type, value.var = "gwh"
)
trade_2018[, net_import := Imports - Exports]     # >0 = net importer

merged <- merge(green_2018, trade_2018,
                by = c("country", "country_name"))
merged[, position := fifelse(net_import > 0, "Net importer", "Net exporter")]

merged[order(-total_gwh), .(country_name, total_gwh, renew_share,
                            net_import, position)][1:6]

7.0.2 Plot 7 — Size vs greenness (scatter + smooth + labels)

Three layers on one plot: points sized by net production, a linear trend, and text labels for the biggest grids.

big <- merged[total_gwh > 90000]   # label only the large grids

ggplot(merged, aes(total_gwh / 1000, renew_share)) +
  geom_point(aes(colour = position, size = total_gwh / 1000),
             alpha = 0.8) +                                   # layer 1: point
  geom_smooth(method = "lm", se = FALSE,
              colour = "grey40", linetype = "dashed") +       # layer 2: smooth
  geom_text(data = big, aes(label = country_name),
            vjust = -1, size = 3, colour = "grey25") +        # layer 3: text
  scale_colour_brewer(palette = "Set1") +
  scale_size_continuous(guide = "none") +
  labs(
    title    = "Does size relate to greenness? 2018",
    subtitle = "Bubble size = total net generation; dashed line = linear fit",
    x = "Total net generation (thousand GWh)",
    y = "Renewable share (%)", colour = NULL
  ) +
  theme_energy

Comment. The fitted line is almost flat — scale and greenness are essentially unrelated. Large producers span the full range, from green-leaning Italy and Spain to thermal-heavy Poland and nuclear France. The big grids are a mix of net importers and exporters, which sets up the final question.

7.0.3 Plot 8 — Who imports and who exports? (diverging bar)

trade_rank <- merged[abs(net_import) > 500]   # drop near-balanced minnows

ggplot(trade_rank,
       aes(x = reorder(country_name, net_import), y = net_import / 1000,
           fill = position)) +
  geom_col() +                                   # geom: column
  geom_hline(yintercept = 0, colour = "grey30") +# reference layer
  coord_flip() +
  scale_fill_brewer(palette = "Set1") +
  labs(
    title    = "Net electricity trade balance, 2018",
    subtitle = "Imports minus exports (positive = net importer)",
    x = NULL, y = "Net imports (thousand GWh)", fill = NULL
  ) +
  theme_energy

Comment. France is Europe’s electricity battery: its nuclear surplus makes it the largest net exporter by a wide margin, with Germany and Sweden also exporting. On the import side, Italy and the UK are the hungriest buyers. The pattern lines up with the mix charts — countries with cheap, inflexible baseload (nuclear France, hydro Sweden/Norway) push power outward, while large demand centres without enough domestic generation pull it in.

7.0.4 Plot 9 — Greenness vs trade position (merged scatter)

ggplot(merged, aes(renew_share, net_import / 1000)) +
  geom_hline(yintercept = 0, colour = "grey60", linetype = "dashed") + # layer 1
  geom_point(aes(colour = position, size = total_gwh / 1000),
             alpha = 0.8) +                                            # layer 2
  scale_colour_brewer(palette = "Set1") +
  scale_size_continuous(guide = "none") +
  labs(
    title    = "Renewable share vs trade position, 2018",
    subtitle = "Above the line = net importer; bubble size = grid size",
    x = "Renewable share (%)", y = "Net imports (thousand GWh)",
    colour = NULL
  ) +
  theme_energy

Comment. There is no tidy relationship between how green a country is and whether it imports or exports. Very green exporters (Norway, hydro) and very green importers both exist, as do fossil-heavy versions of each. Trade position is driven by the balance of domestic supply and demand, not by the colour of the electrons — a green grid that still under-produces (e.g. a small mountainous importer) must buy from neighbours regardless of how clean its own output is.


8 Summary and conclusions

Using two Eurostat tables merged on a common country key, the analysis paints a clear picture of European electricity in 2016-2018:

  1. A two-superpower continent. Germany and France dwarf every other producer, but along opposite technological lines — German thermal versus French nuclear.
  2. Greenness is geographic, not economic. Renewable share ranges from Malta’s 0% to Albania’s 100%, driven almost entirely by access to hydropower. Country size and country greenness are statistically unrelated.
  3. The mix is stable, but wind is rising. Over three years the aggregate recipe barely moves, yet wind is the one source on a consistent upward path — the leading edge of a much longer transition.
  4. Trade follows surplus, not virtue. France’s nuclear surplus makes it the continent’s dominant exporter, while Italy and the UK are the largest importers. Crucially, a country’s renewable share tells you nothing about whether it imports or exports — that depends on the supply-demand balance, not the carbon profile.

Limitations. The window is short (three years), the “Other” category is undefined and was excluded from renewables, and generation is not the same as consumption (imports/exports and pumping losses sit in the second table only). A natural extension would be to bring in price data (e.g. EPEX SPOT day-ahead prices) to test whether cheap surplus generation actually drives the export flows seen here.


Data: Eurostat via the TidyTuesday project, 2020-08-04. Palettes from colorbrewer2.org. Built in R with data.table, ggplot2 and RColorBrewer.