Energy and Economic Growth: A Cross-National Analysis (2013–2023)
Introduction
This project explores the relationship between energy production, consumption, and economic growth across eight countries over the period of 2013 to 2023. The dataset was compiled using the World Bank’s data portal (databank.worldbank.org) through the World Development Indicators interface and includes seven key quantitative variables. These variables enable comparative and temporal analyses of national energy infrastructure, national dependencies, technological capacity, and economic output.
Dataset Overview
The dataset contains annual values for the following seven indicators:
Electric power consumption (kWh per capita): It measures the average kilowatt-hours of electricity consumed per person, and indicates access to and reliance on electric infrastructure; higher consumption often correlates with industrialization and improved living standards.
GDP growth (annual %): An annual percentage growth rate of gross domestic product at market prices based on constant local currency.A key indicator of economic performance, capturing expansions or contractions in national economies.
GDP per unit of energy use (constant 2017 PPP $ per kg of oil equivalent): Measures energy efficiency, calculated as purchasing power parity GDP per kilogram of oil equivalent energy use. High values indicate greater economic output per unit of energy, reflecting efficiency and modernization.
Electricity production from nuclear sources (% of total): Share of electricity generated from nuclear fission processes. Often associated with stable, high-output energy strategies and national energy security policies.
Electricity production from hydroelectric sources (% of total): Share of electricity generated from hydropower.A renewable energy source that is climate-dependent but often used in sustainable energy strategies.
Electricity production from fossil fuels (% of total): Proportion of electricity generated from coal, oil, and natural gas.It is linked to carbon emissions and environmental sustainability concerns.
Electricity production from renewable sources, excluding hydroelectric (kWh per capita): Includes solar, wind, geothermal, and biomass electricity generation, measured in kilowatt-hours per capita rather than as a percentage of total production. Indicates transitions toward cleaner, decentralized, and future-oriented energy systems.
Each indicator is recorded for eight countries (United States of America, United Mexican States, Canada, Republic of Honduras, Republic of Finland, Kingdom of Spain, French Republic, and Kingdom of Norway) across 11 years, allowing rich cross-national and time-series exploration. They include globally significant economies and diverse development models across regions.
Data Source
All indicators are sourced from the World Bank Group’s World Development Indicators databank, accessed using their official data extraction tools(https://databank.worldbank.org/source/world-development-indicators/Series/EG.GDP.PUSE.KO.PP.KD). A separate .xlsx metadata file, provided by the World Bank, documents each indicator’s definition, methodology, source, and limitations.
Project Goal
The purpose of this project is to visualize how countries with different energy profiles perform economically and evolve over time. I am particularly interested in whether patterns of energy production and consumption drive economic growth, or if economic growth instead drives energy expansion. This dataset offers a unique opportunity to explore this question over an 11-year period and across countries with distinct development models and infrastructure priorities.
Specific questions explored include:
Does greater energy consumption translate into greater GDP growth?
Can the type of energy a country uses be linked to higher economic growth?
Which countries demonstrate higher energy efficiency (GDP per unit of energy)?
Are changes in electricity source composition linked to economic resilience or growth?
By answering these, the project contributes insight into how nations can balance development and sustainability in their energy policies. Additionally, it asks whether sustainability itself can align with profitability, competitive economic growth, and sound market practices. This raises a critical question: is the presence of more sustainable energy sources linked to stronger economic growth, and can variations in their proportion be meaningfully associated with higher or lower growth rates?
Setup and Loading Dataset
# Load necessary librarieslibrary(tidyverse) # Includes ggplot2, dplyr, readr, etc.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor) # For cleaning column names easily
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
library(lubridate) # For working with dateslibrary(scales) # For formatting axes and labels in ggplot
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 69 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): Series Name, Series Code, Country Name, Country Code, 2013 [YR2013...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Preview the dataset structure, a quick overview of column names and data types.glimpse(data)
# View the first few rows to understand the layouthead(data)
# A tibble: 6 × 15
`Series Name` `Series Code` `Country Name` `Country Code` `2013 [YR2013]`
<chr> <chr> <chr> <chr> <chr>
1 Electric power co… EG.USE.ELEC.… United States USA 12999.56871789…
2 Electric power co… EG.USE.ELEC.… Mexico MEX 2152.841878451…
3 Electric power co… EG.USE.ELEC.… Canada CAN 16517.09278916…
4 Electric power co… EG.USE.ELEC.… Honduras HND 647.2966645885…
5 Electric power co… EG.USE.ELEC.… Finland FIN 15512.12251138…
6 Electric power co… EG.USE.ELEC.… Spain ESP 5409.3898879763
# ℹ 10 more variables: `2014 [YR2014]` <chr>, `2015 [YR2015]` <chr>,
# `2016 [YR2016]` <chr>, `2017 [YR2017]` <chr>, `2018 [YR2018]` <chr>,
# `2019 [YR2019]` <chr>, `2020 [YR2020]` <chr>, `2021 [YR2021]` <chr>,
# `2022 [YR2022]` <chr>, `2023 [YR2023]` <chr>
# (commented to avoid clutter in Rpubs)# summary(data)# str(data)
Data Cleaning
# Clean column names# The janitor package standardizes column names to lowercase and underscores.clean1_data <- data %>%clean_names()# Remove unnecessary or blank rows# Some rows at the end or in-between may be blank or contain metadata# Remove metadata rows (everything from row 57 onward) and filter valid entriescleaned_data <- clean1_data %>%slice(1:56) %>%# Keep only the first 56 valid data rowsfilter(!is.na(country_name), !is.na(series_name)) %>%# Safeguard in case future versions contain NA identifiers # Remove placeholder values written as ".."select(-series_code, -country_name) # Keep 'country_code' instead of 'country_name' for easier handling in code# Convert year columns to long format# The year columns go from 2013 to 2023 and need to be reshaped into two columns: year and valuelong_data <- cleaned_data %>%pivot_longer(cols =starts_with("x20"), # After clean_names(), columns like '2013 [YR2013]' become 'x2013_yr2013'# clean_names() adds the 'x' prefix because R does not allow column names to begin with a numbernames_to ="year",values_to ="value" ) %>%mutate(year =str_extract(year, "\\d{4}"), # Extract just the 4-digit yearyear =as.numeric(year), # Convert to numericvalue =na_if(value, ".."), # Converts '..' (used in the dataset to mark missing values) into real NA valuesvalue =as.numeric(value) # Now the 'value' column is fully numeric and ready for calculations, plots, or summarization ) %>%filter(!is.na(value)) # Remove NA values completely; they represent missing data and would otherwise be excluded silently from plots or break summary stats# Since this project focuses on visual comparisons and numeric insights, keeping only complete cases improves clarity and accuracy
Label simplification
# Create a named vector to rename series_name values to simplified codesseries_labels <-c("GDP per unit of energy use (PPP $ per kg of oil equivalent)"="GDP_per_energy","GDP growth (annual %)"="GDP_growth","Electricity production from renewable sources, excluding hydroelectric (kWh)"="Elec_prod_renew","Electricity production from oil, gas and coal sources (% of total)"="Elec_prod_fossil","Electricity production from nuclear sources (% of total)"="Elec_prod_nucl","Electricity production from hydroelectric sources (% of total)"="Elec_prod_hydro","Electric power consumption (kWh per capita)"="Elec_consmpt")# Apply renaming and sort into a new dataset to preserve original long_datalong2_data <- long_data %>%mutate(series_name =recode(series_name, !!!series_labels)) %>%# Rename for quicker usearrange(series_name, country_code, year) # Arrange data for logical grouping in plots or summaries
Commentary: This chunk simplifies the series_name labels into shorter codes to make downstream filtering and plotting faster. It also arranges the dataset by series_name, country_code, and year so observations are grouped and ordered for clearer plotting and exploration.
Exploratory Plots
# 1. Histogram: Energy mix per country in 2013 and 2022 (using % total units only)# Filter for fossil, hydro, and nuclear electricity production by country for 2013long2_data %>%filter(year ==2013, series_name %in%c("Elec_prod_fossil", "Elec_prod_hydro", "Elec_prod_nucl")) %>%ggplot(aes(x = country_code, y = value, fill = series_name)) +geom_bar(stat ="identity", position =position_dodge(preserve ="single"), width =0.7) +labs(title ="Electricity Mix by Country in 2013",x ="Country",y ="% of Total Electricity Production",fill ="Source Type",caption ="World Bank (2013)" ) +theme_minimal() +scale_x_discrete(expand =expansion(add =0.5)) +scale_y_continuous(limits =c(0, 100))
# Repeat for 2022long2_data %>%filter(year ==2022, series_name %in%c("Elec_prod_fossil", "Elec_prod_hydro", "Elec_prod_nucl")) %>%ggplot(aes(x = country_code, y = value, fill = series_name)) +geom_bar(stat ="identity", position =position_dodge(preserve ="single"), width =0.7) +labs(title ="Electricity Mix by Country in 2022",x ="Country",y ="% of Total Electricity Production",fill ="Source Type",caption ="World Bank (2022)" ) +theme_minimal() +scale_x_discrete(expand =expansion(add =0.5)) +scale_y_continuous(limits =c(0, 100))
# 2. Time Series: Trends in electricity production (excluding 2022 & 2023)# We'll scale units (in kWh) to billions to improve readabilitylong2_data %>%filter(series_name =="Elec_prod_renew", year <2022) %>%mutate(scaled_value = value /1e9) %>%# Convert to billions of kWhggplot(aes(x = year, y = scaled_value, color = country_code)) +geom_line(size =1) +scale_x_continuous(breaks =2013:2021, labels =2013:2021) +labs(title ="Renewable Electricity Production Trends (in Billion kWh)",x ="Year",y ="Billion kWh per Capita",caption ="World Bank (2013–2021)" ) +theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Commentary: These exploratory plots highlight changes in energy mix and production trends over time. From the 2013 and 2022 histograms, we observe that while a few countries have reduced their fossil fuel use in slight favor of hydro or nuclear sources, the overall shift is limited. The time series plot shows that only Canada demonstrates a consistent upward trend in renewable electricity production, suggesting limited adoption growth elsewhere. These insights can guide the direction for a more targeted final visualization.
Final Visualizations
This follwoing transformation creates a new variable — the ratio of fossil electricity to total traditional electricity sources (fossil + hydro + nuclear) — and joins it with GDP growth for each country-year. This prepares the data for a new comparison in visualization.
# Create a new dataset with Fossil-to-Total (% of total) ratio vs GDP growthdata_ratio_gdp <- long2_data %>%filter(series_name %in%c("Elec_prod_fossil", "Elec_prod_hydro", "Elec_prod_nucl", "GDP_growth")) %>%pivot_wider(names_from = series_name, values_from = value) %>%mutate(total_energy_percent = Elec_prod_fossil + Elec_prod_hydro + Elec_prod_nucl,fossil_ratio = Elec_prod_fossil / total_energy_percent )
Two Final Plots
# 1. GDP Growth vs Fossil Dominance Ratio (Scatter with Regression)# Explore how reliance on fossil fuels (within the energy mix) correlates with GDP growthggplot(data_ratio_gdp, aes(x = fossil_ratio, y = GDP_growth, color = country_code)) +geom_point(alpha =0.6, size =2.5) +geom_smooth(method ="lm", se =FALSE, color ="gray") +labs(title ="GDP Growth vs Fossil Share of Energy Mix (2013–2023)",x ="Fossil Energy Share (% of Fossil + Hydro + Nuclear)",y ="GDP Growth (%)",color ="Country",caption ="World Bank (2013–2023)" ) +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 4 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_point()`).
# 2. GDP Growth Over Time by Country (Line Plot)long2_data %>%filter(series_name =="GDP_growth") %>%ggplot(aes(x = year, y = value, color = country_code)) +geom_line(size =1) +scale_x_continuous(breaks =2013:2023, labels =2013:2023) +labs(title ="GDP Growth Over Time by Country",x ="Year",y ="GDP Growth (%)",caption ="World Bank (2013–2023)" ) +theme_minimal()
Final Essay
Reflection
In this project, I explored how electricity production sources and economic performance (GDP growth) interact across eight countries from 2013 to 2023. The dataset, sourced from the World Bank, included seven key indicators ranging from electricity consumption and GDP growth to electricity production broken down by source (fossil, hydro, nuclear, renewables).
Data Cleaning:
The raw dataset required significant preprocessing:
Cleaned column names using janitor::clean_names()
Filtered out metadata rows beyond row 56 and removed invalid placeholders ("..")
Transformed the data to long format for easier plotting using pivot_longer()
Removed NA values to maintain clean, interpretable visual outputs
Created a new dataset with a custom metric: the fossil dominance ratio (fossil / [fossil + hydro + nuclear])
Visualizations:
Plot 1: GDP Growth vs Fossil “Dominance” Ratio (Scatter) This plot revealed no consistent correlation between high fossil reliance and stronger or weaker GDP growth. Countries with both high and low fossil dominance showed a range of GDP outcomes. This suggests that fossil energy’s role in growth is neither uniformly beneficial nor economically harmful and is likely influenced by broader policy, innovation, or structural factors.
Plot 2: GDP Growth Over Time This line plot shows national GDP fluctuations across the decade. It contextualizes country performance and confirms that GDP growth is influenced by broader economic events (e.g., global recessions, pandemics, recoveries) more than energy mix alone.
Challenges
In terms of challenges, the first involved aligning time-series data across multiple countries while navigating a complex set of variables to establish a clear and insightful analytical direction. Secondly, the anture of this data base led to NA issues due to incomplete country records. Thirdly, interpreting percentage-based indicators vs per-capita values required attention. Finally, it is difficult to create a a meaningful comparison between energy mix and GDP without oversimplifying causality, specially for such a broad indicator like GDP.
Conclusion
The analysis shows that while changes in fossil energy usage are trackable, they do not directly dictate GDP outcomes in this dataset. Creating derived metrics like the fossil dominance ratio allowed for an insight into its lack of impact, and might help show that sustainable shifts do not inherently suppress growth. Perhaps shifts into sustainable production of energy will not have large economic impact.