Stage 1: Renewable Energy and Economic Growth

Author

Ece Kurtoğlu and Halil Rıfat Başbuğ

Introduction

This project investigates whether renewable energy consumption can help predict economic growth across countries. The analysis uses data from the World Bank World Development Indicators between 2000 and 2023. The dataset includes economic and environmental variables such as GDP growth, renewable energy consumption, inflation, trade openness, population growth, and GDP per capita.

Economic Question

Can renewable energy consumption help predict GDP growth across countries?

# Import World Bank indicators
indicators <- c(
  gdp_growth = "NY.GDP.MKTP.KD.ZG",
  renewable_energy = "EG.FEC.RNEW.ZS",
  population_growth = "SP.POP.GROW",
  trade_openness = "NE.TRD.GNFS.ZS",
  inflation = "FP.CPI.TOTL.ZG",
  gdp_per_capita = "NY.GDP.PCAP.CD"
)
raw_data <- WDI(
  country = "all",
  indicator = indicators,
  start = 2000,
  end = 2023,
  extra = TRUE
)

glimpse(raw_data)
Rows: 6,384
Columns: 18
$ country           <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghan…
$ iso2c             <chr> "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF"…
$ iso3c             <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AF…
$ year              <int> 2007, 2010, 2011, 2021, 2012, 2009, 2020, 2000, 2014…
$ status            <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", …
$ lastupdated       <chr> "2026-04-08", "2026-04-08", "2026-04-08", "2026-04-0…
$ gdp_growth        <dbl> 13.8263195, 14.3624415, 0.4263548, -20.7388394, 12.7…
$ renewable_energy  <dbl> 28.8, 15.2, 12.6, 20.0, 15.4, 16.5, 18.2, 45.0, 19.1…
$ population_growth <dbl> 1.8925975, 2.9346867, 3.6915031, 2.3560978, 4.047862…
$ trade_openness    <dbl> NA, NA, NA, 51.41172, NA, NA, 46.70989, NA, NA, NA, …
$ inflation         <dbl> 8.6805708, 2.1785375, 11.8041858, 5.1332034, 6.44121…
$ gdp_per_capita    <dbl> 376.2232, 560.6215, 606.6947, 356.4962, 651.4171, 45…
$ region            <chr> "Middle East, North Africa, Afghanistan & Pakistan",…
$ capital           <chr> "Kabul", "Kabul", "Kabul", "Kabul", "Kabul", "Kabul"…
$ longitude         <chr> "69.1761", "69.1761", "69.1761", "69.1761", "69.1761…
$ latitude          <chr> "34.5228", "34.5228", "34.5228", "34.5228", "34.5228…
$ income            <chr> "Low income", "Low income", "Low income", "Low incom…
$ lending           <chr> "IDA", "IDA", "IDA", "IDA", "IDA", "IDA", "IDA", "ID…
# Clean dataset and remove missing values
clean_data <- raw_data %>%
  clean_names() %>%
  filter(region != "Aggregates") %>%
  select(
    country,
    iso2c,
    year,
    gdp_growth,
    renewable_energy,
    population_growth,
    trade_openness,
    inflation,
    gdp_per_capita,
    income
  ) %>%
  drop_na(
    gdp_growth,
    renewable_energy,
    population_growth,
    trade_openness,
    inflation,
    gdp_per_capita
  )

glimpse(clean_data)
Rows: 3,472
Columns: 10
$ country           <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Albani…
$ iso2c             <chr> "AF", "AF", "AF", "AL", "AL", "AL", "AL", "AL", "AL"…
$ year              <int> 2021, 2020, 2022, 2003, 2019, 2020, 2005, 2021, 2018…
$ gdp_growth        <dbl> -20.7388394, -2.3511007, -6.2401720, 5.3332643, 2.06…
$ renewable_energy  <dbl> 20.0, 18.2, 20.0, 33.7, 40.1, 44.4, 36.8, 41.9, 37.8…
$ population_growth <dbl> 2.3560978, 3.1536092, 1.4357044, -0.3741492, -1.5431…
$ trade_openness    <dbl> 51.41172, 46.70989, 72.88547, 64.82322, 75.38213, 59…
$ inflation         <dbl> 5.13320341, 5.60188791, 13.71210237, 0.48400261, 1.4…
$ gdp_per_capita    <dbl> 356.4962, 510.7871, 357.2612, 1908.6990, 6069.4390, …
$ income            <chr> "Low income", "Low income", "Low income", "Upper mid…
nrow(clean_data)
[1] 3472

Dataset Description

The dataset was obtained from the World Bank World Development Indicators database using the WDI package in R. The dataset contains country-level observations from 2000 to 2023. After cleaning and removing missing values, the final dataset contains 3,472 country-year observations and 10 variables.

This dataset is relevant because renewable energy and economic growth are important topics in modern economics. Governments increasingly try to balance economic growth with environmental sustainability.

Source: https://data.worldbank.org/

# Calculate summary statistics for renewable energy consumption
renewable_summary <- clean_data %>%
  summarise(
    mean = mean(renewable_energy),
    median = median(renewable_energy),
    sd = sd(renewable_energy),
    min = min(renewable_energy),
    q1 = quantile(renewable_energy, 0.25),
    q3 = quantile(renewable_energy, 0.75),
    max = max(renewable_energy)
  )

renewable_summary
      mean median       sd min  q1     q3  max
1 31.59453   23.1 28.75385   0 6.5 51.325 98.3

Interpretation of Summary Statistics

The average renewable energy consumption is approximately 31.6 percent, while the median is 23.1 percent. The standard deviation is relatively high, indicating substantial variation across countries. Some countries use very little renewable energy, while others rely heavily on renewable sources.

# Create histogram of renewable energy consumption
ggplot(clean_data, aes(x = renewable_energy)) +
  geom_histogram(bins = 30) +
  labs(
    title = "Renewable Energy Consumption Distribution",
    x = "Renewable energy consumption (% of total final energy consumption)",
    y = "Number of observations"
  )

Histogram Interpretation

The histogram indicates that renewable energy consumption is not normally distributed. Most observations are concentrated at lower and moderate renewable energy levels, while fewer countries exhibit extremely high renewable energy shares. This creates a right-skewed distribution with a long upper tail.

# Apply log transformation
clean_data <- clean_data %>%
  mutate(log_renewable_energy = log(renewable_energy + 1))

ggplot(clean_data, aes(x = log_renewable_energy)) +
  geom_histogram(bins = 30) +
  labs(
    title = "Distribution of Log Renewable Energy Consumption",
    x = "Log of renewable energy consumption",
    y = "Number of observations"
  )

Log Transformation Interpretation

After applying the log transformation, the distribution becomes less skewed and more balanced. The transformed variable appears closer to a normal distribution than the original variable.

Theoretical Distribution

Based on the histogram, renewable energy consumption appears to follow a right-skewed distribution. Therefore, a log-normal distribution may better approximate the data than a normal distribution. After applying the log transformation, the distribution becomes more symmetric and closer to a normal shape.

Conclusion

In Stage 1, a cross-country economic dataset was collected and cleaned using the World Bank WDI database. The analysis focused on renewable energy consumption and economic growth. Summary statistics and histograms showed that renewable energy consumption is right-skewed, while the log transformation produced a more balanced distribution. The transformed variable may provide better performance in future predictive modeling analyses.