# Import World Bank indicators
indicators <- c(
gdp_growth = "NY.GDP.MKTP.KD.ZG",
renewable_energy = "EG.FEC.RNEW.ZS",
population_growth = "SP.POP.GROW",
trade_openness = "NE.TRD.GNFS.ZS",
inflation = "FP.CPI.TOTL.ZG",
gdp_per_capita = "NY.GDP.PCAP.CD"
)Stage 1: Renewable Energy and Economic Growth
Introduction
This project investigates whether renewable energy consumption can help predict economic growth across countries. The analysis uses data from the World Bank World Development Indicators between 2000 and 2023. The dataset includes economic and environmental variables such as GDP growth, renewable energy consumption, inflation, trade openness, population growth, and GDP per capita.
Economic Question
Can renewable energy consumption help predict GDP growth across countries?
raw_data <- WDI(
country = "all",
indicator = indicators,
start = 2000,
end = 2023,
extra = TRUE
)
glimpse(raw_data)Rows: 6,384
Columns: 18
$ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghan…
$ iso2c <chr> "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF"…
$ iso3c <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AF…
$ year <int> 2007, 2010, 2011, 2021, 2012, 2009, 2020, 2000, 2014…
$ status <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", …
$ lastupdated <chr> "2026-04-08", "2026-04-08", "2026-04-08", "2026-04-0…
$ gdp_growth <dbl> 13.8263195, 14.3624415, 0.4263548, -20.7388394, 12.7…
$ renewable_energy <dbl> 28.8, 15.2, 12.6, 20.0, 15.4, 16.5, 18.2, 45.0, 19.1…
$ population_growth <dbl> 1.8925975, 2.9346867, 3.6915031, 2.3560978, 4.047862…
$ trade_openness <dbl> NA, NA, NA, 51.41172, NA, NA, 46.70989, NA, NA, NA, …
$ inflation <dbl> 8.6805708, 2.1785375, 11.8041858, 5.1332034, 6.44121…
$ gdp_per_capita <dbl> 376.2232, 560.6215, 606.6947, 356.4962, 651.4171, 45…
$ region <chr> "Middle East, North Africa, Afghanistan & Pakistan",…
$ capital <chr> "Kabul", "Kabul", "Kabul", "Kabul", "Kabul", "Kabul"…
$ longitude <chr> "69.1761", "69.1761", "69.1761", "69.1761", "69.1761…
$ latitude <chr> "34.5228", "34.5228", "34.5228", "34.5228", "34.5228…
$ income <chr> "Low income", "Low income", "Low income", "Low incom…
$ lending <chr> "IDA", "IDA", "IDA", "IDA", "IDA", "IDA", "IDA", "ID…
# Clean dataset and remove missing values
clean_data <- raw_data %>%
clean_names() %>%
filter(region != "Aggregates") %>%
select(
country,
iso2c,
year,
gdp_growth,
renewable_energy,
population_growth,
trade_openness,
inflation,
gdp_per_capita,
income
) %>%
drop_na(
gdp_growth,
renewable_energy,
population_growth,
trade_openness,
inflation,
gdp_per_capita
)
glimpse(clean_data)Rows: 3,472
Columns: 10
$ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Albani…
$ iso2c <chr> "AF", "AF", "AF", "AL", "AL", "AL", "AL", "AL", "AL"…
$ year <int> 2021, 2020, 2022, 2003, 2019, 2020, 2005, 2021, 2018…
$ gdp_growth <dbl> -20.7388394, -2.3511007, -6.2401720, 5.3332643, 2.06…
$ renewable_energy <dbl> 20.0, 18.2, 20.0, 33.7, 40.1, 44.4, 36.8, 41.9, 37.8…
$ population_growth <dbl> 2.3560978, 3.1536092, 1.4357044, -0.3741492, -1.5431…
$ trade_openness <dbl> 51.41172, 46.70989, 72.88547, 64.82322, 75.38213, 59…
$ inflation <dbl> 5.13320341, 5.60188791, 13.71210237, 0.48400261, 1.4…
$ gdp_per_capita <dbl> 356.4962, 510.7871, 357.2612, 1908.6990, 6069.4390, …
$ income <chr> "Low income", "Low income", "Low income", "Upper mid…
nrow(clean_data)[1] 3472
Dataset Description
The dataset was obtained from the World Bank World Development Indicators database using the WDI package in R. The dataset contains country-level observations from 2000 to 2023. After cleaning and removing missing values, the final dataset contains 3,472 country-year observations and 10 variables.
This dataset is relevant because renewable energy and economic growth are important topics in modern economics. Governments increasingly try to balance economic growth with environmental sustainability.
Source: https://data.worldbank.org/
# Calculate summary statistics for renewable energy consumption
renewable_summary <- clean_data %>%
summarise(
mean = mean(renewable_energy),
median = median(renewable_energy),
sd = sd(renewable_energy),
min = min(renewable_energy),
q1 = quantile(renewable_energy, 0.25),
q3 = quantile(renewable_energy, 0.75),
max = max(renewable_energy)
)
renewable_summary mean median sd min q1 q3 max
1 31.59453 23.1 28.75385 0 6.5 51.325 98.3
Interpretation of Summary Statistics
The average renewable energy consumption is approximately 31.6 percent, while the median is 23.1 percent. The standard deviation is relatively high, indicating substantial variation across countries. Some countries use very little renewable energy, while others rely heavily on renewable sources.
# Create histogram of renewable energy consumption
ggplot(clean_data, aes(x = renewable_energy)) +
geom_histogram(bins = 30) +
labs(
title = "Renewable Energy Consumption Distribution",
x = "Renewable energy consumption (% of total final energy consumption)",
y = "Number of observations"
)Histogram Interpretation
The histogram indicates that renewable energy consumption is not normally distributed. Most observations are concentrated at lower and moderate renewable energy levels, while fewer countries exhibit extremely high renewable energy shares. This creates a right-skewed distribution with a long upper tail.
# Apply log transformation
clean_data <- clean_data %>%
mutate(log_renewable_energy = log(renewable_energy + 1))
ggplot(clean_data, aes(x = log_renewable_energy)) +
geom_histogram(bins = 30) +
labs(
title = "Distribution of Log Renewable Energy Consumption",
x = "Log of renewable energy consumption",
y = "Number of observations"
)Log Transformation Interpretation
After applying the log transformation, the distribution becomes less skewed and more balanced. The transformed variable appears closer to a normal distribution than the original variable.
Theoretical Distribution
Based on the histogram, renewable energy consumption appears to follow a right-skewed distribution. Therefore, a log-normal distribution may better approximate the data than a normal distribution. After applying the log transformation, the distribution becomes more symmetric and closer to a normal shape.
Conclusion
In Stage 1, a cross-country economic dataset was collected and cleaned using the World Bank WDI database. The analysis focused on renewable energy consumption and economic growth. Summary statistics and histograms showed that renewable energy consumption is right-skewed, while the log transformation produced a more balanced distribution. The transformed variable may provide better performance in future predictive modeling analyses.