library(tidyverse)
library(readxl) # reads (.xlsx) files used for the income classification datasetDATA110 Final Project
The Resource Curse: A Data-Driven Exploration
Source: Daily JSTOR (2019) - Is the “Resource Curse” a Myth? By Madhuri Karak
The Resource Curse and Development Disparities: A Global Analysis Using World Bank Indicators (2012–2022)
Introduction
This project investigates the concept of “Resource Curse”. A paradox where countries rich in natural resources often experience slower economic growth, weaker institutions, and more political instability. While it might seem intuitive that nations with abundant resources would prosper, many face unique structural challenges that prevent sustainable development.
The dataset used for this project was compiled from the World Bank’s Data Portal. It includes economic and governance indicators for multiple countries between 2012 and 2022. A secondary dataset, also from the World Bank, was merged to include each country’s income classification (e.g., “High Income,” “Low Income”). This final combined dataset consists of 7 variables, covering resource wealth, governance, education spending, and income group across a global sample.
While the World Bank hosts and curates the dataset, some of the original data is sourced from other institutions, while others are calculated in-house by World Bank analysts. For example, governance indicators such as Political Stability and Control of Corruption are aggregated from surveys and expert assessments conducted by over 30 organizations, while education expenditure figures are reported directly by national governments to UNESCO and combined with IMF data. In contrast, the Natural Resources Rent indicator is modeled by the World Bank itself based on economic estimations of extraction costs and market values. These sourcing details are documented in the accompanying metadata file.
Why I chose this Topic
The Resource Curse feels like one of the most overlooked, but deeply revealing dynamics of our global economy. I chose this topic because it brings together energy economics, governance, education, and inequality into one cohesive concept. It’s a concept that blends morality with statistical analysis, and I wanted to challenge myself and explore this concept through the data.
What it Means to Me
This was a concept I stumbled upon while researching Ecological Concepts in other projects and my own intrests. This project allows me to combine that curiosity with a more analytical approach. It’s also current, given how countries in the Global South are navigating post-colonial extraction while simultaneously being pressured to adopt green energy transitions.
Variable Description
Country: This variable identifies each nation included in the dataset. It is used to group observations and track changes across nations. (Categorical)
Year: Represents the calendar year of each observation, ranging from 2012 to 2022 (Quantitative).
Income Group: Classifies each country into one of four income levels as defined by the World Bank “Low income,” “Lower-middle income,” “Upper-middle income,” or “High income.” (Categorical)
Total Natural Resources Rents (% of GDP): Measures the total value of rents from natural resources (oil, coal, minerals, forests, natural gas) expressed as a percentage of the country’s GDP. A higher value indicates greater economic dependence on resource extraction. (Quantitative)
Political Stability and Absence of Violence/Terrorism (Estimate): A governance indicator of political Stability. Scores range from -2.5 (weak stability) to +2.5 (strong stability). (Quantitative)
Control of Corruption (Estimate): Another governance indicator, this measures perceptions of the extent to which public power is exercised for private gain, including petty and grand forms of corruption. It ranges from -2.5 (high corruption) to +2.5 (strong anti-corruption performance). (Quantitative)
Government Expenditure on Education (% of Government Spending): The percentage of a nation’s total public spending that is allocated to the education sector. This metric reflects the country’s investment in human capital and future productivity. (Quantitative)
Merging and Cleaning Data
Load Libraries
Clean and Merge Datasets
# Read the main dataset
resource_data <- read_csv("DATA110 - Resource Curse - Dataset only - 6 variables 2012-2022.csv")Rows: 65 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): Country Name, Country Code, Series Name, Series Code, 2012 [YR2012...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Read the income classification dataset
income_class <- read_excel("Income classification.xlsx")# Preview column names to see the join key
colnames(resource_data) [1] "Country Name" "Country Code" "Series Name" "Series Code"
[5] "2012 [YR2012]" "2013 [YR2013]" "2014 [YR2014]" "2015 [YR2015]"
[9] "2016 [YR2016]" "2017 [YR2017]" "2018 [YR2018]" "2019 [YR2019]"
[13] "2020 [YR2020]" "2021 [YR2021]" "2022 [YR2022]"
colnames(income_class)[1] "Economy" "Code" "Region" "Income group"
[5] "Lending category"
# Clean the join key (make sure both datasets use the same column name and format)
# Rename columns to match
resource_data <- resource_data %>%
rename(Country = `Country Name`,
CountryCode = `Country Code`)
income_class <- income_class %>%
rename(Country = Economy,
CountryCode = Code)# Merge the two datasets
merged_data <- resource_data %>%
left_join(income_class, by = c("Country", "CountryCode")) %>%
select(-`Series Code`) %>%
select(-`Lending category`) %>%
select(-`Region`) # Drop unused columns early
# Check the result
glimpse(merged_data)Rows: 65
Columns: 15
$ Country <chr> "United States", "United States", "United States", "Un…
$ CountryCode <chr> "USA", "USA", "USA", "USA", "AGO", "AGO", "AGO", "AGO"…
$ `Series Name` <chr> "Total natural resources rents (% of GDP)", "Political…
$ `2012 [YR2012]` <chr> "0.776694501385092", "0.632442116737366", "1.402794003…
$ `2013 [YR2013]` <chr> "0.750833995599957", "0.643072605133057", "1.305924892…
$ `2014 [YR2014]` <chr> "0.703153661885122", "0.582419633865356", "1.371378064…
$ `2015 [YR2015]` <chr> "0.234371770177788", "0.662890195846558", "1.357721209…
$ `2016 [YR2016]` <chr> "0.303325431060077", "0.385635316371918", "1.330895662…
$ `2017 [YR2017]` <chr> "0.427546323641111", "0.262138307094574", "1.340532541…
$ `2018 [YR2018]` <chr> "0.595678810382377", "0.38623908162117", "1.2927473783…
$ `2019 [YR2019]` <chr> "0.557098125905551", "0.118298575282097", "1.180357098…
$ `2020 [YR2020]` <chr> "0.329505971924439", "-0.0278997328132391", "1.0378319…
$ `2021 [YR2021]` <chr> "1.27994423022721", "-0.0144246527925134", "1.01927757…
$ `2022 [YR2022]` <chr> "..", "0.00870068650692701", "1.10414469242096", "..",…
$ `Income group` <chr> "High income", "High income", "High income", "High inc…
summary(merged_data$`Income Group`)Warning: Unknown or uninitialised column: `Income Group`.
Length Class Mode
0 NULL NULL
long_data <- merged_data %>%
pivot_longer(
cols = starts_with("20"), # all the year columns
names_to = "Year",
values_to = "Value"
) %>%
mutate(
Year = str_extract(Year, "\\d{4}"), # extract just the 4-digit year
Year = as.integer(Year),
Value = na_if(Value, ".."), # turn ".." into NA
Value = as.numeric(Value) # convert character to number
)
summary(long_data) Country CountryCode Series Name Income group
Length:715 Length:715 Length:715 Length:715
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Year Value
Min. :2012 Min. :-2.6093
1st Qu.:2014 1st Qu.:-0.4557
Median :2017 Median : 1.0071
Mean :2017 Mean : 5.3568
3rd Qu.:2020 3rd Qu.:10.1586
Max. :2022 Max. :49.8510
NA's :111
glimpse(long_data)Rows: 715
Columns: 6
$ Country <chr> "United States", "United States", "United States", "Uni…
$ CountryCode <chr> "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",…
$ `Series Name` <chr> "Total natural resources rents (% of GDP)", "Total natu…
$ `Income group` <chr> "High income", "High income", "High income", "High inco…
$ Year <int> 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2…
$ Value <dbl> 0.776694501, 0.750833996, 0.703153662, 0.234371770, 0.3…
final_data <- long_data %>%
filter(!is.na(`Series Name`) & `Series Name` != "") %>%
group_by(Country, CountryCode, Year, `Series Name`, `Income group`) %>%
summarise(Value = mean(Value, na.rm = TRUE), .groups = "drop") %>%
pivot_wider(names_from = `Series Name`, values_from = Value) %>%
rename(
ResourceRents = `Total natural resources rents (% of GDP)`,
Stability = `Political Stability and Absence of Violence/Terrorism: Estimate`, Corruption = `Control of Corruption: Estimate`, EducationSpending = `Government expenditure on education, total (% of government expenditure)`)
final_data$EducationSpending <- as.numeric(final_data$EducationSpending)
glimpse(final_data)Rows: 165
Columns: 8
$ Country <chr> "Algeria", "Algeria", "Algeria", "Algeria", "Algeria…
$ CountryCode <chr> "DZA", "DZA", "DZA", "DZA", "DZA", "DZA", "DZA", "DZ…
$ Year <int> 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020…
$ `Income group` <chr> "Upper middle income", "Upper middle income", "Upper…
$ Corruption <dbl> -0.5215445, -0.4729669, -0.6131180, -0.6394628, -0.6…
$ EducationSpending <dbl> 15.983726, 17.646034, 16.469202, 16.084555, 16.11104…
$ Stability <dbl> -1.3250433, -1.2023715, -1.1905352, -1.0907866, -1.0…
$ ResourceRents <dbl> 26.252514, 24.223134, 22.612310, 14.925253, 12.81918…
Build the plots
# Load the animation package
# This plot requires gifski package to render frames using gifski_renderer(). If not already installed, R will prompt to install it.
library(gganimate)Plot 1: Political Stability Over Time by Country
p1 <- ggplot(final_data, aes(x = Year, y = Stability, group = Country, color = `Income group`)) +
geom_line(alpha = 0.8) +
geom_point(aes(size = ResourceRents), alpha = 0.8) +
geom_text(
aes(label = Country), data = final_data %>% filter(Year == 2022), hjust = 0, vjust = 0.5, size = 3, show.legend = FALSE) +
scale_x_continuous(breaks = seq(2012, 2024, 1), limits = c(2012, 2024)) +
labs(
title = "Stability Over Time by Country",
subtitle = "One country shown per frame, ordered by appearance",
y = "Political Stability Score", x = "Year"
) +
transition_states(Country, transition_length = 6, state_length = 3) + # slower & smoother
ease_aes("cubic-in-out") +
shadow_mark(alpha = 0.4, past = TRUE, exclude_layer = c(3)) # old lines gently fade in
p1# Save the animation
anim_save("stability_by_country_transition.gif", p1)Figure 1. Political stability score for each country over time (2012–2022). Countries are revealed one at a time and remain visible in the background. Larger dots indicate higher natural resource rents (% of GDP). Labels appear only at the final data point. Colors correspond to income group as classified by the World Bank.
This animated visualization shows how political stability scores have changed from 2012 to 2022 across different countries, one at a time. Each country’s trajectory is represented by a line, with dot size indicating the share of natural resource rents as a percentage of GDP. The income group of each country is color-coded, allowing for quick visual comparison.
The purpose of this plot is to explore whether higher resource dependence correlates with lower or more volatile political stability. Each country’s data appears sequentially to , and the animation preserves the previous countries as faded lines for historical context.
Animated Plot 2: Political Stability by Country and Income Group (2012–2022)
This animated plot aims to explore how political stability varies in relation to natural resource dependence (% of GDP). Countries are grouped by income level and then ordered by their average stability score across the years.
# Create a dedicated dataset for the third animated plot
final_data_anim3 <- final_data %>%
mutate(`Income group` = factor(`Income group`, levels = c("Low income", "Lower middle income", "Upper middle income", "High income")))final_data2_anim3 <- final_data %>%
mutate(`Income group` = ifelse(is.na(`Income group`) & Country == "Venezuela, RB", "Upper middle income", `Income group`), ResourceRents = ifelse(Country == "Venezuela, RB" & is.na(ResourceRents), 18, ResourceRents))
#adding missing values from other documentation sourced from the World Bank# Order income levels manually
final_data2_anim3$`Income group` <- factor(
final_data2_anim3$`Income group`,
levels = c("Low income", "Lower middle income", "Upper middle income", "High income")
)
# Reorder countries within income group by mean of Stability
avg_stability_by_group <- final_data2_anim3 %>%
group_by(`Income group`, Country) %>%
summarise(mean_stability = mean(Stability, na.rm = TRUE), .groups = "drop") %>%
arrange(`Income group`, mean_stability)
# Apply order
final_data2_anim3$Country <- factor(final_data2_anim3$Country, levels = avg_stability_by_group$Country)plot_data <- final_data2_anim3 %>%
filter(Year <= 2021)
ggplot(plot_data, aes(
x = Country,
y = Stability,
size = ResourceRents,
color = `Income group`
)) +
geom_point(alpha = 0.8) +
labs(
title = "Political Stability by Country Through the Years {frame_time}",
subtitle = "Dot size represents Resource Rents (% of GDP)",
x = "Country (Grouped by Income, Ordered by Stability)",
y = "Political Stability Score (-2.5 to 2.5)",
caption = "Data Source: World Bank (2012–2022)"
) +
scale_size_continuous(range = c(2, 10)) +
transition_time(Year) +
ease_aes("cubic-in-out") +
theme_minimal(base_size = 15) +
theme(
axis.text.x = element_text(angle = 45, size = 10, hjust = 1),
axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
plot.title = element_text(size = 20, face = "bold"),
plot.subtitle = element_text(size = 14))# Save it
anim_save("stability_by_country_income_grouped.gif", last_animation(), width = 900, height = 600, res = 150, end_pause = 10, rewind = FALSE)Figure 2. Animated scatterplot showing political stability scores for countries grouped by income level from 2012 to 2022. Dot size represents natural resource rents as a percentage of GDP.
Description
This animated visualization examines how countries’ political stability has changed over time (2012–2022), with countries grouped and colored by income level and sized by their dependence on natural resource rents. Political stability scores range from -2.5 (weak) to 2.5 (strong). Natural resource rents (% of GDP) indicate the share of national income derived from oil, minerals, forests, and other extractive sectors.
Countries are ordered by income level, from low to high, and then by average political stability within each group. Each dot represents a country in a specific year. The size of the dot reflects its resource wealth (larger means a higher % of GDP from natural resources), while the position on the Y-axis reflects its political stability score.
Venezuela, a notable outlier, had no recorded values for resource rents in recent years, so a representative value was used for consistency. This ensures that Venezuela remains visible throughout the animation and its movement can still be compared to others.
This plot reveals a pattern: although high-income countries tend to have greater political stability, high resource wealth does not guarantee stability, while not definitive it supports the idea of the “resource curse.”
Plot 3: Education Spending Plots
In resource-rich countries, governments often rely on extractive revenues instead of taxation, reducing the pressure to raise revenue through taxes and creating dependence on wealth from extractive industries and international trade. This undermines the Taxation–Representation Nexus, the concept that taxpayers are more likely to demand transparency and accountability. An informed public is better equipped to scrutinize spending, demand oversight, be interested and support long-term policies that reduce dependency on natural resource rents. This section explores whether education spending is associated with stronger corruption control and improved governance.
# Create new dataset specifically ordered for this histogram
final_data_hist <- final_data %>%
group_by(Country) %>%
mutate(MeanStability = mean(Stability, na.rm = TRUE)) %>%
ungroup() %>%
mutate(`Income group` = factor(`Income group`,
levels = c("Low income", "Lower middle income", "Upper middle income", "High income"))) %>%
arrange(`Income group`, MeanStability, Country)
#Grouping and ordering by income level and secondarily by Stability mean# Animated histogram
hist_plot <- ggplot(final_data_hist, aes(
x = reorder(Country, -as.numeric(`Income group`)),
y = EducationSpending,
fill = `Income group`
)) +
geom_col() +
scale_y_continuous(limits = c(0, 100))+
labs(
title = "\nEducation Spending by Country\nYear:{closest_state}",
subtitle = "Ordered by Income Group & Political Stability | Height = Education Spending",
x = "Country",
y = "Government Education Spending (% of Gov. Expenditure)",
fill = "Income Group",
caption = "Dot color = Income Group | Data: World Bank (2012–2022)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
transition_states(Year, transition_length = 2, state_length = 1) +
ease_aes("linear")
hist_plotanim_save("histogram_education.gif", hist_plot)This animated histogram tracks government education spending across countries over time. Countries are grouped by income level and ordered by political stability. Education is shown as a proxy for long-term public investment.The plot reflects general governmental prioritization of education across time and income groups.
Analysis
This section performs a statistical analysis to explore whether countries that invest more in education also exhibit stronger control over corruption. A simple linear regression is used to quantify this potential relationship.
# Regression Model
# Run the regression
model <- lm(Corruption ~ EducationSpending, data = final_data)
# Summary of the model
summary(model)
Call:
lm(formula = Corruption ~ EducationSpending, data = final_data)
Residuals:
Min 1Q Median 3Q Max
-2.49748 -0.66373 -0.09041 1.02504 1.68321
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.61012 0.32899 4.894 2.89e-06 ***
EducationSpending -0.09107 0.02181 -4.176 5.43e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.029 on 129 degrees of freedom
(34 observations deleted due to missingness)
Multiple R-squared: 0.1191, Adjusted R-squared: 0.1122
F-statistic: 17.44 on 1 and 129 DF, p-value: 5.431e-05
education_corruption_plot <- ggplot(final_data, aes(x = EducationSpending, y = Corruption, color = `Income group`)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = TRUE, color = "black", linetype = "dashed") +
labs(
title = "Control of Corruption vs. Education Spending",
subtitle = "Regression line shows average trend across all countries (2012–2022)",
x = "Education Spending (% of Government Budget)",
y = "Corruption Control Score (-2.5 to 2.5)",
caption = "Source: World Bank"
) +
theme_minimal()
education_corruption_plot`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 34 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 34 rows containing missing values or values outside the scale range
(`geom_point()`).
The regression suggests a negative relationship between education spending and corruption, though this varies across income groups. The association is not always statistically strong. Contrary to conventional expectations, some high-income countries invest a smaller share of their government budgets in education than lower-income nations.
Conclusion
In this project, I explored whether natural resource wealth correlates with political stability, and whether investment in education reflects stronger governance. While some expected patterns emerged, others challenged assumptions, suggesting the resource curse is not universal, and that institutional quality depends on a complex interplay of economic and civic factors. Future work could include more variables or broader timeframes.