knitr::opts_chunk$set(echo = TRUE)
options(repos = c(CRAN = "https://cran.rstudio.com/"))

#Abstract This project analyzes global education metrics, focusing on the relationship between government expenditure and literacy rates. Using data from the World Education Dataset on Kaggle, we explored trends across countries and regions from 1993 to 2023. The analysis highlights disparities in literacy rates and enrollment levels, revealing that higher government expenditure is generally associated with improved literacy. Key findings include the top-performing countries by literacy rate and the identification of regions needing targeted education policies.

Introduction to Data

Education is a cornerstone of societal development, influencing economic growth, health outcomes, and social stability. This project examines global education trends to understand the factors contributing to disparities in literacy rates and school enrollment. By leveraging the World Education Dataset, we aim to uncover actionable insights that can inform educational policies worldwide.

Data Structure Overview

library(stringr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ purrr     1.0.2
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
education_data <- read.csv("world-education-data.csv")

Handling missing values: Imputing missing values with the mean

education_data <- education_data %>%
  mutate(gov_exp_pct_gdp=ifelse(is.na(gov_exp_pct_gdp),mean(gov_exp_pct_gdp,na.rm = TRUE), gov_exp_pct_gdp), lit_rate_adult_pct= ifelse(is.na(lit_rate_adult_pct),mean(lit_rate_adult_pct, na.rm = TRUE), lit_rate_adult_pct))
# Ensuring rows with NA values are removed
education_data <- education_data %>%
  filter(!is.na(gov_exp_pct_gdp) & !is.na(lit_rate_adult_pct))

Standarnizing country names: Removing extra spaces and using title case

education_data <- education_data %>%
  mutate(country = str_to_title(str_trim(country)))

Removing regional or non-country entries

excluded_entries <- c(
  "Europe & Central Asia", "East Asia & Pacific",
  "East Asia & Pacific (excluding high income)",
  "East Asia & Pacific (IDA & IBRD countries)",
  "Europe & Central Asia (IDA & IBRD countries)",
  "Europe & Central Asia (excluding high income)",
  "Upper middle income", "Late-demographic dividend",
  "Latin America & Caribbean", "Latin America & the Caribbean (IDA & IBRD countries)",
  "Latin America & Caribbean (excluding high income)",
  "West Bank and Gaza", "New Caledonia", "Guam"
)

education_data <- education_data %>%
  filter(!country %in% excluded_entries) %>%
  filter(!is.na(lit_rate_adult_pct))

Filtering out unwanted entries

education_data <- education_data %>%
  filter(!country %in% excluded_entries) %>%
  filter(!is.na(lit_rate_adult_pct))

Bining literacy rate into categories

education_data <- education_data %>%
  mutate(
    lit_rate_category = case_when(
      lit_rate_adult_pct < 50 ~ "Low",
      lit_rate_adult_pct >= 50 & lit_rate_adult_pct < 75 ~ "Medium",
      lit_rate_adult_pct >= 75 ~ "High",
      TRUE ~ NA_character_
    )
  )

Calculating average student-teacher ratio

education_data <- education_data %>%
  mutate(
    avg_student_teacher_ratio = rowMeans(
      cbind(pupil_teacher_primary, pupil_teacher_secondary), na.rm = TRUE
    )
  )

Business Questions

  1. Is there a statistically significant relationship between government expenditure and literacy rates?
  2. How do student-teacher ratios vary across countries and regions?
  3. Which countries and regions underperform in literacy rates, and why? ´
summary(education_data[, c("gov_exp_pct_gdp", 
                           "lit_rate_adult_pct", 
                           "school_enrol_primary_pct", 
                           "school_enrol_secondary_pct", 
                           "school_enrol_tertiary_pct")])
##  gov_exp_pct_gdp   lit_rate_adult_pct school_enrol_primary_pct
##  Min.   : 0.2426   Min.   : 14.00     Min.   :  8.448         
##  1st Qu.: 3.4732   1st Qu.: 79.48     1st Qu.: 97.216         
##  Median : 4.3201   Median : 79.48     Median :101.496         
##  Mean   : 4.3210   Mean   : 79.28     Mean   :101.460         
##  3rd Qu.: 4.8697   3rd Qu.: 79.48     3rd Qu.:106.775         
##  Max.   :15.8635   Max.   :100.00     Max.   :257.434         
##                                       NA's   :538             
##  school_enrol_secondary_pct school_enrol_tertiary_pct
##  Min.   :  3.294            Min.   :  0.1174         
##  1st Qu.: 58.515            1st Qu.: 12.3921         
##  Median : 85.484            Median : 30.4765         
##  Mean   : 78.762            Mean   : 36.3609         
##  3rd Qu.: 99.274            3rd Qu.: 57.2233         
##  Max.   :194.460            Max.   :166.6656         
##  NA's   :1145               NA's   :1498

This table summarizes important education indicators, showing minimum, maximum, and average values. We’ll focus on how government spending and literacy rates are related.

library(ggplot2)

Government Expenditure Over Time

avg_gov_exp <- education_data %>%
  group_by(year) %>%
  summarise(mean_gov_exp = mean(gov_exp_pct_gdp, na.rm = TRUE))

ggplot(avg_gov_exp, aes(x = year, y = mean_gov_exp)) +
  geom_line(color = "blue", size = 1.2) +
  theme_minimal() +
  labs(
    title = "Average Government Expenditure on Education Over Time",
    x = "Year",
    y = "Gov Expenditure (% of GDP)"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    axis.title = element_text(lineheight = 12),
    axis.text = element_text(lineheight = 10)
  )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

This plot focuses on the average government expenditure per year across all countries. The plot shows that the average government expenditure on education has fluctuated over the years. Peaks and dips indicate changing funding priorities globally. Higher government spending is positively correlated with literacy rates, although socioeconomic and cultural factors also play a role.

Average literacy rate over time

avg_lit_rate <- education_data %>%
  group_by(year) %>%
  summarise(mean_lit_rate = mean(lit_rate_adult_pct, na.rm = TRUE))

ggplot(avg_lit_rate, aes(x = year, y = mean_lit_rate)) +
  geom_line(color = "blue", size = 1) +
  theme_minimal() +
  labs(
    title = "Average Adult Literacy Rate Over Time",
    x = "Year",
    y = "Literacy Rate (%)"
  )

The chart highlights how the average literacy rate has evolved globally. The upward trend suggests increased access to education and literacy improvements over time, reflecting better access to education and global efforts like the Millennium Development Goals.

Comparing student-teacher ratios

ggplot(education_data, aes(x = pupil_teacher_primary, y = pupil_teacher_secondary)) +
  geom_point(alpha = 0.5, color = "purple") +
 geom_smooth(method = "lm", color = "red", se = FALSE) +
  theme_minimal() +
  labs(
    title = "Government Expenditure vs. Literacy Rates",
    x = "Gov Expenditure (% of GDP)",
    y = "Literacy Rate (%)"
  )
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2937 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2937 rows containing missing values or values outside the scale range
## (`geom_point()`).

This scatter plot reveals a positive relationship between government spending and literacy rates. The red line indicates that countries investing more in education tend to have higher literacy rates.

Top 10 countries by average literacy rate

top_10_countries <- education_data %>%
  group_by(country) %>%
  summarise(avg_lit_rate = mean(lit_rate_adult_pct, na.rm = TRUE)) %>%
  arrange(desc(avg_lit_rate)) %>%
  slice_max(avg_lit_rate, n = 10)

ggplot(top_10_countries, aes(x = reorder(country, avg_lit_rate), y = avg_lit_rate, fill = avg_lit_rate)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(
    title = "Top 10 Countries by Average Literacy Rate",
    x = "Country",
    y = "Average Literacy Rate (%)"
  ) +
  theme(
    legend.position = "none",
    plot.title = element_text(hjust = 0.5, size = 16),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10)
  )

# Filtering the dataset to exclude regions and non-country entries
cleaned_data <- education_data %>%
  filter(!country %in% excluded_entries) %>%
  filter(!is.na(lit_rate_adult_pct))  # Remove rows with missing literacy rates


worst_10_countries <- cleaned_data %>%
  group_by(country) %>%
  summarise(avg_lit_rate = mean(lit_rate_adult_pct, na.rm = TRUE)) %>%
  arrange(avg_lit_rate) %>%
  slice_min(avg_lit_rate, n = 10)

ggplot(worst_10_countries, aes(x = reorder(country, avg_lit_rate), y = avg_lit_rate, fill = avg_lit_rate)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  scale_fill_gradient(low = "red", high = "orange") +
  labs(
    title = "Top 10 Worst Performing Countries by Literacy Rate",
    x = "Country",
    y = "Average Literacy Rate (%)"
  ) +
  theme(
    legend.position = "none",
    plot.title = element_text(hjust = 0.5, size = 16),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10)
  )

#Results 1. Positive Correlation Between Government Spending and Literacy Rates: Countries that invest a higher percentage of GDP in education generally show improved literacy rates. The top 10 countries by average literacy rates are primarily high-income countries with well-funded education systems.

  1. Disparities in Global Education: Regions like Sub-Saharan Africa have consistently lower literacy rates, indicating a lack of adequate education funding and infrastructure.

  2. School Enrollment Trends: School enrollment at the primary level is near universal, but significant drop-offs occur at the secondary and tertiary levels, particularly in developing countries.

  3. Outliers and Unexpected Findings: Some countries with high education expenditure still have relatively low literacy rates, suggesting that factors like political instability, economic inequality, and systemic inefficiencies may influence education outcomes.

#Result interpretation:

  1. Why Literacy Rates Vary: Literacy rates are shaped by socioeconomic conditions, government policies, and international development efforts. Wealthier countries tend to perform better due to stronger economies and political stability, enabling greater investment in public education.

  2. Policy Implications:

  1. Broader Impacts: Improving literacy contributes to better economic development, lower poverty rates, and greater gender equality. Policy changes must balance financial investment with educational quality and accessibility.

#Conclusion The analysis of global education metrics highlights a clear link between government expenditure and literacy rates, emphasizing the critical role of public investment in education. Countries that allocate a larger share of GDP to education generally achieve better literacy outcomes. However, significant disparities persist, particularly in low-income regions where systemic challenges hinder educational progress.

While financial investment is essential, improving education also requires addressing broader social, political, and economic issues. Governments should prioritize transparency, efficiency, and accountability in education spending while ensuring equitable access for marginalized communities. International organizations and policymakers should collaborate to address these gaps through sustainable funding and educational reforms.

Ultimately, this analysis underscores the importance of education as a global development priority. Investing in education not only improves literacy but also drives long-term economic growth, health outcomes, and social equality. Future research could explore how cultural, political, and technological factors further influence global education trends.