This study uses a dataset from Our World in Data that combines information from UNESCO, the World Bank, and other international sources. The dataset includes country-level observations on adult literacy rates and GDP per capita for multiple years. For this analysis, data from 2013 to 2023 were used, and each observation represents a country in a given year. After removing missing values, the final dataset contains 458 country-year observations. Adult literacy rate is measured as the percentage of people aged 15 and older who can read and write
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
literacy_data <- read_csv("https://ourworldindata.org/grapher/literacy-rate-vs-gdp-per-capita.csv?v=1&csvType=full&useColumnShortNames=true")
## Rows: 59712 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Entity, Code, owid_region
## dbl (4): Year, adult_literacy_rate__population_15plus_years__both_sexes__pct...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(literacy_data)
## # A tibble: 6 × 7
## Entity Code Year adult_literacy_rate__population_15…¹ ny_gdp_pcap_pp_kd
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1979 18 NA
## 2 Afghanistan AFG 2011 31 2757.
## 3 Afghanistan AFG 2015 33.8 2968.
## 4 Afghanistan AFG 2021 37 2144.
## 5 Afghanistan AFG 2000 NA 1618.
## 6 Afghanistan AFG 2001 NA 1454.
## # ℹ abbreviated name:
## # ¹​adult_literacy_rate__population_15plus_years__both_sexes__pct__lr_ag15t99
## # ℹ 2 more variables: population_historical <dbl>, owid_region <chr>
library(tidyverse)
colnames(literacy_data)
## [1] "Entity"
## [2] "Code"
## [3] "Year"
## [4] "adult_literacy_rate__population_15plus_years__both_sexes__pct__lr_ag15t99"
## [5] "ny_gdp_pcap_pp_kd"
## [6] "population_historical"
## [7] "owid_region"
literacy_2013_2023 <- literacy_data %>%
filter(Year >= 2013 & Year <= 2023) %>%
select(
Entity,
Year,
literacy_rate = adult_literacy_rate__population_15plus_years__both_sexes__pct__lr_ag15t99,
gdp_per_capita = ny_gdp_pcap_pp_kd
) %>%
drop_na()
glimpse(literacy_2013_2023)
## Rows: 458
## Columns: 4
## $ Entity <chr> "Afghanistan", "Afghanistan", "Albania", "Angola", "Ang…
## $ Year <dbl> 2015, 2021, 2017, 2014, 2015, 2016, 2017, 2020, 2022, 2…
## $ literacy_rate <dbl> 33.75384, 37.00000, 98.81623, 66.00000, 66.23586, 100.0…
## $ gdp_per_capita <dbl> 2967.692, 2144.166, 14155.040, 10250.593, 9980.930, 131…
The dataset was filtered to include data from 2013 to 2023, and key variables were selected and renamed for clarity. Observations with missing values were removed prior to analysis.
model <- lm(literacy_rate ~ gdp_per_capita, data = literacy_2013_2023)
summary(model)
##
## Call:
## lm(formula = literacy_rate ~ gdp_per_capita, data = literacy_2013_2023)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.232 -7.832 6.236 11.021 20.242
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.727e+01 9.504e-01 81.30 <2e-16 ***
## gdp_per_capita 3.656e-04 3.308e-05 11.05 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.46 on 456 degrees of freedom
## Multiple R-squared: 0.2113, Adjusted R-squared: 0.2095
## F-statistic: 122.1 on 1 and 456 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(model)
This analysis examined the relationship between GDP per capita and adult
literacy rates using data from countries between 2013 and 2023. The
results show a clear positive relationship between economic prosperity
and literacy. Countries with higher GDP per capita tend to have higher
adult literacy rates.
GDP per capita was a statistically significant predictor of literacy rates (p < 0.001). On average, an increase of $1,000 in GDP per capita is associated with about a 0.37 percentage-point increase in adult literacy. This suggests that higher national income levels are generally linked to better educational outcomes.
The model explains about 21% of the differences in adult literacy rates across countries. While this indicates that economic development plays an important role, it also shows that literacy is influenced by other factors such as education policy, access to schooling, and historical conditions.
Overall, the regression model is statistically significant, meaning that GDP per capita provides meaningful information about adult literacy rates across countries.