P4

This study uses a dataset from Our World in Data that combines information from UNESCO, the World Bank, and other international sources. The dataset includes country-level observations on adult literacy rates and GDP per capita for multiple years. For this analysis, data from 2013 to 2023 were used, and each observation represents a country in a given year. After removing missing values, the final dataset contains 458 country-year observations. Adult literacy rate is measured as the percentage of people aged 15 and older who can read and write

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

literacy_data <- read_csv("https://ourworldindata.org/grapher/literacy-rate-vs-gdp-per-capita.csv?v=1&csvType=full&useColumnShortNames=true")

## Rows: 59712 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Entity, Code, owid_region
## dbl (4): Year, adult_literacy_rate__population_15plus_years__both_sexes__pct...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(literacy_data)

## # A tibble: 6 × 7
##   Entity      Code   Year adult_literacy_rate__population_15…¹ ny_gdp_pcap_pp_kd
##   <chr>       <chr> <dbl>                                <dbl>             <dbl>
## 1 Afghanistan AFG    1979                                 18                 NA 
## 2 Afghanistan AFG    2011                                 31               2757.
## 3 Afghanistan AFG    2015                                 33.8             2968.
## 4 Afghanistan AFG    2021                                 37               2144.
## 5 Afghanistan AFG    2000                                 NA               1618.
## 6 Afghanistan AFG    2001                                 NA               1454.
## # ℹ abbreviated name:
## #   ¹adult_literacy_rate__population_15plus_years__both_sexes__pct__lr_ag15t99
## # ℹ 2 more variables: population_historical <dbl>, owid_region <chr>

library(tidyverse)



colnames(literacy_data)

## [1] "Entity"                                                                   
## [2] "Code"                                                                     
## [3] "Year"                                                                     
## [4] "adult_literacy_rate__population_15plus_years__both_sexes__pct__lr_ag15t99"
## [5] "ny_gdp_pcap_pp_kd"                                                        
## [6] "population_historical"                                                    
## [7] "owid_region"

literacy_2013_2023 <- literacy_data %>%
  filter(Year >= 2013 & Year <= 2023) %>%
  select(
    Entity,
    Year,
    literacy_rate = adult_literacy_rate__population_15plus_years__both_sexes__pct__lr_ag15t99,
    gdp_per_capita = ny_gdp_pcap_pp_kd
  ) %>%
  drop_na()

glimpse(literacy_2013_2023)

## Rows: 458
## Columns: 4
## $ Entity         <chr> "Afghanistan", "Afghanistan", "Albania", "Angola", "Ang…
## $ Year           <dbl> 2015, 2021, 2017, 2014, 2015, 2016, 2017, 2020, 2022, 2…
## $ literacy_rate  <dbl> 33.75384, 37.00000, 98.81623, 66.00000, 66.23586, 100.0…
## $ gdp_per_capita <dbl> 2967.692, 2144.166, 14155.040, 10250.593, 9980.930, 131…

The dataset was filtered to include data from 2013 to 2023, and key variables were selected and renamed for clarity. Observations with missing values were removed prior to analysis.

model <- lm(literacy_rate ~ gdp_per_capita, data = literacy_2013_2023)
summary(model)

## 
## Call:
## lm(formula = literacy_rate ~ gdp_per_capita, data = literacy_2013_2023)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.232  -7.832   6.236  11.021  20.242 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    7.727e+01  9.504e-01   81.30   <2e-16 ***
## gdp_per_capita 3.656e-04  3.308e-05   11.05   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.46 on 456 degrees of freedom
## Multiple R-squared:  0.2113, Adjusted R-squared:  0.2095 
## F-statistic: 122.1 on 1 and 456 DF,  p-value: < 2.2e-16

par(mfrow = c(2, 2))
plot(model)

This analysis examined the relationship between GDP per capita and adult literacy rates using data from countries between 2013 and 2023. The results show a clear positive relationship between economic prosperity and literacy. Countries with higher GDP per capita tend to have higher adult literacy rates.

GDP per capita was a statistically significant predictor of literacy rates (p < 0.001). On average, an increase of $1,000 in GDP per capita is associated with about a 0.37 percentage-point increase in adult literacy. This suggests that higher national income levels are generally linked to better educational outcomes.

The model explains about 21% of the differences in adult literacy rates across countries. While this indicates that economic development plays an important role, it also shows that literacy is influenced by other factors such as education policy, access to schooling, and historical conditions.

Overall, the regression model is statistically significant, meaning that GDP per capita provides meaningful information about adult literacy rates across countries.

P4

2025-12-21