Introduction

Research question. Do countries with higher internet access (individuals using the internet, % of population) have higher GDP per capita (PPP)?

Dataset. I analyze a 2019 country-level cross-section from the World Bank’s World Development Indicators (WDI), retrieved via the WDI R package from the approved source list. The working file includes identifiers and context fields—iso3c, country, region, income, and year—and two primary indicators: GDP per capita, PPP (constant 2017 international $; WDI code NY.GDP.PCAP.PP.KD) and Individuals using the Internet (% of population; WDI code IT.NET.USER.ZS). I also construct log_gdp_pcap_ppp for interpretability in plots and models. Together, these variables allow me to describe distributions, visualize the bivariate relationship, and estimate a simple association between connectivity and economic prosperity.

Source. World Bank (2024), World Development Indicators (DataBank); Bache & Wickham (2024), WDI R package.

library(WDI)
library(dplyr)

# Download WDI (2019 cross-section)
raw <- WDI(
  country   = "all",
  indicator = c(
    gdp      = "NY.GDP.PCAP.PP.KD",   # GDP per capita
    internet = "IT.NET.USER.ZS"       # Internet users 
  ),
  start = 2019, end = 2019, extra = TRUE
)

# Keep needed columns, remove NAs, add log GDP 
wdi19 <- raw |>
  select(
    iso3c, country, region, income, year,
    gdp_pcap_ppp       = gdp,
    internet_users_pct = internet
  ) |>
  filter(!is.na(gdp_pcap_ppp), !is.na(internet_users_pct)) |>
  mutate(log_gdp_pcap_ppp = log(gdp_pcap_ppp))

# Quick checks 
dim(wdi19)
## [1] 214   8
head(wdi19, 10)
##    iso3c                     country                     region
## 1    AFG                 Afghanistan                 South Asia
## 2    AFE Africa Eastern and Southern                 Aggregates
## 3    AFW  Africa Western and Central                 Aggregates
## 4    ALB                     Albania      Europe & Central Asia
## 5    DZA                     Algeria Middle East & North Africa
## 6    AND                     Andorra      Europe & Central Asia
## 7    AGO                      Angola         Sub-Saharan Africa
## 8    ATG         Antigua and Barbuda  Latin America & Caribbean
## 9    ARG                   Argentina  Latin America & Caribbean
## 10   ARM                     Armenia      Europe & Central Asia
##                 income year gdp_pcap_ppp internet_users_pct log_gdp_pcap_ppp
## 1           Low income 2019     2927.245            17.6000         7.981817
## 2           Aggregates 2019     4073.654            22.4000         8.312296
## 3           Aggregates 2019     4822.310            28.7000         8.481008
## 4  Upper middle income 2019    15079.374            68.5504         9.621083
## 5  Upper middle income 2019    15199.199            55.7907         9.628998
## 6          High income 2019    63215.900            90.7187        11.054311
## 7  Lower middle income 2019     8274.543            32.1294         9.020939
## 8          High income 2019    29651.864            73.9792        10.297280
## 9  Upper middle income 2019    26629.553            79.9470        10.189777
## 10 Upper middle income 2019    16215.361            66.5439         9.693714
summary(wdi19[, c("gdp_pcap_ppp", "internet_users_pct", "log_gdp_pcap_ppp")])
##   gdp_pcap_ppp      internet_users_pct log_gdp_pcap_ppp
##  Min.   :   855.7   Min.   : 6.10      Min.   : 6.752  
##  1st Qu.:  5830.7   1st Qu.:36.35      1st Qu.: 8.671  
##  Median : 15719.5   Median :63.85      Median : 9.663  
##  Mean   : 24055.3   Mean   :58.67      Mean   : 9.555  
##  3rd Qu.: 34208.0   3rd Qu.:81.04      3rd Qu.:10.440  
##  Max.   :133549.2   Max.   :99.70      Max.   :11.802
table(wdi19$income)
## 
##          Aggregates         High income          Low income Lower middle income 
##                  24                  62                  20                  49 
## Upper middle income 
##                  49

Data Analysis

I examine how internet access relates to economic prosperity using a 2019 cross-section of countries from the World Bank’s WDI. First, I ensure a clean analysis dataset by selecting needed columns and removing missing values, then create a log transform of GDP per capita for interpretability. I describe distributions (summary stats and a histogram of internet access), visualize the bivariate relationship (scatter with linear fit), and aggregate by income group to see whether the pattern holds across development levels.

raw <- WDI(
  country   = "all",
  indicator = c(gdp = "NY.GDP.PCAP.PP.KD", internet = "IT.NET.USER.ZS"),
  start = 2019, end = 2019, extra = TRUE
)

wdi19 <- raw |>
  select(iso3c, country, region, income, year,
         gdp_pcap_ppp = gdp,
         internet_users_pct = internet) |>
  filter(!is.na(gdp_pcap_ppp), !is.na(internet_users_pct)) |>
  mutate(log_gdp_pcap_ppp = log(gdp_pcap_ppp))

# quick checks
dim(wdi19)
## [1] 214   8
head(wdi19, 10)
##    iso3c                     country                     region
## 1    AFG                 Afghanistan                 South Asia
## 2    AFE Africa Eastern and Southern                 Aggregates
## 3    AFW  Africa Western and Central                 Aggregates
## 4    ALB                     Albania      Europe & Central Asia
## 5    DZA                     Algeria Middle East & North Africa
## 6    AND                     Andorra      Europe & Central Asia
## 7    AGO                      Angola         Sub-Saharan Africa
## 8    ATG         Antigua and Barbuda  Latin America & Caribbean
## 9    ARG                   Argentina  Latin America & Caribbean
## 10   ARM                     Armenia      Europe & Central Asia
##                 income year gdp_pcap_ppp internet_users_pct log_gdp_pcap_ppp
## 1           Low income 2019     2927.245            17.6000         7.981817
## 2           Aggregates 2019     4073.654            22.4000         8.312296
## 3           Aggregates 2019     4822.310            28.7000         8.481008
## 4  Upper middle income 2019    15079.374            68.5504         9.621083
## 5  Upper middle income 2019    15199.199            55.7907         9.628998
## 6          High income 2019    63215.900            90.7187        11.054311
## 7  Lower middle income 2019     8274.543            32.1294         9.020939
## 8          High income 2019    29651.864            73.9792        10.297280
## 9  Upper middle income 2019    26629.553            79.9470        10.189777
## 10 Upper middle income 2019    16215.361            66.5439         9.693714
summary(wdi19[, c("gdp_pcap_ppp","internet_users_pct","log_gdp_pcap_ppp")])
##   gdp_pcap_ppp      internet_users_pct log_gdp_pcap_ppp
##  Min.   :   855.7   Min.   : 6.10      Min.   : 6.752  
##  1st Qu.:  5830.7   1st Qu.:36.35      1st Qu.: 8.671  
##  Median : 15719.5   Median :63.85      Median : 9.663  
##  Mean   : 24055.3   Mean   :58.67      Mean   : 9.555  
##  3rd Qu.: 34208.0   3rd Qu.:81.04      3rd Qu.:10.440  
##  Max.   :133549.2   Max.   :99.70      Max.   :11.802
table(wdi19$income)
## 
##          Aggregates         High income          Low income Lower middle income 
##                  24                  62                  20                  49 
## Upper middle income 
##                  49
# Summary statistics for key variables
summary(wdi19[, c("gdp_pcap_ppp", "internet_users_pct", "log_gdp_pcap_ppp")])
##   gdp_pcap_ppp      internet_users_pct log_gdp_pcap_ppp
##  Min.   :   855.7   Min.   : 6.10      Min.   : 6.752  
##  1st Qu.:  5830.7   1st Qu.:36.35      1st Qu.: 8.671  
##  Median : 15719.5   Median :63.85      Median : 9.663  
##  Mean   : 24055.3   Mean   :58.67      Mean   : 9.555  
##  3rd Qu.: 34208.0   3rd Qu.:81.04      3rd Qu.:10.440  
##  Max.   :133549.2   Max.   :99.70      Max.   :11.802
# EDA metrics
mean_internet <- mean(wdi19$internet_users_pct)  
max_gdp <- max(wdi19$gdp_pcap_ppp)               
mean_internet; max_gdp
## [1] 58.66679
## [1] 133549.2
hist(wdi19$internet_users_pct,
     breaks = 30,
     main  = "Distribution of Internet Use (% of Population), 2019",
     xlab  = "Internet users (% of population)",
     ylab  = "Number of countries")

plot(wdi19$internet_users_pct, wdi19$gdp_pcap_ppp,
     xlab = "Internet users (% of population)",
     ylab = "GDP per capita, PPP (2017 intl $)",
     main = "Internet Access vs. GDP per Capita (PPP), 2019",
     pch  = 19, cex = 0.7)
fit <- lm(gdp_pcap_ppp ~ internet_users_pct, data = wdi19)
abline(fit, lwd = 2)

oly_style <- wdi19 |>
  group_by(income) |>
  summarize(
    n            = n(),
    mean_internet= mean(internet_users_pct, na.rm = TRUE),
    mean_gdp_ppp = mean(gdp_pcap_ppp,      na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(desc(mean_gdp_ppp))

oly_style
## # A tibble: 6 × 4
##   income                  n mean_internet mean_gdp_ppp
##   <chr>               <int>         <dbl>        <dbl>
## 1 High income            62          85.6       53574.
## 2 Upper middle income    49          65.6       18109.
## 3 <NA>                   10          50.2       18057.
## 4 Aggregates             24          47.4       15679.
## 5 Lower middle income    49          41.8        6851.
## 6 Low income             20          17.0        2317.

Conclusion Across countries in 2019, higher rates of internet use are associated with higher GDP per capita (PPP). The histogram shows wide dispersion in connectivity, while the scatter plot and fitted linear trend indicate a clear positive relationship between connectivity and income levels. Summary statistics by income group reinforce this pattern: groups with greater average internet penetration tend to report higher average GDP per capita. While these results are consistent with the idea that digital access supports productivity and market integration, they remain correlational.

Looking ahead, a richer design would help clarify mechanisms and directionality. Future work could (a) add controls for education, urbanization, governance quality, and regional effects; (b) model non-linearities or thresholds in the connectivity–income link; (c) extend to a multi-year panel with country fixed effects to study within-country changes; and (d) replace or complement GDP with broader outcomes (e.g., poverty rates, labor productivity, or human capital indices). Together, these steps would strengthen causal interpretation and policy relevance.