Research question. Do countries with higher internet access (individuals using the internet, % of population) have higher GDP per capita (PPP)?
Dataset. I analyze a 2019 country-level cross-section from the World Bank’s World Development Indicators (WDI), retrieved via the WDI R package from the approved source list. The working file includes identifiers and context fields—iso3c, country, region, income, and year—and two primary indicators: GDP per capita, PPP (constant 2017 international $; WDI code NY.GDP.PCAP.PP.KD) and Individuals using the Internet (% of population; WDI code IT.NET.USER.ZS). I also construct log_gdp_pcap_ppp for interpretability in plots and models. Together, these variables allow me to describe distributions, visualize the bivariate relationship, and estimate a simple association between connectivity and economic prosperity.
Source. World Bank (2024), World Development Indicators (DataBank); Bache & Wickham (2024), WDI R package.
library(WDI)
library(dplyr)
# Download WDI (2019 cross-section)
raw <- WDI(
country = "all",
indicator = c(
gdp = "NY.GDP.PCAP.PP.KD", # GDP per capita
internet = "IT.NET.USER.ZS" # Internet users
),
start = 2019, end = 2019, extra = TRUE
)
# Keep needed columns, remove NAs, add log GDP
wdi19 <- raw |>
select(
iso3c, country, region, income, year,
gdp_pcap_ppp = gdp,
internet_users_pct = internet
) |>
filter(!is.na(gdp_pcap_ppp), !is.na(internet_users_pct)) |>
mutate(log_gdp_pcap_ppp = log(gdp_pcap_ppp))
# Quick checks
dim(wdi19)
## [1] 214 8
head(wdi19, 10)
## iso3c country region
## 1 AFG Afghanistan South Asia
## 2 AFE Africa Eastern and Southern Aggregates
## 3 AFW Africa Western and Central Aggregates
## 4 ALB Albania Europe & Central Asia
## 5 DZA Algeria Middle East & North Africa
## 6 AND Andorra Europe & Central Asia
## 7 AGO Angola Sub-Saharan Africa
## 8 ATG Antigua and Barbuda Latin America & Caribbean
## 9 ARG Argentina Latin America & Caribbean
## 10 ARM Armenia Europe & Central Asia
## income year gdp_pcap_ppp internet_users_pct log_gdp_pcap_ppp
## 1 Low income 2019 2927.245 17.6000 7.981817
## 2 Aggregates 2019 4073.654 22.4000 8.312296
## 3 Aggregates 2019 4822.310 28.7000 8.481008
## 4 Upper middle income 2019 15079.374 68.5504 9.621083
## 5 Upper middle income 2019 15199.199 55.7907 9.628998
## 6 High income 2019 63215.900 90.7187 11.054311
## 7 Lower middle income 2019 8274.543 32.1294 9.020939
## 8 High income 2019 29651.864 73.9792 10.297280
## 9 Upper middle income 2019 26629.553 79.9470 10.189777
## 10 Upper middle income 2019 16215.361 66.5439 9.693714
summary(wdi19[, c("gdp_pcap_ppp", "internet_users_pct", "log_gdp_pcap_ppp")])
## gdp_pcap_ppp internet_users_pct log_gdp_pcap_ppp
## Min. : 855.7 Min. : 6.10 Min. : 6.752
## 1st Qu.: 5830.7 1st Qu.:36.35 1st Qu.: 8.671
## Median : 15719.5 Median :63.85 Median : 9.663
## Mean : 24055.3 Mean :58.67 Mean : 9.555
## 3rd Qu.: 34208.0 3rd Qu.:81.04 3rd Qu.:10.440
## Max. :133549.2 Max. :99.70 Max. :11.802
table(wdi19$income)
##
## Aggregates High income Low income Lower middle income
## 24 62 20 49
## Upper middle income
## 49
Data Analysis
I examine how internet access relates to economic prosperity using a 2019 cross-section of countries from the World Bank’s WDI. First, I ensure a clean analysis dataset by selecting needed columns and removing missing values, then create a log transform of GDP per capita for interpretability. I describe distributions (summary stats and a histogram of internet access), visualize the bivariate relationship (scatter with linear fit), and aggregate by income group to see whether the pattern holds across development levels.
raw <- WDI(
country = "all",
indicator = c(gdp = "NY.GDP.PCAP.PP.KD", internet = "IT.NET.USER.ZS"),
start = 2019, end = 2019, extra = TRUE
)
wdi19 <- raw |>
select(iso3c, country, region, income, year,
gdp_pcap_ppp = gdp,
internet_users_pct = internet) |>
filter(!is.na(gdp_pcap_ppp), !is.na(internet_users_pct)) |>
mutate(log_gdp_pcap_ppp = log(gdp_pcap_ppp))
# quick checks
dim(wdi19)
## [1] 214 8
head(wdi19, 10)
## iso3c country region
## 1 AFG Afghanistan South Asia
## 2 AFE Africa Eastern and Southern Aggregates
## 3 AFW Africa Western and Central Aggregates
## 4 ALB Albania Europe & Central Asia
## 5 DZA Algeria Middle East & North Africa
## 6 AND Andorra Europe & Central Asia
## 7 AGO Angola Sub-Saharan Africa
## 8 ATG Antigua and Barbuda Latin America & Caribbean
## 9 ARG Argentina Latin America & Caribbean
## 10 ARM Armenia Europe & Central Asia
## income year gdp_pcap_ppp internet_users_pct log_gdp_pcap_ppp
## 1 Low income 2019 2927.245 17.6000 7.981817
## 2 Aggregates 2019 4073.654 22.4000 8.312296
## 3 Aggregates 2019 4822.310 28.7000 8.481008
## 4 Upper middle income 2019 15079.374 68.5504 9.621083
## 5 Upper middle income 2019 15199.199 55.7907 9.628998
## 6 High income 2019 63215.900 90.7187 11.054311
## 7 Lower middle income 2019 8274.543 32.1294 9.020939
## 8 High income 2019 29651.864 73.9792 10.297280
## 9 Upper middle income 2019 26629.553 79.9470 10.189777
## 10 Upper middle income 2019 16215.361 66.5439 9.693714
summary(wdi19[, c("gdp_pcap_ppp","internet_users_pct","log_gdp_pcap_ppp")])
## gdp_pcap_ppp internet_users_pct log_gdp_pcap_ppp
## Min. : 855.7 Min. : 6.10 Min. : 6.752
## 1st Qu.: 5830.7 1st Qu.:36.35 1st Qu.: 8.671
## Median : 15719.5 Median :63.85 Median : 9.663
## Mean : 24055.3 Mean :58.67 Mean : 9.555
## 3rd Qu.: 34208.0 3rd Qu.:81.04 3rd Qu.:10.440
## Max. :133549.2 Max. :99.70 Max. :11.802
table(wdi19$income)
##
## Aggregates High income Low income Lower middle income
## 24 62 20 49
## Upper middle income
## 49
# Summary statistics for key variables
summary(wdi19[, c("gdp_pcap_ppp", "internet_users_pct", "log_gdp_pcap_ppp")])
## gdp_pcap_ppp internet_users_pct log_gdp_pcap_ppp
## Min. : 855.7 Min. : 6.10 Min. : 6.752
## 1st Qu.: 5830.7 1st Qu.:36.35 1st Qu.: 8.671
## Median : 15719.5 Median :63.85 Median : 9.663
## Mean : 24055.3 Mean :58.67 Mean : 9.555
## 3rd Qu.: 34208.0 3rd Qu.:81.04 3rd Qu.:10.440
## Max. :133549.2 Max. :99.70 Max. :11.802
# EDA metrics
mean_internet <- mean(wdi19$internet_users_pct)
max_gdp <- max(wdi19$gdp_pcap_ppp)
mean_internet; max_gdp
## [1] 58.66679
## [1] 133549.2
hist(wdi19$internet_users_pct,
breaks = 30,
main = "Distribution of Internet Use (% of Population), 2019",
xlab = "Internet users (% of population)",
ylab = "Number of countries")
plot(wdi19$internet_users_pct, wdi19$gdp_pcap_ppp,
xlab = "Internet users (% of population)",
ylab = "GDP per capita, PPP (2017 intl $)",
main = "Internet Access vs. GDP per Capita (PPP), 2019",
pch = 19, cex = 0.7)
fit <- lm(gdp_pcap_ppp ~ internet_users_pct, data = wdi19)
abline(fit, lwd = 2)
oly_style <- wdi19 |>
group_by(income) |>
summarize(
n = n(),
mean_internet= mean(internet_users_pct, na.rm = TRUE),
mean_gdp_ppp = mean(gdp_pcap_ppp, na.rm = TRUE),
.groups = "drop"
) |>
arrange(desc(mean_gdp_ppp))
oly_style
## # A tibble: 6 × 4
## income n mean_internet mean_gdp_ppp
## <chr> <int> <dbl> <dbl>
## 1 High income 62 85.6 53574.
## 2 Upper middle income 49 65.6 18109.
## 3 <NA> 10 50.2 18057.
## 4 Aggregates 24 47.4 15679.
## 5 Lower middle income 49 41.8 6851.
## 6 Low income 20 17.0 2317.
Conclusion Across countries in 2019, higher rates of internet use are associated with higher GDP per capita (PPP). The histogram shows wide dispersion in connectivity, while the scatter plot and fitted linear trend indicate a clear positive relationship between connectivity and income levels. Summary statistics by income group reinforce this pattern: groups with greater average internet penetration tend to report higher average GDP per capita. While these results are consistent with the idea that digital access supports productivity and market integration, they remain correlational.
Looking ahead, a richer design would help clarify mechanisms and directionality. Future work could (a) add controls for education, urbanization, governance quality, and regional effects; (b) model non-linearities or thresholds in the connectivity–income link; (c) extend to a multi-year panel with country fixed effects to study within-country changes; and (d) replace or complement GDP with broader outcomes (e.g., poverty rates, labor productivity, or human capital indices). Together, these steps would strengthen causal interpretation and policy relevance.