Research question. Do countries with higher internet access (individuals using the internet, % of population) have higher GDP per capita (PPP)? Dataset. I analyze a 2019 country-level cross-section created from the World Bank’s World Development Indicators (WDI) and saved locally as a CSV for this project. I use a 2019 country-level file created by merging two World Bank World Development Indicators (WDI): GDP per capita, PPP (NY.GDP.PCAP.PP.KD) and Individuals using the Internet (% of population) (IT.NET.USER.ZS)

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# 1) Read the two files
gdp <- read.csv("WB_WDI_NY_GDP_PCAP_PP_KD.csv", stringsAsFactors = FALSE)
net <- read.csv("WB_WDI_IT_NET_USER_ZS.csv",    stringsAsFactors = FALSE)

# 2) Keep 2019 rows and needed columns, then rename
gdp2019 <- gdp[gdp$TIME_PERIOD == 2019, c("REF_AREA","REF_AREA_LABEL","TIME_PERIOD","OBS_VALUE")]
names(gdp2019) <- c("iso3c","country","year","gdp")

net2019 <- net[net$TIME_PERIOD == 2019, c("REF_AREA","REF_AREA_LABEL","TIME_PERIOD","OBS_VALUE")]
names(net2019) <- c("iso3c","country","year","internet")

# 3) Merge and save
combined <- merge(gdp2019, net2019, by = c("iso3c","country","year"), all = TRUE)
write.csv(combined, "wdi_2019_internet_gdp.csv", row.names = FALSE)

# quick check
dim(combined); head(combined, 8)
## [1] 252   5
##   iso3c                     country year       gdp internet
## 1   ABW                       Aruba 2019 38435.427       NA
## 2   AFE Africa Eastern and Southern 2019  4073.654  22.4000
## 3   AFG                 Afghanistan 2019  2927.245  17.6000
## 4   AFW  Africa Western and Central 2019  4822.310  28.7000
## 5   AGO                      Angola 2019  8274.543  32.1294
## 6   ALB                     Albania 2019 15079.374  68.5504
## 7   AND                     Andorra 2019 63215.900  90.7187
## 8   ARB                  Arab World 2019 16697.636       NA

Data Analysis

I first import the CSV and show its dimensions, column names, and a preview to make the data load explicit. Next, I clean by selecting the columns listed above, removing rows with missing values, and creating log_gdp_pcap_ppp. For EDA, I report summary statistics for internet_users_pct, gdp_pcap_ppp, and the log transform; then I produce a histogram of internet access to show its distribution and a scatterplot of internet access vs. GDP per capita.

raw <- read.csv("wdi_2019_internet_gdp.csv", stringsAsFactors = FALSE)

# Keeping only needed columns, drop NAs
wdi19 <- raw |>
  select(
    iso3c, country, year,
    gdp_pcap_ppp       = gdp,
    internet_users_pct = internet
  ) |>
  filter(!is.na(gdp_pcap_ppp), !is.na(internet_users_pct)) |>
  mutate(
    log_gdp_pcap_ppp    = log(gdp_pcap_ppp),                         
    above_mean_internet = internet_users_pct > mean(internet_users_pct, na.rm = TRUE),
    is_max_gdp_country  = gdp_pcap_ppp == max(gdp_pcap_ppp, na.rm = TRUE)             
  )

# Summaries (summary + mean + max explicitly shown)
dim(wdi19)
## [1] 214   8
head(wdi19, 10)
##    iso3c                     country year gdp_pcap_ppp internet_users_pct
## 1    AFE Africa Eastern and Southern 2019     4073.654            22.4000
## 2    AFG                 Afghanistan 2019     2927.245            17.6000
## 3    AFW  Africa Western and Central 2019     4822.310            28.7000
## 4    AGO                      Angola 2019     8274.543            32.1294
## 5    ALB                     Albania 2019    15079.374            68.5504
## 6    AND                     Andorra 2019    63215.900            90.7187
## 7    ARE        United Arab Emirates 2019    68887.845            99.1500
## 8    ARG                   Argentina 2019    26629.553            79.9470
## 9    ARM                     Armenia 2019    16215.361            66.5439
## 10   ATG         Antigua and Barbuda 2019    29651.864            73.9792
##    log_gdp_pcap_ppp above_mean_internet is_max_gdp_country
## 1          8.312296               FALSE              FALSE
## 2          7.981817               FALSE              FALSE
## 3          8.481008               FALSE              FALSE
## 4          9.020939               FALSE              FALSE
## 5          9.621083                TRUE              FALSE
## 6         11.054311                TRUE              FALSE
## 7         11.140235                TRUE              FALSE
## 8         10.189777                TRUE              FALSE
## 9          9.693714                TRUE              FALSE
## 10        10.297280                TRUE              FALSE
summary(wdi19[, c("gdp_pcap_ppp", "internet_users_pct", "log_gdp_pcap_ppp")])
##   gdp_pcap_ppp      internet_users_pct log_gdp_pcap_ppp
##  Min.   :   855.7   Min.   : 6.10      Min.   : 6.752  
##  1st Qu.:  5830.7   1st Qu.:36.35      1st Qu.: 8.671  
##  Median : 15719.5   Median :63.85      Median : 9.663  
##  Mean   : 24055.3   Mean   :58.67      Mean   : 9.555  
##  3rd Qu.: 34208.0   3rd Qu.:81.04      3rd Qu.:10.440  
##  Max.   :133549.2   Max.   :99.70      Max.   :11.802
mean_internet <- mean(wdi19$internet_users_pct, na.rm = TRUE)
max_gdp       <- max(wdi19$gdp_pcap_ppp,       na.rm = TRUE)

mean_internet
## [1] 58.66679
max_gdp
## [1] 133549.2
hist(
  wdi19$internet_users_pct,
  breaks = 30,
  main   = "Distribution of Internet Use (% of Population), 2019",
  xlab   = "Internet users (% of population)",
  ylab   = "Number of countries"
)

plot(
  wdi19$internet_users_pct,
  wdi19$gdp_pcap_ppp,
  xlab = "Internet users (% of population)",
  ylab = "GDP per capita, PPP (2017 intl $)",
  main = "Internet Access vs. GDP per Capita (PPP), 2019",
  pch  = 19, cex = 0.7
)

Conclusion

Key findings. In 2019, countries with higher shares of internet users generally show higher GDP per capita (PPP). The histogram indicates wide dispersion in connectivity across countries, while the scatterplot reveals a visible positive association. Income-group summaries align with this pattern: higher-income groups tend to have both greater internet penetration and higher average GDP per capita. These results are descriptive, not causal. Implications and next steps. The pattern is consistent with the idea that digital connectivity and economic prosperity move together.

References

World Bank. (2019). World Development Indicators (WDI) [Data set; variables NY.GDP.PCAP.PP.KD and IT.NET.USER.ZS, 2019 cross-section]. The World Bank Group. DataBank: World Development Indicators.