Data 607 Final Project: Population Internet Access and Economic Growth

Author

Emily El Mouaquite, Long Lin, Pascal Hermann Kouogang Tafo, Zihao Yu

Introduction

For this project, we want to investigate the relationship between internet access and economic opportunity through the following research question:

Is higher internet access associated with stronger economic growth?

To answer this question, we will integrate data from the online dataset Share of the population using the Internet (CSV file). Economic indicators, including GDP (USD) and GDP growth over time, will be retrieved through the World Bank Indicators API, which provides JSON/XML data for all countries. Our motivation for doing this project is to see if, in a world where digital connectivity has become essential for education, commerce, and innovation, whether unequal access to the internet reinforces existing economic inequalities. By analyzing long‑term trends in internet usage alongside GDP growth, this project aims to quantify how strongly digital access correlates with national economic performance.

The workflow for this project will include: cleaning the internet access data from the CSV, and conducting statistical analyses to investigate the correlation between internet access and GDP growth over time. Possible hurdles that we might encounter could be misalignment between the CSV data and the data found in the API, missing data for some countries, and having to filter through the API data, which is very large.

Packages and Libraries

library(httr2)
library(xml2)

Attaching package: 'xml2'
The following object is masked from 'package:httr2':

    url_parse
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()   masks stats::filter()
✖ dplyr::lag()      masks stats::lag()
✖ xml2::url_parse() masks httr2::url_parse()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tmap) #used for creating maps
library(sf) #used for tmap coordinates 
Linking to GEOS 3.14.1, GDAL 3.12.1, PROJ 9.7.1; sf_use_s2() is TRUE
library(rnaturalearth) #provides map vectors for creating tmaps

Fetching the GDP Data from the World Bank API

req <- request("https://api.worldbank.org/v2/country/all/indicator/NY.GDP.MKTP.KD.ZG") |>
  req_url_query(
    date = "1995:2023", 
    per_page = 32500
  )
resp <- req |> req_perform()
xml_data <- resp |> resp_body_xml()

records <- xml_find_all(xml_data, "//*[local-name()='data' and *[local-name()='indicator']]")

GDP_growth_df <- tibble(
  country     = records |> xml_find_first(".//*[local-name()='country']") |> xml_text(),
  country_iso = records |> xml_find_first(".//*[local-name()='countryiso3code']") |> xml_text(),
  date        = records |> xml_find_first(".//*[local-name()='date']") |> xml_integer(),
  value       = records |> xml_find_first(".//*[local-name()='value']") |> xml_double(),
  indicator   = records |> xml_find_first(".//*[local-name()='indicator']") |> xml_text()
)

head(GDP_growth_df)
# A tibble: 6 × 5
  country                     country_iso  date value indicator            
  <chr>                       <chr>       <int> <dbl> <chr>                
1 Africa Eastern and Southern AFE          2023  1.93 GDP growth (annual %)
2 Africa Eastern and Southern AFE          2022  3.72 GDP growth (annual %)
3 Africa Eastern and Southern AFE          2021  4.58 GDP growth (annual %)
4 Africa Eastern and Southern AFE          2020 -2.82 GDP growth (annual %)
5 Africa Eastern and Southern AFE          2019  2.03 GDP growth (annual %)
6 Africa Eastern and Southern AFE          2018  2.71 GDP growth (annual %)

Internet Access Data Cleaning/ Exploration

#read csv
internet_data <- read.csv("https://raw.githubusercontent.com/emilye5/607-final-project/refs/heads/main/share-of-individuals-using-the-internet.csv")
#clean dataset
internet_data_clean <- internet_data %>%
  #rename columns
  rename(
    country = Entity,
    country_iso = Code,
    date = Year,
    internet_usage_share = Share.of.the.population.using.the.Internet,
  ) %>%
  #drop rows with missing values 
  drop_na(internet_usage_share) %>%
  # keep only years after 1995 and before 2024
  filter(date >= 1995, date < 2024)
#take a look at the cleaned dataset
glimpse(internet_data_clean)
Rows: 5,804
Columns: 4
$ country              <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afg…
$ country_iso          <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", …
$ date                 <int> 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2…
$ internet_usage_share <dbl> 0.00472257, 0.00456140, 0.08789130, 0.10580900, 1…
#check how many distinct countries are included in the data
n_distinct(internet_data_clean$country)
[1] 222

Distribution of Internet Usage Over time

#boxplot showing the distribution of internet usage by year
internet_data_clean %>%
  ggplot(aes(x = factor(date), y = internet_usage_share)) +
  geom_boxplot() +
  labs(
    title = "Distribution of Internet Usage Across Countries Over Time",
    x = "Year",
    y = "Internet Usage (%)"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

In order to validate our internet usage data we created the above boxplot, which shows an increase in internet usage per year. The median half of the data increases in percentage of internet usage each year, with fewer outliers over time.

Merging Internet & GDP Data

final_data <- internet_data_clean |>
  left_join(GDP_growth_df, by = c("country_iso", "date"))

head(final_data)
    country.x country_iso date internet_usage_share   country.y     value
1 Afghanistan         AFG 2001           0.00472257 Afghanistan -9.431974
2 Afghanistan         AFG 2002           0.00456140 Afghanistan 28.600001
3 Afghanistan         AFG 2003           0.08789130 Afghanistan  8.832278
4 Afghanistan         AFG 2004           0.10580900 Afghanistan  1.414118
5 Afghanistan         AFG 2005           1.22415000 Afghanistan 11.229715
6 Afghanistan         AFG 2006           2.10712000 Afghanistan  5.357403
              indicator
1 GDP growth (annual %)
2 GDP growth (annual %)
3 GDP growth (annual %)
4 GDP growth (annual %)
5 GDP growth (annual %)
6 GDP growth (annual %)

Fetching GDP Data from 1995 & 2023

1995:

req <- request("https://api.worldbank.org/v2/country/all/indicator/NY.GDP.MKTP.CD") |>
  req_url_query(
    date = "1995", 
    per_page = 32500
  )
resp <- req |> req_perform()
xml_data <- resp |> resp_body_xml()

records <- xml_find_all(xml_data, "//*[local-name()='data' and *[local-name()='indicator']]")

GDP_1995_df <- tibble(
  country     = records |> xml_find_first(".//*[local-name()='country']") |> xml_text(),
  country_iso = records |> xml_find_first(".//*[local-name()='countryiso3code']") |> xml_text(),
  date        = records |> xml_find_first(".//*[local-name()='date']") |> xml_integer(),
  value_1995       = records |> xml_find_first(".//*[local-name()='value']") |> xml_double(),
  indicator   = records |> xml_find_first(".//*[local-name()='indicator']") |> xml_text()
)

print(GDP_1995_df)
# A tibble: 266 × 5
   country                                country_iso  date value_1995 indicator
   <chr>                                  <chr>       <int>      <dbl> <chr>    
 1 Africa Eastern and Southern            AFE          1995    2.73e11 GDP (cur…
 2 Africa Western and Central             AFW          1995    2.08e11 GDP (cur…
 3 Arab World                             ARB          1995    5.28e11 GDP (cur…
 4 Caribbean small states                 CSS          1995    1.57e10 GDP (cur…
 5 Central Europe and the Baltics         CEB          1995    3.95e11 GDP (cur…
 6 Early-demographic dividend             EAR          1995    2.62e12 GDP (cur…
 7 East Asia & Pacific                    EAS          1995    8.43e12 GDP (cur…
 8 East Asia & Pacific (excluding high i… EAP          1995    1.33e12 GDP (cur…
 9 East Asia & Pacific (IDA & IBRD count… TEA          1995    1.32e12 GDP (cur…
10 Euro area                              EMU          1995    7.54e12 GDP (cur…
# ℹ 256 more rows
GDP_1995_Remove_Regions_df <- GDP_1995_df[-(1:49), ]
print(GDP_1995_Remove_Regions_df)
# A tibble: 217 × 5
   country             country_iso  date    value_1995 indicator        
   <chr>               <chr>       <int>         <dbl> <chr>            
 1 Afghanistan         AFG          1995           NA  GDP (current US$)
 2 Albania             ALB          1995   2905092799. GDP (current US$)
 3 Algeria             DZA          1995  41764291672. GDP (current US$)
 4 American Samoa      ASM          1995           NA  GDP (current US$)
 5 Andorra             AND          1995   1178745283. GDP (current US$)
 6 Angola              AGO          1995   5538749260. GDP (current US$)
 7 Antigua and Barbuda ATG          1995    616051852. GDP (current US$)
 8 Argentina           ARG          1995 258031750000  GDP (current US$)
 9 Armenia             ARM          1995   1468317435. GDP (current US$)
10 Aruba               ABW          1995   1320670391. GDP (current US$)
# ℹ 207 more rows

2023:

req <- request("https://api.worldbank.org/v2/country/all/indicator/NY.GDP.MKTP.CD") |>
  req_url_query(
    date = "2023", 
    per_page = 32500
  )
resp <- req |> req_perform()
xml_data <- resp |> resp_body_xml()

records <- xml_find_all(xml_data, "//*[local-name()='data' and *[local-name()='indicator']]")

GDP_2023_df <- tibble(
  country     = records |> xml_find_first(".//*[local-name()='country']") |> xml_text(),
  country_iso = records |> xml_find_first(".//*[local-name()='countryiso3code']") |> xml_text(),
  date        = records |> xml_find_first(".//*[local-name()='date']") |> xml_integer(),
  value_2023       = records |> xml_find_first(".//*[local-name()='value']") |> xml_double(),
  indicator   = records |> xml_find_first(".//*[local-name()='indicator']") |> xml_text()
)

print(GDP_2023_df)
# A tibble: 266 × 5
   country                                country_iso  date value_2023 indicator
   <chr>                                  <chr>       <int>      <dbl> <chr>    
 1 Africa Eastern and Southern            AFE          2023    1.18e12 GDP (cur…
 2 Africa Western and Central             AFW          2023    9.38e11 GDP (cur…
 3 Arab World                             ARB          2023    3.62e12 GDP (cur…
 4 Caribbean small states                 CSS          2023    7.95e10 GDP (cur…
 5 Central Europe and the Baltics         CEB          2023    2.27e12 GDP (cur…
 6 Early-demographic dividend             EAR          2023    1.49e13 GDP (cur…
 7 East Asia & Pacific                    EAS          2023    3.14e13 GDP (cur…
 8 East Asia & Pacific (excluding high i… EAP          2023    2.16e13 GDP (cur…
 9 East Asia & Pacific (IDA & IBRD count… TEA          2023    2.16e13 GDP (cur…
10 Euro area                              EMU          2023    1.59e13 GDP (cur…
# ℹ 256 more rows
GDP_2023_Remove_Regions_df <- GDP_2023_df[-(1:49), ]
print(GDP_2023_Remove_Regions_df)
# A tibble: 217 × 5
   country             country_iso  date    value_2023 indicator        
   <chr>               <chr>       <int>         <dbl> <chr>            
 1 Afghanistan         AFG          2023  17152234637. GDP (current US$)
 2 Albania             ALB          2023  23491242727. GDP (current US$)
 3 Algeria             DZA          2023 247923887215. GDP (current US$)
 4 American Samoa      ASM          2023           NA  GDP (current US$)
 5 Andorra             AND          2023   3785065525. GDP (current US$)
 6 Angola              AGO          2023 107167747140. GDP (current US$)
 7 Antigua and Barbuda ATG          2023   2005785185. GDP (current US$)
 8 Argentina           ARG          2023 649461687959. GDP (current US$)
 9 Armenia             ARM          2023  24185982216. GDP (current US$)
10 Aruba               ABW          2023   3834729616. GDP (current US$)
# ℹ 207 more rows
  Changed_GDP <- GDP_1995_Remove_Regions_df |>
  left_join(GDP_2023_Remove_Regions_df, by = c("country_iso"))
  print(Changed_GDP)
# A tibble: 217 × 9
   country.x          country_iso date.x value_1995 indicator.x country.y date.y
   <chr>              <chr>        <int>      <dbl> <chr>       <chr>      <int>
 1 Afghanistan        AFG           1995   NA       GDP (curre… Afghanis…   2023
 2 Albania            ALB           1995    2.91e 9 GDP (curre… Albania     2023
 3 Algeria            DZA           1995    4.18e10 GDP (curre… Algeria     2023
 4 American Samoa     ASM           1995   NA       GDP (curre… American…   2023
 5 Andorra            AND           1995    1.18e 9 GDP (curre… Andorra     2023
 6 Angola             AGO           1995    5.54e 9 GDP (curre… Angola      2023
 7 Antigua and Barbu… ATG           1995    6.16e 8 GDP (curre… Antigua …   2023
 8 Argentina          ARG           1995    2.58e11 GDP (curre… Argentina   2023
 9 Armenia            ARM           1995    1.47e 9 GDP (curre… Armenia     2023
10 Aruba              ABW           1995    1.32e 9 GDP (curre… Aruba       2023
# ℹ 207 more rows
# ℹ 2 more variables: value_2023 <dbl>, indicator.y <chr>

We removed the first 49 observations from the dataframe because the World Bank API includes world regions, which are not a part of our internet usage data.

Calculating the Average Change in GDP by Country

  Changed_GDP_Average <- Changed_GDP |>
    mutate(
      change_in_gdp = value_2023 - value_1995,
      average_change = change_in_gdp / 28
    )
  print(Changed_GDP_Average)
# A tibble: 217 × 11
   country.x          country_iso date.x value_1995 indicator.x country.y date.y
   <chr>              <chr>        <int>      <dbl> <chr>       <chr>      <int>
 1 Afghanistan        AFG           1995   NA       GDP (curre… Afghanis…   2023
 2 Albania            ALB           1995    2.91e 9 GDP (curre… Albania     2023
 3 Algeria            DZA           1995    4.18e10 GDP (curre… Algeria     2023
 4 American Samoa     ASM           1995   NA       GDP (curre… American…   2023
 5 Andorra            AND           1995    1.18e 9 GDP (curre… Andorra     2023
 6 Angola             AGO           1995    5.54e 9 GDP (curre… Angola      2023
 7 Antigua and Barbu… ATG           1995    6.16e 8 GDP (curre… Antigua …   2023
 8 Argentina          ARG           1995    2.58e11 GDP (curre… Argentina   2023
 9 Armenia            ARM           1995    1.47e 9 GDP (curre… Armenia     2023
10 Aruba              ABW           1995    1.32e 9 GDP (curre… Aruba       2023
# ℹ 207 more rows
# ℹ 4 more variables: value_2023 <dbl>, indicator.y <chr>, change_in_gdp <dbl>,
#   average_change <dbl>

Top 20 Countries Ranked by Average GDP Growth

  top_avg_growth_df <- Changed_GDP_Average |>
    arrange(desc(average_change))
  
  top_20_avg_gdp_growth_df <- head(top_avg_growth_df, 20)
  top_20_avg_gdp_growth_df
# A tibble: 20 × 11
   country.x          country_iso date.x value_1995 indicator.x country.y date.y
   <chr>              <chr>        <int>      <dbl> <chr>       <chr>      <int>
 1 United States      USA           1995    7.64e12 GDP (curre… United S…   2023
 2 China              CHN           1995    7.38e11 GDP (curre… China       2023
 3 India              IND           1995    3.60e11 GDP (curre… India       2023
 4 United Kingdom     GBR           1995    1.35e12 GDP (curre… United K…   2023
 5 Germany            DEU           1995    2.59e12 GDP (curre… Germany     2023
 6 Russian Federation RUS           1995    3.96e11 GDP (curre… Russian …   2023
 7 Canada             CAN           1995    6.06e11 GDP (curre… Canada      2023
 8 France             FRA           1995    1.60e12 GDP (curre… France      2023
 9 Brazil             BRA           1995    7.69e11 GDP (curre… Brazil      2023
10 Mexico             MEX           1995    3.80e11 GDP (curre… Mexico      2023
11 Australia          AUS           1995    3.69e11 GDP (curre… Australia   2023
12 Korea, Rep.        KOR           1995    5.86e11 GDP (curre… Korea, R…   2023
13 Indonesia          IDN           1995    2.02e11 GDP (curre… Indonesia   2023
14 Italy              ITA           1995    1.18e12 GDP (curre… Italy       2023
15 Saudi Arabia       SAU           1995    1.43e11 GDP (curre… Saudi Ar…   2023
16 Spain              ESP           1995    6.14e11 GDP (curre… Spain       2023
17 Turkiye            TUR           1995    2.35e11 GDP (curre… Turkiye     2023
18 Netherlands        NLD           1995    4.53e11 GDP (curre… Netherla…   2023
19 Poland             POL           1995    1.43e11 GDP (curre… Poland      2023
20 Switzerland        CHE           1995    3.53e11 GDP (curre… Switzerl…   2023
# ℹ 4 more variables: value_2023 <dbl>, indicator.y <chr>, change_in_gdp <dbl>,
#   average_change <dbl>
top_20_avg_gdp_growth_clean <- top_20_avg_gdp_growth_df %>%
  mutate(
    "Average Annual Change in USD (billions)" = average_change/1000000000
  ) %>%
  rename(Country = country.x) %>%
  #columns cleaned to remove date and indicator (always the same)
  select(-country.y, -indicator.x, -indicator.y, -value_1995, -value_2023, -change_in_gdp, -average_change, -date.x, -date.y)

top_20_avg_gdp_growth_clean
# A tibble: 20 × 3
   Country            country_iso `Average Annual Change in USD (billions)`
   <chr>              <chr>                                           <dbl>
 1 United States      USA                                             702. 
 2 China              CHN                                             626. 
 3 India              IND                                             117. 
 4 United Kingdom     GBR                                              74.0
 5 Germany            DEU                                              70.3
 6 Russian Federation RUS                                              59.9
 7 Canada             CAN                                              56.0
 8 France             FRA                                              52.2
 9 Brazil             BRA                                              50.8
10 Mexico             MEX                                              50.6
11 Australia          AUS                                              48.8
12 Korea, Rep.        KOR                                              44.9
13 Indonesia          IDN                                              41.8
14 Italy              ITA                                              40.7
15 Saudi Arabia       SAU                                              38.4
16 Spain              ESP                                              35.9
17 Turkiye            TUR                                              32.4
18 Netherlands        NLD                                              24.4
19 Poland             POL                                              23.9
20 Switzerland        CHE                                              19.3

Bottom 20 Countries Ranked by Average GDP Growth

 bottom_avg_growth_df <- Changed_GDP_Average |>
     arrange(average_change)
bottom_20_avg_gdp_growth_df <- head(bottom_avg_growth_df, 20)
bottom_20_avg_gdp_growth_df
# A tibble: 20 × 11
   country.x          country_iso date.x value_1995 indicator.x country.y date.y
   <chr>              <chr>        <int>      <dbl> <chr>       <chr>      <int>
 1 Japan              JPN           1995    5.55e12 GDP (curre… Japan       2023
 2 Tuvalu             TUV           1995    1.19e 7 GDP (curre… Tuvalu      2023
 3 Nauru              NRU           1995    4.00e 7 GDP (curre… Nauru       2023
 4 Marshall Islands   MHL           1995    1.20e 8 GDP (curre… Marshall…   2023
 5 Palau              PLW           1995    1.21e 8 GDP (curre… Palau       2023
 6 Kiribati           KIR           1995    6.86e 7 GDP (curre… Kiribati    2023
 7 Micronesia, Fed. … FSM           1995    2.20e 8 GDP (curre… Micrones…   2023
 8 Tonga              TON           1995    2.09e 8 GDP (curre… Tonga       2023
 9 Dominica           DMA           1995    2.75e 8 GDP (curre… Dominica    2023
10 Sao Tome and Prin… STP           1995    1.04e 8 GDP (curre… Sao Tome…   2023
11 St. Kitts and Nev… KNA           1995    3.13e 8 GDP (curre… St. Kitt…   2023
12 St. Vincent and t… VCT           1995    3.16e 8 GDP (curre… St. Vinc…   2023
13 Samoa              WSM           1995    2.25e 8 GDP (curre… Samoa       2023
14 Vanuatu            VUT           1995    2.49e 8 GDP (curre… Vanuatu     2023
15 Comoros            COM           1995    3.93e 8 GDP (curre… Comoros     2023
16 Grenada            GRD           1995    3.42e 8 GDP (curre… Grenada     2023
17 Solomon Islands    SLB           1995    4.69e 8 GDP (curre… Solomon …   2023
18 Lesotho            LSO           1995    1.00e 9 GDP (curre… Lesotho     2023
19 Antigua and Barbu… ATG           1995    6.16e 8 GDP (curre… Antigua …   2023
20 Central African R… CAF           1995    1.12e 9 GDP (curre… Central …   2023
# ℹ 4 more variables: value_2023 <dbl>, indicator.y <chr>, change_in_gdp <dbl>,
#   average_change <dbl>
bottom_avg_gdp_growth_clean <- bottom_avg_growth_df %>%
  mutate(
    "Average Annual Change in USD (billions)" = average_change/1000000000
  ) %>%
  #columns cleaned to remove date and indicator (always the same)
  rename(Country = country.x) %>%
  select(-country.y, -indicator.x, -indicator.y, -value_1995, -value_2023, -change_in_gdp, -average_change, -date.x, -date.y)

bottom_avg_gdp_growth_clean
# A tibble: 217 × 3
   Country               country_iso `Average Annual Change in USD (billions)`
   <chr>                 <chr>                                           <dbl>
 1 Japan                 JPN                                         -47.6    
 2 Tuvalu                TUV                                           0.00180
 3 Nauru                 NRU                                           0.00398
 4 Marshall Islands      MHL                                           0.00513
 5 Palau                 PLW                                           0.00556
 6 Kiribati              KIR                                           0.00786
 7 Micronesia, Fed. Sts. FSM                                           0.00791
 8 Tonga                 TON                                           0.0137 
 9 Dominica              DMA                                           0.0137 
10 Sao Tome and Principe STP                                           0.0210 
# ℹ 207 more rows

Calculating the Correlation Betweeen GDP and Internet Usage

correlation_data <- internet_data_clean |>
  inner_join(GDP_growth_df, by = c("country_iso", "date")) |>
  rename(country = country.x) |>  # resolve the .x/.y conflict after join
  drop_na(internet_usage_share, value)

overall_cor <- cor(
  correlation_data$internet_usage_share,
  correlation_data$value,
  method = "pearson",
  use = "complete.obs"
)
print(overall_cor)
[1] -0.159345
country_correlations <- correlation_data |>
  group_by(country, country_iso) |>
  filter(n() >= 10) |>
  summarise(
    correlation = cor(internet_usage_share, value, method = "pearson"),
    n_years     = n(),
    .groups = "drop"
  ) |>
  arrange(desc(correlation))

The Pearson Correlation Coefficient is -0.159. That indicates a very weak inverse linear relationship between Internet Usage and Annual GDP Growth. Moreover the effect size is small suggesting the relationship is not practically meaningful.

Countries with the top 10 positive & top 10 negative correlations between internet usage and GDP

# Top 10 positive correlations
cat("\n--- Top 10 Positive Correlations ---\n")

--- Top 10 Positive Correlations ---
print(head(country_correlations, 10))
# A tibble: 10 × 4
   country                      country_iso correlation n_years
   <chr>                        <chr>             <dbl>   <int>
 1 Guyana                       GUY               0.725      28
 2 Guinea                       GIN               0.433      29
 3 Benin                        BEN               0.397      28
 4 Kenya                        KEN               0.379      26
 5 Cayman Islands               CYM               0.354      11
 6 Cote d'Ivoire                CIV               0.352      29
 7 Bangladesh                   BGD               0.341      27
 8 Democratic Republic of Congo COD               0.248      28
 9 Senegal                      SEN               0.193      29
10 Zimbabwe                     ZWE               0.192      29
# Top 10 negative correlations
cat("\n--- Top 10 Negative Correlations ---\n")

--- Top 10 Negative Correlations ---
print(tail(country_correlations, 10))
# A tibble: 10 × 4
   country             country_iso correlation n_years
   <chr>               <chr>             <dbl>   <int>
 1 Bermuda             BMU              -0.604      23
 2 Angola              AGO              -0.629      28
 3 Yemen               YEM              -0.637      22
 4 Mozambique          MOZ              -0.644      28
 5 Sri Lanka           LKA              -0.673      23
 6 Trinidad and Tobago TTO              -0.673      29
 7 China               CHN              -0.689      29
 8 Laos                LAO              -0.704      26
 9 Sudan               SDN              -0.760      17
10 Myanmar             MMR              -0.803      24

Scatter plot: internet usage vs GDP growth (all countries, all years)

# Scatter plot

correlation_data |>
  ggplot(aes(x = internet_usage_share, y = value)) +
  geom_point(alpha = 0.2, color = "steelblue") +
  geom_smooth(method = "lm", color = "red", se = TRUE) +
  labs(
    title = "Internet Usage vs GDP Growth (1995–2023)",
    subtitle = paste("Overall Pearson r =", round(overall_cor, 3)),
    x = "Internet Usage Share (%)",
    y = "GDP Growth Rate (%)"
  ) +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

To answer our main research question, we decided to generate a Scatter Plot and compute the Pearson Correlation to specifically quantify how strongly “Internet Usage” and the “ Annual GDP Growth” in all countries move together in a linear direction.

Correlation Between Internet Usage and GDP by Year

yearly_correlation <- final_data |>
  drop_na(internet_usage_share, value) |>
  group_by(date) |>
  summarize(
    correlation = cor(internet_usage_share, value, method = "pearson"),
    n_countries = n(),
    .groups = "drop"
  )

yearly_correlation
# A tibble: 29 × 3
    date correlation n_countries
   <int>       <dbl>       <int>
 1  1995     -0.0383         132
 2  1996     -0.0747         166
 3  1997     -0.0471         174
 4  1998     -0.0265         181
 5  1999      0.0410         188
 6  2000      0.124          190
 7  2001     -0.108          192
 8  2002     -0.163          194
 9  2003     -0.103          190
10  2004     -0.140          193
# ℹ 19 more rows
yearly_correlation |>
  ggplot(aes(x = date, y = correlation)) +
  geom_line(color = "steelblue") +
  geom_point(color = "steelblue") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = "Yearly Correlation Between Internet Usage and GDP Growth",
    x = "Year",
    y = "Pearson Correlation"
  ) +
  theme_minimal()

The yearly correlation results show that the relationship between internet usage and GDP growth is not stable over time. Most years show weak or negative correlations, which suggests that higher internet usage is not consistently associated with higher GDP growth. This means that the association between internet usage and GDP growth is not consistent, and does not show a clear causal relationship.

Mapping Internet Usage Over Time

world <- ne_countries(scale = "medium", returnclass = "sf")

selected_years <- c(1995, 2005, 2015, 2023)

world_years <- world[rep(seq_len(nrow(world)), times = length(selected_years)), ]
world_years$date <- rep(selected_years, each = nrow(world))

map_internet_over_years <- world_years |>
  left_join(
    internet_data_clean |>
      filter(date %in% selected_years) |>
      select(country_iso, date, internet_usage_share),
    by = c("iso_a3" = "country_iso", "date" = "date")
  )

tmap_mode("plot")
ℹ tmap modes "plot" - "view"
ℹ toggle with `tmap::ttm()`
tmap_options(component.autoscale = FALSE)

tm_shape(map_internet_over_years) +
  tm_polygons(
    "internet_usage_share",
    title = "Internet Usage (%)",
    style = "quantile",
    palette = "Blues",
    colorNA = "brown",
    textNA = "No data"
  ) +
  tm_facets(
    by = "date",
    ncol = 2
  ) +
  tm_layout(
    main.title = "Internet Usage by Country Over Time",
    main.title.size = 1.2,
    main.title.position = "center",
    panel.label.size = 1.1,
    legend.outside = TRUE,
    legend.outside.position = "right",
    legend.title.size = 0.9,
    legend.text.size = 0.8,
    inner.margins = c(0.01, 0.01, 0.01, 0.01),
    outer.margins = c(0.02, 0.02, 0.02, 0.02),
    frame = FALSE
  )

── tmap v3 code detected ───────────────────────────────────────────────────────
[v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
`tm_scale_intervals()`.
ℹ Migrate the argument(s) 'style', 'palette' (rename to 'values'), 'colorNA'
  (rename to 'value.na'), 'textNA' (rename to 'label.na') to
  'tm_scale_intervals(<HERE>)'
For small multiples, specify a 'tm_scale_' for each multiple, and put them in a
list: 'fill'.scale = list(<scale1>, <scale2>, ...)'[v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`[tip] Consider a suitable map projection, e.g. by adding `+ tm_crs("auto")`.[cols4all] color palettes: use palettes from the R package cols4all. Run
`cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
"brewer.blues"

These maps show the global spread of internet usage from 1995 to 2023. Internet usage was limited in many countries in 1995, but by 2023, most regions show much higher levels of internet access.

Mapping Internet Access and GDP Growth Over Time

world <- ne_countries(scale = "medium", returnclass = "sf")

selected_years <- c(1995, 2005, 2015, 2023)

world_years <- world[rep(seq_len(nrow(world)), times = length(selected_years)), ]
world_years$date <- rep(selected_years, each = nrow(world))

map_data_over_years <- world_years |>
  left_join(
    final_data |>
      filter(date %in% selected_years) |>
      select(country_iso, date, internet_usage_share, value),
    by = c("iso_a3" = "country_iso", "date" = "date")
  ) |>
  group_by(date) |>
  mutate(
    internet_group = ifelse(
      internet_usage_share >= median(internet_usage_share, na.rm = TRUE),
      "High Internet",
      "Low Internet"
    ),
    gdp_growth_group = ifelse(
      value >= median(value, na.rm = TRUE),
      "High GDP Growth",
      "Low GDP Growth"
    ),
    internet_gdp_group = ifelse(
      is.na(internet_usage_share) | is.na(value),
      NA,
      paste(internet_group, gdp_growth_group, sep = " + ")
    )
  ) |>
  ungroup()

tmap_mode("plot")
ℹ tmap modes "plot" - "view"
tmap_options(component.autoscale = FALSE)

tm_shape(map_data_over_years) +
  tm_polygons(
    "internet_gdp_group",
    title = "Internet Access and GDP Growth",
    colorNA = "brown",
    textNA = "No data"
  ) +
  tm_facets(
    by = "date",
    ncol = 2
  ) +
  tm_layout(
    main.title = "Internet Access and GDP Growth Over Time",
    main.title.size = 1.2,
    main.title.position = "center",
    panel.label.size = 1.1,
    legend.outside = TRUE,
    legend.outside.position = "right",
    legend.title.size = 0.9,
    legend.text.size = 0.8,
    inner.margins = c(0.01, 0.01, 0.01, 0.01),
    outer.margins = c(0.02, 0.02, 0.02, 0.02),
    frame = FALSE
  )

── tmap v3 code detected ───────────────────────────────────────────────────────
[v3->v4] `tm_tm_polygons()`: migrate the argument(s) related to the scale of
the visual variable `fill` namely 'colorNA' (rename to 'value.na'), 'textNA'
(rename to 'label.na') to fill.scale = tm_scale(<HERE>).
ℹ For small multiples, specify a 'tm_scale_' for each multiple, and put them in
  a list: 'fill.scale = list(<scale1>, <scale2>, ...)'[v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`

These maps compare countries by whether they are above or below the yearly median for internet usage and GDP growth. It shows that the relationship between internet access and GDP growth varies between countries and years, rather than following one clear global pattern.

Conclusion

Our findings suggest a weak inverse linear relationship between internet usage and GDP growth. The overall Pearson correlation is -0.159, which is close to 0 and indicates only a weak association. This means the data does not show a strong relationship between countries’ internet access and their GDP growth, meaning that internet usage, by itself, is not a significant enough parameter to determine the growth of a country’s GDP. Therefore, this correlation analysis does not provide evidence that higher internet access is clearly associated with higher GDP growth.