title: “Internet Use and Life Satisfaction” author: “Hadiyah Sumter” date: “2025-11-16” output: html_document —

setwd("~/Desktop/DATA-101")

internet_data <- read.csv("share-of-individuals-using-the-internet.csv")

# Read the happiness dataset
happiness_data <- read.csv("happiness-cantril-ladder.csv")

# Check the first few rows
head(internet_data)

##        Entity Code Year Individuals.using.the.Internet....of.population.
## 1 Afghanistan  AFG 1990                                                0
## 2 Afghanistan  AFG 1991                                                0
## 3 Afghanistan  AFG 1992                                                0
## 4 Afghanistan  AFG 1993                                                0
## 5 Afghanistan  AFG 1994                                                0
## 6 Afghanistan  AFG 1995                                                0

head(happiness_data)

##        Entity Code Year Cantril.ladder.score
## 1 Afghanistan  AFG 2011                4.258
## 2 Afghanistan  AFG 2012                4.040
## 3 Afghanistan  AFG 2014                3.575
## 4 Afghanistan  AFG 2015                3.360
## 5 Afghanistan  AFG 2016                3.794
## 6 Afghanistan  AFG 2017                3.632

Introduction

The purpose of this project is to explore the relationship between global internet usage and overall happiness. Specifically, this analysis investigates the following research question: ## Is there a difference in the average life satisfaction between countries where people use the internet much and countries where people use the internet a little?

The dataset used in this analysis comes from Our World in Data, containing information on internet usage and life satisfaction across several countries. This project uses the most recent year available for each country, forming a cross-sectional snapshot. The dataset includes variables such as country name, percentage of individuals using the Internet, and average life satisfaction (on a scale from 0 to 10). For this project, two main variables are used:

Internet Use Group (High or Low, based on a 70% cutoff)

Life Satisfaction (quantitative happiness score)

Dataset Source:The dataset can be accessed here: https://ourworldindata.org/happiness-and-life-satisfaction

Key variables used in this analysis: Country — country name (nominal) Year — year of observation (numeric) Individuals using the Internet (% of population) — continuous ratio variable Cantril ladder life satisfaction score — continuous ratio variable

Data Analysis

In this analysis, I began by cleaning the dataset and conducting exploratory data analysis (EDA) to better understand the variables involved in answering the research question. I used functions such as summary(), head(), and names() to examine the structure of the dataset, along with several dplyr functions—including filter(), mutate(), and select() to prepare the data for analysis. Specifically, I created a categorical variable that groups countries into “High” or “Low” internet usage based on a 70% threshold. To visualize the data, I generated histograms to examine the distribution of life satisfaction scores and boxplots to compare these scores between the two internet-use groups. These exploratory steps helped identify patterns in the data and provided a foundation for selecting the appropriate statistical test for comparing the two groups.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(broom)

# Internet data: keep the latest year per country
internet_latest <- internet_data |>
  group_by(Entity, Code) |>                 
  filter(Year == max(Year, na.rm = TRUE)) |> 
  ungroup()                                  

# Happiness data: keep the latest year per country
happiness_latest <- happiness_data |>
  group_by(Entity, Code) |>
  filter(Year == max(Year, na.rm = TRUE)) |>
  ungroup()

# Merge 2 datasets
happiness_clean <- left_join(
  happiness_latest,
  internet_latest,
  by = c("Entity", "Code"),
  suffix = c(".happiness", ".internet")  
)

# Rename columns for clarity
happiness_clean <- happiness_clean |>
  rename(
    Country = Entity,
    LifeSatisfaction = Cantril.ladder.score,
    InternetUse = Individuals.using.the.Internet....of.population.,
    Year = Year.happiness   
  )

# Keep only needed columns
happiness_clean <- happiness_clean |>
  select(Country, Code, Year, LifeSatisfaction, InternetUse)

# Check resuls!
head(happiness_clean)

## # A tibble: 6 × 5
##   Country     Code   Year LifeSatisfaction InternetUse
##   <chr>       <chr> <int>            <dbl>       <dbl>
## 1 Afghanistan "AFG"  2024             1.36        18.4
## 2 Africa      ""     2024             4.39        NA  
## 3 Albania     "ALB"  2024             5.41        83.1
## 4 Algeria     "DZA"  2024             5.57        71.2
## 5 Angola      "AGO"  2017             3.80        39.3
## 6 Argentina   "ARG"  2024             6.40        89.2

names(happiness_clean)

## [1] "Country"          "Code"             "Year"             "LifeSatisfaction"
## [5] "InternetUse"

summary(happiness_clean)

##    Country              Code                Year      LifeSatisfaction
##  Length:178         Length:178         Min.   :2011   Min.   :1.364   
##  Class :character   Class :character   1st Qu.:2024   1st Qu.:4.545   
##  Mode  :character   Mode  :character   Median :2024   Median :5.821   
##                                        Mean   :2023   Mean   :5.495   
##                                        3rd Qu.:2024   3rd Qu.:6.404   
##                                        Max.   :2024   Max.   :7.736   
##                                                                       
##   InternetUse    
##  Min.   : 10.00  
##  1st Qu.: 44.50  
##  Median : 79.22  
##  Mean   : 69.12  
##  3rd Qu.: 89.90  
##  Max.   :100.00  
##  NA's   :9

# Create UseGroup variable based on 70% threshold
happiness_clean <- happiness_clean |>
  mutate(
    UseGroup = ifelse(InternetUse > 70, "High", "Low")
  )

#To visualize the relationship between the categorical and quantitative variables, I created a boxplot comparing life satisfaction scores between high- and low-internet-use countries. This plot provides a clear comparison of the distributions and differences in central tendency across the two groups.

# Boxplot showing Life Satisfaction by Internet Use group
ggplot(happiness_clean, aes(x = UseGroup, y = LifeSatisfaction)) +
  geom_boxplot(fill = "pink") +                     
  labs(
    title = "Life Satisfaction by Internet Use Group",
    x = "Internet Use Group",
    y = "Life Satisfaction (0–10)"
  ) +
  theme_minimal()

Statistical Analysis

In order to determine whether life satisfaction differs between high- and low-internet-use countries, I conducted a two-sample t-test comparing the mean LifeSatisfaction values across the two groups created previously. Before running the test, I checked that each group contained an adequate number of observations and examined the distributions using histograms and boxplots. Because the two groups have unequal sample sizes and may not have equal variances, I used two-sample t-test, which does not assume equal population variances and is appropriate for comparing the means of two independent groups. Using a significance level of α = 0.05, the t-test indicated t = 11.594, df = 115.43, and a p-value below 2.2e-16. Since this p-value is far less than 0.05, I reject the null hypothesis. The results show a statistically significant difference in mean life satisfaction between high- and low-internet-use countries. The 95% confidence interval for the difference (approximately 1.39 to 1.97) suggests that countries with higher internet usage report, on average, life satisfaction scores 1.4 to 2 points higher than countries with lower internet usage.

Hypotheses

Null Hypothesis (H₀): There is no difference in mean life satisfaction between high and low internet use countries.

\(H_0\):\(\mu\)High=\(\mu\)Low

Alternative Hypothesis (H₁): There is a difference in mean life satisfaction between high and low internet use countries.

\(H_a\):\(\mu\)High=\(\mu\)Low

This is a two-tailed two-sample t-test because the alternative hypothesis uses “≠”.

# Create separate vectors for the t-test
high <- subset(happiness_clean, UseGroup == "High")$LifeSatisfaction
low  <- subset(happiness_clean, UseGroup == "Low")$LifeSatisfaction

t_test_result <- t.test(high, low, var.equal = TRUE)

t_test_result

## 
##  Two Sample t-test
## 
## data:  high and low
## t = 12.263, df = 167, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.415780 1.959125
## sample estimates:
## mean of x mean of y 
##  6.101812  4.414359

Decision Rule (α = 0.05) If p-value < 0.05, reject H, If p-value ≥ 0.05, fail to reject H₀

Conclusion and Future Directions

The results of this project show a clear and statistically significant difference in average life satisfaction between countries with high internet usage and those with low internet usage. Given the extremely small p-value (p < 2.2e-16), we rejected the null hypothesis and concluded that differences in internet access are associated with differences in national well-being. Countries with high internet usage reported meaningfully higher life satisfaction, with mean scores approximately 1.7 points higher than low-usage countries.

These findings suggest that internet accessibility may play an important role in improving quality of life, possibly through increased economic opportunity, social connectivity, and access to information. However, future research could strengthen these conclusions by controlling for additional factors such as GDP per capita, education levels, healthcare access, or population age distribution. Expanding the analysis to include regression models or a larger dataset with more recent data could offer deeper insight into the causal relationship between internet access and well-being.

Reference:

Our World in Data. (n.d.). Happiness and life satisfaction. Retrieved November 18, 2025, from https://ourworldindata.org/happiness-and-life-satisfaction