title: “Internet Use and Life Satisfaction” author: “Hadiyah Sumter” date: “2025-11-16” output: html_document —
setwd("~/Desktop/DATA-101")
internet_data <- read.csv("share-of-individuals-using-the-internet.csv")
# Read the happiness dataset
happiness_data <- read.csv("happiness-cantril-ladder.csv")
# Check the first few rows
head(internet_data)
## Entity Code Year Individuals.using.the.Internet....of.population.
## 1 Afghanistan AFG 1990 0
## 2 Afghanistan AFG 1991 0
## 3 Afghanistan AFG 1992 0
## 4 Afghanistan AFG 1993 0
## 5 Afghanistan AFG 1994 0
## 6 Afghanistan AFG 1995 0
head(happiness_data)
## Entity Code Year Cantril.ladder.score
## 1 Afghanistan AFG 2011 4.258
## 2 Afghanistan AFG 2012 4.040
## 3 Afghanistan AFG 2014 3.575
## 4 Afghanistan AFG 2015 3.360
## 5 Afghanistan AFG 2016 3.794
## 6 Afghanistan AFG 2017 3.632
The purpose of this project is to explore the relationship between global internet usage and overall happiness. Specifically, this analysis investigates the following research question: ## Is there a difference in the average life satisfaction between countries where people use the internet much and countries where people use the internet a little?
The dataset used in this analysis comes from Our World in Data, containing information on internet usage and life satisfaction across several countries. This project uses the most recent year available for each country, forming a cross-sectional snapshot. The dataset includes variables such as country name, percentage of individuals using the Internet, and average life satisfaction (on a scale from 0 to 10). For this project, two main variables are used:
Dataset Source:The dataset can be accessed here: https://ourworldindata.org/happiness-and-life-satisfaction
Key variables used in this analysis: Country — country name (nominal) Year — year of observation (numeric) Individuals using the Internet (% of population) — continuous ratio variable Cantril ladder life satisfaction score — continuous ratio variable
In this analysis, I began by cleaning the dataset and conducting exploratory data analysis (EDA) to better understand the variables involved in answering the research question. I used functions such as summary(), head(), and names() to examine the structure of the dataset, along with several dplyr functions—including filter(), mutate(), and select() to prepare the data for analysis. Specifically, I created a categorical variable that groups countries into “High” or “Low” internet usage based on a 70% threshold. To visualize the data, I generated histograms to examine the distribution of life satisfaction scores and boxplots to compare these scores between the two internet-use groups. These exploratory steps helped identify patterns in the data and provided a foundation for selecting the appropriate statistical test for comparing the two groups.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(broom)
# Internet data: keep the latest year per country
internet_latest <- internet_data |>
group_by(Entity, Code) |>
filter(Year == max(Year, na.rm = TRUE)) |>
ungroup()
# Happiness data: keep the latest year per country
happiness_latest <- happiness_data |>
group_by(Entity, Code) |>
filter(Year == max(Year, na.rm = TRUE)) |>
ungroup()
# Merge 2 datasets
happiness_clean <- left_join(
happiness_latest,
internet_latest,
by = c("Entity", "Code"),
suffix = c(".happiness", ".internet")
)
# Rename columns for clarity
happiness_clean <- happiness_clean |>
rename(
Country = Entity,
LifeSatisfaction = Cantril.ladder.score,
InternetUse = Individuals.using.the.Internet....of.population.,
Year = Year.happiness
)
# Keep only needed columns
happiness_clean <- happiness_clean |>
select(Country, Code, Year, LifeSatisfaction, InternetUse)
# Check resuls!
head(happiness_clean)
## # A tibble: 6 × 5
## Country Code Year LifeSatisfaction InternetUse
## <chr> <chr> <int> <dbl> <dbl>
## 1 Afghanistan "AFG" 2024 1.36 18.4
## 2 Africa "" 2024 4.39 NA
## 3 Albania "ALB" 2024 5.41 83.1
## 4 Algeria "DZA" 2024 5.57 71.2
## 5 Angola "AGO" 2017 3.80 39.3
## 6 Argentina "ARG" 2024 6.40 89.2
names(happiness_clean)
## [1] "Country" "Code" "Year" "LifeSatisfaction"
## [5] "InternetUse"
summary(happiness_clean)
## Country Code Year LifeSatisfaction
## Length:178 Length:178 Min. :2011 Min. :1.364
## Class :character Class :character 1st Qu.:2024 1st Qu.:4.545
## Mode :character Mode :character Median :2024 Median :5.821
## Mean :2023 Mean :5.495
## 3rd Qu.:2024 3rd Qu.:6.404
## Max. :2024 Max. :7.736
##
## InternetUse
## Min. : 10.00
## 1st Qu.: 44.50
## Median : 79.22
## Mean : 69.12
## 3rd Qu.: 89.90
## Max. :100.00
## NA's :9
# Create UseGroup variable based on 70% threshold
happiness_clean <- happiness_clean |>
mutate(
UseGroup = ifelse(InternetUse > 70, "High", "Low")
)
#To visualize the relationship between the categorical and quantitative variables, I created a boxplot comparing life satisfaction scores between high- and low-internet-use countries. This plot provides a clear comparison of the distributions and differences in central tendency across the two groups.
# Boxplot showing Life Satisfaction by Internet Use group
ggplot(happiness_clean, aes(x = UseGroup, y = LifeSatisfaction)) +
geom_boxplot(fill = "pink") +
labs(
title = "Life Satisfaction by Internet Use Group",
x = "Internet Use Group",
y = "Life Satisfaction (0–10)"
) +
theme_minimal()
In order to determine whether life satisfaction differs between high- and low-internet-use countries, I conducted a two-sample t-test comparing the mean LifeSatisfaction values across the two groups created previously. Before running the test, I checked that each group contained an adequate number of observations and examined the distributions using histograms and boxplots. Because the two groups have unequal sample sizes and may not have equal variances, I used two-sample t-test, which does not assume equal population variances and is appropriate for comparing the means of two independent groups. Using a significance level of α = 0.05, the t-test indicated t = 11.594, df = 115.43, and a p-value below 2.2e-16. Since this p-value is far less than 0.05, I reject the null hypothesis. The results show a statistically significant difference in mean life satisfaction between high- and low-internet-use countries. The 95% confidence interval for the difference (approximately 1.39 to 1.97) suggests that countries with higher internet usage report, on average, life satisfaction scores 1.4 to 2 points higher than countries with lower internet usage.
Null Hypothesis (H₀): There is no difference in mean life satisfaction between high and low internet use countries.
\(H_0\):\(\mu\)High=\(\mu\)Low
Alternative Hypothesis (H₁): There is a difference in mean life satisfaction between high and low internet use countries.
\(H_a\):\(\mu\)High=\(\mu\)Low
This is a two-tailed two-sample t-test because the alternative hypothesis uses “≠”.
# Create separate vectors for the t-test
high <- subset(happiness_clean, UseGroup == "High")$LifeSatisfaction
low <- subset(happiness_clean, UseGroup == "Low")$LifeSatisfaction
t_test_result <- t.test(high, low, var.equal = TRUE)
t_test_result
##
## Two Sample t-test
##
## data: high and low
## t = 12.263, df = 167, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.415780 1.959125
## sample estimates:
## mean of x mean of y
## 6.101812 4.414359
The results of this project show a clear and statistically significant difference in average life satisfaction between countries with high internet usage and those with low internet usage. Given the extremely small p-value (p < 2.2e-16), we rejected the null hypothesis and concluded that differences in internet access are associated with differences in national well-being. Countries with high internet usage reported meaningfully higher life satisfaction, with mean scores approximately 1.7 points higher than low-usage countries.
These findings suggest that internet accessibility may play an important role in improving quality of life, possibly through increased economic opportunity, social connectivity, and access to information. However, future research could strengthen these conclusions by controlling for additional factors such as GDP per capita, education levels, healthcare access, or population age distribution. Expanding the analysis to include regression models or a larger dataset with more recent data could offer deeper insight into the causal relationship between internet access and well-being.
Our World in Data. (n.d.). Happiness and life satisfaction. Retrieved November 18, 2025, from https://ourworldindata.org/happiness-and-life-satisfaction