Final

Introduction

For my final project, I chose to analyze the World Happiness Report 2026 dataset. This dataset focuses on happiness scores across different countries. In this project, I will use variables such as happiness score, GDP per capita, social support, healthy life expectancy, freedom, generosity, and perceptions of corruption. I want to explore which factors appear to be most strongly related to happiness and whether richer countries are always happier.

The original data source is the World Happiness Report 2026, which is based primarily on survey data collected by Gallup through the Gallup World Poll. Participants from different countries were asked to evaluate their life satisfaction using the Cantril Ladder scale, where respondents rate their lives from 0 to 10. It was published by the Wellbeing Research Centre at the University of Oxford, in partnership with Gallup and the UN Sustainable Development Solutions Network. The dataset was accessed through Kaggle: https://www.kaggle.com/datasets/hassanali789/world-happiness-report-2026-official-rankings.

I chose this topic because I am interested in understanding whether money, health, freedom, or social support matter more when comparing happiness across countries.

## Load libraries
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)

# Loading the World Happiness dataset using read_csv()
happiness_raw <- read_csv("world_happiness_2026.csv")

Rows: 130 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): country, region
dbl (8): rank, score, gdp_per_capita, social_support, healthy_life_expectanc...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## Explore the Dataset
head(happiness_raw)

# A tibble: 6 × 10
   rank country    region                    score gdp_per_capita social_support
  <dbl> <chr>      <chr>                     <dbl>          <dbl>          <dbl>
1     1 Finland    Western Europe             7.76           1.89           1.58
2     2 Iceland    Western Europe             7.70           1.87           1.61
3     3 Denmark    Western Europe             7.69           1.89           1.56
4     4 Costa Rica Latin America and Caribb…  7.44           1.25           1.42
5     5 Sweden     Western Europe             7.40           1.88           1.50
6     6 Norway     Western Europe             7.39           1.96           1.55
# ℹ 4 more variables: healthy_life_expectancy <dbl>, freedom <dbl>,
#   generosity <dbl>, corruption <dbl>

glimpse(happiness_raw)

Rows: 130
Columns: 10
$ rank                    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
$ country                 <chr> "Finland", "Iceland", "Denmark", "Costa Rica",…
$ region                  <chr> "Western Europe", "Western Europe", "Western E…
$ score                   <dbl> 7.764, 7.701, 7.688, 7.439, 7.401, 7.392, 7.31…
$ gdp_per_capita          <dbl> 1.892, 1.874, 1.887, 1.254, 1.878, 1.959, 1.87…
$ social_support          <dbl> 1.581, 1.611, 1.562, 1.421, 1.501, 1.553, 1.48…
$ healthy_life_expectancy <dbl> 0.952, 0.959, 0.948, 0.891, 0.952, 0.952, 0.94…
$ freedom                 <dbl> 0.670, 0.662, 0.665, 0.631, 0.658, 0.661, 0.64…
$ generosity              <dbl> 0.186, 0.269, 0.211, 0.178, 0.224, 0.210, 0.26…
$ corruption              <dbl> 0.498, 0.512, 0.495, 0.312, 0.481, 0.491, 0.41…

## Check Missing Values
happiness_raw %>%
summarize(across(everything(), ~ sum(is.na(.))))

# A tibble: 1 × 10
   rank country region score gdp_per_capita social_support
  <int>   <int>  <int> <int>          <int>          <int>
1     0       0      0     0              0              0
# ℹ 4 more variables: healthy_life_expectancy <int>, freedom <int>,
#   generosity <int>, corruption <int>

## Select and Prepare Variables
happiness_project <- happiness_raw %>%
select(rank, country, region, score, gdp_per_capita, social_support, healthy_life_expectancy, freedom, generosity, corruption)

## Create a GDP Group Variable
happiness_project <- happiness_project %>%
mutate( gdp_group = case_when
        (gdp_per_capita >= quantile(gdp_per_capita, 0.75, na.rm = TRUE) ~ "High GDP", gdp_per_capita >= quantile(gdp_per_capita, 0.50, na.rm = TRUE) ~ "Medium GDP", TRUE ~ "Lower GDP"))

## Filter Important Variables
happiness_project <- happiness_project %>%

  filter(

    !is.na(score),

    !is.na(gdp_per_capita),

    !is.na(social_support),

    !is.na(healthy_life_expectancy),

    !is.na(freedom),

    !is.na(generosity),

    !is.na(corruption)

  )

## Summary by Region
region_summary <- happiness_project %>%

  group_by(region) %>%

  summarize(

    average_score = mean(score, na.rm = TRUE),

    average_gdp = mean(gdp_per_capita, na.rm = TRUE),

    number_of_countries = n()

  ) %>%

  arrange(desc(average_score))

region_summary

# A tibble: 10 × 4
   region                          average_score average_gdp number_of_countries
   <chr>                                   <dbl>       <dbl>               <int>
 1 Western Europe                           6.99       1.86                   18
 2 North America and ANZ                    6.97       1.84                    4
 3 Central and Eastern Europe               6.55       1.54                   11
 4 East Asia                                6.10       1.66                    5
 5 Latin America and Caribbean              5.87       1.20                   15
 6 Commonwealth of Independent St…          5.76       1.24                    5
 7 Southeast Asia                           5.69       1.27                    9
 8 Middle East and North Africa             5.36       1.18                   14
 9 South Asia                               4.82       0.816                   6
10 Sub-Saharan Africa                       4.20       0.564                  43

Multiple Linear Regression

For my statistical analysis, I used a multiple linear regression model. I want to see how different social and economic variables predict the happiness score of a country.

The model equation is: Score = β0 + β1(GDP per capita) + β2(Social support) + β3(Healthy life expectancy) + β4(Freedom) + β5(Generosity) + β6(Corruption)

# Creating a multiple linear regression model
happiness_model <- lm(

  score ~ gdp_per_capita + social_support + healthy_life_expectancy +

    freedom + generosity + corruption,

  data = happiness_project

)

summary(happiness_model)


Call:
lm(formula = score ~ gdp_per_capita + social_support + healthy_life_expectancy + 
    freedom + generosity + corruption, data = happiness_project)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.66244 -0.12033  0.00094  0.10373  0.56797 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)              -0.0775     0.1507  -0.514 0.607988    
gdp_per_capita           -0.6714     0.1214  -5.530 1.83e-07 ***
social_support            5.5534     0.3597  15.438  < 2e-16 ***
healthy_life_expectancy  -1.5123     0.3623  -4.174 5.62e-05 ***
freedom                   1.6205     0.5594   2.897 0.004464 ** 
generosity               -1.8395     0.5209  -3.531 0.000583 ***
corruption                2.1673     0.3970   5.459 2.53e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2052 on 123 degrees of freedom
Multiple R-squared:  0.9745,    Adjusted R-squared:  0.9733 
F-statistic: 784.8 on 6 and 123 DF,  p-value: < 2.2e-16

## Diagnostic Plots
par(mfrow = c(2, 2))
plot(happiness_model)

Regression Analysis

The multiple linear regression model suggests that GDP per capita, social support, and healthy life expectancy have a positive relationship with happiness scores. Countries with stronger economies and better social support systems generally tend to report higher levels of happiness.

The adjusted R² value indicates that the model explains a substantial portion of the variation in happiness scores across countries. Variables with small p-values are considered statistically significant predictors of happiness, while variables with larger p-values may have a weaker relationship with the response variable.

The diagnostic plots show that the model assumptions are reasonably satisfied, although there may be a few outliers among countries with unusually high or low happiness scores. Overall, the model provides useful insight into the factors associated with happiness around the world.

## Visualization 1: GDP per Capita and Happiness Score

ggplot(happiness_project, aes(x = gdp_per_capita, y = score, color = region)) +

  geom_point(size = 3, alpha = 0.8) +

  geom_smooth(method = "lm", se = FALSE, color = "black") +

  scale_color_manual(values = c("#1b9e77", "#d95f02", "#7570b3", "#e7298a", "#66a61e",    "#e6ab02", "#a6761d", "#1f78b4", "#b2df8a", "#fb9a99"

  )) +

  labs(

    title = "Relationship Between GDP per Capita and Happiness Score",

    x = "GDP per Capita",

    y = "Happiness Score",

    color = "Region",

    caption = "Data source: World Happiness Report 2026"

  ) +

  theme_minimal()

`geom_smooth()` using formula = 'y ~ x'

This visualization shows the relationship between GDP per capita and happiness score. In general, countries with higher GDP per capita tend to have higher happiness scores. However, the plot also shows that GDP is not the only factor because countries in the same GDP range can still have different happiness scores.

## Visualization 2: Average Happiness Score by Region

ggplot(region_summary, aes(x = reorder(region, average_score), y = average_score, fill = region)) +

  geom_col() +

  coord_flip() +

  scale_fill_manual(values = c( "#264653", "#2a9d8f", "#e9c46a", "#f4a261", "#e76f51",

  "#8ab17d", "#6d597a", "#457b9d", "#a8dadc", "#ffafcc"

  )) +

  labs(

    title = "Average Happiness Score by Region",

    x = "Region",

    y = "Average Happiness Score",

    fill = "Region",

    caption = "Data source: World Happiness Report 2026"

  ) +

  theme_classic()

This visualization compares the average happiness score across regions. It helps show which regions have higher or lower average happiness levels. The chart makes it easier to compare regional patterns instead of looking at each country individually.

Tableau Visualization

https://public.tableau.com/shared/JR7YCG8QM?:display_count=n&:origin=viz_share_link

The Tableau dashboard provides an interactive view of global happiness scores across countries and regions. Users can hover over countries on the map to view additional information such as happiness score and region, making it easier to compare patterns between different parts of the world.

Conclusion

This project explored global happiness levels and the factors that may influence them using data from the World Happiness Report 2024. By analyzing variables such as GDP per capita, social support, healthy life expectancy, freedom to make life choices, generosity, and perceptions of corruption, I was able to better understand how economic and social conditions are connected to happiness across countries.

One interesting pattern shown in the visualizations is that countries with higher GDP per capita and stronger social support systems generally tend to report higher happiness scores. Another noticeable trend is that some countries with similar economic conditions still have different happiness levels, suggesting that social and cultural factors may also play an important role.

The regression analysis showed that several variables were statistically significant predictors of happiness. The diagnostic plots also suggested that the model fit the data reasonably well, although a few countries appeared as potential outliers.

If I had more time, I would like to explore changes in happiness over time using data from multiple years. Overall, this project helped me improve my skills in data cleaning, visualization, regression analysis, and data interpretation using R and Tableau.