Research question: How do a country’s social support systems, freedom of life choices, generosity, and perceptions of corruption predict national happiness scores?
Dependent variable: Happiness score (0–10 scale)
Independent variables: Social support, freedom to make life choices, generosity index, corruption perception score
Why the question is of your interest Because I always concern what can decide a happy country as a happy country would have more good people and the more good people the better the world.
Dataset: World Happiness Report 2021 - Kaggle - https://www.kaggle.com/datasets/ajaypalsinghlo/world-happiness-report-2021
About:
The World Happiness Report is a landmark survey of the state of global happiness
The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions.
Leading experts across fields-economics, psycology, surevy analysis, national statistics, health, public policy and more-describe how measurements of well-being can be used effectively to assess the progress of nations.
Source: - The happiness scores and rankings use data from the Gallup World Poll - a set of nationally representative surveys undertaken in more than 160 countries in over 140 languages.
Variables: Main variable
var<-tribble(
~variable, ~meaning,
"Ladder Score (Happiness Score)", "Based on the Cantril Ladder of Life Scale — respondents imagine a ladder from 0 (worst possible life) to 10 (best possible life) and rate where they currently stand. This is the national average of those responses.",
"Log GDP per capita", "Acountry's economic output per person, log-transformed",
"Social support", "Average answer to 'Do you have someone to count on in times of trouble?' (0 or 1)",
"Healthy life expectancy", "Average number of years a person can expect to live in good health",
"Freedom to make life choices", "Average response to 'Are you satisfied with your freedom to choose what to do with your life?'",
"Generosity", "Residual of regressing national average of donation behavior on GDP per capita",
"Perceptions of corruption", "Average of two binary questions about corruption in government and business",
"Regional indicator", "Region",
"Upper and lower confidence interval bounds"," Confidence interval bounds for the ladder score"
)
library(knitr)
kable(var)
| variable | meaning |
|---|---|
| Ladder Score (Happiness Score) | Based on the Cantril Ladder of Life Scale — respondents imagine a ladder from 0 (worst possible life) to 10 (best possible life) and rate where they currently stand. This is the national average of those responses. |
| Log GDP per capita | Acountry’s economic output per person, log-transformed |
| Social support | Average answer to ‘Do you have someone to count on in times of trouble?’ (0 or 1) |
| Healthy life expectancy | Average number of years a person can expect to live in good health |
| Freedom to make life choices | Average response to ‘Are you satisfied with your freedom to choose what to do with your life?’ |
| Generosity | Residual of regressing national average of donation behavior on GDP per capita |
| Perceptions of corruption | Average of two binary questions about corruption in government and business |
| Regional indicator | Region |
| Upper and lower confidence interval bounds | Confidence interval bounds for the ladder score |
view(var)
Table of the Dataset:
wh <- read.csv("~/Downloads/School material/D-Visual/archive/world-happiness-report-2021.csv", header=TRUE)
glimpse(wh)
## Rows: 149
## Columns: 20
## $ Country.name <chr> "Finland", "Denmark", "Swit…
## $ Regional.indicator <chr> "Western Europe", "Western …
## $ Ladder.score <dbl> 7.842, 7.620, 7.571, 7.554,…
## $ Standard.error.of.ladder.score <dbl> 0.032, 0.035, 0.036, 0.059,…
## $ upperwhisker <dbl> 7.904, 7.687, 7.643, 7.670,…
## $ lowerwhisker <dbl> 7.780, 7.552, 7.500, 7.438,…
## $ Logged.GDP.per.capita <dbl> 10.775, 10.933, 11.117, 10.…
## $ Social.support <dbl> 0.954, 0.954, 0.942, 0.983,…
## $ Healthy.life.expectancy <dbl> 72.000, 72.700, 74.400, 73.…
## $ Freedom.to.make.life.choices <dbl> 0.949, 0.946, 0.919, 0.955,…
## $ Generosity <dbl> -0.098, 0.030, 0.025, 0.160…
## $ Perceptions.of.corruption <dbl> 0.186, 0.179, 0.292, 0.673,…
## $ Ladder.score.in.Dystopia <dbl> 2.43, 2.43, 2.43, 2.43, 2.4…
## $ Explained.by..Log.GDP.per.capita <dbl> 1.446, 1.502, 1.566, 1.482,…
## $ Explained.by..Social.support <dbl> 1.106, 1.108, 1.079, 1.172,…
## $ Explained.by..Healthy.life.expectancy <dbl> 0.741, 0.763, 0.816, 0.772,…
## $ Explained.by..Freedom.to.make.life.choices <dbl> 0.691, 0.686, 0.653, 0.698,…
## $ Explained.by..Generosity <dbl> 0.124, 0.208, 0.204, 0.293,…
## $ Explained.by..Perceptions.of.corruption <dbl> 0.481, 0.485, 0.413, 0.170,…
## $ Dystopia...residual <dbl> 3.253, 2.868, 2.839, 2.967,…
Visualize:
wh %>%
slice_max(Ladder.score, n = 20) %>%
ggplot(aes(x = reorder(Country.name, Ladder.score), y = Ladder.score)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(
title = "Top 20 Happiest Countries - 2021",
x = "Country",
y = "Ladder Score"
) +
theme_minimal()
Dependent variable: Ladder.score (Happiness) Independent variables: GDP, Social support, Life expectancy,Freedom, Generosity, Corruption
We are predicting happiness using the 6 factors that the World Happiness Report uses in their research: GDP, social support, healthy life expectancy, freedom, generosity, and perceptions of corruption
This is because the studies and theory both support that these things affect how happy people feel.
# Fit the full model
model <- lm( Ladder.score ~ Logged.GDP.per.capita +
Social.support +
Healthy.life.expectancy +
Freedom.to.make.life.choices +
Generosity +
Perceptions.of.corruption,
data = wh)
#View
summary(model)
##
## Call:
## lm(formula = Ladder.score ~ Logged.GDP.per.capita + Social.support +
## Healthy.life.expectancy + Freedom.to.make.life.choices +
## Generosity + Perceptions.of.corruption, data = wh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.85049 -0.30026 0.05735 0.33368 1.04878
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.23722 0.63049 -3.548 0.000526 ***
## Logged.GDP.per.capita 0.27953 0.08684 3.219 0.001595 **
## Social.support 2.47621 0.66822 3.706 0.000301 ***
## Healthy.life.expectancy 0.03031 0.01333 2.274 0.024494 *
## Freedom.to.make.life.choices 2.01046 0.49480 4.063 7.98e-05 ***
## Generosity 0.36438 0.32121 1.134 0.258541
## Perceptions.of.corruption -0.60509 0.29051 -2.083 0.039058 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5417 on 142 degrees of freedom
## Multiple R-squared: 0.7558, Adjusted R-squared: 0.7455
## F-statistic: 73.27 on 6 and 142 DF, p-value: < 2.2e-16
#Anova
anova(model)
## Analysis of Variance Table
##
## Response: Ladder.score
## Df Sum Sq Mean Sq F value Pr(>F)
## Logged.GDP.per.capita 1 106.463 106.463 362.7575 < 2.2e-16 ***
## Social.support 1 8.320 8.320 28.3503 3.869e-07 ***
## Healthy.life.expectancy 1 3.476 3.476 11.8455 0.0007596 ***
## Freedom.to.make.life.choices 1 8.769 8.769 29.8807 2.009e-07 ***
## Generosity 1 0.713 0.713 2.4306 0.1212116
## Perceptions.of.corruption 1 1.273 1.273 4.3383 0.0390577 *
## Residuals 142 41.674 0.293
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Reduced model
model_r <- lm( Ladder.score ~ Logged.GDP.per.capita +
Social.support +
Healthy.life.expectancy +
Freedom.to.make.life.choices +
Perceptions.of.corruption,
data = wh)
summary(model_r)
##
## Call:
## lm(formula = Ladder.score ~ Logged.GDP.per.capita + Social.support +
## Healthy.life.expectancy + Freedom.to.make.life.choices +
## Perceptions.of.corruption, data = wh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.93303 -0.29768 0.06863 0.33924 1.02304
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.11039 0.62112 -3.398 0.000880 ***
## Logged.GDP.per.capita 0.26400 0.08584 3.075 0.002518 **
## Social.support 2.50670 0.66835 3.751 0.000256 ***
## Healthy.life.expectancy 0.02936 0.01332 2.204 0.029095 *
## Freedom.to.make.life.choices 2.13266 0.48342 4.412 2.01e-05 ***
## Perceptions.of.corruption -0.66778 0.28549 -2.339 0.020718 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5423 on 143 degrees of freedom
## Multiple R-squared: 0.7536, Adjusted R-squared: 0.745
## F-statistic: 87.49 on 5 and 143 DF, p-value: < 2.2e-16
# Compare full vs reduced model
anova(model_r, model)
## Analysis of Variance Table
##
## Model 1: Ladder.score ~ Logged.GDP.per.capita + Social.support + Healthy.life.expectancy +
## Freedom.to.make.life.choices + Perceptions.of.corruption
## Model 2: Ladder.score ~ Logged.GDP.per.capita + Social.support + Healthy.life.expectancy +
## Freedom.to.make.life.choices + Generosity + Perceptions.of.corruption
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 143 42.052
## 2 142 41.674 1 0.37767 1.2868 0.2585
# Residual and normality plots (4 plots in one)
par(mfrow = c(2,2))
plot(model)
# Check collinearity using VIF
# VIF > 5 = moderate concern, VIF > 10 = serious problem
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:openintro':
##
## densityPlot
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
vif(model)
## Logged.GDP.per.capita Social.support
## 5.104890 2.972200
## Healthy.life.expectancy Freedom.to.make.life.choices
## 4.099348 1.585807
## Generosity Perceptions.of.corruption
## 1.180982 1.367122
wh_selected <- wh %>%
dplyr:: select(Ladder.score,
Logged.GDP.per.capita,
Social.support,
Healthy.life.expectancy,
Freedom.to.make.life.choices,
Generosity,
Perceptions.of.corruption)
cor(wh_selected)
## Ladder.score Logged.GDP.per.capita Social.support
## Ladder.score 1.00000000 0.7897597 0.7568876
## Logged.GDP.per.capita 0.78975970 1.0000000 0.7852987
## Social.support 0.75688765 0.7852987 1.0000000
## Healthy.life.expectancy 0.76809946 0.8594606 0.7232561
## Freedom.to.make.life.choices 0.60775307 0.4323235 0.4829298
## Generosity -0.01779928 -0.1992864 -0.1149459
## Perceptions.of.corruption -0.42114000 -0.3423374 -0.2032070
## Healthy.life.expectancy
## Ladder.score 0.7680995
## Logged.GDP.per.capita 0.8594606
## Social.support 0.7232561
## Healthy.life.expectancy 1.0000000
## Freedom.to.make.life.choices 0.4614939
## Generosity -0.1617503
## Perceptions.of.corruption -0.3643735
## Freedom.to.make.life.choices Generosity
## Ladder.score 0.6077531 -0.01779928
## Logged.GDP.per.capita 0.4323235 -0.19928640
## Social.support 0.4829298 -0.11494585
## Healthy.life.expectancy 0.4614939 -0.16175028
## Freedom.to.make.life.choices 1.0000000 0.16943737
## Generosity 0.1694374 1.00000000
## Perceptions.of.corruption -0.4013630 -0.16396173
## Perceptions.of.corruption
## Ladder.score -0.4211400
## Logged.GDP.per.capita -0.3423374
## Social.support -0.2032070
## Healthy.life.expectancy -0.3643735
## Freedom.to.make.life.choices -0.4013630
## Generosity -0.1639617
## Perceptions.of.corruption 1.0000000
With a correlations higher than 0.85, we can tell the correlations between GDP and Health is quite high which show that rich country will live longer.
Wealthier countries here tend to have stronger social systems because our correlation between GDP and social is about 0.78
Check all the residual plot
wh_diag <- wh %>%
mutate(residuals = resid(model))
predictors <- c("Logged.GDP.per.capita", "Social.support",
"Healthy.life.expectancy", "Freedom.to.make.life.choices",
"Generosity", "Perceptions.of.corruption")
for (var in predictors) {
p <- ggplot(wh_diag, aes_string(x = var, y = "residuals")) +
geom_point(alpha = 0.6) +
geom_smooth(method = "loess", se = FALSE, color = "steelblue") +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = paste("Residuals vs", var), y = "Residuals") +
theme_minimal()
print(p)
}
Residual plots were examined for all six predictors to assess whether higher-order or interaction terms were necessary.
Freedom to make life choices showed a near-flat pattern around zero which mean there is no need of adjustment.
The remaining variables showed varying degrees of non-linearity:
Healthy life expectancy displayed the strongest pattern which provide a clear justification of a quadratic term
Social support showed a bit curve in the mid-range
GDP showed only a mild S-curve
Perceptions of corruption and Generosity showed minor deviations.
Based on this visual evidence, quadratic terms for Healthy.life.expectancy and Social.support were tested using partial F-tests via anova()
# strongest candidate
model_quad_health <- lm(Ladder.score ~ Logged.GDP.per.capita +
Social.support +
Healthy.life.expectancy +
I(Healthy.life.expectancy^2) +
Freedom.to.make.life.choices +
Generosity +
Perceptions.of.corruption,
data = wh)
# Second candidate
model_quad_social <- lm(Ladder.score ~ Logged.GDP.per.capita +
Social.support +
I(Social.support^2) +
Healthy.life.expectancy +
Freedom.to.make.life.choices +
Generosity +
Perceptions.of.corruption,
data = wh)
# Compare each to base model
anova(model, model_quad_health) # p < 0.05 = keep it
## Analysis of Variance Table
##
## Model 1: Ladder.score ~ Logged.GDP.per.capita + Social.support + Healthy.life.expectancy +
## Freedom.to.make.life.choices + Generosity + Perceptions.of.corruption
## Model 2: Ladder.score ~ Logged.GDP.per.capita + Social.support + Healthy.life.expectancy +
## I(Healthy.life.expectancy^2) + Freedom.to.make.life.choices +
## Generosity + Perceptions.of.corruption
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 142 41.674
## 2 141 39.705 1 1.9694 6.9939 0.009106 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(model, model_quad_social) # p < 0.05 = keep it
## Analysis of Variance Table
##
## Model 1: Ladder.score ~ Logged.GDP.per.capita + Social.support + Healthy.life.expectancy +
## Freedom.to.make.life.choices + Generosity + Perceptions.of.corruption
## Model 2: Ladder.score ~ Logged.GDP.per.capita + Social.support + I(Social.support^2) +
## Healthy.life.expectancy + Freedom.to.make.life.choices +
## Generosity + Perceptions.of.corruption
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 142 41.674
## 2 141 40.343 1 1.331 4.652 0.03271 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Also check adjusted R² — did it improve?
summary(model)$adj.r.squared
## [1] 0.7455308
summary(model_quad_health)$adj.r.squared
## [1] 0.7558371
summary(model_quad_social)$adj.r.squared
## [1] 0.7519113
#both quadratic terms together
model_quad_both <- lm(Ladder.score ~ Logged.GDP.per.capita +
Social.support +
I(Social.support^2) +
Healthy.life.expectancy +
I(Healthy.life.expectancy^2) +
Freedom.to.make.life.choices +
Generosity +
Perceptions.of.corruption,
data = wh)
summary(model_quad_both)$adj.r.squared
## [1] 0.7601577
anova(model_quad_health, model_quad_both)
## Analysis of Variance Table
##
## Model 1: Ladder.score ~ Logged.GDP.per.capita + Social.support + Healthy.life.expectancy +
## I(Healthy.life.expectancy^2) + Freedom.to.make.life.choices +
## Generosity + Perceptions.of.corruption
## Model 2: Ladder.score ~ Logged.GDP.per.capita + Social.support + I(Social.support^2) +
## Healthy.life.expectancy + I(Healthy.life.expectancy^2) +
## Freedom.to.make.life.choices + Generosity + Perceptions.of.corruption
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 141 39.705
## 2 140 38.726 1 0.97922 3.54 0.06198 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since adding Social.support^2 on top of Healthy.life.expectancy^2 does not significantly improve fit because p = 0.062 > 0.05 threshold
So that we keep my final model would be Healthy.life.expectancy^2.
# Final
model_final <- lm(Ladder.score ~ Logged.GDP.per.capita +
Social.support +
Healthy.life.expectancy +
I(Healthy.life.expectancy^2) +
Freedom.to.make.life.choices +
Generosity +
Perceptions.of.corruption,
data = wh)
summary(model_final)
##
## Call:
## lm(formula = Ladder.score ~ Logged.GDP.per.capita + Social.support +
## Healthy.life.expectancy + I(Healthy.life.expectancy^2) +
## Freedom.to.make.life.choices + Generosity + Perceptions.of.corruption,
## data = wh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.73154 -0.24981 0.06797 0.33892 1.12600
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.758067 3.829640 2.026 0.044672 *
## Logged.GDP.per.capita 0.225680 0.087467 2.580 0.010897 *
## Social.support 2.561439 0.655343 3.909 0.000144 ***
## Healthy.life.expectancy -0.297325 0.124576 -2.387 0.018328 *
## I(Healthy.life.expectancy^2) 0.002650 0.001002 2.645 0.009106 **
## Freedom.to.make.life.choices 2.274823 0.494882 4.597 9.45e-06 ***
## Generosity 0.353807 0.314667 1.124 0.262760
## Perceptions.of.corruption -0.312564 0.305309 -1.024 0.307701
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5307 on 141 degrees of freedom
## Multiple R-squared: 0.7674, Adjusted R-squared: 0.7558
## F-statistic: 66.45 on 7 and 141 DF, p-value: < 2.2e-16
The final model explains 75.6% of the variance in national happiness scores (adjusted R² = 0.7558), indicating a strong fit.
For log GDP per capita, it is associated with a 0.23 point increase in happiness. This is statistically significant with p value <0.05, confirming that wealthier nations tend to be happier.
For Social support, and Freedom, these are the strongest significant predictor since a one-unit increase in Social support is associated with a 2.56 point increase in happiness and a one-unit increase in freedom of life choices is associated with a 2.27 point increase in happiness. And for my research question : How do a country’s social support systems, freedom of life choices, generosity, and perceptions of corruption predict national happiness scores, we can tell that Social support and freedom of life have the most influent on the national happiness scores.
For Healthy life expectancy, we can’t interpreted as a single number because of the quadratic term. The relationship is curved, not straight
For Generosity and Corruption, these two variables are not significant since their p value are larger than 0.05.
Potential Model Improvements: I can add regional indicator to capture more information
Additional Variables or Transformations:
We can add Unemployment rate and Income inequality to show how Financial insecurity affects happiness beyond average GDP and how Two countries can have the same GDP but very different happiness if wealth is unevenly distributed
We can also change it from just in 2021 to 2015-2021 or make it update until now since the year of 2021 is the year of Pandemic that it might not be representative.
Limitation:
2021 was affected by the pandemic, which may have changed the happiness scores, social support measures, and life expectancy in ways that are not representative of normal conditions.
This data set might be biased since all variables are national averages, which hides enormous within-country variation. For example: A person in rural China and urban China may have completely different happiness drivers, but they contribute to the same data point.