The World Happiness Report is a landmark survey of the state of global happiness . The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.
Our mission with this report is to analyze any variation in reported national happiness levels. We willsee if there is any correlation in happiness levels with the variables present and analyze any changes in hapiness levels over the years along with comparing happiness levels between countries.
## Country name year Life Ladder Log GDP per capita
## Length:2363 Min. :2005 Min. :1.281 Min. : 5.527
## Class :character 1st Qu.:2011 1st Qu.:4.647 1st Qu.: 8.507
## Mode :character Median :2015 Median :5.449 Median : 9.503
## Mean :2015 Mean :5.484 Mean : 9.400
## 3rd Qu.:2019 3rd Qu.:6.324 3rd Qu.:10.393
## Max. :2023 Max. :8.019 Max. :11.676
## NA's :28
## Social support Healthy life expectancy at birth Freedom to make life choices
## Min. :0.2280 Min. : 6.72 Min. :0.2280
## 1st Qu.:0.7440 1st Qu.:59.20 1st Qu.:0.6610
## Median :0.8345 Median :65.10 Median :0.7710
## Mean :0.8094 Mean :63.40 Mean :0.7503
## 3rd Qu.:0.9040 3rd Qu.:68.55 3rd Qu.:0.8620
## Max. :0.9870 Max. :74.60 Max. :0.9850
## NA's :13 NA's :63 NA's :36
## Generosity Perceptions of corruption Positive affect Negative affect
## Min. :-0.34000 Min. :0.0350 Min. :0.1790 Min. :0.0830
## 1st Qu.:-0.11200 1st Qu.:0.6870 1st Qu.:0.5720 1st Qu.:0.2090
## Median :-0.02200 Median :0.7985 Median :0.6630 Median :0.2620
## Mean : 0.00010 Mean :0.7440 Mean :0.6519 Mean :0.2732
## 3rd Qu.: 0.09375 3rd Qu.:0.8678 3rd Qu.:0.7370 3rd Qu.:0.3260
## Max. : 0.70000 Max. :0.9830 Max. :0.8840 Max. :0.7050
## NA's :81 NA's :125 NA's :24 NA's :16
Now that we have our listed set of variables, an obvious first question to be answered would be to look into the correlation between the variables in our data set along with each other:
Lets do that now, first we’ll only extract the values of interest from our data set, that being most of the variables with the exception of year and country name. We also want to avoid any rows that contain missing data (NA values) so we’ll index a new data set with said properties :
Lets have a Statistical Breakdown of correlations from this dataset:
## Life Ladder Log GDP per capita Social support
## Life Ladder 1.0000000 0.78711919 0.72492636
## Log GDP per capita 0.7871192 1.00000000 0.69899085
## Social support 0.7249264 0.69899085 1.00000000
## Healthy life expectancy at birth 0.7252396 0.83214199 0.60211977
## Freedom to make life choices 0.5281304 0.34997346 0.39383489
## Generosity 0.1625579 -0.02495393 0.05582004
## Perceptions of corruption -0.4515752 -0.35246405 -0.22334152
## Positive affect 0.5021954 0.22298279 0.42611799
## Negative affect -0.3455397 -0.26952643 -0.46178104
## Healthy life expectancy at birth
## Life Ladder 0.72523962
## Log GDP per capita 0.83214199
## Social support 0.60211977
## Healthy life expectancy at birth 1.00000000
## Freedom to make life choices 0.36645489
## Generosity 0.01164712
## Perceptions of corruption -0.30766870
## Positive affect 0.21207360
## Negative affect -0.14516448
## Freedom to make life choices Generosity
## Life Ladder 0.5281304 0.16255789
## Log GDP per capita 0.3499735 -0.02495393
## Social support 0.3938349 0.05582004
## Healthy life expectancy at birth 0.3664549 0.01164712
## Freedom to make life choices 1.0000000 0.31295798
## Generosity 0.3129580 1.00000000
## Perceptions of corruption -0.4741101 -0.27270434
## Positive affect 0.5808449 0.30989854
## Negative affect -0.2664939 -0.06939231
## Perceptions of corruption Positive affect
## Life Ladder -0.4515752 0.5021954
## Log GDP per capita -0.3524641 0.2229828
## Social support -0.2233415 0.4261180
## Healthy life expectancy at birth -0.3076687 0.2120736
## Freedom to make life choices -0.4741101 0.5808449
## Generosity -0.2727043 0.3098985
## Perceptions of corruption 1.0000000 -0.2876335
## Positive affect -0.2876335 1.0000000
## Negative affect 0.2740036 -0.3279926
## Negative affect
## Life Ladder -0.34553967
## Log GDP per capita -0.26952643
## Social support -0.46178104
## Healthy life expectancy at birth -0.14516448
## Freedom to make life choices -0.26649389
## Generosity -0.06939231
## Perceptions of corruption 0.27400358
## Positive affect -0.32799261
## Negative affect 1.00000000
A statician may be able to look above and draw insight, but let’s make it easier to digest with a heat map:
corrplot(cor_matrix, method = "color", type = "upper", tl.col = "black",
title = "Correlation Matrix Heatmap", mar = c(0, 0, 2, 0))We can break down this correlation matrix in two ways - 1: looking at correlation levels according to Ladder Score (Overall Happiness Scores), 2: checking Correlation levels according to Positive and Negative affects
By observing our heat map and the statistics presented to us in our correlation matrix, we can see that the factor with the highest correlation to Ladder Score is Log GDP with a correlation score of .79, and coming in at second and third place are Life Expectancy and Social Support. With that said let’s take a closer look at the spread of happiness levels by country over the past decades with a scatter plot. Each point in our plot will represent a given countries ladder score in a distinct year from 2005-2024.
ggplot(Corr_Data, aes(x= `Life Ladder`, y = `Log GDP per capita`))+
geom_point()+
geom_smooth(color = "green")+
labs(title = "Happiness Score vs Log GDP per Capita",
x = "Happiness Score",
y = "Log GDP per Capita",
caption = "Source: World Happiness Report 2024") +
theme_minimal()## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
As expected with a correlation score of .79, there is a generally tight
and noticeable fit of the points along our generated line of fit, with
not many notable outliers present in our graph.
With that said Lets take at a more digestible display of LOG GDP levels with a histogram:
ggplot(Corr_Data, aes(x = `Log GDP per capita`)) +
geom_histogram(aes(y = ..density..), bins = 30, fill = "darkgreen", color = "black", alpha = 0.5) +
geom_density(color = 'red')+
labs(title = "Distribution of GDP per Capita", x = "Log GDP", y = "Frequency")+
theme_minimal()## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Interestingly enough, we can see the distribution of our Log GDP data is set to be rather skewed right. Signifying there are more instances of there being countries with higher GDP per capita levels than the average, and a fewlower end outliers bringing the average down.
We see a particularly high correlation level between positive effect and freedom to make life choices. And inversely, though one might expect freedom to make life choices to hold a similar correlation in negative affects in the opposite direction, it appears social support is the factor that hold the most correlation in denoting these negative affects.
Let’s Take a deeper look with some scatter plots of reported Positive and Negative affect levels based around these two variables
ggplot(Corr_Data, aes(x= `Positive affect`, y = `Freedom to make life choices`))+
geom_point()+
geom_smooth()+
labs(title = "Freedom of Choice Levels vs Positive Affect Score",
x = "Freedom of choice",
y = "Positive Affect Score",
caption = "Source: World Happiness Report 2024") +
theme_minimal()## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
ggplot(Corr_Data, aes(x= `Negative affect`, y = `Social support`))+
geom_point()+
geom_smooth(color = "red")+
labs(title = "Freedom of Choice Levels vs Negative Affect Score",
x = "Freedom of choice",
y = "Negative Affect Score",
caption = "Source: World Happiness Report 2024") +
theme_minimal()## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
There seems to be in general more variation between our correlation levels in our selected variable to analyze our positve and negative affects. So, whats our conclusion considering all thing?
We see GDP per capita to be the most correlated factor with our general happiness score. This makes sense as with a higher GDP, a country would be able to more feasibly provide their citizens with tools to help live a higher quality life. Better access to health care, cleaner food, nicer homes etc.
Although GDP holds the crown for being the best general predictor for happiness scores, it does not really reflect very heavily in the Positive or Negative emotions displayed by citizens of country. Those are held by Freedom of decision making and social support respectively. How does this work? Well, freedom of decision making and Social support are factors that vary heavily within communities even within the same nations, a lot more than GDP. With that said it can be easily seen these variables that are felt much closer to home and have higher variability within communities could have a more adverse affect on positive or negative comments made on living status than something like GDP.
Lets breakdown happiness levels overtime globally from 2006 - 2024 and analyze any changes. Moving forward we will omit any data present from 2005 due to a notable amount of missing information.
m <- WH_data_raw%>%
filter(year != 2005)%>%
na.omit()%>%
group_by(year)%>%
summarise(avg_happiness = mean(`Life Ladder`))
j <- WH_data_raw%>%
filter(year != 2005)%>%
na.omit()%>%
group_by(year)
ggplot(m, aes(x = year, y = avg_happiness))+
geom_line(color = "purple")+
geom_point()+
labs(title = "Global Happiness Levels over the Year",
x = "Year",
y = "Happiness Levels")+
theme_minimal()
We can see a general incline in world Happiness levels over the past
Several Years, with notable dips around the 2020 and 2021 years, most
possibly caused by the affects of the COVID-19 Virus.
Lets take a deeper dive into the display of these statistics with a box plot:
ggplot(j, aes(x = year, y = `Life Ladder`))+
geom_boxplot(aes(group = year, fill = factor(year)))+
labs(title = "Global Happiness Levels over the Year",
x = "Year",
y = "Happiness Levels")+
theme_minimal()We can see globally that the spread of happiness levels has not varied much. We can follow the denoted medians in the middle of our box plot to see that their direction roughly matches the tack of our line graph from earlier.
Now that we see that there’s been an awfully consistent globally, let’s narrow down our analysis to a single country to see if this trend remains the same.
Our Country of choice will be Spain. We will look at 4 main factors of interest in this analysis: Life Ladder, GDP per Capita, Healthy life expectancy at birth, and social Social support and analyze any changes in all of these variables over 5 year incraments from the past 20 years.
Spain <- WH_data_raw%>%
na.omit()%>%
select(`Life Ladder`, `Log GDP per capita`, `Healthy life expectancy at birth`, `Social support`,`year`, `Country name`)%>%
filter(`Country name` == 'Spain', year %in% c(2008, 2013, 2018, 2023))%>%
pivot_longer(cols = c(`Life Ladder`, `Log GDP per capita`, `Healthy life expectancy at birth`, `Social support`), names_to = "Variable", values_to = "Value")
ggplot(Spain, aes(x = factor(year), y = Value)) +
geom_bar(stat = "identity", aes(fill = factor(year))) +
facet_wrap(. ~ Variable, scale = 'free_y') +
labs(x = "Year", y = "Value", title = "Comparison of Variables in Spain by Year") +
theme_minimal()At the Global Level, we can assess that general change in Happiness levels have only changed slightly over the past 20 years. Even with our singularly selected country of Spain, we can observe not much change has occurred. With this in mind, we can dig a level deeper and compare multiple countries to see any apparent changes in Happiness levels over time
Lets take a step closer and look at changes in happiness levels by 5 6 countries of interest: Spain (where I am studying currently), United States (where I am from), Afghanistan and Togo (Lowest Average Life Ladder - with complete data), and Denmark and Finland (Highest Average Life Ladder)
six <- WH_data_raw%>%
filter(year != 2005)%>%
na.omit()%>%
filter( `Country name` %in% c("United States", "Togo", "Denmark", "Finland", "Spain", "Afghanistan"))
six_average <- six%>%
group_by(`Country name`)%>%
summarise_all(mean)
ggplot(six_average, aes(x =`Country name`, y = `Life Ladder`))+
geom_bar(stat = "identity", aes(fill = `Country name`))+
labs(title = "Life Ladder of Six countries of interest",
x = "Country",
y = "Happiness score")+
theme_minimal()Here Is a general breakdown of Happiness scores by country, lets look at their change over the years.
ggplot(six, aes(x = year, y = `Life Ladder`))+
geom_line(aes(color = `Country name`), size = 1)+
geom_point()+
labs(title = "Global Happiness Levels over the Year",
x = "Year",
y = "Happiness Levels")+
theme_minimal()## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
One thing to point out with this graph is the volatility of Happiness levels. As we go higher in our graph, our happiness levels appear to be rather stable over the past 20 years. Inversely, our scores have rather sharp changes with our lower countries lowere in the average happiness levels score
Lets see if this apparent variability is present accross the board for different variables by our countries.
six_updated <- six %>%
select(year, `Country name`, `Life Ladder`, `Log GDP per capita`, `Healthy life expectancy at birth`, `Social support`, `Perceptions of corruption`)%>%
filter(year %in% c(2008, 2013, 2018, 2023))%>%
pivot_longer(cols = c(`Life Ladder`, `Log GDP per capita`, `Healthy life expectancy at birth`, `Social support`, `Perceptions of corruption`), names_to = "Variable", values_to = "Value")
ggplot(six_updated, aes(x = `Country name`, y = Value)) +
geom_boxplot( aes(fill = `Country name`)) +
facet_wrap(. ~ Variable, scale = 'free_y') +
labs(x = "", y = "Value", title = "Comparison of Variables in Spain by Year") +
theme_minimal()+
theme(axis.text.x = element_text(angle=90))We can see that although Globally there isn’t much variability year by year in happiness levels, there does show to be an apparent relationship of variability in scores year by year with countries that rate their happiness levels higher on average vs those lower.