Introduction

Context

The World Happiness Report is a landmark survey of the state of global happiness . The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.

Goal

Our mission with this report is to analyze any variation in reported national happiness levels. We willsee if there is any correlation in happiness levels with the variables present and analyze any changes in hapiness levels over the years along with comparing happiness levels between countries.

WH_data_raw <- read_csv("World-happiness-report-updated_2024.csv")
summary(WH_data_raw)

##  Country name            year       Life Ladder    Log GDP per capita
##  Length:2363        Min.   :2005   Min.   :1.281   Min.   : 5.527    
##  Class :character   1st Qu.:2011   1st Qu.:4.647   1st Qu.: 8.507    
##  Mode  :character   Median :2015   Median :5.449   Median : 9.503    
##                     Mean   :2015   Mean   :5.484   Mean   : 9.400    
##                     3rd Qu.:2019   3rd Qu.:6.324   3rd Qu.:10.393    
##                     Max.   :2023   Max.   :8.019   Max.   :11.676    
##                                                    NA's   :28        
##  Social support   Healthy life expectancy at birth Freedom to make life choices
##  Min.   :0.2280   Min.   : 6.72                    Min.   :0.2280              
##  1st Qu.:0.7440   1st Qu.:59.20                    1st Qu.:0.6610              
##  Median :0.8345   Median :65.10                    Median :0.7710              
##  Mean   :0.8094   Mean   :63.40                    Mean   :0.7503              
##  3rd Qu.:0.9040   3rd Qu.:68.55                    3rd Qu.:0.8620              
##  Max.   :0.9870   Max.   :74.60                    Max.   :0.9850              
##  NA's   :13       NA's   :63                       NA's   :36                  
##    Generosity       Perceptions of corruption Positive affect  Negative affect 
##  Min.   :-0.34000   Min.   :0.0350            Min.   :0.1790   Min.   :0.0830  
##  1st Qu.:-0.11200   1st Qu.:0.6870            1st Qu.:0.5720   1st Qu.:0.2090  
##  Median :-0.02200   Median :0.7985            Median :0.6630   Median :0.2620  
##  Mean   : 0.00010   Mean   :0.7440            Mean   :0.6519   Mean   :0.2732  
##  3rd Qu.: 0.09375   3rd Qu.:0.8678            3rd Qu.:0.7370   3rd Qu.:0.3260  
##  Max.   : 0.70000   Max.   :0.9830            Max.   :0.8840   Max.   :0.7050  
##  NA's   :81         NA's   :125               NA's   :24       NA's   :16

Break Down of Variables

Country name: The name of the country for the given data entry.
Year: The year corresponding to the data entry.
Life Ladder: A measure of overall life satisfaction or happiness in the country.
Log GDP per capita: The logarithm of the country’s GDP per capita, indicating economic strength.
Social support: A measure of the extent to which people have support from family or friends during times of need.
Healthy life expectancy at birth: The number of years a newborn is expected to live in good health.
Freedom to make life choices: A measure of how free people feel to make choices about their lives.
Generosity: The perceived generosity in the country, typically through donations and helping others.
Perceptions of corruption: A measure of the perceived level of corruption in the government and businesses.
Positive affect: The average frequency of positive emotions experienced by individuals.
Negative affect: The average frequency of negative emotions experienced by individuals.

Questions to be Answered

General Correlation between variables
Change in Happiness over time
Country Vs Country

1) Correlation between variables.

Now that we have our listed set of variables, an obvious first question to be answered would be to look into the correlation between the variables in our data set along with each other:

Lets do that now, first we’ll only extract the values of interest from our data set, that being most of the variables with the exception of year and country name. We also want to avoid any rows that contain missing data (NA values) so we’ll index a new data set with said properties :

Corr_Data <- WH_data_raw %>%
  select(-year, -`Country name`)%>%
  na.omit()

Lets have a Statistical Breakdown of correlations from this dataset:

cor_matrix = cor(Corr_Data, use = "complete.obs")
print(cor_matrix)

##                                  Life Ladder Log GDP per capita Social support
## Life Ladder                        1.0000000         0.78711919     0.72492636
## Log GDP per capita                 0.7871192         1.00000000     0.69899085
## Social support                     0.7249264         0.69899085     1.00000000
## Healthy life expectancy at birth   0.7252396         0.83214199     0.60211977
## Freedom to make life choices       0.5281304         0.34997346     0.39383489
## Generosity                         0.1625579        -0.02495393     0.05582004
## Perceptions of corruption         -0.4515752        -0.35246405    -0.22334152
## Positive affect                    0.5021954         0.22298279     0.42611799
## Negative affect                   -0.3455397        -0.26952643    -0.46178104
##                                  Healthy life expectancy at birth
## Life Ladder                                            0.72523962
## Log GDP per capita                                     0.83214199
## Social support                                         0.60211977
## Healthy life expectancy at birth                       1.00000000
## Freedom to make life choices                           0.36645489
## Generosity                                             0.01164712
## Perceptions of corruption                             -0.30766870
## Positive affect                                        0.21207360
## Negative affect                                       -0.14516448
##                                  Freedom to make life choices  Generosity
## Life Ladder                                         0.5281304  0.16255789
## Log GDP per capita                                  0.3499735 -0.02495393
## Social support                                      0.3938349  0.05582004
## Healthy life expectancy at birth                    0.3664549  0.01164712
## Freedom to make life choices                        1.0000000  0.31295798
## Generosity                                          0.3129580  1.00000000
## Perceptions of corruption                          -0.4741101 -0.27270434
## Positive affect                                     0.5808449  0.30989854
## Negative affect                                    -0.2664939 -0.06939231
##                                  Perceptions of corruption Positive affect
## Life Ladder                                     -0.4515752       0.5021954
## Log GDP per capita                              -0.3524641       0.2229828
## Social support                                  -0.2233415       0.4261180
## Healthy life expectancy at birth                -0.3076687       0.2120736
## Freedom to make life choices                    -0.4741101       0.5808449
## Generosity                                      -0.2727043       0.3098985
## Perceptions of corruption                        1.0000000      -0.2876335
## Positive affect                                 -0.2876335       1.0000000
## Negative affect                                  0.2740036      -0.3279926
##                                  Negative affect
## Life Ladder                          -0.34553967
## Log GDP per capita                   -0.26952643
## Social support                       -0.46178104
## Healthy life expectancy at birth     -0.14516448
## Freedom to make life choices         -0.26649389
## Generosity                           -0.06939231
## Perceptions of corruption             0.27400358
## Positive affect                      -0.32799261
## Negative affect                       1.00000000

A statician may be able to look above and draw insight, but let’s make it easier to digest with a heat map:

corrplot(cor_matrix, method = "color", type = "upper", tl.col = "black",
title = "Correlation Matrix Heatmap", mar = c(0, 0, 2, 0))

We can break down this correlation matrix in two ways - 1: looking at correlation levels according to Ladder Score (Overall Happiness Scores), 2: checking Correlation levels according to Positive and Negative affects

Overall Happiness Levels

By observing our heat map and the statistics presented to us in our correlation matrix, we can see that the factor with the highest correlation to Ladder Score is Log GDP with a correlation score of .79, and coming in at second and third place are Life Expectancy and Social Support. With that said let’s take a closer look at the spread of happiness levels by country over the past decades with a scatter plot. Each point in our plot will represent a given countries ladder score in a distinct year from 2005-2024.

ggplot(Corr_Data, aes(x= `Life Ladder`, y = `Log GDP per capita`))+
  geom_point()+
  geom_smooth(color = "green")+
  labs(title = "Happiness Score vs Log GDP per Capita",
         x = "Happiness Score",
         y = "Log GDP per Capita",
         caption = "Source: World Happiness Report 2024") +
  theme_minimal()

## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

As expected with a correlation score of .79, there is a generally tight and noticeable fit of the points along our generated line of fit, with not many notable outliers present in our graph.

With that said Lets take at a more digestible display of LOG GDP levels with a histogram:

ggplot(Corr_Data, aes(x = `Log GDP per capita`)) + 
  geom_histogram(aes(y = ..density..), bins = 30, fill = "darkgreen", color = "black", alpha = 0.5) +
  geom_density(color = 'red')+
  labs(title = "Distribution of GDP per Capita", x = "Log GDP", y = "Frequency")+
  theme_minimal()

## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Interestingly enough, we can see the distribution of our Log GDP data is set to be rather skewed right. Signifying there are more instances of there being countries with higher GDP per capita levels than the average, and a fewlower end outliers bringing the average down.

Positive and Negative affects

We see a particularly high correlation level between positive effect and freedom to make life choices. And inversely, though one might expect freedom to make life choices to hold a similar correlation in negative affects in the opposite direction, it appears social support is the factor that hold the most correlation in denoting these negative affects.

Let’s Take a deeper look with some scatter plots of reported Positive and Negative affect levels based around these two variables

ggplot(Corr_Data, aes(x= `Positive affect`, y = `Freedom to make life choices`))+
  geom_point()+
  geom_smooth()+
  labs(title = "Freedom of Choice Levels vs Positive Affect Score",
         x = "Freedom of choice",
         y = "Positive Affect Score",
         caption = "Source: World Happiness Report 2024") +
  theme_minimal()

## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

ggplot(Corr_Data, aes(x= `Negative affect`, y = `Social support`))+
  geom_point()+
  geom_smooth(color = "red")+
  labs(title = "Freedom of Choice Levels vs Negative Affect Score",
         x = "Freedom of choice",
         y = "Negative Affect Score",
         caption = "Source: World Happiness Report 2024") +
  theme_minimal()

## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

There seems to be in general more variation between our correlation levels in our selected variable to analyze our positve and negative affects. So, whats our conclusion considering all thing?

Correlation Conclusion

We see GDP per capita to be the most correlated factor with our general happiness score. This makes sense as with a higher GDP, a country would be able to more feasibly provide their citizens with tools to help live a higher quality life. Better access to health care, cleaner food, nicer homes etc.

Although GDP holds the crown for being the best general predictor for happiness scores, it does not really reflect very heavily in the Positive or Negative emotions displayed by citizens of country. Those are held by Freedom of decision making and social support respectively. How does this work? Well, freedom of decision making and Social support are factors that vary heavily within communities even within the same nations, a lot more than GDP. With that said it can be easily seen these variables that are felt much closer to home and have higher variability within communities could have a more adverse affect on positive or negative comments made on living status than something like GDP.

2) Change in Happiness Levels

Lets breakdown happiness levels overtime globally from 2006 - 2024 and analyze any changes. Moving forward we will omit any data present from 2005 due to a notable amount of missing information.

Global change in Happiness Levels

m <- WH_data_raw%>%
  filter(year != 2005)%>%
  na.omit()%>%
  group_by(year)%>%
  summarise(avg_happiness = mean(`Life Ladder`))

j <- WH_data_raw%>%
  filter(year != 2005)%>%
  na.omit()%>%
  group_by(year)

ggplot(m, aes(x = year, y = avg_happiness))+
  geom_line(color = "purple")+
  geom_point()+
  labs(title = "Global Happiness Levels over the Year",
       x = "Year",
       y = "Happiness Levels")+
  theme_minimal()

We can see a general incline in world Happiness levels over the past Several Years, with notable dips around the 2020 and 2021 years, most possibly caused by the affects of the COVID-19 Virus.

Lets take a deeper dive into the display of these statistics with a box plot:

ggplot(j, aes(x = year, y = `Life Ladder`))+
  geom_boxplot(aes(group = year, fill = factor(year)))+
  labs(title = "Global Happiness Levels over the Year",
       x = "Year",
       y = "Happiness Levels")+
  theme_minimal()

We can see globally that the spread of happiness levels has not varied much. We can follow the denoted medians in the middle of our box plot to see that their direction roughly matches the tack of our line graph from earlier.

Now that we see that there’s been an awfully consistent globally, let’s narrow down our analysis to a single country to see if this trend remains the same.

Change in Happiness levels: Spain

Our Country of choice will be Spain. We will look at 4 main factors of interest in this analysis: Life Ladder, GDP per Capita, Healthy life expectancy at birth, and social Social support and analyze any changes in all of these variables over 5 year incraments from the past 20 years.

Spain <- WH_data_raw%>%
  na.omit()%>%
  select(`Life Ladder`, `Log GDP per capita`, `Healthy life expectancy at birth`, `Social support`,`year`, `Country name`)%>%
  filter(`Country name` == 'Spain', year %in% c(2008, 2013, 2018, 2023))%>%
  pivot_longer(cols = c(`Life Ladder`, `Log GDP per capita`, `Healthy life expectancy at birth`, `Social support`), names_to = "Variable", values_to = "Value")



ggplot(Spain, aes(x = factor(year), y = Value)) +
  geom_bar(stat = "identity", aes(fill = factor(year))) +
  facet_wrap(. ~ Variable, scale = 'free_y') + 
  labs(x = "Year", y = "Value", title = "Comparison of Variables in Spain by Year") +
  theme_minimal()

Change in Happiness levels Conclusion

At the Global Level, we can assess that general change in Happiness levels have only changed slightly over the past 20 years. Even with our singularly selected country of Spain, we can observe not much change has occurred. With this in mind, we can dig a level deeper and compare multiple countries to see any apparent changes in Happiness levels over time

3) Comparing Countries

Lets take a step closer and look at changes in happiness levels by 5 6 countries of interest: Spain (where I am studying currently), United States (where I am from), Afghanistan and Togo (Lowest Average Life Ladder - with complete data), and Denmark and Finland (Highest Average Life Ladder)

six <- WH_data_raw%>%
  filter(year != 2005)%>%
  na.omit()%>%
  filter( `Country name` %in% c("United States", "Togo", "Denmark", "Finland", "Spain", "Afghanistan"))

six_average <- six%>%
  group_by(`Country name`)%>%
  summarise_all(mean)


ggplot(six_average, aes(x =`Country name`, y = `Life Ladder`))+
  geom_bar(stat = "identity", aes(fill = `Country name`))+
  labs(title = "Life Ladder of Six countries of interest",
       x = "Country",
       y = "Happiness score")+
  theme_minimal()

Here Is a general breakdown of Happiness scores by country, lets look at their change over the years.

ggplot(six, aes(x = year, y = `Life Ladder`))+
  geom_line(aes(color = `Country name`), size = 1)+
  geom_point()+
  labs(title = "Global Happiness Levels over the Year",
       x = "Year",
       y = "Happiness Levels")+
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

One thing to point out with this graph is the volatility of Happiness levels. As we go higher in our graph, our happiness levels appear to be rather stable over the past 20 years. Inversely, our scores have rather sharp changes with our lower countries lowere in the average happiness levels score

Lets see if this apparent variability is present accross the board for different variables by our countries.

six_updated <- six %>%
  select(year, `Country name`, `Life Ladder`, `Log GDP per capita`, `Healthy life expectancy at birth`, `Social support`, `Perceptions of corruption`)%>%
  filter(year %in% c(2008, 2013, 2018, 2023))%>%
  pivot_longer(cols = c(`Life Ladder`, `Log GDP per capita`, `Healthy life expectancy at birth`, `Social support`, `Perceptions of corruption`), names_to = "Variable", values_to = "Value")


ggplot(six_updated, aes(x = `Country name`, y = Value)) +
  geom_boxplot( aes(fill = `Country name`)) +
  facet_wrap(. ~ Variable, scale = 'free_y') + 
  labs(x = "", y = "Value", title = "Comparison of Variables in Spain by Year") +
  theme_minimal()+
  theme(axis.text.x = element_text(angle=90))

Conclusion on Country Vs Country

We can see that although Globally there isn’t much variability year by year in happiness levels, there does show to be an apparent relationship of variability in scores year by year with countries that rate their happiness levels higher on average vs those lower.

World Happiness Dataset Analysis

Jeffrey Fernandez

2024-01-24