Introduction:

The World Happiness Report (https://worldhappiness.report/ed/2019/) is a study that analyzed the happiness of each country in the world, and ranked them based on multiple factors such as income, social support, healthy life expectancy, freedom, corruption, and generosity. This specific one was undertaken in 2019, in which Finland was the top-ranked country, followed by Denmark, Norway, Iceland, and the Netherlands. The report is supported by the Sustainable Development Solutions Network (SDSN), which is a global initiative that promotes sustainable development via the mobilization of scientific and technological expertise. I want to understand the change in certain variables, and how that affects ranking amongst happiness across the different countries.

#Calling libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom)

#Calling data
h2019 <- read.csv("happiness2019(1).csv")
head(h2019)
##   Overall.rank Country.or.region Score GDP.per.capita Social.support
## 1            1           Finland 7.769          1.340          1.587
## 2            2           Denmark 7.600          1.383          1.573
## 3            3            Norway 7.554          1.488          1.582
## 4            4           Iceland 7.494          1.380          1.624
## 5            5       Netherlands 7.488          1.396          1.522
## 6            6       Switzerland 7.480          1.452          1.526
##   Healthy.life.expectancy Freedom.to.make.life.choices Generosity
## 1                   0.986                        0.596      0.153
## 2                   0.996                        0.592      0.252
## 3                   1.028                        0.603      0.271
## 4                   1.026                        0.591      0.354
## 5                   0.999                        0.557      0.322
## 6                   1.052                        0.572      0.263
##   Perceptions.of.corruption
## 1                     0.393
## 2                     0.410
## 3                     0.341
## 4                     0.118
## 5                     0.298
## 6                     0.343
# Fixing names
names(h2019) <- tolower(names(h2019))
names(h2019) <- gsub("\\.", "_", names(h2019)) 

#Data preview
head(h2019)
##   overall_rank country_or_region score gdp_per_capita social_support
## 1            1           Finland 7.769          1.340          1.587
## 2            2           Denmark 7.600          1.383          1.573
## 3            3            Norway 7.554          1.488          1.582
## 4            4           Iceland 7.494          1.380          1.624
## 5            5       Netherlands 7.488          1.396          1.522
## 6            6       Switzerland 7.480          1.452          1.526
##   healthy_life_expectancy freedom_to_make_life_choices generosity
## 1                   0.986                        0.596      0.153
## 2                   0.996                        0.592      0.252
## 3                   1.028                        0.603      0.271
## 4                   1.026                        0.591      0.354
## 5                   0.999                        0.557      0.322
## 6                   1.052                        0.572      0.263
##   perceptions_of_corruption
## 1                     0.393
## 2                     0.410
## 3                     0.341
## 4                     0.118
## 5                     0.298
## 6                     0.343
#Selecting variables
h2019_select <- h2019 |>
  select(score, generosity, social_support, healthy_life_expectancy, freedom_to_make_life_choices, gdp_per_capita, country_or_region, overall_rank) |>
  drop_na()
#Fitting the PCA
pca_fit <- h2019_select |> 
  select(where(is.numeric)) |> 
  scale() |>                   
  prcomp()  

pca_fit
## Standard deviations (1, .., p=7):
## [1] 2.1258351 1.0890760 0.7477258 0.5363595 0.5280881 0.3987725 0.1003317
## 
## Rotation (n x k) = (7 x 7):
##                                      PC1         PC2          PC3         PC4
## score                        -0.44926715  0.04567267 -0.006484705 -0.06026107
## generosity                   -0.01919916  0.84067922 -0.529610020  0.08410161
## social_support               -0.40793159 -0.10884585 -0.004036933  0.87381899
## healthy_life_expectancy      -0.41455725 -0.13754991 -0.262930741 -0.40220565
## freedom_to_make_life_choices -0.28654886  0.47450189  0.775973661 -0.13025078
## gdp_per_capita               -0.41989547 -0.18704397 -0.219012214 -0.18913069
## overall_rank                  0.44844406 -0.01635733  0.014860400  0.10597562
##                                     PC5         PC6         PC7
## score                        -0.5321077  0.03804883 -0.71261020
## generosity                    0.0593037 -0.03889216  0.01733370
## social_support                0.2066825  0.12099500  0.02847880
## healthy_life_expectancy       0.4453037  0.61636965 -0.01065141
## freedom_to_make_life_choices  0.2688316 -0.03482139  0.01242415
## gdp_per_capita                0.3233112 -0.77478380 -0.01206284
## overall_rank                  0.5436276 -0.03139890 -0.70047261
#Scatterplot graph of PCA 1 & 2
pca_fit |>
  augment(h2019_select) |>
  ggplot(aes(.fittedPC1, .fittedPC2)) +
  geom_point(aes(size = overall_rank),alpha = 0.35, color = "palevioletred4") +
  theme_light() +
  xlab ("PC 1") +
  ylab ("PC 2") +
  geom_segment(aes(x = -3.7, y = 0, xend = 5.4, yend = 0), 
               arrow = arrow(type = "closed", length = unit(0.1, "inches")),
               color = "black") +  
  geom_segment(aes(x = 0.85, y = -3.2, xend = 0.85, yend = 4.15), 
               arrow = arrow(type = "closed", length = unit(0.1, "inches")),
               color = "black") +  
  geom_text(aes(x = 1.3, y = -3, label = "PC1"), 
            vjust = -0.5, color = "black") +  
  geom_text(aes(x = -3, y = -1.5, label = "PC2"), 
            hjust = -0.5, color = "black")

Higher overall rank is positive on PC1, opposite for lower ranks. Majoirty points are condensed closer to positive portion of PC2.

#Rotation Arrow graph of the PCA
arrow_style <- arrow(
  angle = 20, length = grid::unit(9, "pt"),
  ends = "first", type = "closed"
)
pca_fit |>
  tidy(matrix = "rotation") |>
  pivot_wider(
    names_from = "PC", values_from = "value",
    names_prefix = "PC"
  ) |>
  ggplot(aes(PC1, PC2)) +
  geom_segment(
    xend = 0, yend = 0,
    arrow = arrow_style
  ) +
  geom_text(aes(label = column), hjust = 1) +
   xlim(-0.75, 0.5) + ylim(-0.3, 1)

PCA Analysis

PC1 seems to be representative of rankings- the more positive the points are, the higher ranked they seem to be. There is no specific visually obtainable reason as to what PC2 represents, but it seems as if a lot of the lower ranked countries lean to the positive side of it, whereas higher ranks float between the two.

Social support, healthy life expectancy, and gdp per capita are all relevantly close and therefor correlated, while freedom to make choices branches off slightly and generosity follows suit, far more significant

#PCA Variance bar plot
pca_fit |>
  tidy(matrix = "eigenvalues") |>
  ggplot(aes(PC, percent)) + 
  geom_col(fill= "hotpink3") + 
  scale_x_continuous(breaks = 1:6) +
  scale_y_continuous(
    name = "variance explained",
    breaks = seq(0, 1, by = 0.1), 
    label = scales::label_percent(accuracy = 1)
  )+
  xlab("Principal Component (PC)") +
  theme_minimal()

#PCA Variance calculation
pca_fit |>
  tidy(matrix = "eigenvalues")
## # A tibble: 7 × 4
##      PC std.dev percent cumulative
##   <dbl>   <dbl>   <dbl>      <dbl>
## 1     1   2.13  0.646        0.646
## 2     2   1.09  0.169        0.815
## 3     3   0.748 0.0799       0.895
## 4     4   0.536 0.0411       0.936
## 5     5   0.528 0.0398       0.976
## 6     6   0.399 0.0227       0.999
## 7     7   0.100 0.00144      1

PCA Variation

About 81.5% of variance can be explained, as per the PC2 cumulative. PC 1 can express 64.56% alone, and PC 2 can do about 16.94%.