.— title: “Happiness in 2019” author: “Haley Karp” date: “3/4/2021” output: html_document —

World Happiness Report

The world happiness report is a survey first published in 2012, up to 2019. I chose to cover the most recent 2019 happiness survey to consider. The rankings and data are from the Gallop World Poll. Scores are based on the Cantril Ladder, where the best possible life is a 10 and the worst possible life being 0. Interesting note: “Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom and least social support, it is referred to as “Dystopia,” in contrast to Utopia."

Load needed libraries

#Ensure all packages are installed before uploading libraries
library(treemap)
library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.0     v dplyr   1.0.4
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(RColorBrewer)
library(ggplot2)
library(tibble)

Load data file and print contents

Read the CSV file with the World Happiness Data Reports and assign to variable happy2019

setwd("C:/Users/Haley/Desktop/dataset")
happy2019 <- read_csv("2019.csv - Sheet1")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   `Overall rank` = col_double(),
##   `Country or region` = col_character(),
##   Score = col_double(),
##   `GDP per capita` = col_double(),
##   `Social support` = col_double(),
##   `Healthy life expectancy` = col_double(),
##   `Freedom to make life choices` = col_double(),
##   Generosity = col_double(),
##   `Perceptions of corruption` = col_double()
## )

Summary of Data

summary(happy2019)

##   Overall rank    Country or region      Score       GDP per capita  
##  Min.   :  1.00   Length:156         Min.   :2.853   Min.   :0.0000  
##  1st Qu.: 39.75   Class :character   1st Qu.:4.545   1st Qu.:0.6028  
##  Median : 78.50   Mode  :character   Median :5.380   Median :0.9600  
##  Mean   : 78.50                      Mean   :5.407   Mean   :0.9051  
##  3rd Qu.:117.25                      3rd Qu.:6.184   3rd Qu.:1.2325  
##  Max.   :156.00                      Max.   :7.769   Max.   :1.6840  
##  Social support  Healthy life expectancy Freedom to make life choices
##  Min.   :0.000   Min.   :0.0000          Min.   :0.0000              
##  1st Qu.:1.056   1st Qu.:0.5477          1st Qu.:0.3080              
##  Median :1.272   Median :0.7890          Median :0.4170              
##  Mean   :1.209   Mean   :0.7252          Mean   :0.3926              
##  3rd Qu.:1.452   3rd Qu.:0.8818          3rd Qu.:0.5072              
##  Max.   :1.624   Max.   :1.1410          Max.   :0.6310              
##    Generosity     Perceptions of corruption
##  Min.   :0.0000   Min.   :0.0000           
##  1st Qu.:0.1087   1st Qu.:0.0470           
##  Median :0.1775   Median :0.0855           
##  Mean   :0.1848   Mean   :0.1106           
##  3rd Qu.:0.2482   3rd Qu.:0.1412           
##  Max.   :0.5660   Max.   :0.4530

Initial Cleaning of variables

#Declare names in dataset happy2019
#For the category names, use underscore to create the top row title names
names(happy2019) <- tolower(names(happy2019))
names(happy2019) <- gsub(" ","_",names(happy2019))
head(happy2019)

## # A tibble: 6 x 9
##   overall_rank country_or_region score gdp_per_capita social_support
##          <dbl> <chr>             <dbl>          <dbl>          <dbl>
## 1            1 Finland            7.77           1.34           1.59
## 2            2 Denmark            7.6            1.38           1.57
## 3            3 Norway             7.55           1.49           1.58
## 4            4 Iceland            7.49           1.38           1.62
## 5            5 Netherlands        7.49           1.40           1.52
## 6            6 Switzerland        7.48           1.45           1.53
## # ... with 4 more variables: healthy_life_expectancy <dbl>,
## #   freedom_to_make_life_choices <dbl>, generosity <dbl>,
## #   perceptions_of_corruption <dbl>

#Display the names of each row in dataset
names(happy2019)

## [1] "overall_rank"                 "country_or_region"           
## [3] "score"                        "gdp_per_capita"              
## [5] "social_support"               "healthy_life_expectancy"     
## [7] "freedom_to_make_life_choices" "generosity"                  
## [9] "perceptions_of_corruption"

Create a treemap

# Create treemap for happy 2019, use color index that displays the countries and regions in order of numerical ranking
treemap(happy2019, index = "country_or_region", vSize = "score",  vColor = "overall_rank", type = "manual", palette = "RdYlBu")

Treemap: With this being my first time using R, I have really been interested in using a variety of visual displays. This displays the names of the Countries or Regions in the data set with a progression from the left being the “happiest” places to the “least-happiest” places on the right. I like how you can visually see the different countries and regions on this visual but some are a bit more difficult than others to read. What I would like to work on for a future representation is possible grouping the countries based on scale level, between 0-1, 1-2 and so on. Once these groups were created, I could use a treemap to show the size comparison of the highest scaled group to the lowest scaled group. Looking closely at the data, I thought it was interesting how the topped ranked places for having the highest happiness ratings were Finland, Denmark, Norway, Iceland, Netherlands, Switzerland, and Sweden. Then on the opposite end, Malawi, Yemen, Rwanda and Tanzania are at the lowest end. With this information, it would be interesting to know specific statistics on these places to determine why they are viewed as being happy or not. This could allow a researcher to look at the categories on a numerical scale to look at the population income, the regulations set by government, and other factors that could explain why these places are where they are in this dataset.

Format the dataframe as a matrix

#create datafram for heatmap
#declare variables being used from the happy2019 data 
happy2019 <- happy2019[order(happy2019$score),]
row.names(happy2019) <-happy2019$country_or_region

## Warning: Setting row names on a tibble is deprecated.

#There will be 8 columns displaying the entire dataset
happy2019<-happy2019[,2:8]
#create the happy_2019 data matrix
happy2019_matrix <- data.matrix(happy2019)

#?heatmap

Create a heatmap

happy2019_heatmap <- heatmap(happy2019_matrix, Rowv = NA, Colv= NA, col = cm.colors(243), scale = "column", margins = c(5,10),  ylab = "Country or Region", main= "Overall Happiness Ranking per Country or Region", cexCol = .5)

This is one of my favorite visuals to look at because it gives you an exact representation of where the country stands on all categories in comparison to the others. By using the score variable as the “baseline”, the score is the general identifying factor for the dataset. This way, if we know that blue is the “happiest”, we can see how generally there are more countries and regions that have a variety of mix. At the top, the least-happy, there is a lot of pink, which makes sense becuase these countries will lower scores are more likely to also have low social support, low healthy life expectancy, etc. Roughly, we can see how the countries that are between 1 to approximately 50 in the overall ranking placement, have generally blue columns that relate closely to the score scale. One important note that will be clear later on is that the generosity factor is a bit speratic and does not have really any direct relationships among countries and their scores.

plot1 <- happy2019 %>%
  ggplot(aes(score, healthy_life_expectancy))+
  geom_point()+
  labs (x = "Score", y = "Healthy Life Expectancy", title = "Score vs. Healthy Life Expectancy")
plot1

## PLOT 1 Looking further into this data, I created a general plot to compare the score to the level of healthy life expectancy. Here, there is a general positive linear relationship which can further support that countries or regions with a healthy life expectancy, tend to have a higher happiness score.

plot2 <- happy2019 %>%
  ggplot(aes(score, social_support))+
  geom_point()+
  labs (x = "Score", y = "Social Support", title = "Score vs. Social Support")
plot2

##PLOT 2 With this plot, we see another direct linear relationship between social support being higher, results in a higher score for different countries or regions.

plot3 <- happy2019 %>%
  ggplot(aes(score, freedom_to_make_life_choices))+
  geom_point()+
  labs (x = "Score", y = "Freedom to make life choices", title = "Score vs. Freedom")
plot3

## PLOT3 Although this graph is a bit more all over the place with a few outliars, I still see a general positively linear relationship for this comparison between score and freedom to make life choices.

plot4 <- happy2019 %>%
  ggplot(aes(score, generosity))+
  geom_point()+
  labs (x = "Score", y = "Generosity", title = "Score vs. Generosity")
plot4

#PLOT4 This ggplot was extremely interesting to me because I figured generosity would have the same relationship to the score as the other factors did too. However, I definitely could not even consider this being a positive or even negative relationship. The data seems to support that generosity just does not truly play a factor in happiness or not. I found this quite interesting because I figured generosity towards others and the people around you would make the population happier. When the data showed this wasn’t the case, I was shocked and wanted to look further into the culture of different places to see maybe it is less likely for people to interact with others or vice versa.

happiest <- happy2019 %>%
  filter(score >= 7.0)

happiest [order(happiest$score),]

## # A tibble: 16 x 7
##    country_or_region score gdp_per_capita social_support healthy_life_expectancy
##    <chr>             <dbl>          <dbl>          <dbl>                   <dbl>
##  1 Ireland            7.02           1.50           1.55                   0.999
##  2 United Kingdom     7.05           1.33           1.54                   0.996
##  3 Luxembourg         7.09           1.61           1.48                   1.01 
##  4 Israel             7.14           1.28           1.46                   1.03 
##  5 Costa Rica         7.17           1.03           1.44                   0.963
##  6 Australia          7.23           1.37           1.55                   1.04 
##  7 Austria            7.25           1.38           1.48                   1.02 
##  8 Canada             7.28           1.36           1.50                   1.04 
##  9 New Zealand        7.31           1.30           1.56                   1.03 
## 10 Sweden             7.34           1.39           1.49                   1.01 
## 11 Switzerland        7.48           1.45           1.53                   1.05 
## 12 Netherlands        7.49           1.40           1.52                   0.999
## 13 Iceland            7.49           1.38           1.62                   1.03 
## 14 Norway             7.55           1.49           1.58                   1.03 
## 15 Denmark            7.6            1.38           1.57                   0.996
## 16 Finland            7.77           1.34           1.59                   0.986
## # ... with 2 more variables: freedom_to_make_life_choices <dbl>,
## #   generosity <dbl>

row.names(happiest) <-happiest$country_or_region

## Warning: Setting row names on a tibble is deprecated.

happiest<-happiest[,2:6]
happiest_matrix <- data.matrix(happiest)

happiest_heatmap <- heatmap(happiest_matrix, Rowv = NA, Colv= NA, col = cm.colors(243), scale = "column", margins = c(5,10),  ylab = "Country or Region", main= "Overall Happiness Ranking per Country or Region", cexCol = .5)

Heatmap 2

My goal for this heat map was to show the top scoring happy countries and regions. All places that had a score that was greater than or equal to 7, are placed in this map. What suprised me is how they are levels of pink; however, it is important to note that this heat map is designed based on the “happiest” of the countries and regions in the dataset.

nothappiest <- happy2019 %>%
  filter(score <= 4.0)

nothappiest [order(nothappiest$score),]

## # A tibble: 16 x 7
##    country_or_region     score gdp_per_capita social_support healthy_life_expec~
##    <chr>                 <dbl>          <dbl>          <dbl>               <dbl>
##  1 South Sudan            2.85          0.306          0.575               0.295
##  2 Central African Repu~  3.08          0.026          0                   0.105
##  3 Afghanistan            3.20          0.35           0.517               0.361
##  4 Tanzania               3.23          0.476          0.885               0.499
##  5 Rwanda                 3.33          0.359          0.711               0.614
##  6 Yemen                  3.38          0.287          1.16                0.463
##  7 Malawi                 3.41          0.191          0.56                0.495
##  8 Syria                  3.46          0.619          0.378               0.44 
##  9 Botswana               3.49          1.04           1.14                0.538
## 10 Haiti                  3.60          0.323          0.688               0.449
## 11 Zimbabwe               3.66          0.366          1.11                0.433
## 12 Burundi                3.78          0.046          0.447               0.38 
## 13 Lesotho                3.80          0.489          1.17                0.168
## 14 Madagascar             3.93          0.274          0.916               0.555
## 15 Comoros                3.97          0.274          0.757               0.505
## 16 Liberia                3.98          0.073          0.922               0.443
## # ... with 2 more variables: freedom_to_make_life_choices <dbl>,
## #   generosity <dbl>

row.names(nothappiest) <-nothappiest$country_or_region

## Warning: Setting row names on a tibble is deprecated.

nothappiest<-nothappiest[,2:6]
nothappiest_matrix <- data.matrix(nothappiest)

nothappiest_heatmap <- heatmap(nothappiest_matrix, Rowv = NA, Colv= NA, col = cm.colors(150), scale = "column", margins = c(5,10),  ylab = "Country or Region", main= "Overall Happiness Ranking per Country or Region", cexCol = .5)

## Heatmap 3 In this next visual, I tried to isolate the least happy countries and display the comparison of the different categories in relation to the score in which their happiness level is where 4.0 is the highest. Considerably low compared to the 7.0 and higher that the highest groups have.

barplot(happy2019$score, names.arg = happy2019$country_or_region, las = 2, cex.names = 0.6, main = "Country or Region and Happiness Score")

I know this barplot is extremely overwhelming and I would definitely like to take more time to truly expand on my coding skills and be able to make this look more aesthetically appealing. However, I think this pairs nicely with the other general maps because you can visually see the countries and their names and then the place where they are on the score scale.

barplot(happiest$score, names.arg = happiest$social_support, las = 2, xlab = "Social Support", ylab = "Score", col = "lightblue", main = "Happiest vs Social Support")

barplot(nothappiest$score, names.arg = nothappiest$social_support, las = 2, xlab = "Social Support", ylab = "Score", col = "lightpink", main = "Not Happiest vs Social Support")

barplot(happiest$score, names.arg = happiest$healthy_life_expectancy, las = 2, xlab = "Life Expectancy", ylab = "Score", col = "lightblue", main = "Happiest Life Expectancy vs Score")

barplot(nothappiest$score, names.arg = nothappiest$healthy_life_expectancy, las = 2, xlab = "Healhy Life Expectancy", ylab = "Score", col = "lightpink", main = "Not Happiest Life Expectancy vs Score")

Summary

When I first saw “world happiness” as a dataset, I was extremely interested to see what it was about. Since we were on a bit of a time crunch, I went ahead and just decided to use it. The one thing I wish was listed on the details of this dataset was how they went about determining these statistics. More specifically, I’d love to know how the polling places went about asking questions to the populations. One side note, is I also would have loved to have seen data on the population they were asking the questions to. Was it primarily males or females, older or younger, how long has said person actually lived there. I think questions like this about the population could open up a new entirety of information that could be collected from this data set. Although a lot of the data had a positive relationship between score and social support, healthy life expectancy, and other factors, it was interesting to see which countries were considered the happiest. With Finland being the #1 “happiest country,” The data that has created this analysis is that their GDP is around 1.3, social support is among the highest at 1.587, healthy life expectancy is quite good at 0.986, freedom to make choices is also high at .596, generosity is generally low at .153 and perceptions of corruption is on the low average side at .393. Comparing these values to the lowest scored place, South Sudan, with a score of 2.853, GDP of .306, social support around 0.575, healthy life expectancy is 0.295, freedom to make choice is at .01, generosity is .202 and perception of corruption is .091. The general assumptions I was able to understand is that a lot of areas in Africa specifically, are not as happy due to not having a high rating of the variables that quantify a happier place. Also this is a much more advanced visual representation of this dataset that I was really intrigued by but need more time to practice and understanding my coding first. :) Website: https://web.stanford.edu/~kjytay/courses/stats32-aut2018/projects/world_happiness_analysis-1.html