World Happiness Report

The world happiness report is a survey first published in 2012, up to 2019. I chose to cover the most recent 2019 happiness survey to consider. The rankings and data are from the Gallop World Poll. Scores are based on the Cantril Ladder, where the best possible life is a 10 and the worst possible life being 0. Interesting note: “Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom and least social support, it is referred to as “Dystopia,” in contrast to Utopia.”

Load Needed Library

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(treemap)
library(RColorBrewer)
library(highcharter)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo 
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use

Load Data File and Display Contents

read the data CSV file with the World Happiness Data Reports and assigned variable World happiness2019

setwd("/cloud/project/BaiData101Summer2022")
happiness2019 <- read_csv("happiness2019.csv")

## Rows: 156 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Country or region
## dbl (8): Overall rank, Score, GDP per capita, Social support, Healthy life e...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

This is a summary Dataset of world Happiness 2019

summary(happiness2019)

##   Overall rank    Country or region      Score       GDP per capita  
##  Min.   :  1.00   Length:156         Min.   :2.853   Min.   :0.0000  
##  1st Qu.: 39.75   Class :character   1st Qu.:4.545   1st Qu.:0.6028  
##  Median : 78.50   Mode  :character   Median :5.380   Median :0.9600  
##  Mean   : 78.50                      Mean   :5.407   Mean   :0.9051  
##  3rd Qu.:117.25                      3rd Qu.:6.184   3rd Qu.:1.2325  
##  Max.   :156.00                      Max.   :7.769   Max.   :1.6840  
##  Social support  Healthy life expectancy Freedom to make life choices
##  Min.   :0.000   Min.   :0.0000          Min.   :0.0000              
##  1st Qu.:1.056   1st Qu.:0.5477          1st Qu.:0.3080              
##  Median :1.272   Median :0.7890          Median :0.4170              
##  Mean   :1.209   Mean   :0.7252          Mean   :0.3926              
##  3rd Qu.:1.452   3rd Qu.:0.8818          3rd Qu.:0.5072              
##  Max.   :1.624   Max.   :1.1410          Max.   :0.6310              
##    Generosity     Perceptions of corruption
##  Min.   :0.0000   Min.   :0.0000           
##  1st Qu.:0.1087   1st Qu.:0.0470           
##  Median :0.1775   Median :0.0855           
##  Mean   :0.1848   Mean   :0.1106           
##  3rd Qu.:0.2482   3rd Qu.:0.1412           
##  Max.   :0.5660   Max.   :0.4530

Now doing initial cleaning of variable

Displaying name of countries, region etc for each row in the dataset

#Declare names in dataset happiness2019
#For the category names, use underscore to create the top row title names
names(happiness2019) <- tolower(names(happiness2019))
names(happiness2019) <- gsub(" ","_",names(happiness2019))
head(happiness2019)

## # A tibble: 6 × 9
##   overall_rank country_o…¹ score gdp_p…² socia…³ healt…⁴ freed…⁵ gener…⁶ perce…⁷
##          <dbl> <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1            1 Finland      7.77    1.34    1.59   0.986   0.596   0.153   0.393
## 2            2 Denmark      7.6     1.38    1.57   0.996   0.592   0.252   0.41 
## 3            3 Norway       7.55    1.49    1.58   1.03    0.603   0.271   0.341
## 4            4 Iceland      7.49    1.38    1.62   1.03    0.591   0.354   0.118
## 5            5 Netherlands  7.49    1.40    1.52   0.999   0.557   0.322   0.298
## 6            6 Switzerland  7.48    1.45    1.53   1.05    0.572   0.263   0.343
## # … with abbreviated variable names ¹country_or_region, ²gdp_per_capita,
## #   ³social_support, ⁴healthy_life_expectancy, ⁵freedom_to_make_life_choices,
## #   ⁶generosity, ⁷perceptions_of_corruption

Display the names of each row in dataset

names(happiness2019)

## [1] "overall_rank"                 "country_or_region"           
## [3] "score"                        "gdp_per_capita"              
## [5] "social_support"               "healthy_life_expectancy"     
## [7] "freedom_to_make_life_choices" "generosity"                  
## [9] "perceptions_of_corruption"

Defination of Terms

Detailed information about each of the predictors in Table 2.1 1. GDP per capita is in terms of Purchasing Power Parity (PPP) adjusted to constant 2011 international dollars, taken from the World Development Indicators (WDI) released by the World Bank on November 14, 2019. GDP data for 2018 are not yet available, so we extend the GDP time series from 2017 to 2018 using country-specific forecasts of real GDP growth from the OECD Economic Outlook No. 104 (Edition November 2018) and the World Bank’s Global Economic Prospects (Last Updated: 06/07/2019), after adjustment for population growth. The equation uses the natural log of GDP per capita, as this form fits the data significantly better than GDP per capita.

The time series of healthy life expectancy at birth are constructed based on data from the World Health Organization (WHO) Global Health Observatory data repository, with data available for 2005, 2010, 2015, and 2019. To match this report’s sample period, interpolation and extrapolation are used. See Statistical Appendix for more details below.
Social support is the national average of the binary responses (either 0 or 1) to the Gallup World Poll (GWP) question “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”

4.Freedom to make life choices is the national average of binary responses to the GWP question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”

Generosity is the residual of regressing the national average of GWP responses to the question “Have you donated money to a charity in the past month?” on GDP per capita.

6.Perceptions of corruption are the average of binary answers to two GWP questions: “Is corruption widespread throughout the government or not?” and “Is corruption widespread within businesses or not?” Where data for government corruption are missing, the perception of business corruption is used as the overall corruption-perception measure.

Positive affect is defined as the average of previous-day affect measures for happiness, laughter, and enjoyment for GWP waves 3-7 (years 2008 to 2012, and some in 2013). It is defined as the average of laughter and enjoyment for other waves where the happiness question was not asked. The general form for the affect questions is: Did you experience the following feelings during a lot of the day yesterday? See pp. 1-2 of Statistical Appendix 1 for more details.
Negative affect is defined as the average of previous-day affect measures for worry, sadness, and anger for all waves.

Source : worldhappiness.report

Creating a sctterplot showing Linear Relationship

Plot 1

plot1 <- happiness2019 %>%
  ggplot(aes(healthy_life_expectancy, score))+
  geom_point()+ 
  geom_smooth(method = 'lm') +
  labs (x = "healthy_life_expectancy", y =  "score",  title = "Score vs. Healthy Life Expectancy") 
plot1

## `geom_smooth()` using formula 'y ~ x'

Summary of the above visualization

This just a little summary score on how happier countries have a Healthy Life Expectancy. I created a general plot to compare the score to the level of healthy life expectancy before showing the tree map of the dataset in the world. Here, there is a general positive linear relationship which can further support that countries or regions with a healthy life expectancy, tend to have a higher happiness score.In the chart, score between 4.5 and above tend to have a good healthy Life expectancy

Creating a treemap

Plot 2

# Create treemap for happy 2019, use color index that displays the countries and regions in order of numerical ranking
treemap(happiness2019, index = "country_or_region", vSize = "healthy_life_expectancy",  vColor = "score", type = "manual", palette = "RdYlBu")

time using R, I have really been interested in using a variety of visual displays. This displays the names of the Countries or Regions in the data set with a progression from the left being the “happiest” places to the “least-happiest” places on the right. I like how you can visually see the different countries and regions on this visual but some are a bit more difficult than others to read. What I would like to work on for a future representation is possible grouping the countries based on scale level, between 0-1, 1-2 and so on. Once these groups were created, I could use a treemap to show the size comparison of the highest scaled group to the lowest scaled group. Looking closely at the data, I thought it was interesting how the topped ranked places for having the highest happiness ratings were Finland, Denmark, Norway, Iceland, Netherlands, Switzerland, and Sweden. Then on the opposite end, Malawi, Yemen, Rwanda and Tanzania are at the lowest end. With this information, it would be interesting to know specific statistics on these places to determine why they are viewed as being happy or not. This could allow a researcher to look at the categories on a numerical scale to look at the population income, the regulations set by government, and other factors that could explain why these places are where they are in this dataset.

Plot 3

plot3 <- happiness2019 %>%
  ggplot(aes(x = social_support, y= score ))+
  geom_point()+ 
    geom_smooth(method = 'lm') +
  labs (x = "Social Support", y = "Score", title = "Score vs. Social Support")
plot3

## `geom_smooth()` using formula 'y ~ x'

fit <-lm(score~social_support, data = happiness2019)
summary(fit)

## 
## Call:
## lm(formula = score ~ social_support, data = happiness2019)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.89465 -0.45762 -0.01993  0.54720  1.70721 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      1.9124     0.2349    8.14 1.25e-13 ***
## social_support   2.8910     0.1887   15.32  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7029 on 154 degrees of freedom
## Multiple R-squared:  0.6038, Adjusted R-squared:  0.6012 
## F-statistic: 234.7 on 1 and 154 DF,  p-value: < 2.2e-16

pred<-predict(fit)
ssds<-tibble(ss=happiness2019$social_support, sc= happiness2019$score, pred)
ssds

## # A tibble: 156 × 3
##       ss    sc  pred
##    <dbl> <dbl> <dbl>
##  1  1.59  7.77  6.50
##  2  1.57  7.6   6.46
##  3  1.58  7.55  6.49
##  4  1.62  7.49  6.61
##  5  1.52  7.49  6.31
##  6  1.53  7.48  6.32
##  7  1.49  7.34  6.21
##  8  1.56  7.31  6.41
##  9  1.50  7.28  6.26
## 10  1.48  7.25  6.18
## # … with 146 more rows
## # ℹ Use `print(n = ...)` to see more rows

ssds %>% ggplot() + 
geom_point(aes(x = ss, y = sc)) + 
geom_smooth(aes(x = ss, y = sc), method = 'lm', se = FALSE) +
ggtitle("score vs social support") +
geom_point(aes(x = ss, y = pred), shape = 1, size = 3, color = 'red', alpha = 0.5)

## `geom_smooth()` using formula 'y ~ x'

With this plot, I created another direct linear relationship between social support being higher, results in a higher score for different countries or regions.

Plot 4

plot4 <- happiness2019 %>%
  ggplot(aes(score, freedom_to_make_life_choices))+
  geom_point()+
  labs (x = "Score", y = "Freedom to make life choices", title = "Score vs. Freedom")
plot4

Although this graph is a bit more all over the place with a few outliars, I still see a general positively linear relationship for this comparison between score and freedom to make life choices.

Linear regression and predictive graph

happiness2019 %>% ggplot() + 
geom_point(aes(x = freedom_to_make_life_choices , y = score)) + 
geom_smooth(aes(x = freedom_to_make_life_choices , y = score), method = 'lm', se = FALSE) +
ggtitle("score vs social support") +
geom_point(aes(x = freedom_to_make_life_choices, y = pred), shape = 1, size = 3, color = 'red', alpha = 0.5)

## `geom_smooth()` using formula 'y ~ x'

fit<- lm(score~freedom_to_make_life_choices, data = happiness2019)
pred = predict(fit)
resid = residuals(fit)
summary(fit)

## 
## Call:
## lm(formula = score ~ freedom_to_make_life_choices, data = happiness2019)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7882 -0.5838  0.0149  0.7029  1.8269 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    3.6788     0.2155  17.075  < 2e-16 ***
## freedom_to_make_life_choices   4.4026     0.5158   8.536 1.24e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9201 on 154 degrees of freedom
## Multiple R-squared:  0.3212, Adjusted R-squared:  0.3168 
## F-statistic: 72.87 on 1 and 154 DF,  p-value: 1.238e-14

mean(resid)

## [1] 1.058574e-16

ggplot(,aes(x = happiness2019$freedom_to_make_life_choices, y = resid )) +
geom_point() + 
geom_hline(aes(yintercept = 0), color = "red")

#Plot 5

plot5 <- happiness2019 %>%
  ggplot(aes(score, generosity))+
  geom_point()+
  labs (x = "Score", y = "Generosity", title = "Score vs. Generosity")
plot5

This ggplot was extremely interesting to me because I figured generosity would have the same relationship to the score as the other factors did too. However, I definitely could not even consider this being a positive or even negative relationship. The data seems to support that generosity just does not truly play a factor in happiness or not. I found this quite interesting because I figured generosity towards others and the people around you would make the population happier. When the data showed this wasn’t the case, I was shocked and wanted to look further into the culture of different places to see maybe it is less likely for people to interact with others or vice versa.

Filter scores greater than 7

happiest <- happiness2019 %>%
  filter(score >= 7.0)

happiest [order(happiest$score),]

## # A tibble: 16 × 9
##    overall_rank country_…¹ score gdp_p…² socia…³ healt…⁴ freed…⁵ gener…⁶ perce…⁷
##           <dbl> <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1           16 Ireland     7.02    1.50    1.55   0.999   0.516   0.298   0.31 
##  2           15 United Ki…  7.05    1.33    1.54   0.996   0.45    0.348   0.278
##  3           14 Luxembourg  7.09    1.61    1.48   1.01    0.526   0.194   0.316
##  4           13 Israel      7.14    1.28    1.46   1.03    0.371   0.261   0.082
##  5           12 Costa Rica  7.17    1.03    1.44   0.963   0.558   0.144   0.093
##  6           11 Australia   7.23    1.37    1.55   1.04    0.557   0.332   0.29 
##  7           10 Austria     7.25    1.38    1.48   1.02    0.532   0.244   0.226
##  8            9 Canada      7.28    1.36    1.50   1.04    0.584   0.285   0.308
##  9            8 New Zeala…  7.31    1.30    1.56   1.03    0.585   0.33    0.38 
## 10            7 Sweden      7.34    1.39    1.49   1.01    0.574   0.267   0.373
## 11            6 Switzerla…  7.48    1.45    1.53   1.05    0.572   0.263   0.343
## 12            5 Netherlan…  7.49    1.40    1.52   0.999   0.557   0.322   0.298
## 13            4 Iceland     7.49    1.38    1.62   1.03    0.591   0.354   0.118
## 14            3 Norway      7.55    1.49    1.58   1.03    0.603   0.271   0.341
## 15            2 Denmark     7.6     1.38    1.57   0.996   0.592   0.252   0.41 
## 16            1 Finland     7.77    1.34    1.59   0.986   0.596   0.153   0.393
## # … with abbreviated variable names ¹country_or_region, ²gdp_per_capita,
## #   ³social_support, ⁴healthy_life_expectancy, ⁵freedom_to_make_life_choices,
## #   ⁶generosity, ⁷perceptions_of_corruption

Bar Plot

library(ggplot2)

barplot(happiness2019$score, names.arg = happiness2019$country_or_region, las = 2, cex.names = 0.6, main = "Country or Region and Happiness Score")

I know this barplot is extremely overwhelming and I would definitely like to take more time to truly expand on my coding skills and be able to make this look more aesthetically appealing. However, I think this pairs nicely with the other general maps because you can visually see the countries and their names and then the place where they are on the score scale.

Barplot showing Happiest vs Social support

barplot(happiest$score, names.arg = happiest$social_support, las = 2, xlab = "Social Support", ylab = "Score", col = "lightblue", main = "Happiest vs Social Support")

Happier vs life expectancy

barplot(happiest$score, names.arg = happiest$healthy_life_expectancy, las = 2, xlab = "Life Expectancy", ylab = "Score", col = "lightblue", main = "Happiest Life Expectancy vs Score")

Overall,

When I first saw “world happiness” as a dataset, I was extremely interested to see what it was about. Since we were on a bit of a time crunch, I went ahead and just decided to use it. The one thing I wish was listed on the details of this dataset was how they went about determining these statistics. More specifically, I’d love to know how the polling places went about asking questions to the populations. One side note, is I also would have loved to have seen data on the population they were asking the questions to. Was it primarily males or females, older or younger, how long has said person actually lived there. I think questions like this about the population could open up a new entirety of information that could be collected from this data set. Although a lot of the data had a positive relationship between score and social support, healthy life expectancy, and other factors, it was interesting to see which countries were considered the happiest. With Finland being the #1 “happiest country,” The data that has created this analysis is that their GDP is around 1.3, social support is among the highest at 1.587, healthy life expectancy is quite good at 0.986, freedom to make choices is also high at .596, generosity is generally low at .153 and perceptions of corruption is on the low average side at .393. Comparing these values to the lowest scored place, South Sudan, with a score of 2.853, GDP of .306, social support around 0.575, healthy life expectancy is 0.295, freedom to make choice is at .01, generosity is .202 and perception of corruption is .091. The general assumptions I was able to understand is that a lot of areas in Africa specifically, are not as happy due to not having a high rating of the variables that quantify a happier place. Also this is a much more advanced visual representation of this dataset that I was really intrigued by but need more time to practice and understanding my coding first. :) Website: https://web.stanford.edu/~kjytay/courses/stats32-aut2018/projects/world_happiness_analysis-1.html

Final Project DATA 101

Bai Sesay

2022-08-18