The world happiness report is a survey first published in 2012, up to 2019. I chose to cover the most recent 2019 happiness survey to consider. The rankings and data are from the Gallop World Poll. Scores are based on the Cantril Ladder, where the best possible life is a 10 and the worst possible life being 0. Interesting note: “Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom and least social support, it is referred to as “Dystopia,” in contrast to Utopia.”
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(treemap)
library(RColorBrewer)
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
read the data CSV file with the World Happiness Data Reports and assigned variable World happiness2019
setwd("/cloud/project/BaiData101Summer2022")
happiness2019 <- read_csv("happiness2019.csv")
## Rows: 156 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Country or region
## dbl (8): Overall rank, Score, GDP per capita, Social support, Healthy life e...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(happiness2019)
## Overall rank Country or region Score GDP per capita
## Min. : 1.00 Length:156 Min. :2.853 Min. :0.0000
## 1st Qu.: 39.75 Class :character 1st Qu.:4.545 1st Qu.:0.6028
## Median : 78.50 Mode :character Median :5.380 Median :0.9600
## Mean : 78.50 Mean :5.407 Mean :0.9051
## 3rd Qu.:117.25 3rd Qu.:6.184 3rd Qu.:1.2325
## Max. :156.00 Max. :7.769 Max. :1.6840
## Social support Healthy life expectancy Freedom to make life choices
## Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.056 1st Qu.:0.5477 1st Qu.:0.3080
## Median :1.272 Median :0.7890 Median :0.4170
## Mean :1.209 Mean :0.7252 Mean :0.3926
## 3rd Qu.:1.452 3rd Qu.:0.8818 3rd Qu.:0.5072
## Max. :1.624 Max. :1.1410 Max. :0.6310
## Generosity Perceptions of corruption
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.1087 1st Qu.:0.0470
## Median :0.1775 Median :0.0855
## Mean :0.1848 Mean :0.1106
## 3rd Qu.:0.2482 3rd Qu.:0.1412
## Max. :0.5660 Max. :0.4530
Displaying name of countries, region etc for each row in the dataset
#Declare names in dataset happiness2019
#For the category names, use underscore to create the top row title names
names(happiness2019) <- tolower(names(happiness2019))
names(happiness2019) <- gsub(" ","_",names(happiness2019))
head(happiness2019)
## # A tibble: 6 × 9
## overall_rank country_o…¹ score gdp_p…² socia…³ healt…⁴ freed…⁵ gener…⁶ perce…⁷
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 Finland 7.77 1.34 1.59 0.986 0.596 0.153 0.393
## 2 2 Denmark 7.6 1.38 1.57 0.996 0.592 0.252 0.41
## 3 3 Norway 7.55 1.49 1.58 1.03 0.603 0.271 0.341
## 4 4 Iceland 7.49 1.38 1.62 1.03 0.591 0.354 0.118
## 5 5 Netherlands 7.49 1.40 1.52 0.999 0.557 0.322 0.298
## 6 6 Switzerland 7.48 1.45 1.53 1.05 0.572 0.263 0.343
## # … with abbreviated variable names ¹country_or_region, ²gdp_per_capita,
## # ³social_support, ⁴healthy_life_expectancy, ⁵freedom_to_make_life_choices,
## # ⁶generosity, ⁷perceptions_of_corruption
names(happiness2019)
## [1] "overall_rank" "country_or_region"
## [3] "score" "gdp_per_capita"
## [5] "social_support" "healthy_life_expectancy"
## [7] "freedom_to_make_life_choices" "generosity"
## [9] "perceptions_of_corruption"
Detailed information about each of the predictors in Table 2.1 1. GDP per capita is in terms of Purchasing Power Parity (PPP) adjusted to constant 2011 international dollars, taken from the World Development Indicators (WDI) released by the World Bank on November 14, 2019. GDP data for 2018 are not yet available, so we extend the GDP time series from 2017 to 2018 using country-specific forecasts of real GDP growth from the OECD Economic Outlook No. 104 (Edition November 2018) and the World Bank’s Global Economic Prospects (Last Updated: 06/07/2019), after adjustment for population growth. The equation uses the natural log of GDP per capita, as this form fits the data significantly better than GDP per capita.
The time series of healthy life expectancy at birth are constructed based on data from the World Health Organization (WHO) Global Health Observatory data repository, with data available for 2005, 2010, 2015, and 2019. To match this report’s sample period, interpolation and extrapolation are used. See Statistical Appendix for more details below.
Social support is the national average of the binary responses (either 0 or 1) to the Gallup World Poll (GWP) question “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”
4.Freedom to make life choices is the national average of binary responses to the GWP question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”
6.Perceptions of corruption are the average of binary answers to two GWP questions: “Is corruption widespread throughout the government or not?” and “Is corruption widespread within businesses or not?” Where data for government corruption are missing, the perception of business corruption is used as the overall corruption-perception measure.
Positive affect is defined as the average of previous-day affect measures for happiness, laughter, and enjoyment for GWP waves 3-7 (years 2008 to 2012, and some in 2013). It is defined as the average of laughter and enjoyment for other waves where the happiness question was not asked. The general form for the affect questions is: Did you experience the following feelings during a lot of the day yesterday? See pp. 1-2 of Statistical Appendix 1 for more details.
Negative affect is defined as the average of previous-day affect measures for worry, sadness, and anger for all waves.
Source : worldhappiness.report
plot1 <- happiness2019 %>%
ggplot(aes(healthy_life_expectancy, score))+
geom_point()+
geom_smooth(method = 'lm') +
labs (x = "healthy_life_expectancy", y = "score", title = "Score vs. Healthy Life Expectancy")
plot1
## `geom_smooth()` using formula 'y ~ x'
This just a little summary score on how happier countries have a Healthy Life Expectancy. I created a general plot to compare the score to the level of healthy life expectancy before showing the tree map of the dataset in the world. Here, there is a general positive linear relationship which can further support that countries or regions with a healthy life expectancy, tend to have a higher happiness score.In the chart, score between 4.5 and above tend to have a good healthy Life expectancy
# Create treemap for happy 2019, use color index that displays the countries and regions in order of numerical ranking
treemap(happiness2019, index = "country_or_region", vSize = "healthy_life_expectancy", vColor = "score", type = "manual", palette = "RdYlBu")
time using R, I have really been interested in using a variety of visual displays. This displays the names of the Countries or Regions in the data set with a progression from the left being the “happiest” places to the “least-happiest” places on the right. I like how you can visually see the different countries and regions on this visual but some are a bit more difficult than others to read. What I would like to work on for a future representation is possible grouping the countries based on scale level, between 0-1, 1-2 and so on. Once these groups were created, I could use a treemap to show the size comparison of the highest scaled group to the lowest scaled group. Looking closely at the data, I thought it was interesting how the topped ranked places for having the highest happiness ratings were Finland, Denmark, Norway, Iceland, Netherlands, Switzerland, and Sweden. Then on the opposite end, Malawi, Yemen, Rwanda and Tanzania are at the lowest end. With this information, it would be interesting to know specific statistics on these places to determine why they are viewed as being happy or not. This could allow a researcher to look at the categories on a numerical scale to look at the population income, the regulations set by government, and other factors that could explain why these places are where they are in this dataset.
plot3 <- happiness2019 %>%
ggplot(aes(x = social_support, y= score ))+
geom_point()+
geom_smooth(method = 'lm') +
labs (x = "Social Support", y = "Score", title = "Score vs. Social Support")
plot3
## `geom_smooth()` using formula 'y ~ x'
fit <-lm(score~social_support, data = happiness2019)
summary(fit)
##
## Call:
## lm(formula = score ~ social_support, data = happiness2019)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.89465 -0.45762 -0.01993 0.54720 1.70721
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.9124 0.2349 8.14 1.25e-13 ***
## social_support 2.8910 0.1887 15.32 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7029 on 154 degrees of freedom
## Multiple R-squared: 0.6038, Adjusted R-squared: 0.6012
## F-statistic: 234.7 on 1 and 154 DF, p-value: < 2.2e-16
pred<-predict(fit)
ssds<-tibble(ss=happiness2019$social_support, sc= happiness2019$score, pred)
ssds
## # A tibble: 156 × 3
## ss sc pred
## <dbl> <dbl> <dbl>
## 1 1.59 7.77 6.50
## 2 1.57 7.6 6.46
## 3 1.58 7.55 6.49
## 4 1.62 7.49 6.61
## 5 1.52 7.49 6.31
## 6 1.53 7.48 6.32
## 7 1.49 7.34 6.21
## 8 1.56 7.31 6.41
## 9 1.50 7.28 6.26
## 10 1.48 7.25 6.18
## # … with 146 more rows
## # ℹ Use `print(n = ...)` to see more rows
ssds %>% ggplot() +
geom_point(aes(x = ss, y = sc)) +
geom_smooth(aes(x = ss, y = sc), method = 'lm', se = FALSE) +
ggtitle("score vs social support") +
geom_point(aes(x = ss, y = pred), shape = 1, size = 3, color = 'red', alpha = 0.5)
## `geom_smooth()` using formula 'y ~ x'
With this plot, I created another direct linear relationship between social support being higher, results in a higher score for different countries or regions.
plot4 <- happiness2019 %>%
ggplot(aes(score, freedom_to_make_life_choices))+
geom_point()+
labs (x = "Score", y = "Freedom to make life choices", title = "Score vs. Freedom")
plot4
Although this graph is a bit more all over the place with a few
outliars, I still see a general positively linear relationship for this
comparison between score and freedom to make life choices.
happiness2019 %>% ggplot() +
geom_point(aes(x = freedom_to_make_life_choices , y = score)) +
geom_smooth(aes(x = freedom_to_make_life_choices , y = score), method = 'lm', se = FALSE) +
ggtitle("score vs social support") +
geom_point(aes(x = freedom_to_make_life_choices, y = pred), shape = 1, size = 3, color = 'red', alpha = 0.5)
## `geom_smooth()` using formula 'y ~ x'
fit<- lm(score~freedom_to_make_life_choices, data = happiness2019)
pred = predict(fit)
resid = residuals(fit)
summary(fit)
##
## Call:
## lm(formula = score ~ freedom_to_make_life_choices, data = happiness2019)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.7882 -0.5838 0.0149 0.7029 1.8269
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6788 0.2155 17.075 < 2e-16 ***
## freedom_to_make_life_choices 4.4026 0.5158 8.536 1.24e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9201 on 154 degrees of freedom
## Multiple R-squared: 0.3212, Adjusted R-squared: 0.3168
## F-statistic: 72.87 on 1 and 154 DF, p-value: 1.238e-14
mean(resid)
## [1] 1.058574e-16
ggplot(,aes(x = happiness2019$freedom_to_make_life_choices, y = resid )) +
geom_point() +
geom_hline(aes(yintercept = 0), color = "red")
#Plot 5
plot5 <- happiness2019 %>%
ggplot(aes(score, generosity))+
geom_point()+
labs (x = "Score", y = "Generosity", title = "Score vs. Generosity")
plot5
This ggplot was extremely interesting to me because I figured generosity would have the same relationship to the score as the other factors did too. However, I definitely could not even consider this being a positive or even negative relationship. The data seems to support that generosity just does not truly play a factor in happiness or not. I found this quite interesting because I figured generosity towards others and the people around you would make the population happier. When the data showed this wasn’t the case, I was shocked and wanted to look further into the culture of different places to see maybe it is less likely for people to interact with others or vice versa.
happiest <- happiness2019 %>%
filter(score >= 7.0)
happiest [order(happiest$score),]
## # A tibble: 16 × 9
## overall_rank country_…¹ score gdp_p…² socia…³ healt…⁴ freed…⁵ gener…⁶ perce…⁷
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 16 Ireland 7.02 1.50 1.55 0.999 0.516 0.298 0.31
## 2 15 United Ki… 7.05 1.33 1.54 0.996 0.45 0.348 0.278
## 3 14 Luxembourg 7.09 1.61 1.48 1.01 0.526 0.194 0.316
## 4 13 Israel 7.14 1.28 1.46 1.03 0.371 0.261 0.082
## 5 12 Costa Rica 7.17 1.03 1.44 0.963 0.558 0.144 0.093
## 6 11 Australia 7.23 1.37 1.55 1.04 0.557 0.332 0.29
## 7 10 Austria 7.25 1.38 1.48 1.02 0.532 0.244 0.226
## 8 9 Canada 7.28 1.36 1.50 1.04 0.584 0.285 0.308
## 9 8 New Zeala… 7.31 1.30 1.56 1.03 0.585 0.33 0.38
## 10 7 Sweden 7.34 1.39 1.49 1.01 0.574 0.267 0.373
## 11 6 Switzerla… 7.48 1.45 1.53 1.05 0.572 0.263 0.343
## 12 5 Netherlan… 7.49 1.40 1.52 0.999 0.557 0.322 0.298
## 13 4 Iceland 7.49 1.38 1.62 1.03 0.591 0.354 0.118
## 14 3 Norway 7.55 1.49 1.58 1.03 0.603 0.271 0.341
## 15 2 Denmark 7.6 1.38 1.57 0.996 0.592 0.252 0.41
## 16 1 Finland 7.77 1.34 1.59 0.986 0.596 0.153 0.393
## # … with abbreviated variable names ¹country_or_region, ²gdp_per_capita,
## # ³social_support, ⁴healthy_life_expectancy, ⁵freedom_to_make_life_choices,
## # ⁶generosity, ⁷perceptions_of_corruption
library(ggplot2)
barplot(happiness2019$score, names.arg = happiness2019$country_or_region, las = 2, cex.names = 0.6, main = "Country or Region and Happiness Score")
I know this barplot is extremely overwhelming and I would definitely like to take more time to truly expand on my coding skills and be able to make this look more aesthetically appealing. However, I think this pairs nicely with the other general maps because you can visually see the countries and their names and then the place where they are on the score scale.
barplot(happiest$score, names.arg = happiest$healthy_life_expectancy, las = 2, xlab = "Life Expectancy", ylab = "Score", col = "lightblue", main = "Happiest Life Expectancy vs Score")
When I first saw “world happiness” as a dataset, I was extremely interested to see what it was about. Since we were on a bit of a time crunch, I went ahead and just decided to use it. The one thing I wish was listed on the details of this dataset was how they went about determining these statistics. More specifically, I’d love to know how the polling places went about asking questions to the populations. One side note, is I also would have loved to have seen data on the population they were asking the questions to. Was it primarily males or females, older or younger, how long has said person actually lived there. I think questions like this about the population could open up a new entirety of information that could be collected from this data set. Although a lot of the data had a positive relationship between score and social support, healthy life expectancy, and other factors, it was interesting to see which countries were considered the happiest. With Finland being the #1 “happiest country,” The data that has created this analysis is that their GDP is around 1.3, social support is among the highest at 1.587, healthy life expectancy is quite good at 0.986, freedom to make choices is also high at .596, generosity is generally low at .153 and perceptions of corruption is on the low average side at .393. Comparing these values to the lowest scored place, South Sudan, with a score of 2.853, GDP of .306, social support around 0.575, healthy life expectancy is 0.295, freedom to make choice is at .01, generosity is .202 and perception of corruption is .091. The general assumptions I was able to understand is that a lot of areas in Africa specifically, are not as happy due to not having a high rating of the variables that quantify a happier place. Also this is a much more advanced visual representation of this dataset that I was really intrigued by but need more time to practice and understanding my coding first. :) Website: https://web.stanford.edu/~kjytay/courses/stats32-aut2018/projects/world_happiness_analysis-1.html