I. Introduction

National happiness is often used as a metric to gauge a population’s satisfaction with quality of life and as a measurement of the effectiveness of various public policies. Reported happiness can serve as a more holistic measure of a population’s life experience compared to more field specific measures such as life expectancy and disposable income. As a subjective measure, based on respondents’ perception of their own happiness, happyiness is a measure largely dependent on respondents’ perception of factors they deem most relevant to their sense of happiness. One of these determining factors captured by the 2019 World Happiness Report is respondents’ perception of the prevalence of corruption in their country.

This analysis will explore the relationship between corruption perception and national happiness by answering the following questions:

What does the relationship between corruption perception and national happiness look like?
How is this relationship affected by variables such as a country’s GDP (gross domestic product) and national health?
How significant is the effect of corruption perception on national happiness compared to other factors ?
How do changes in other factors impact the effect of corruption perception on national happiness?

II. Data

Dataset

2019.csv contains data ranking 155 countries by happiness from the 2019 World Happiness Report. Respondents rated 6 components of happiness (GDP, Social.support, Healthy.life.expectancy, Freedom.to.make.life.choices, Generosity, and Perceptions.of.corruption) on a scale of 0(worst) to 10(best).These responses are captured in the variable “happyness,” which measures overall reported happiness on a 0 to 10 scale. Other variables are shown on different scales based on their contribution to overall happiness scores.

The dataset contains 156 rows and 9 variables. Variable scores are solely based on the mean of respondent provided responses by country with the exception of the GDP and Healthy.life.expectancy variables. These are composite scores which derive from respondent responses as well as a country’s performance based on Purchasing Power Parity and World Health Organization data on respectively, scaled to that variable’s impact on the overall happyness score.

Source: Sustainable Development Solutions Network (2022, June 27). World Happiness Report. Retrieved from https://www.kaggle.com/datasets/unsdsn/world-happiness?select=2019.csv

Import & Cleaning

Ensure necessary packages are present.

library(readr)
library(tidyverse)
library(dplyr)
library(hexbin)
library(distill)

Set working directory and import downloaded dataset.

setwd("/Users/kendocekal/Documents/DACSS601/DACSS601Prj")

happy <- read.csv("/Users/kendocekal/Documents/DACSS601/DACSS601Prj/2019.csv")

Review dataset.

view(happy)

as_tibble(happy)

# A tibble: 156 × 9
   Overall.rank Country.or.region Score GDP.per.capita Social.support
          <int> <chr>             <dbl>          <dbl>          <dbl>
 1            1 Finland            7.77           1.34           1.59
 2            2 Denmark            7.6            1.38           1.57
 3            3 Norway             7.55           1.49           1.58
 4            4 Iceland            7.49           1.38           1.62
 5            5 Netherlands        7.49           1.40           1.52
 6            6 Switzerland        7.48           1.45           1.53
 7            7 Sweden             7.34           1.39           1.49
 8            8 New Zealand        7.31           1.30           1.56
 9            9 Canada             7.28           1.36           1.50
10           10 Austria            7.25           1.38           1.48
# … with 146 more rows, and 4 more variables:
#   Healthy.life.expectancy <dbl>,
#   Freedom.to.make.life.choices <dbl>, Generosity <dbl>,
#   Perceptions.of.corruption <dbl>

head(happy)

  Overall.rank Country.or.region Score GDP.per.capita Social.support
1            1           Finland 7.769          1.340          1.587
2            2           Denmark 7.600          1.383          1.573
3            3            Norway 7.554          1.488          1.582
4            4           Iceland 7.494          1.380          1.624
5            5       Netherlands 7.488          1.396          1.522
6            6       Switzerland 7.480          1.452          1.526
  Healthy.life.expectancy Freedom.to.make.life.choices Generosity
1                   0.986                        0.596      0.153
2                   0.996                        0.592      0.252
3                   1.028                        0.603      0.271
4                   1.026                        0.591      0.354
5                   0.999                        0.557      0.322
6                   1.052                        0.572      0.263
  Perceptions.of.corruption
1                     0.393
2                     0.410
3                     0.341
4                     0.118
5                     0.298
6                     0.343

dim(happy)

[1] 156   9

colnames(happy)

[1] "Overall.rank"                 "Country.or.region"           
[3] "Score"                        "GDP.per.capita"              
[5] "Social.support"               "Healthy.life.expectancy"     
[7] "Freedom.to.make.life.choices" "Generosity"                  
[9] "Perceptions.of.corruption"

Rename variables for clarity.

Score to happyness

happy <- rename(happy, "happyness" = Score)

GDP.per.capita to GDP

happy <- rename(happy, "GDP" = GDP.per.capita)

Perceptions.of.corruption to corrupt

happy <- rename(happy, "corrupt" = Perceptions.of.corruption)

Healthy.life.expectancy to health

happy <- rename(happy, "health" = Healthy.life.expectancy)

Confirm name changes

colnames(happy)

[1] "Overall.rank"                 "Country.or.region"           
[3] "happyness"                    "GDP"                         
[5] "Social.support"               "health"                      
[7] "Freedom.to.make.life.choices" "Generosity"                  
[9] "corrupt"

We focus on interested variables by limiting results to happiness score and corruption perception for the 20 happiest countries as well as GDP (gross domestic product, a measure of national income) per capita and healthy life expectancy variables which are selected to capture objective divergence in national resources and conditions.

select(happy, happyness, GDP, corrupt, health)%>% 
  arrange(happy, desc(happyness))%>%
  slice(1:20)

   happyness   GDP corrupt health
1      7.769 1.340   0.393  0.986
2      7.600 1.383   0.410  0.996
3      7.554 1.488   0.341  1.028
4      7.494 1.380   0.118  1.026
5      7.488 1.396   0.298  0.999
6      7.480 1.452   0.343  1.052
7      7.343 1.387   0.373  1.009
8      7.307 1.303   0.380  1.026
9      7.278 1.365   0.308  1.039
10     7.246 1.376   0.226  1.016
11     7.228 1.372   0.290  1.036
12     7.167 1.034   0.093  0.963
13     7.139 1.276   0.082  1.029
14     7.090 1.609   0.316  1.012
15     7.054 1.333   0.278  0.996
16     7.021 1.499   0.310  0.999
17     6.985 1.373   0.265  0.987
18     6.923 1.356   0.210  0.986
19     6.892 1.433   0.128  0.874
20     6.852 1.269   0.036  0.920

III. Visualization

Mean, Median, and Standard Deviation

Here we review variable characteristics by looking at measures of central tendency - mean and median, and spread - standard deviation.

Happyness

summarise(happy, mean(happyness, na.rm = TRUE), median(happyness, na.rm = TRUE), sd(happyness, na.rm = TRUE))

  mean(happyness, na.rm = TRUE) median(happyness, na.rm = TRUE)
1                      5.407096                          5.3795
  sd(happyness, na.rm = TRUE)
1                     1.11312

GDP

summarise(happy, mean(GDP, na.rm = TRUE), median(GDP, na.rm = TRUE), sd(GDP, na.rm = TRUE))

  mean(GDP, na.rm = TRUE) median(GDP, na.rm = TRUE)
1               0.9051474                      0.96
  sd(GDP, na.rm = TRUE)
1             0.3983895

Corrupt

summarise(happy, mean(corrupt, na.rm = TRUE), median(corrupt, na.rm = TRUE), sd(corrupt, na.rm = TRUE))

  mean(corrupt, na.rm = TRUE) median(corrupt, na.rm = TRUE)
1                   0.1106026                        0.0855
  sd(corrupt, na.rm = TRUE)
1                0.09453784

Health

summarise(happy, mean(health, na.rm = TRUE), median(health, na.rm = TRUE), sd(health, na.rm = TRUE))

  mean(health, na.rm = TRUE) median(health, na.rm = TRUE)
1                  0.7252436                        0.789
  sd(health, na.rm = TRUE)
1                 0.242124

Variable Distribution

Looking at variable distribution shows us which each variable looks like for all countries as well as the concentration of scores.

Happiness Scores shows a relatively standard distribution with two peaks near the center.

ggplot(data = happy) +
  geom_histogram(mapping = aes(x = happyness), binwidth = 0.5)+
    theme_bw()+
  ggtitle("Happyness Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

GDP per capita shows a standard distribution with some skewing. More countries are in the upper half of scores but there is also a significant grouping of middle-lower income countries.

ggplot(data = happy) +
  geom_histogram(mapping = aes(x = GDP), binwidth = 0.1)+
    theme_bw()+
  ggtitle("GDP Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

Perception of Corruption shows significant concentration in the lower end of the bar graph but overall distribution is low.

ggplot(data = happy) +
  geom_histogram(mapping = aes(x = corrupt), binwidth = 0.05)+
    theme_bw()+
  ggtitle("Corrupt Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

Health shows skewing towards the upper end of the spectrum shown but overall distribution is still low.

ggplot(data = happy) +
  geom_histogram(mapping = aes(x = health), binwidth = 0.05)+
    theme_bw()+
  ggtitle("Health Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

Happyness and Corrupt

Here we look at the distribution of country happiness by reported corruption score. There is a overall positive linear trend but most scores show low corruption perception with varying happiness while all high corrupt scorers also score high on happyness.

happy %>% 
                  group_by(happyness, GDP, corrupt) %>% 
                  summarise(
                    first = min(happyness),
                    last = max(happyness)
                  ) %>%
    ggplot(mapping = aes(x = corrupt, y = happyness)) +
    geom_point() + 
    geom_smooth(method="lm", se = FALSE)+
    theme_bw()+
  ggtitle("Positive Linear Trend")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

Comparing corruption perception and happiness scores reveal similar ranges of perception among most low and middle happiness countries (with some extreme outliers) but substantially increased perception amongst higher happiness countries

ggplot(data = happy) +
  geom_hex(mapping = aes(x = corrupt, y = happyness))+
    theme_bw()+
  ggtitle("Concentration Around Low Corrupt Scores")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

Happyness and GDP

A comparison of happiness score and GDP through a scatter plot shows a very linear positive relationship.

ggplot(data = happy) +
  geom_hex(mapping = aes(x = GDP, y = happyness))+
    theme_bw()+
   ggtitle("Strongly Linear Positive Trend")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

Grouping by happiness and corresponding GDP with a boxplot shows skewness and outliers with middle happiness scores more skewed while high scores show much less variation in GDP although with some outliers.

ggplot(data = happy, mapping = aes(x = GDP, y = happyness)) + 
  geom_boxplot(mapping = aes(group = cut_width(happyness, 1)))+
    theme_bw()+
   ggtitle("Greater Spread for Mid GDP Scores")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

Happyness and Health

Looking at the distribution of country happiness and health shows strong positive correlation between national health levels and reported happiness.

happy %>% 
                  group_by(happyness, health, corrupt) %>% 
                  summarise(
                    first = min(happyness),
                    last = max(happyness)
                  ) %>%
    ggplot(mapping = aes(x = health, y = happyness)) +
    geom_point() + 
    geom_smooth(method="lm", se = FALSE)+
    theme_bw()+
  ggtitle("Strongly Linear Positive Trend")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

GDP and Corrupt

The distribution of country scores on perceived corruption by GDP shows most countries reporting low corruption perception except for a group of high GDP countries where there score are much less concentrated and high corrupt scores are shown.

ggplot(data = happy) + 
  geom_point(mapping = aes(x = GDP, y = corrupt))+
    theme_bw()+
  ggtitle("High Corrupt Only for High GDP Scores and Outliers")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

When comparing corruption perception and GDP, perception levels remain relatively similar until the highest GDP group which sees a substantial increase though with a wide range.

ggplot(data = happy, mapping = aes(x = GDP, y = corrupt)) + 
  geom_boxplot(mapping = aes(group = cut_width(GDP, .25)))+
    theme_bw()+
  ggtitle("Greater Corrupt Spread at Low and High GDP Scores")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

Health and Corrupt

The distribution of countries based on perceived corruption by health shows high corruption perception for high health scorers.

ggplot(data = happy) + 
  geom_point(mapping = aes(x = health, y = corrupt))+
    theme_bw()+
  ggtitle("High Corrupt for High Health Scores and Outliers")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

When comparing corruption perception and health, the range of corruption perception also increases significantly for higher health countries.

ggplot(data = happy, mapping = aes(x = health, y = corrupt)) + 
  geom_boxplot(mapping = aes(group = cut_width(health, .25)))+
    theme_bw()+
   ggtitle("Corrupt Spread Highest for High Health Scores")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))+
  theme(axis.title = element_text(face = "bold"))

IV. Reflection

Working on the project was an interesting experience which challenged me to learn how to use R to overcome particularities of my data set and pursue my research questions. I began with selecting my dataset based on interesting questions on the determinants of national happiness it could provide insight on. After reviewing the data in different forms, I developed a better understanding of the types of variables included and potential data manipulation techniques I could utilize. I had to decide which variables to focus my analysis on as well as which visualization to use to best represent relationships and analyze trends.

It was a challenge manipulating code to customize each visualization adequately at times as many of my variables were at different scales. I wish I had known more about visualization specification techniques to improve my representation of the data set. To continue the project, I would look towards better understanding outliers and data subgroups that do not follow the regular linear trend, such as where higher GDP scorers show greater range in corruption responses. I would also broaden the research my adding new content such as additional variables, visualizations, and statistical analysis.

V. Conclusion

We can conclude that there is a positive relationship between increased national happiness and increased perception of corruption although this relationship is strongest for countries reporting high happiness. There is also a linear relationship between happiness and health as well as happiness and GDP, suggesting these factors are strongly interrelated. By looking at GDP and Corrupt and Corrupt and Health, we also observe that high scoring countries are affected by corruption perception differently. This could indicate there is greater sensitivity towards corruption perception among wealth countries due to the difference in wealth that also enables them to achieve higher GDP and health scores, with that same wealth also enabling them to maintain high happyness despite increased corruption perception.

More research is necessary to explore this relationship further as well as address unanswered questions which include how significant is corruption perception’s effect on national happiness compared to other relevant factors and how would changes in those factors affect corruption perceptions’ effect.

VI. Bibliography

Battaglia, C. (2021). RPubs Quick Guide. RStudio. http://rpubs.com/clairebattaglia/RPubs-quick-guide

Conway S. (2022). Data Analytics and Computational Social Science: RMarkdown Demo. https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomspconway909277/

Navarro, D. J. (2015). Learning Statistics with R: A Tutorial for Psychology Students and Other Beginners, Version 0.6.

R Markdown Cheat Sheet (2014). R Studio. https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

Sustainable Development Solutions Network (2022). World Happiness Report 2019. https://www.kaggle.com/datasets/unsdsn/world-happiness?select=2019.csv

Wickham, H., & Grolemund, G. (2016). R for data science: Visualize, model, transform, tidy, and import data. OReilly Media.

Wickham, H. (2019). Advanced R. Chapman and Hall/CRC.

Wickham, H. (2010). A layered grammar of graphics. Journal of Computational I and Graphical Statistics, 19(1), 3-28.

Final Project