HW5

national happiness and corruption perception

Ken Docekal
2022-07-26

I. Introduction

National happiness is often used as a metric to gauge a population’s satisfaction with quality of life and as a measurement of the effectiveness of various public policies. Reported happiness can serve as a more holistic measure of a population’s life experience compared to more field specific measures such as life expectancy and disposable income. As a subjective measure, based on respondents’ perception of their own happiness, happyiness is a measure largely dependent on respondents’ perception of factors they deem most relevant to their sense of happiness. One of these determining factors captured by the 2019 World Happiness Report is respondents’ perception of the prevalence of corruption in their country.

This analysis will explore the relationship between corruption perception and national happiness by answering the following questions:

What does the relationship between corruption perception and national happiness look like?

How is this relationship affected by variables such as location, GDP, etc?

How significant is the effect of corruption perception on national happiness compared to other factors ?

How do changes in other factors impact the effect of corruption perception on national happiness?

II. Data

Dataset

2019.csv contains data ranking 155 countries by happiness from the 2019 World Happiness Report. Respondents rated 6 components of happiness (GDP, Social.support, Healthy.life.expectancy, Freedom.to.make.life.choices, Generosity, and Perceptions.of.corruption) on a scale of 0(worst) to 10(best).

The dataset contains 156 rows and 9 variables. Variable scores are solely based on the mean of respondent provided responses by country with the exception of the GDP and Healthy.life.expectancy variables. These are composite scores which derive from respondent responses as well as a country’s performance based on Purchasing Power Parity and World Health Organization data on respectively.

Source: Sustainable Development Solutions Network (2022, June 27). World Happiness Report. Retrieved from https://www.kaggle.com/datasets/unsdsn/world-happiness?select=2019.csv

Import & Cleaning

  1. Ensure necessary packages are present
  1. Set working directory and import downloaded dataset
    setwd("/Users/kendocekal/Documents/DACSS601/DACSS601Prj")
happy <- read.csv("/Users/kendocekal/Documents/DACSS601/DACSS601Prj/2019.csv")
  1. Review dataset
    view(happy)
as_tibble(happy)
# A tibble: 156 × 9
   Overall.rank Country.or.region Score GDP.per.capita Social.support
          <int> <chr>             <dbl>          <dbl>          <dbl>
 1            1 Finland            7.77           1.34           1.59
 2            2 Denmark            7.6            1.38           1.57
 3            3 Norway             7.55           1.49           1.58
 4            4 Iceland            7.49           1.38           1.62
 5            5 Netherlands        7.49           1.40           1.52
 6            6 Switzerland        7.48           1.45           1.53
 7            7 Sweden             7.34           1.39           1.49
 8            8 New Zealand        7.31           1.30           1.56
 9            9 Canada             7.28           1.36           1.50
10           10 Austria            7.25           1.38           1.48
# … with 146 more rows, and 4 more variables:
#   Healthy.life.expectancy <dbl>,
#   Freedom.to.make.life.choices <dbl>, Generosity <dbl>,
#   Perceptions.of.corruption <dbl>
head(happy)
  Overall.rank Country.or.region Score GDP.per.capita Social.support
1            1           Finland 7.769          1.340          1.587
2            2           Denmark 7.600          1.383          1.573
3            3            Norway 7.554          1.488          1.582
4            4           Iceland 7.494          1.380          1.624
5            5       Netherlands 7.488          1.396          1.522
6            6       Switzerland 7.480          1.452          1.526
  Healthy.life.expectancy Freedom.to.make.life.choices Generosity
1                   0.986                        0.596      0.153
2                   0.996                        0.592      0.252
3                   1.028                        0.603      0.271
4                   1.026                        0.591      0.354
5                   0.999                        0.557      0.322
6                   1.052                        0.572      0.263
  Perceptions.of.corruption
1                     0.393
2                     0.410
3                     0.341
4                     0.118
5                     0.298
6                     0.343
dim(happy)
[1] 156   9
colnames(happy)
[1] "Overall.rank"                 "Country.or.region"           
[3] "Score"                        "GDP.per.capita"              
[5] "Social.support"               "Healthy.life.expectancy"     
[7] "Freedom.to.make.life.choices" "Generosity"                  
[9] "Perceptions.of.corruption"   
  1. Rename variables for clarity
Score to happyness
happy <- rename(happy, "happyness" = Score)
GDP.per.capita to GDP
happy <- rename(happy, "GDP" = GDP.per.capita)
Perceptions.of.corruption to corrupt
happy <- rename(happy, "corrupt" = Perceptions.of.corruption)
Healthy.life.expectancy to health
happy <- rename(happy, "health" = Healthy.life.expectancy)
colnames(happy)
[1] "Overall.rank"                 "Country.or.region"           
[3] "happyness"                    "GDP"                         
[5] "Social.support"               "health"                      
[7] "Freedom.to.make.life.choices" "Generosity"                  
[9] "corrupt"                     

Focus on interested variables by limiting results to happiness score and corruption perception for the 20 happiest countries as well as GDP (gross domestic product, a measure of national income) per capita and healthy life expectancy variables which are selected to capture objective divergence in national resources and conditions

select(happy, happyness, GDP, corrupt, health)%>% 
  arrange(happy, desc(happyness))%>%
  slice(1:20)
   happyness   GDP corrupt health
1      7.769 1.340   0.393  0.986
2      7.600 1.383   0.410  0.996
3      7.554 1.488   0.341  1.028
4      7.494 1.380   0.118  1.026
5      7.488 1.396   0.298  0.999
6      7.480 1.452   0.343  1.052
7      7.343 1.387   0.373  1.009
8      7.307 1.303   0.380  1.026
9      7.278 1.365   0.308  1.039
10     7.246 1.376   0.226  1.016
11     7.228 1.372   0.290  1.036
12     7.167 1.034   0.093  0.963
13     7.139 1.276   0.082  1.029
14     7.090 1.609   0.316  1.012
15     7.054 1.333   0.278  0.996
16     7.021 1.499   0.310  0.999
17     6.985 1.373   0.265  0.987
18     6.923 1.356   0.210  0.986
19     6.892 1.433   0.128  0.874
20     6.852 1.269   0.036  0.920

III. Visualization

Mean, Median, and Standard Deviation - Reivew variable characteristics

mean(happy$happyness, na.rm = FALSE)
[1] 5.407096
mean(happy$corrupt, na.rm = FALSE)
[1] 0.1106026
mean(happy$GDP, na.rm = FALSE)
[1] 0.9051474
mean(happy$health, na.rm = FALSE)
[1] 0.7252436
median(happy$happyness, na.rm = FALSE)
[1] 5.3795
median(happy$corrupt, na.rm = FALSE)
[1] 0.0855
median(happy$GDP, na.rm = FALSE)
[1] 0.96
median(happy$health, na.rm = FALSE)
[1] 0.789
sd(happy$happyness, na.rm = FALSE)
[1] 1.11312
sd(happy$corrupt, na.rm = FALSE)
[1] 0.09453784
sd(happy$GDP, na.rm = FALSE)
[1] 0.3983895
sd(happy$health, na.rm = FALSE)
[1] 0.242124

Distribtion of each variable - what each variable looks like amongst all countries as well as the concentration of scores.

Happiness Scores shows a relatively standard distribution with two peaks near the center
ggplot(data = happy) +
  geom_histogram(mapping = aes(x = happyness), binwidth = 0.5)+
    theme_bw()+
  ggtitle("Happyness Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

GDP per capita shows some skewing with most countries in the upper half of scores
ggplot(data = happy) +
  geom_histogram(mapping = aes(x = GDP), binwidth = 0.1)+
    theme_bw()+
  ggtitle("GDP Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

Perception of Corruption shows significant concentration the the lower end of the bar graph
ggplot(data = happy) +
  geom_histogram(mapping = aes(x = corrupt), binwidth = 0.1)+
    theme_bw()+
  ggtitle("Corruption Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

Health shows skewing towards the upper end of the spectrum
ggplot(data = happy) +
  geom_histogram(mapping = aes(x = health), binwidth = 0.1)+
    theme_bw()+
  ggtitle("Health Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

Happiness’ relationship with corruption , GDP, and health

Happiness by corruption - Shows the distribution of country happiness and reported corruption score

happy %>% 
                  group_by(happyness, GDP, corrupt) %>% 
                  summarise(
                    first = min(happyness),
                    last = max(happyness)
                  ) %>%
    ggplot(mapping = aes(x = corrupt, y = happyness)) +
    geom_point() + 
    geom_smooth(method="lm", se = FALSE)+
    theme_bw()+
  ggtitle("Happyness and Corruption")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

The graph of corruption perception and happiness score shows a much less positive relationship with some outliners as well a concentration of happiness counts around 5
ggplot(data = happy) +
  geom_hex(mapping = aes(x = corrupt, y = happyness))+
    theme_bw()

Comparing corruption perception and happiness scores reveal similar ranges of perception amongst most low and middle happiness countries (with some extreme outliers) but substantially increased perception amongst higher happiness countries
ggplot(data = happy, mapping = aes(x = corrupt, y = happyness)) + 
  geom_boxplot(mapping = aes(group = cut_width(happyness, 1)))+
    theme_bw()

Happiness by GDP - Shows the distribution of country happiness and GDP scores.

happy %>% 
                  group_by(happyness, GDP, corrupt) %>% 
                  summarise(
                    first = min(happyness),
                    last = max(happyness)
                  ) %>%
    ggplot(mapping = aes(x = GDP, y = happyness)) +
    geom_point() + 
    geom_smooth(method="lm", se = FALSE)+
    theme_bw()+
  ggtitle("Happyness and GDP")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

Happiness score and GDP through a scatter plot shows a relatively linear relationship
ggplot(data = happy) +
  geom_hex(mapping = aes(x = GDP, y = happyness))+
    theme_bw()

Grouping by happiness and corresponding GDP with a boxplot shows skewness and outliers with middle happiness scores more skewed while high scores show much less variation in GDP although with some outliers
ggplot(data = happy, mapping = aes(x = GDP, y = happyness)) + 
  geom_boxplot(mapping = aes(group = cut_width(happyness, 1)))+
    theme_bw()

Happiness by health - Shows the distribution of country happiness and health

happy %>% 
                  group_by(happyness, health, corrupt) %>% 
                  summarise(
                    first = min(happyness),
                    last = max(happyness)
                  ) %>%
    ggplot(mapping = aes(x = health, y = happyness)) +
    geom_point() + 
    geom_smooth(method="lm", se = FALSE)+
    theme_bw()+
  ggtitle("Happyness and Health")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

Relationship between corruption and other variables

GDP by Perceptions of Corruption - Shows distribution of countries based on perceived corruption by GDP

ggplot(data = happy) + 
  geom_point(mapping = aes(x = GDP, y = corrupt))+
    theme_bw()+
  ggtitle("GDP and Corruption")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

When comparing corruption perception and GDP, perception levels remaind relatively similar until the highest GDP group which sees a substatial increase though with a wide range
ggplot(data = happy, mapping = aes(x = GDP, y = corrupt)) + 
  geom_boxplot(mapping = aes(group = cut_width(GDP, .25)))+
    theme_bw()

health by corruption - Shows distribution of countries based on perceived corruption by health

ggplot(data = happy) + 
  geom_point(mapping = aes(x = health, y = corrupt))+
    theme_bw()+
  ggtitle("Health and Corruption")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

When comparing corruption perception and health, the range of corruption perception increases significantly for higher health countries
ggplot(data = happy, mapping = aes(x = health, y = corrupt)) + 
  geom_boxplot(mapping = aes(group = cut_width(health, .25)))+
    theme_bw()

Effect of health and GDP changes

Rate change of corruption and happiness by GDP

HPerGDP shows the rate of increase of a country’s happiness score per increase in GDP while CPerGDP shows the same per increase in perceived corruption.
happy <- select(happy, 
                happyness, 
                GDP, 
                Generosity, 
                corrupt
                )
                mutate(happy,
                       HPerGPD = happyness / GDP,
                       CPerGDP = corrupt / GDP
                )%>% 
  arrange(happy, desc(happyness))%>%
  slice(1:20)
   happyness   GDP Generosity corrupt    HPerGPD    CPerGDP
1      2.853 0.306      0.202   0.091   9.323529 0.29738562
2      3.083 0.026      0.235   0.035 118.576923 1.34615385
3      3.203 0.350      0.158   0.025   9.151429 0.07142857
4      3.231 0.476      0.276   0.147   6.787815 0.30882353
5      3.334 0.359      0.217   0.411   9.286908 1.14484680
6      3.380 0.287      0.108   0.077  11.777003 0.26829268
7      3.410 0.191      0.218   0.089  17.853403 0.46596859
8      3.462 0.619      0.331   0.141   5.592892 0.22778675
9      3.488 1.041      0.025   0.100   3.350624 0.09606148
10     3.597 0.323      0.419   0.110  11.136223 0.34055728
11     3.663 0.366      0.151   0.089  10.008197 0.24316940
12     3.775 0.046      0.176   0.180  82.065217 3.91304348
13     3.802 0.489      0.107   0.093   7.775051 0.19018405
14     3.933 0.274      0.169   0.041  14.354015 0.14963504
15     3.973 0.274      0.275   0.078  14.500000 0.28467153
16     3.975 0.073      0.233   0.033  54.452055 0.45205479
17     4.015 0.755      0.200   0.085   5.317881 0.11258278
18     4.085 0.275      0.177   0.085  14.854545 0.30909091
19     4.107 0.578      0.247   0.087   7.105536 0.15051903
20     4.166 0.913      0.076   0.067   4.562979 0.07338445

IV. Reflection

This analysis is missing more information about the relationship between other variables in the dataset. We are also lacking information on other differences between countries which may correlate with the results such as country location or economic development level.

We can currently conclude that corruption perception is reported with a wider range for countries with higher scores in GDP and health. The overall relationship is positive between key determinats of happyness. Naive readers would currently need more background information on what variable measure and how they are reported to better understand the graphs.This data set can better answer questions on the relationship by including other factors that might be relevant, such as by including more metrics like those on media freedom, age distribution, and population concentration.