HW6

project draft

Ken Docekal
2022-08-09

I. Introduction

National happiness is often used as a metric to gauge a population’s satisfaction with quality of life and as a measurement of the effectiveness of various public policies. Reported happiness can serve as a more holistic measure of a population’s life experience compared to more field specific measures such as life expectancy and disposable income. As a subjective measure, based on respondents’ perception of their own happiness, happyiness is a measure largely dependent on respondents’ perception of factors they deem most relevant to their sense of happiness. One of these determining factors captured by the 2019 World Happiness Report is respondents’ perception of the prevalence of corruption in their country.

This analysis will explore the relationship between corruption perception and national happiness by answering the following questions:

II. Data

Dataset

2019.csv contains data ranking 155 countries by happiness from the 2019 World Happiness Report. Respondents rated 6 components of happiness (GDP, Social.support, Healthy.life.expectancy, Freedom.to.make.life.choices, Generosity, and Perceptions.of.corruption) on a scale of 0(worst) to 10(best).

The dataset contains 156 rows and 9 variables. Variable scores are solely based on the mean of respondent provided responses by country with the exception of the GDP and Healthy.life.expectancy variables. These are composite scores which derive from respondent responses as well as a country’s performance based on Purchasing Power Parity and World Health Organization data on respectively.

Source: Sustainable Development Solutions Network (2022, June 27). World Happiness Report. Retrieved from https://www.kaggle.com/datasets/unsdsn/world-happiness?select=2019.csv

Import & Cleaning

  1. Ensure necessary packages are present
  1. Set working directory and import downloaded dataset
setwd("/Users/kendocekal/Documents/DACSS601/DACSS601Prj")

happy <- read.csv("/Users/kendocekal/Documents/DACSS601/DACSS601Prj/2019.csv")
  1. Review dataset
view(happy)

as_tibble(happy)
# A tibble: 156 × 9
   Overall.rank Country.or.region Score GDP.per.capita Social.support
          <int> <chr>             <dbl>          <dbl>          <dbl>
 1            1 Finland            7.77           1.34           1.59
 2            2 Denmark            7.6            1.38           1.57
 3            3 Norway             7.55           1.49           1.58
 4            4 Iceland            7.49           1.38           1.62
 5            5 Netherlands        7.49           1.40           1.52
 6            6 Switzerland        7.48           1.45           1.53
 7            7 Sweden             7.34           1.39           1.49
 8            8 New Zealand        7.31           1.30           1.56
 9            9 Canada             7.28           1.36           1.50
10           10 Austria            7.25           1.38           1.48
# … with 146 more rows, and 4 more variables:
#   Healthy.life.expectancy <dbl>,
#   Freedom.to.make.life.choices <dbl>, Generosity <dbl>,
#   Perceptions.of.corruption <dbl>
head(happy)
  Overall.rank Country.or.region Score GDP.per.capita Social.support
1            1           Finland 7.769          1.340          1.587
2            2           Denmark 7.600          1.383          1.573
3            3            Norway 7.554          1.488          1.582
4            4           Iceland 7.494          1.380          1.624
5            5       Netherlands 7.488          1.396          1.522
6            6       Switzerland 7.480          1.452          1.526
  Healthy.life.expectancy Freedom.to.make.life.choices Generosity
1                   0.986                        0.596      0.153
2                   0.996                        0.592      0.252
3                   1.028                        0.603      0.271
4                   1.026                        0.591      0.354
5                   0.999                        0.557      0.322
6                   1.052                        0.572      0.263
  Perceptions.of.corruption
1                     0.393
2                     0.410
3                     0.341
4                     0.118
5                     0.298
6                     0.343
dim(happy)
[1] 156   9
colnames(happy)
[1] "Overall.rank"                 "Country.or.region"           
[3] "Score"                        "GDP.per.capita"              
[5] "Social.support"               "Healthy.life.expectancy"     
[7] "Freedom.to.make.life.choices" "Generosity"                  
[9] "Perceptions.of.corruption"   
  1. Rename variables for clarity

Score to happyness

happy <- rename(happy, "happyness" = Score)

GDP.per.capita to GDP

happy <- rename(happy, "GDP" = GDP.per.capita)

Perceptions.of.corruption to corrupt

happy <- rename(happy, "corrupt" = Perceptions.of.corruption)

Healthy.life.expectancy to health

happy <- rename(happy, "health" = Healthy.life.expectancy)

Confirm name changes

colnames(happy)
[1] "Overall.rank"                 "Country.or.region"           
[3] "happyness"                    "GDP"                         
[5] "Social.support"               "health"                      
[7] "Freedom.to.make.life.choices" "Generosity"                  
[9] "corrupt"                     

Focus on interested variables by limiting results to happiness score and corruption perception for the 20 happiest countries as well as GDP (gross domestic product, a measure of national income) per capita and healthy life expectancy variables which are selected to capture objective divergence in national resources and conditions

select(happy, happyness, GDP, corrupt, health)%>% 
  arrange(happy, desc(happyness))%>%
  slice(1:20)
   happyness   GDP corrupt health
1      7.769 1.340   0.393  0.986
2      7.600 1.383   0.410  0.996
3      7.554 1.488   0.341  1.028
4      7.494 1.380   0.118  1.026
5      7.488 1.396   0.298  0.999
6      7.480 1.452   0.343  1.052
7      7.343 1.387   0.373  1.009
8      7.307 1.303   0.380  1.026
9      7.278 1.365   0.308  1.039
10     7.246 1.376   0.226  1.016
11     7.228 1.372   0.290  1.036
12     7.167 1.034   0.093  0.963
13     7.139 1.276   0.082  1.029
14     7.090 1.609   0.316  1.012
15     7.054 1.333   0.278  0.996
16     7.021 1.499   0.310  0.999
17     6.985 1.373   0.265  0.987
18     6.923 1.356   0.210  0.986
19     6.892 1.433   0.128  0.874
20     6.852 1.269   0.036  0.920

III. Visualization

Mean, Median, and Standard Deviation - Reivew variable characteristics

Happyness

summarise(happy, mean(happyness, na.rm = TRUE), median(happyness, na.rm = TRUE), sd(happyness, na.rm = TRUE))
  mean(happyness, na.rm = TRUE) median(happyness, na.rm = TRUE)
1                      5.407096                          5.3795
  sd(happyness, na.rm = TRUE)
1                     1.11312

GDP

summarise(happy, mean(GDP, na.rm = TRUE), median(GDP, na.rm = TRUE), sd(GDP, na.rm = TRUE))
  mean(GDP, na.rm = TRUE) median(GDP, na.rm = TRUE)
1               0.9051474                      0.96
  sd(GDP, na.rm = TRUE)
1             0.3983895

Corrupt

summarise(happy, mean(corrupt, na.rm = TRUE), median(corrupt, na.rm = TRUE), sd(corrupt, na.rm = TRUE))
  mean(corrupt, na.rm = TRUE) median(corrupt, na.rm = TRUE)
1                   0.1106026                        0.0855
  sd(corrupt, na.rm = TRUE)
1                0.09453784

Health

summarise(happy, mean(health, na.rm = TRUE), median(health, na.rm = TRUE), sd(health, na.rm = TRUE))
  mean(health, na.rm = TRUE) median(health, na.rm = TRUE)
1                  0.7252436                        0.789
  sd(health, na.rm = TRUE)
1                 0.242124

Distribtion of each variable - what each variable looks like amongst all countries as well as the concentration of scores.

Happiness Scores shows a relatively standard distribution with two peaks near the center

ggplot(data = happy) +
  geom_histogram(mapping = aes(x = happyness), binwidth = 0.5)+
    theme_bw()+
  ggtitle("Happyness Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

GDP per capita shows some skewing with most countries in the upper half of scores

ggplot(data = happy) +
  geom_histogram(mapping = aes(x = GDP), binwidth = 0.1)+
    theme_bw()+
  ggtitle("GDP Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

Perception of Corruption shows significant concentration the the lower end of the bar graph but overall distribution is low

ggplot(data = happy) +
  geom_histogram(mapping = aes(x = corrupt), binwidth = 0.001)+
    theme_bw()+
  ggtitle("Corruption Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

Health shows skewing towards the upper end of the spectrum shown but overall distribution is still low

ggplot(data = happy) +
  geom_histogram(mapping = aes(x = health), binwidth = 0.001)+
    theme_bw()+
  ggtitle("Health Distribution")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

Happyness and corrupt

Shows the distribution of country happiness and reported corruption score

happy %>% 
                  group_by(happyness, GDP, corrupt) %>% 
                  summarise(
                    first = min(happyness),
                    last = max(happyness)
                  ) %>%
    ggplot(mapping = aes(x = corrupt, y = happyness)) +
    geom_point() + 
    geom_smooth(method="lm", se = FALSE)+
    theme_bw()+
  ggtitle("Happyness and Corruption")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

The graph of corruption perception and happiness score shows a much less positive relationship with some outliners as well a concentration of happiness counts around 5

ggplot(data = happy) +
  geom_hex(mapping = aes(x = corrupt, y = happyness))+
    theme_bw()

Comparing corruption perception and happiness scores reveal similar ranges of perception amongst most low and middle happiness countries (with some extreme outliers) but substantially increased perception amongst higher happiness countries

ggplot(data = happy, mapping = aes(x = corrupt, y = happyness)) + 
  geom_boxplot(mapping = aes(group = cut_width(happyness, 1)))+
    theme_bw()

Happyness and GDP

Shows the distribution of country happiness and GDP scores.

happy %>% 
                  group_by(happyness, GDP, corrupt) %>% 
                  summarise(
                    first = min(happyness),
                    last = max(happyness)
                  ) %>%
    ggplot(mapping = aes(x = GDP, y = happyness)) +
    geom_point() + 
    geom_smooth(method="lm", se = FALSE)+
    theme_bw()+
  ggtitle("Happyness and GDP")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

Happiness score and GDP through a scatter plot shows a relatively linear relationship

ggplot(data = happy) +
  geom_hex(mapping = aes(x = GDP, y = happyness))+
    theme_bw()

Grouping by happiness and corresponding GDP with a boxplot shows skewness and outliers with middle happiness scores more skewed while high scores show much less variation in GDP although with some outliers

ggplot(data = happy, mapping = aes(x = GDP, y = happyness)) + 
  geom_boxplot(mapping = aes(group = cut_width(happyness, 1)))+
    theme_bw()

Happyness and health

Distribution of country happiness and health

happy %>% 
                  group_by(happyness, health, corrupt) %>% 
                  summarise(
                    first = min(happyness),
                    last = max(happyness)
                  ) %>%
    ggplot(mapping = aes(x = health, y = happyness)) +
    geom_point() + 
    geom_smooth(method="lm", se = FALSE)+
    theme_bw()+
  ggtitle("Happyness and Health")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

GDP and Perceptions of Corruption

Distribution of countries based on perceived corruption by GDP

ggplot(data = happy) + 
  geom_point(mapping = aes(x = GDP, y = corrupt))+
    theme_bw()+
  ggtitle("GDP and Corruption")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

When comparing corruption perception and GDP, perception levels remaind relatively similar until the highest GDP group which sees a substatial increase though with a wide range

ggplot(data = happy, mapping = aes(x = GDP, y = corrupt)) + 
  geom_boxplot(mapping = aes(group = cut_width(GDP, .25)))+
    theme_bw()

Health and corrupt

Distribution of countries based on perceived corruption by health

ggplot(data = happy) + 
  geom_point(mapping = aes(x = health, y = corrupt))+
    theme_bw()+
  ggtitle("Health and Corruption")+
  theme(plot.title = element_text(face = "bold", colour = "blue"))

When comparing corruption perception and health, the range of corruption perception increases significantly for higher health countries

ggplot(data = happy, mapping = aes(x = health, y = corrupt)) + 
  geom_boxplot(mapping = aes(group = cut_width(health, .25)))+
    theme_bw()

IV. Reflection & Conclusion

This project needs further refinement and improved presentation. In addition to formatting improvements required, I think explanations can be improved and additional visualizations may also be helpful in improving the exploration of relationships between variables. Existing visualizations can also be tweaked to better display data and the bibliography added.

I will continue to develop project content and improve presentation as well as further review my data and current visualizations. I hope to increase the legibility and clarity of my project and as well as continue gaining insights from my dataset and improving my analysis through new visualizations. The project can be further developed by adding new content and by deepening exploration of existing aspects of research.