project draft
National happiness is often used as a metric to gauge a population’s satisfaction with quality of life and as a measurement of the effectiveness of various public policies. Reported happiness can serve as a more holistic measure of a population’s life experience compared to more field specific measures such as life expectancy and disposable income. As a subjective measure, based on respondents’ perception of their own happiness, happyiness is a measure largely dependent on respondents’ perception of factors they deem most relevant to their sense of happiness. One of these determining factors captured by the 2019 World Happiness Report is respondents’ perception of the prevalence of corruption in their country.
This analysis will explore the relationship between corruption perception and national happiness by answering the following questions:
What does the relationship between corruption perception and national happiness look like?
How is this relationship affected by variables such as location, GDP, etc?
How significant is the effect of corruption perception on national happiness compared to other factors ?
How do changes in other factors impact the effect of corruption perception on national happiness?
2019.csv contains data ranking 155 countries by happiness from the 2019 World Happiness Report. Respondents rated 6 components of happiness (GDP, Social.support, Healthy.life.expectancy, Freedom.to.make.life.choices, Generosity, and Perceptions.of.corruption) on a scale of 0(worst) to 10(best).
The dataset contains 156 rows and 9 variables. Variable scores are solely based on the mean of respondent provided responses by country with the exception of the GDP and Healthy.life.expectancy variables. These are composite scores which derive from respondent responses as well as a country’s performance based on Purchasing Power Parity and World Health Organization data on respectively.
Source: Sustainable Development Solutions Network (2022, June 27). World Happiness Report. Retrieved from https://www.kaggle.com/datasets/unsdsn/world-happiness?select=2019.csv
# A tibble: 156 × 9
Overall.rank Country.or.region Score GDP.per.capita Social.support
<int> <chr> <dbl> <dbl> <dbl>
1 1 Finland 7.77 1.34 1.59
2 2 Denmark 7.6 1.38 1.57
3 3 Norway 7.55 1.49 1.58
4 4 Iceland 7.49 1.38 1.62
5 5 Netherlands 7.49 1.40 1.52
6 6 Switzerland 7.48 1.45 1.53
7 7 Sweden 7.34 1.39 1.49
8 8 New Zealand 7.31 1.30 1.56
9 9 Canada 7.28 1.36 1.50
10 10 Austria 7.25 1.38 1.48
# … with 146 more rows, and 4 more variables:
# Healthy.life.expectancy <dbl>,
# Freedom.to.make.life.choices <dbl>, Generosity <dbl>,
# Perceptions.of.corruption <dbl>
head(happy)
Overall.rank Country.or.region Score GDP.per.capita Social.support
1 1 Finland 7.769 1.340 1.587
2 2 Denmark 7.600 1.383 1.573
3 3 Norway 7.554 1.488 1.582
4 4 Iceland 7.494 1.380 1.624
5 5 Netherlands 7.488 1.396 1.522
6 6 Switzerland 7.480 1.452 1.526
Healthy.life.expectancy Freedom.to.make.life.choices Generosity
1 0.986 0.596 0.153
2 0.996 0.592 0.252
3 1.028 0.603 0.271
4 1.026 0.591 0.354
5 0.999 0.557 0.322
6 1.052 0.572 0.263
Perceptions.of.corruption
1 0.393
2 0.410
3 0.341
4 0.118
5 0.298
6 0.343
dim(happy)
[1] 156 9
colnames(happy)
[1] "Overall.rank" "Country.or.region"
[3] "Score" "GDP.per.capita"
[5] "Social.support" "Healthy.life.expectancy"
[7] "Freedom.to.make.life.choices" "Generosity"
[9] "Perceptions.of.corruption"
Score to happyness
happy <- rename(happy, "happyness" = Score)
GDP.per.capita to GDP
happy <- rename(happy, "GDP" = GDP.per.capita)
Perceptions.of.corruption to corrupt
happy <- rename(happy, "corrupt" = Perceptions.of.corruption)
Healthy.life.expectancy to health
happy <- rename(happy, "health" = Healthy.life.expectancy)
Confirm name changes
colnames(happy)
[1] "Overall.rank" "Country.or.region"
[3] "happyness" "GDP"
[5] "Social.support" "health"
[7] "Freedom.to.make.life.choices" "Generosity"
[9] "corrupt"
Focus on interested variables by limiting results to happiness score and corruption perception for the 20 happiest countries as well as GDP (gross domestic product, a measure of national income) per capita and healthy life expectancy variables which are selected to capture objective divergence in national resources and conditions
happyness GDP corrupt health
1 7.769 1.340 0.393 0.986
2 7.600 1.383 0.410 0.996
3 7.554 1.488 0.341 1.028
4 7.494 1.380 0.118 1.026
5 7.488 1.396 0.298 0.999
6 7.480 1.452 0.343 1.052
7 7.343 1.387 0.373 1.009
8 7.307 1.303 0.380 1.026
9 7.278 1.365 0.308 1.039
10 7.246 1.376 0.226 1.016
11 7.228 1.372 0.290 1.036
12 7.167 1.034 0.093 0.963
13 7.139 1.276 0.082 1.029
14 7.090 1.609 0.316 1.012
15 7.054 1.333 0.278 0.996
16 7.021 1.499 0.310 0.999
17 6.985 1.373 0.265 0.987
18 6.923 1.356 0.210 0.986
19 6.892 1.433 0.128 0.874
20 6.852 1.269 0.036 0.920
Happyness
summarise(happy, mean(happyness, na.rm = TRUE), median(happyness, na.rm = TRUE), sd(happyness, na.rm = TRUE))
mean(happyness, na.rm = TRUE) median(happyness, na.rm = TRUE)
1 5.407096 5.3795
sd(happyness, na.rm = TRUE)
1 1.11312
GDP
mean(GDP, na.rm = TRUE) median(GDP, na.rm = TRUE)
1 0.9051474 0.96
sd(GDP, na.rm = TRUE)
1 0.3983895
Corrupt
summarise(happy, mean(corrupt, na.rm = TRUE), median(corrupt, na.rm = TRUE), sd(corrupt, na.rm = TRUE))
mean(corrupt, na.rm = TRUE) median(corrupt, na.rm = TRUE)
1 0.1106026 0.0855
sd(corrupt, na.rm = TRUE)
1 0.09453784
Health
summarise(happy, mean(health, na.rm = TRUE), median(health, na.rm = TRUE), sd(health, na.rm = TRUE))
mean(health, na.rm = TRUE) median(health, na.rm = TRUE)
1 0.7252436 0.789
sd(health, na.rm = TRUE)
1 0.242124
Happiness Scores shows a relatively standard distribution with two peaks near the center
ggplot(data = happy) +
geom_histogram(mapping = aes(x = happyness), binwidth = 0.5)+
theme_bw()+
ggtitle("Happyness Distribution")+
theme(plot.title = element_text(face = "bold", colour = "blue"))
GDP per capita shows some skewing with most countries in the upper half of scores
ggplot(data = happy) +
geom_histogram(mapping = aes(x = GDP), binwidth = 0.1)+
theme_bw()+
ggtitle("GDP Distribution")+
theme(plot.title = element_text(face = "bold", colour = "blue"))
Perception of Corruption shows significant concentration the the lower end of the bar graph but overall distribution is low
ggplot(data = happy) +
geom_histogram(mapping = aes(x = corrupt), binwidth = 0.001)+
theme_bw()+
ggtitle("Corruption Distribution")+
theme(plot.title = element_text(face = "bold", colour = "blue"))
Health shows skewing towards the upper end of the spectrum shown but overall distribution is still low
ggplot(data = happy) +
geom_histogram(mapping = aes(x = health), binwidth = 0.001)+
theme_bw()+
ggtitle("Health Distribution")+
theme(plot.title = element_text(face = "bold", colour = "blue"))
Shows the distribution of country happiness and reported corruption score
happy %>%
group_by(happyness, GDP, corrupt) %>%
summarise(
first = min(happyness),
last = max(happyness)
) %>%
ggplot(mapping = aes(x = corrupt, y = happyness)) +
geom_point() +
geom_smooth(method="lm", se = FALSE)+
theme_bw()+
ggtitle("Happyness and Corruption")+
theme(plot.title = element_text(face = "bold", colour = "blue"))
The graph of corruption perception and happiness score shows a much less positive relationship with some outliners as well a concentration of happiness counts around 5
Comparing corruption perception and happiness scores reveal similar ranges of perception amongst most low and middle happiness countries (with some extreme outliers) but substantially increased perception amongst higher happiness countries
ggplot(data = happy, mapping = aes(x = corrupt, y = happyness)) +
geom_boxplot(mapping = aes(group = cut_width(happyness, 1)))+
theme_bw()
Shows the distribution of country happiness and GDP scores.
happy %>%
group_by(happyness, GDP, corrupt) %>%
summarise(
first = min(happyness),
last = max(happyness)
) %>%
ggplot(mapping = aes(x = GDP, y = happyness)) +
geom_point() +
geom_smooth(method="lm", se = FALSE)+
theme_bw()+
ggtitle("Happyness and GDP")+
theme(plot.title = element_text(face = "bold", colour = "blue"))
Happiness score and GDP through a scatter plot shows a relatively linear relationship
Grouping by happiness and corresponding GDP with a boxplot shows skewness and outliers with middle happiness scores more skewed while high scores show much less variation in GDP although with some outliers
ggplot(data = happy, mapping = aes(x = GDP, y = happyness)) +
geom_boxplot(mapping = aes(group = cut_width(happyness, 1)))+
theme_bw()
Distribution of country happiness and health
happy %>%
group_by(happyness, health, corrupt) %>%
summarise(
first = min(happyness),
last = max(happyness)
) %>%
ggplot(mapping = aes(x = health, y = happyness)) +
geom_point() +
geom_smooth(method="lm", se = FALSE)+
theme_bw()+
ggtitle("Happyness and Health")+
theme(plot.title = element_text(face = "bold", colour = "blue"))
Distribution of countries based on perceived corruption by GDP
ggplot(data = happy) +
geom_point(mapping = aes(x = GDP, y = corrupt))+
theme_bw()+
ggtitle("GDP and Corruption")+
theme(plot.title = element_text(face = "bold", colour = "blue"))
When comparing corruption perception and GDP, perception levels remaind relatively similar until the highest GDP group which sees a substatial increase though with a wide range
ggplot(data = happy, mapping = aes(x = GDP, y = corrupt)) +
geom_boxplot(mapping = aes(group = cut_width(GDP, .25)))+
theme_bw()
Distribution of countries based on perceived corruption by health
ggplot(data = happy) +
geom_point(mapping = aes(x = health, y = corrupt))+
theme_bw()+
ggtitle("Health and Corruption")+
theme(plot.title = element_text(face = "bold", colour = "blue"))
When comparing corruption perception and health, the range of corruption perception increases significantly for higher health countries
ggplot(data = happy, mapping = aes(x = health, y = corrupt)) +
geom_boxplot(mapping = aes(group = cut_width(health, .25)))+
theme_bw()
This project needs further refinement and improved presentation. In addition to formatting improvements required, I think explanations can be improved and additional visualizations may also be helpful in improving the exploration of relationships between variables. Existing visualizations can also be tweaked to better display data and the bibliography added.
I will continue to develop project content and improve presentation as well as further review my data and current visualizations. I hope to increase the legibility and clarity of my project and as well as continue gaining insights from my dataset and improving my analysis through new visualizations. The project can be further developed by adding new content and by deepening exploration of existing aspects of research.