The dataset Aline and I chose to work on is the levels of Happiness collecting from 153 different countries and 10 regions in the World. We would love to know how people feel about living in their own country and we are curious to know how happiness factors affect people’s life expectancy.
There are several variables in the dataset, but I am personally interested in analyzing the score of Freedom and Social support they have regionally, how they correlate with healthy life expectancy. I believe the two variables “Freedom to make life choices” and ” Social Support” are the two important features that greatly reflect happiness levels. The ladder score in this dataset is the happiness score of each country based on life evaluation questions in the Gallup World Poll (GWP), using the Cantril Ladder and sourced by Sustainable Development Solutions Network.
Therefore, I will create a main visualization to demonstrate that relation. Moreover, I will also generate two other visualizations that intriguingly show a similar trend using different factors to support my world happiness assumption.
Happiness can change and be affected according to the quality of the society in which one lives.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggplot2)
setwd("/Users/Linh/Desktop/DATASETS ")
happiness <- read_csv("happiness2020.csv")
## Rows: 153 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Country name, Regional indicator
## dbl (18): Ladder score, Standard error of ladder score, upperwhisker, lowerw...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The plot will explore how the two factors
Freedom to make life choices and
Social support affect healthy life expectancy regionally in
2020 I did a sum of “Freedom to make life choices” and ” Social support”
for Y axe. Because the dataset has 10 regional indicator and the shape
palette can only deal with a maximum of 6 discrete values, I manually
specified the 10 shapes in order to create a legend for this
visualization. I also changed the default theme of ggplot to light theme
and have a title for the plot.
ggplot(happiness, aes(x=`Healthy life expectancy`, y=`Freedom to make life choices` + `Social support`)) +
geom_point(aes(shape =`Regional indicator`, color =`Regional indicator`), size=2, alpha=1) +
scale_shape_manual(values = c(0, 1, 2, 5, 6, 7, 10, 11, 12, 13)) +
scale_color_manual(values=c('#999999','#E69F00', '#56B4E9', "red", "green", "pink", "cyan", "black", "purple", "brown")) +
theme_light() +
ggtitle("How Freedom to Make Life Choices and Social Support
affect Healthy Life Expectancy",
subtitle = "in 2020")
##Create another plot with a different variable
Ladder score in this case shows a similar demonstration of how Happiness Levels affect Healthy Life Expectancy and most importantly, the relationship between Happiness levels and the two factors Freedome and Social support people have in each country/ region.
ggplot(happiness, aes(x=`Healthy life expectancy`, y=`Ladder score`)) +
geom_point(aes(shape =`Regional indicator`, color =`Regional indicator`), size=2, alpha=0.7) +
scale_shape_manual(values = c(0, 1, 2, 5, 6, 7, 10, 11, 12, 13)) +
scale_color_manual(values=c('#999999','#E69F00', '#56B4E9', "red", "green", "pink", "cyan", "black", "purple", "brown")) +
theme_linedraw(base_size = 11) +
ggtitle("How Happiness Levels reflect Healthy Life Expectancy")
##A similar trend with Logged GDP per capita and Social Support
ggplot(happiness, aes(x=`Social support`, y=`Logged GDP per capita`)) +
geom_point(aes(shape =`Regional indicator`, color =`Regional indicator`), size=2, alpha=0.7) +
scale_shape_manual(values = c(0, 1, 2, 5, 6, 7, 10, 11, 12, 13)) +
scale_color_manual(values=c('#999999','#E69F00', '#56B4E9', "red", "green", "pink", "cyan", "black", "purple", "brown")) +
theme_light() +
ggtitle("GDP is highly correlated with Social Support")
There are various factors in which the happiness score is explained. In this data, I can look closely at each country to analyze the quality of life.
I created similar plots with different variables to show the upward trend that reflects society structures in different regions around the World. I find it interesting to learn about the correlations of those variables and how it affects happiness levels, and after all, affects life expectancy.
Looking at the three visualizations, I clearly see clusters of Sub- Saharan African and Western Europe. With the low level of freedom and social support, people in Sub- Saharan African region can barely reach the healthy life expectancy of 60. Meanwhile, Western Europe is doing well with the highest life expectancy, close to 75 years old because of their GPD per capita, social support, and the freedom they have. In the third visualization using two different variables, I can see that GDP per capita is highly correlated with other factors that are important to the well- being of society. An upward trend with lowest GDP and social support in the Sub- Saharan African region and the highest GDP and strong social support in Western Europe.
Something I found surprising are the two outliers in my first visualization that are Afghanistan from South Asia and Singapore in Southeast Asia.
Unfortunately, Afghanistan has a very low level of freedom and social support, coming with it is the very short life expectancy. It is explained by the 20-year war that happened to the people there.
The second outlier is Singapore. As a person who was living in Southeast Asia all my life, I was impressed by how far Singapore has grown. Their healthy life expectancy is far more than people in Western Europe. The country is doing well with high support from the government and their GDP is even higher than many of the Western European countries.
My very first thought when I started to work on this dataset was to
create a stacked bar chart with 3 different factors which were
Freedom to make life choices, Social support,
and Healthy life expectancy in ten different regional
indicators. However, I realized that it did not look right because one
region can have more countries than the other and that visualization
would not reflect the true reality.
I slowly came to understand the data better and changed my direction. I started to work on creating a simple scatterplot and gradually figured out the way to have legends appear. I worked on different layers of the plot, and then manually colored and shaped them. It was a journey of frustration, fear, thrill, and excitement. What I am eager to do is to use Plotly to create an interactive plot in which I can mouse over, and it could show more information such as name of a country, Logged GDP per capita, Generosity, Perceptions of corruption, and Dystopia. It took me Spring break to get to where I am in this project, but I will continue to explore Plotly in order to have my visualization interactive and informative to the audience.
Citation: Helliwell, John F., Richard Layard, Jeffrey Sachs, and Jan-Emmanuel De Neve, eds. 2020. World Happiness Report 2020. New York: Sustainable Development Solutions Network