World Happiness Report

The world happiness report is a survey first published in 2012, up to 2019. I chose to cover the most recent 2019 happiness survey to consider. The rankings and data are from the Gallop World Poll. Scores are based on the Cantril Ladder, where the best possible life is a 10 and the worst possible life being 0. Interesting note: “Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom and least social support, it is referred to as “Dystopia,” in contrast to Utopia.”

load needed library

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(treemap)
library(RColorBrewer)

Load data file and display contents

read the data CSV file with the World Happiness Data Reports and assigned variable World happiness2019

setwd("C:/Users/baise/OneDrive/Desktop/Baidata110summer")
happiness2019 <- read_csv("happiness2019.csv")

## Rows: 156 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Country or region
## dbl (8): Overall rank, Score, GDP per capita, Social support, Healthy life e...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

This is a summary Dataset of world Happiness 2019

summary(happiness2019)

##   Overall rank    Country or region      Score       GDP per capita  
##  Min.   :  1.00   Length:156         Min.   :2.853   Min.   :0.0000  
##  1st Qu.: 39.75   Class :character   1st Qu.:4.545   1st Qu.:0.6028  
##  Median : 78.50   Mode  :character   Median :5.380   Median :0.9600  
##  Mean   : 78.50                      Mean   :5.407   Mean   :0.9051  
##  3rd Qu.:117.25                      3rd Qu.:6.184   3rd Qu.:1.2325  
##  Max.   :156.00                      Max.   :7.769   Max.   :1.6840  
##  Social support  Healthy life expectancy Freedom to make life choices
##  Min.   :0.000   Min.   :0.0000          Min.   :0.0000              
##  1st Qu.:1.056   1st Qu.:0.5477          1st Qu.:0.3080              
##  Median :1.272   Median :0.7890          Median :0.4170              
##  Mean   :1.209   Mean   :0.7252          Mean   :0.3926              
##  3rd Qu.:1.452   3rd Qu.:0.8818          3rd Qu.:0.5072              
##  Max.   :1.624   Max.   :1.1410          Max.   :0.6310              
##    Generosity     Perceptions of corruption
##  Min.   :0.0000   Min.   :0.0000           
##  1st Qu.:0.1087   1st Qu.:0.0470           
##  Median :0.1775   Median :0.0855           
##  Mean   :0.1848   Mean   :0.1106           
##  3rd Qu.:0.2482   3rd Qu.:0.1412           
##  Max.   :0.5660   Max.   :0.4530

Now doing initial cleaning of variable

#Declare names in dataset happiness2019
#For the category names, use underscore to create the top row title names
names(happiness2019) <- tolower(names(happiness2019))
names(happiness2019) <- gsub(" ","_",names(happiness2019))
head(happiness2019)

## # A tibble: 6 × 9
##   overall_rank country_or_region score gdp_per_capita social_support
##          <dbl> <chr>             <dbl>          <dbl>          <dbl>
## 1            1 Finland            7.77           1.34           1.59
## 2            2 Denmark            7.6            1.38           1.57
## 3            3 Norway             7.55           1.49           1.58
## 4            4 Iceland            7.49           1.38           1.62
## 5            5 Netherlands        7.49           1.40           1.52
## 6            6 Switzerland        7.48           1.45           1.53
## # … with 4 more variables: healthy_life_expectancy <dbl>,
## #   freedom_to_make_life_choices <dbl>, generosity <dbl>,
## #   perceptions_of_corruption <dbl>

#Displaying name of countries, region etc for each row in the dataset

#Display the names of each row in dataset
names(happiness2019)

## [1] "overall_rank"                 "country_or_region"           
## [3] "score"                        "gdp_per_capita"              
## [5] "social_support"               "healthy_life_expectancy"     
## [7] "freedom_to_make_life_choices" "generosity"                  
## [9] "perceptions_of_corruption"

Creating a sctterplot showing points and region of the world

plot1 <- happiness2019 %>%
  ggplot(aes(score, healthy_life_expectancy))+
  geom_point()+
  labs (x = "Score", y = "Healthy Life Expectancy",  title = "Score vs. Healthy Life Expectancy")
plot1

Summmary of the scatterplot

This just a little summary score on how happier countries have a Healthy Life Expectancy. I created a general plot to compare the score to the level of healthy life expectancy before showing the tree map of the dataset in the world. Here, there is a general positive linear relationship which can further support that countries or regions with a healthy life expectancy, tend to have a higher happiness score.In the chart, score between 4.5 and above tend to have a good healthy Life expectancy

plot2 <- happy2019 %>%

##Creating a map Create a treemap for happiness 2019, using color index that displays the countries and religion inorder of numerical ranking

# Create treemap for happiness 2019, use color index that displays the countries and regions in order of numerical ranking
treemap(happiness2019, index = "country_or_region", vSize = "score",  vColor = "overall_rank", type = "manual", palette = "RdYlBu")

## Summary of Data

I am really interested in using a variety of visual displays for this dataset. This displays show the names of the Countries or Regions in the data set with a progression from the left being the “happiest” places to the “least-happiest” places on the right. I like how you can visually see the different countries and regions on this visual but some are a bit more difficult than others to read. Looking at this charts, countries that are most happiest are top 20 wealthiest countries in the world. To me, as you see the first thick column red color (Finland, Denmark, Norway, Iceland, the Netherlands, Switzerland and Sweden) are all countries ranked in the top 20 world wealthiest countries in the world. I would wonder why the United States is not on the thick red color column. Also, this top 10 happiest countries most of them came from Europe and why? I am curious to know what determines the happiness of a country, good health facility and good infrastructure? does the lower population also determine for a country to be happier?. After looking at this tree map, I am really wondering why most happiest countries are mostly in Europe. When looking at the least happiest countries from the charts, African and Asia are the least happiest countries in the world. I am assuming that because they are the poorest continent in the world specially Africa. I can give an example to myself, I came from African where there are limited/no job opportunities, poor health care facility, poor infrastructures, poor education poor in everything etc… there is no hope in where I came from. people die from acute illnesses because they can not able to afford health care expenses, other can not go to school for not affording tuition. Most countries in Africa people that have master degree, the highest salaries they can make a month is around $100. So all these factors contributes to a country not to be happy.Some countries in North and South America will be difficult to see them in the bottom of the list because they are way better than majority of the African and some countries in Asia like India which is one of the most populated countries in the world.

What I would like to work on for a future representation is possible grouping the countries based on scale level, between 0-1, 1-2 and so on. Once these groups were created, I could use a tree map to show the size comparison of the highest scaled group to the lowest scaled group. Looking closely at the data, I thought it was interesting how the topped ranked places for having the highest happiness ratings were Finland, Denmark, Norway, Iceland, Netherlands, Switzerland, and Sweden. Then on the opposite end, Malawi, Yemen, Rwanda and Tanzania are at the lowest end. With this information, it would be interesting to know specific statistics on these places to determine why they are viewed as being happy or not. This could allow a researcher to look at the categories on a numerical scale to look at the population income,way of living, cost of living the regulations set by government, and other factors that could explain why these places are where they are in this data set.

This project really widen my research and thought about this topic.

Data Science110 Project 1

Bai Sesay

2022-06-19