Project 2

Introduction

Are people happy? What makes them happy? And how satisfied are they? These are some questions that we all ask ourselves.The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2020, which ranks 153 countries by their ladder score (happiness level), was released at the United Nations at an event celebrating the International Day of Happiness. The reports review the state of happiness in the world and show how the new science of happiness explains aspects of life that can contribute to people’s happiness such as GDP per capita although we always hear that money does not make people.(by Esteban Ortiz-Ospina and Max Roser first published in 2013; substantive revision May 2017).

I chose the Happiness dataset because I always wanted to know some aspects of life that can contribute to people’s happiness although some people say that happiness is only about mindset. For example, it comes from within people, not from their outside circumstances. Also, on the other hand, we have some people who materialize happiness. The more money people have, the happier they are. Therefore, by analyzing this dataset where we have the economic and social aspects, I can have a better understanding of some contributory factors of happiness.

Throughout this analysis, we will clean up the dataset by handling missing data, duplicate rows, and changing the format of the variables. Also, we will explore the dataset by looking at the top and the bottom of the dataset, the structure, dimension, and summary statistics. Additionally, we will use multiple regression analysis to see the strength of the relations between the dependent variable which is the Ladder score, and the predictor variables such as Logged GDP per capita, social support, Generosity, Perceptions of corruption, Healthy Life expectancy, and Freedom to make life choices are connected. Finally, we will use some visualization to have a clear idea of what the information means by using graphs and maps.

Metadata List

Country name (153 countries) Regional indicator
Ladder score -> Life evaluation score (ranked from 0 to 1)
Logged GDP per capita: GDP shows how much economic production value can be attributed to each citizen and the extent to which GDP contributes to the calculation of the Ladder score
Social support: Having someone to support and count on (ranked from 0 to 1) Healthy life expectancy: Healthy life expectancy (HALE) at birth adds up expectations of life for different Freedom to make life choices:
Generosity: is a function of the national average of GWP responses to the question “Have you donated money to a charity in the past month?”
Perceptions of corruption: is an index that ranks countries “by their perceived levels of public sector corruption

Load library

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2

## Warning: package 'dplyr' was built under R version 4.2.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Import the Happiness Dataset Set

library(readr)
happiness2020 <- read_csv("C:/Users/Mitcheyla$/Desktop/DATA110 -VISUALISATION/happiness2020.csv")

## Rows: 153 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): Country name, Regional indicator
## dbl (18): Ladder score, Standard error of ladder score, upperwhisker, lowerw...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(happiness2020)

Explore the dataset

See the first and last 10 rows of the dataset

head(happiness2020,10)

## # A tibble: 10 × 20
##    Country nam…¹ Regio…² Ladde…³ Stand…⁴ upper…⁵ lower…⁶ Logge…⁷ Socia…⁸ Healt…⁹
##    <chr>         <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 Finland       Wester…    7.81  0.0312    7.87    7.75    10.6   0.954    71.9
##  2 Denmark       Wester…    7.65  0.0335    7.71    7.58    10.8   0.956    72.4
##  3 Switzerland   Wester…    7.56  0.0350    7.63    7.49    11.0   0.943    74.1
##  4 Iceland       Wester…    7.50  0.0596    7.62    7.39    10.8   0.975    73  
##  5 Norway        Wester…    7.49  0.0348    7.56    7.42    11.1   0.952    73.2
##  6 Netherlands   Wester…    7.45  0.0278    7.50    7.39    10.8   0.939    72.3
##  7 Sweden        Wester…    7.35  0.0362    7.42    7.28    10.8   0.926    72.6
##  8 New Zealand   North …    7.30  0.0395    7.38    7.22    10.5   0.949    73.2
##  9 Austria       Wester…    7.29  0.0334    7.36    7.23    10.7   0.928    73.0
## 10 Luxembourg    Wester…    7.24  0.0309    7.30    7.18    11.5   0.907    72.6
## # … with 11 more variables: `Freedom to make life choices` <dbl>,
## #   Generosity <dbl>, `Perceptions of corruption` <dbl>,
## #   `Ladder score in Dystopia` <dbl>, `Explained by: Log GDP per capita` <dbl>,
## #   `Explained by: Social support` <dbl>,
## #   `Explained by: Healthy life expectancy` <dbl>,
## #   `Explained by: Freedom to make life choices` <dbl>,
## #   `Explained by: Generosity` <dbl>, …

In the first an dlats 10 rows, we can see the countries that have the greater and the smaller Ladder score, logged GDP per capita, social support, and Healthy life expectancy

tail(happiness2020, 10)

## # A tibble: 10 × 20
##    Country nam…¹ Regio…² Ladde…³ Stand…⁴ upper…⁵ lower…⁶ Logge…⁷ Socia…⁸ Healt…⁹
##    <chr>         <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 India         South …    3.57  0.0277    3.63    3.52    8.85   0.592    60.2
##  2 Malawi        Sub-Sa…    3.54  0.0703    3.68    3.40    7.06   0.544    57.6
##  3 Yemen         Middle…    3.53  0.0542    3.63    3.42    7.76   0.818    56.7
##  4 Botswana      Sub-Sa…    3.48  0.0605    3.60    3.36    9.71   0.779    58.9
##  5 Tanzania      Sub-Sa…    3.48  0.0632    3.60    3.35    7.97   0.689    57.5
##  6 Central Afri… Sub-Sa…    3.48  0.115     3.70    3.25    6.63   0.319    45.2
##  7 Rwanda        Sub-Sa…    3.31  0.0524    3.42    3.21    7.60   0.541    61.1
##  8 Zimbabwe      Sub-Sa…    3.30  0.0587    3.41    3.18    7.87   0.763    55.6
##  9 South Sudan   Sub-Sa…    2.82  0.108     3.03    2.61    7.43   0.554    51  
## 10 Afghanistan   South …    2.57  0.0313    2.63    2.51    7.46   0.470    52.6
## # … with 11 more variables: `Freedom to make life choices` <dbl>,
## #   Generosity <dbl>, `Perceptions of corruption` <dbl>,
## #   `Ladder score in Dystopia` <dbl>, `Explained by: Log GDP per capita` <dbl>,
## #   `Explained by: Social support` <dbl>,
## #   `Explained by: Healthy life expectancy` <dbl>,
## #   `Explained by: Freedom to make life choices` <dbl>,
## #   `Explained by: Generosity` <dbl>, …

In the first an dlats 10 rows, we can see the countries that have the greater and the smaller Ladder score, logged GDP per capita, social support, and Healthy life expectancy.

Look at the dimension of the dataset

dim(happiness2020)

## [1] 153  20

Verify if we have any missing data in happiness2020

sum(is.na(happiness2020))

## [1] 0

Because we do not have any missing data, we do not need to remove any missing data.

Summarize the dataset

summary(happiness2020)

##  Country name       Regional indicator  Ladder score  
##  Length:153         Length:153         Min.   :2.567  
##  Class :character   Class :character   1st Qu.:4.724  
##  Mode  :character   Mode  :character   Median :5.515  
##                                        Mean   :5.473  
##                                        3rd Qu.:6.229  
##                                        Max.   :7.809  
##  Standard error of ladder score  upperwhisker    lowerwhisker  
##  Min.   :0.02590                Min.   :2.628   Min.   :2.506  
##  1st Qu.:0.04070                1st Qu.:4.826   1st Qu.:4.603  
##  Median :0.05061                Median :5.608   Median :5.431  
##  Mean   :0.05354                Mean   :5.578   Mean   :5.368  
##  3rd Qu.:0.06068                3rd Qu.:6.364   3rd Qu.:6.139  
##  Max.   :0.12059                Max.   :7.870   Max.   :7.748  
##  Logged GDP per capita Social support   Healthy life expectancy
##  Min.   : 6.493        Min.   :0.3195   Min.   :45.20          
##  1st Qu.: 8.351        1st Qu.:0.7372   1st Qu.:58.96          
##  Median : 9.456        Median :0.8292   Median :66.31          
##  Mean   : 9.296        Mean   :0.8087   Mean   :64.45          
##  3rd Qu.:10.265        3rd Qu.:0.9067   3rd Qu.:69.29          
##  Max.   :11.451        Max.   :0.9747   Max.   :76.80          
##  Freedom to make life choices   Generosity       Perceptions of corruption
##  Min.   :0.3966               Min.   :-0.30091   Min.   :0.1098           
##  1st Qu.:0.7148               1st Qu.:-0.12701   1st Qu.:0.6830           
##  Median :0.7998               Median :-0.03366   Median :0.7831           
##  Mean   :0.7834               Mean   :-0.01457   Mean   :0.7331           
##  3rd Qu.:0.8777               3rd Qu.: 0.08543   3rd Qu.:0.8492           
##  Max.   :0.9750               Max.   : 0.56066   Max.   :0.9356           
##  Ladder score in Dystopia Explained by: Log GDP per capita
##  Min.   :1.972            Min.   :0.0000                  
##  1st Qu.:1.972            1st Qu.:0.5759                  
##  Median :1.972            Median :0.9185                  
##  Mean   :1.972            Mean   :0.8688                  
##  3rd Qu.:1.972            3rd Qu.:1.1692                  
##  Max.   :1.972            Max.   :1.5367                  
##  Explained by: Social support Explained by: Healthy life expectancy
##  Min.   :0.0000               Min.   :0.0000                       
##  1st Qu.:0.9867               1st Qu.:0.4954                       
##  Median :1.2040               Median :0.7598                       
##  Mean   :1.1556               Mean   :0.6929                       
##  3rd Qu.:1.3871               3rd Qu.:0.8672                       
##  Max.   :1.5476               Max.   :1.1378                       
##  Explained by: Freedom to make life choices Explained by: Generosity
##  Min.   :0.0000                             Min.   :0.0000          
##  1st Qu.:0.3815                             1st Qu.:0.1150          
##  Median :0.4833                             Median :0.1767          
##  Mean   :0.4636                             Mean   :0.1894          
##  3rd Qu.:0.5767                             3rd Qu.:0.2555          
##  Max.   :0.6933                             Max.   :0.5698          
##  Explained by: Perceptions of corruption Dystopia + residual
##  Min.   :0.00000                         Min.   :0.2572     
##  1st Qu.:0.05580                         1st Qu.:1.6299     
##  Median :0.09844                         Median :2.0463     
##  Mean   :0.13072                         Mean   :1.9723     
##  3rd Qu.:0.16306                         3rd Qu.:2.3503     
##  Max.   :0.53316                         Max.   :3.4408

In the summarization, we can really see there is no missing data.

Clean Up the Dataset

Make all headers lowercase and remove spaces

names(happiness2020) <- tolower(names(happiness2020))
names(happiness2020) <- gsub(" ","",names(happiness2020))
str(happiness2020)

## spc_tbl_ [153 × 20] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ countryname                         : chr [1:153] "Finland" "Denmark" "Switzerland" "Iceland" ...
##  $ regionalindicator                   : chr [1:153] "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
##  $ ladderscore                         : num [1:153] 7.81 7.65 7.56 7.5 7.49 ...
##  $ standarderrorofladderscore          : num [1:153] 0.0312 0.0335 0.035 0.0596 0.0348 ...
##  $ upperwhisker                        : num [1:153] 7.87 7.71 7.63 7.62 7.56 ...
##  $ lowerwhisker                        : num [1:153] 7.75 7.58 7.49 7.39 7.42 ...
##  $ loggedgdppercapita                  : num [1:153] 10.6 10.8 11 10.8 11.1 ...
##  $ socialsupport                       : num [1:153] 0.954 0.956 0.943 0.975 0.952 ...
##  $ healthylifeexpectancy               : num [1:153] 71.9 72.4 74.1 73 73.2 ...
##  $ freedomtomakelifechoices            : num [1:153] 0.949 0.951 0.921 0.949 0.956 ...
##  $ generosity                          : num [1:153] -0.0595 0.0662 0.1059 0.2469 0.1345 ...
##  $ perceptionsofcorruption             : num [1:153] 0.195 0.168 0.304 0.712 0.263 ...
##  $ ladderscoreindystopia               : num [1:153] 1.97 1.97 1.97 1.97 1.97 ...
##  $ explainedby:loggdppercapita         : num [1:153] 1.29 1.33 1.39 1.33 1.42 ...
##  $ explainedby:socialsupport           : num [1:153] 1.5 1.5 1.47 1.55 1.5 ...
##  $ explainedby:healthylifeexpectancy   : num [1:153] 0.961 0.979 1.041 1.001 1.008 ...
##  $ explainedby:freedomtomakelifechoices: num [1:153] 0.662 0.665 0.629 0.662 0.67 ...
##  $ explainedby:generosity              : num [1:153] 0.16 0.243 0.269 0.362 0.288 ...
##  $ explainedby:perceptionsofcorruption : num [1:153] 0.478 0.495 0.408 0.145 0.434 ...
##  $ dystopia+residual                   : num [1:153] 2.76 2.43 2.35 2.46 2.17 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Country name` = col_character(),
##   ..   `Regional indicator` = col_character(),
##   ..   `Ladder score` = col_double(),
##   ..   `Standard error of ladder score` = col_double(),
##   ..   upperwhisker = col_double(),
##   ..   lowerwhisker = col_double(),
##   ..   `Logged GDP per capita` = col_double(),
##   ..   `Social support` = col_double(),
##   ..   `Healthy life expectancy` = col_double(),
##   ..   `Freedom to make life choices` = col_double(),
##   ..   Generosity = col_double(),
##   ..   `Perceptions of corruption` = col_double(),
##   ..   `Ladder score in Dystopia` = col_double(),
##   ..   `Explained by: Log GDP per capita` = col_double(),
##   ..   `Explained by: Social support` = col_double(),
##   ..   `Explained by: Healthy life expectancy` = col_double(),
##   ..   `Explained by: Freedom to make life choices` = col_double(),
##   ..   `Explained by: Generosity` = col_double(),
##   ..   `Explained by: Perceptions of corruption` = col_double(),
##   ..   `Dystopia + residual` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Load Libraries

library(plotly)

## Warning: package 'plotly' was built under R version 4.2.2

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

library(tmap)

## Warning: package 'tmap' was built under R version 4.2.2

library(sf)

## Warning: package 'sf' was built under R version 4.2.2

## Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE

library(tidyverse)
library(ggthemes)
library(heatmaply)

## Warning: package 'heatmaply' was built under R version 4.2.2

## Loading required package: viridis

## Loading required package: viridisLite

## Registered S3 methods overwritten by 'registry':
##   method               from 
##   print.registry_field proxy
##   print.registry_entry proxy

## 
## ======================
## Welcome to heatmaply version 1.4.0
## 
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
## 
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## You may ask questions at stackoverflow, use the r and heatmaply tags: 
##   https://stackoverflow.com/questions/tagged/heatmaply
## ======================

library(RColorBrewer)

library(ggplot2)

Create a new variable

I will select the variables that I want to focus on and make a multiple regression to see how those variables are correlated.

happiness2020_df <- happiness2020 %>%
  select('countryname', 'regionalindicator','ladderscore', 'loggedgdppercapita', 'socialsupport', 'healthylifeexpectancy', 'freedomtomakelifechoices', 'generosity', 'perceptionsofcorruption')

Make a multiple regression model

model1<- lm(ladderscore ~ loggedgdppercapita + socialsupport + generosity + freedomtomakelifechoices + perceptionsofcorruption + healthylifeexpectancy, data = happiness2020_df)
summary(model1)

## 
## Call:
## lm(formula = ladderscore ~ loggedgdppercapita + socialsupport + 
##     generosity + freedomtomakelifechoices + perceptionsofcorruption + 
##     healthylifeexpectancy, data = happiness2020_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.75647 -0.31792  0.06653  0.37230  1.48375 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              -2.05938    0.63984  -3.219 0.001588 ** 
## loggedgdppercapita        0.22908    0.08208   2.791 0.005960 ** 
## socialsupport             2.72332    0.66118   4.119 6.35e-05 ***
## generosity                0.41057    0.33704   1.218 0.225126    
## freedomtomakelifechoices  1.77682    0.49752   3.571 0.000481 ***
## perceptionsofcorruption  -0.62816    0.31480  -1.995 0.047857 *  
## healthylifeexpectancy     0.03531    0.01297   2.721 0.007293 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5693 on 146 degrees of freedom
## Multiple R-squared:  0.7483, Adjusted R-squared:  0.738 
## F-statistic: 72.36 on 6 and 146 DF,  p-value: < 2.2e-16

par(mfrow = c(2, 2))
plot(model1)

In this model, we can see that that all those predictor variables contribute to the model. The adjusted R-squared values is 73.8 %. Generosity as a higher p-value as the others. Let re-run the model by removing generosity.

model2<- lm(ladderscore ~ loggedgdppercapita + socialsupport + freedomtomakelifechoices + perceptionsofcorruption + healthylifeexpectancy, data = happiness2020_df)
summary(model2)

## 
## Call:
## lm(formula = ladderscore ~ loggedgdppercapita + socialsupport + 
##     freedomtomakelifechoices + perceptionsofcorruption + healthylifeexpectancy, 
##     data = happiness2020_df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.85079 -0.34528  0.06273  0.38041  1.47120 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              -1.93892    0.63320  -3.062 0.002614 ** 
## loggedgdppercapita        0.21370    0.08124   2.631 0.009434 ** 
## socialsupport             2.74190    0.66209   4.141  5.8e-05 ***
## freedomtomakelifechoices  1.92196    0.48384   3.972 0.000111 ***
## perceptionsofcorruption  -0.72755    0.30455  -2.389 0.018165 *  
## healthylifeexpectancy     0.03470    0.01299   2.672 0.008394 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5703 on 147 degrees of freedom
## Multiple R-squared:  0.7458, Adjusted R-squared:  0.7371 
## F-statistic: 86.25 on 5 and 147 DF,  p-value: < 2.2e-16

par(mfrow = c(2, 2))
plot(model2)

We can see the adjusted r-squared wnet slithly down (73.71 %)

Correlation Plot

I will use heatmaply package to make an interactive plot, use the cor() to measure the correlation coefficient value between two variables, and use Rcolobrewer package to add color.

happiness2020_corr <- as.data.frame(happiness2020_df)

heatmaply_cor(
  cor(happiness2020_corr[, 3:9]),
  xlab = "Variables", 
  ylab = "Variables",
  colors = colorRampPalette(brewer.pal(3, "Set3"))(256),
  k_col = 2, 
  k_row = 2 
)

In this plot, we can see social support is highly correlated to Ladder Score (0.77), healthy life expectancy (0.77), and the logged GDP per Capita (0.78). Logged GDP per Capita is highly correlated to Ladder Score (0.78), Healthy Life Expectancy (0.85), and Social support (0.78).

On the other hand, we can see for some variables there is not a positive correlation between them such as generosity and social support (-0.06). Perceptions of corruption do not have any positive correlation to the other variables. The four variables that are almost a strong positive correlation are social support, logged GDP per capita, and healthy life experience to ladder score. Therefore, we can assume that having social, healthy, and economic stability can people feel happy

happiness2020plot <-happiness2020_df[c("regionalindicator", "ladderscore","generosity")]

ggplot(happiness2020plot, 
  aes(x = ladderscore, 
      y = generosity)) +
  geom_point(aes(colour = regionalindicator),
             size = 2) +
  geom_smooth(method="lm") +
  labs(x = "Ladder Score",
       y = "Generosity",
       title = "Relationship Between Ladder Score and Generosity") +
  scale_color_viridis(discrete = T) +
  theme_minimal() +
  theme(text = element_text(size=16))

## `geom_smooth()` using formula = 'y ~ x'

In this graph, we can see the level of generosity does not really impact the ladder score.

Data Visualization

Ladder Score Vs Logged GDP Per Capita

p <- ggplot(happiness2020_df, 
  aes(x = ladderscore, y=loggedgdppercapita, 
      colour = regionalindicator ,text = paste("country:", countryname))) +
  geom_point(show.legend = FALSE, alpha = 0.8) +
  scale_colour_brewer(type = "seq", palette = "Spectral") + 
  scale_size(range = c(2, 12)) +
  scale_x_log10()+
  theme_fivethirtyeight(base_size = 12)+
  ggtitle("Relationship between Ladder Score and Logged GDP Per Capita")
  labs(x = "Ladder score", y = "Logged GDP Per Capita",
       caption = "Data source:  UN Sustainable Development Solutions Network")+
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black"))

## NULL

figure <- ggplotly(p)

## Warning: plotly.js does not (yet) support horizontal legend items 
## You can track progress here: 
## https://github.com/plotly/plotly.js/issues/53

figure

In this visualization, we can see the the greater the logged gdp is, the greater the ladder score is.it seems safe to say that GDP per capita has the biggest influence on a country’s overall happiness

Vizualisation: Ladder Score versus Regional Indicator

figure2 <- ggplot(happiness2020_df, aes(x= regionalindicator, y = ladderscore, fill = regionalindicator))+
  geom_boxplot()+
  theme_minimal()+
  scale_color_brewer(palette = "Spectral") +
  stat_summary(geom = 'point', fun = 'mean', color='blue')+ 
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black")) 
ggtitle("Regional Indicator VS Ladder Score")

## $title
## [1] "Regional Indicator VS Ladder Score"
## 
## attr(,"class")
## [1] "labels"

ggplotly(figure2)

this plot, we can see the counties that have a high ladder score are part of North America and ANZ, and the ones that have the smaller ladder scores are part of Sub-Saharan African. Also, we can see there is an outlier.

Visualisation of the top 10 and Bottom 10 Countries with high and Small Ladder Score

happiness2020_df <- happiness2020_df %>% arrange(desc(ladderscore)) 

top20<- head(happiness2020_df,10) 

p2 <- ggplot(top20, aes(x= reorder(countryname,-ladderscore), 
                        y=ladderscore, fill=regionalindicator))+
 geom_point( color="#00AFBB", size=4, shape=18) +
 geom_segment( aes(x=reorder(countryname,-ladderscore), 
                   xend=reorder(countryname,-ladderscore), 
                   y=0, yend=ladderscore), color="grey") +
 theme_void() +
 theme(
    plot.title = element_text(hjust = 0.5),
    panel.grid.major.x = element_blank(),
    panel.border = element_blank(),
    axis.ticks.x = element_blank()
  ) +
  scale_fill_brewer(palette = "Spectral")+
  xlab("") +
  ylab("Ladder score")+
  scale_x_discrete(labels = function(x) str_wrap(x, width = 20))+
  ggtitle("Top 10 Countries with high Ladder Scores" )


  ggplotly(p2)

bottom20<- tail(happiness2020_df,10) 

plot3 <- ggplot(bottom20, aes(x= reorder(countryname,-ladderscore),
                           y=ladderscore, fill=regionalindicator))+
 geom_point( color="#00AFBB", size=4, shape=18) +
 geom_segment( aes(x=reorder(countryname,-ladderscore),
                   xend=reorder(countryname,-ladderscore), 
                   y=0, yend=ladderscore), color="purple") +
   theme_minimal() +
 theme(
    plot.title = element_text(hjust = 0.5),
    panel.grid.major.x = element_blank(),
    panel.border = element_blank(),
    axis.ticks.x = element_blank()
    ) +
  scale_fill_brewer(palette = "Set2")+
  xlab("Country") +
  ylab("Ladder score")+
  scale_x_discrete(labels = function(x) str_wrap(x, width = 20)) + 
  ggtitle("Bottom 10 Countries with low Ladder Scores" )
  ggplotly(plot3)

Essay

The Happiness Data set is a published report from the United Nations Sustainable Development Solutions Network. The report categorizes the countries by how happy their citizens see themselves to be.It is focused on some important factors economic and social that might contribute to the happiness of a citizen such as Gross Domestic Product (GDP), Freedom to make life choices, and Healthy Life Expectancy

By doing this analysis, I was amazed at how GDP per capita a positive correlation with the level of happiness has although some people think that money has nothing to do with happiness. Also, a high GDP per capita is not the only factor that contributes to the happiness level of a citizen. Additionally, in countries with high GDP per capita, higher level of life expectancy is observed in most cases.

Finally, I wanted to add a map, but I could not figure out how to include a map. However, I believe those visualizations can give people an overall idea of some aspects of the level of happiness in the world.

Reference

https://worldhappiness.report/ed/2020/