Happiness Score 2020

Data 101

Spring Semester

Happiness Score Report:

The World Happiness Report is a publication of the United Nations Sustainable Development Solutions Network. It contains articles and rankings of national happiness, based on respondent ratings of their own lives,[1] which the report also correlates with various (quality of) life factors.The report primarily uses data from the Gallup World Poll. Each annual report is available to the public to download on the World Happiness Report website.I collected this dataset from https://www.kaggle.com/

Methods to collect the data:

The rankings of national happiness are based on a Cantril ladder survey. Nationally representative samples of respondents are asked to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale.[13] The report correlates the life evaluation results with various life factors. The life factor variables used in the reports are reflective of determinants that explain national-level differences in life evaluations across research literature. However, certain variables, such as unemployment or inequality, are not considered as comparable data is not yet available across all countries. The variables used illustrate important correlations rather than causal estimates. The use of subjective measurements of wellbeing is meant to be a bottom-up approach which emancipates respondents to evaluate their own wellbeing.[15] In this context, the value of the Cantril Ladder is the fact that a respondent can self-anchor themselves based on their perspective. In the reports, experts in fields including economics, psychology, survey analysis, and national statistics, describe how measurements of well-being can be used effectively to assess the progress of nations, and other topics. Each report is organized by chapters that delve deeper into issues relating to happiness, including mental illness, the objective benefits of happiness, the importance of ethics, policy implications, and links with the Organisation for Economic Co-operation and Development’s (OECD) approach to measuring subjective well-being and other international and national efforts.https://en.wikipedia.org/wiki/World_Happiness_Report)

Data Sources and Variable Definitions:

• Happiness score or subjective well-being (variable name ladder ): The surveymeasure of SWB is from the Feb 28, 2020 release of the Gallup World Poll(GWP) covering years from 2005 to 2019. Unless stated otherwise, it is the national average response to the question of life evaluations. The English wording of the question is “Please imagine a ladder, with steps numbered from 0 at the bottom to 10 at the top. The top of the ladder represents the best possible lifefor you and the bottom of the ladder represents the worst possible life for you.On which step of the ladder would you say you personally feel you stand at this time?” This measure is also referred to as Cantril life ladder, or just life ladder in our analysis.

• The statistics of GDP per capita (variable name gdp) in purchasing power parity(PPP) at constant 2011 international dollar prices are from the November 28,2019 update of the World Development Indicators (WDI). The GDP figures for Taiwan, Syria, Palestine, Venezuela, and Djibouti, up to 2017, are from the Penn World Table 9.1.

• Healthy Life Expectancy (HLE). Healthy life expectancies at birth are based on the data extracted from the World Health Organization’s (WHO) Global Health Observatory data repository. The data at the source are available for the years 2000, 2005, 2010, 2015 and 2016. To match this report’s sample period (2005-2019), interpolation and extrapolation are used.

• Social support (or having someone to count on in times of trouble) is the national average of the binary responses (either 0 or 1) to the GWP question “If you were in trouble, do you have relatives or friends you can count on to help youwhenever you need them, or not?”

• Freedom to make life choices is the national average of responses to the GWP question “Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”

• Generosity is the residual of regressing national average of response to the GWPquestion “Have you donated money to a charity in the past month?” on GDP per capita.

• Corruption Perception: The measure is the national average of the survey responses to two questions in the GWP: “Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?” The overall perception is just the average of the two 0-or-1 responses. In case the perception of government corruption is missing, we use the perception of business corruption as the overall perception. The corruption perception at the national level is just the average response of the overall perception at the individual level.

• Positive affect is defined as the average of three positive affect measures in GWP: happiness, laugh and enjoyment in the Gallup World Poll waves 3-7. These measures are the responses to the following three questions, respectively: “Did you experience the following feelings during A LOT OF THE DAY yesterday? How about Happiness?”, “Did you smile or laugh a lot yesterday?”,and “Did you experience the following feelings during A LOT OF THE DAY yesterday? How about Enjoyment?” Waves 3-7 cover years 2008 to 2012 and a small number of countries in 2013. For waves 1-2 and those from wave 8 on,positive affect is defined as the average of laugh and enjoyment only, due to the limited availability of happiness.

• Negative affect is defined as the average of three negative affect measures in GWP. They are worry, sadness and anger, respectively the responses to “Did you experience the following feelings during A LOT OF THE DAY yesterday? How about Worry?”, “Did you experience the following feelings during A LOT OF THE DAY yesterday? How about Sadness?”, and “Did you experience the following feelings during A LOT OF THE DAY yesterday? How about Anger?” (https://happiness-report.s3.amazonaws.com/2020/WHR20_Ch2_Statistical_Appendix.pdf)

Loading the libraries

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(dplyr)
library(ggcorrplot)
library(RColorBrewer)
library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(corrplot)
## corrplot 0.88 loaded

Setting the working directory

setwd("~/Desktop/Pankti _ Data Science")
happiness_2020 <- read_csv("~/Desktop/Pankti _ Data Science/WHR_2020.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   Country = col_character(),
##   Region = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
str(happiness_2020)
## spec_tbl_df [153 × 20] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Country                                   : chr [1:153] "Finland" "Denmark" "Switzerland" "Iceland" ...
##  $ Region                                    : chr [1:153] "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
##  $ Ladder_score                              : num [1:153] 7.81 7.65 7.56 7.5 7.49 ...
##  $ Standard_error_of_ladder_score            : num [1:153] 0.0312 0.0335 0.035 0.0596 0.0348 ...
##  $ upperwhisker                              : num [1:153] 7.87 7.71 7.63 7.62 7.56 ...
##  $ lowerwhisker                              : num [1:153] 7.75 7.58 7.49 7.39 7.42 ...
##  $ GDP_per_capita                            : num [1:153] 10.6 10.8 11 10.8 11.1 ...
##  $ Social_support                            : num [1:153] 0.954 0.956 0.943 0.975 0.952 ...
##  $ Life_expectancy                           : num [1:153] 71.9 72.4 74.1 73 73.2 ...
##  $ Freedom                                   : num [1:153] 0.949 0.951 0.921 0.949 0.956 ...
##  $ Generosity                                : num [1:153] -0.0595 0.0662 0.1059 0.2469 0.1345 ...
##  $ Perceptions_of_corruption                 : num [1:153] 0.195 0.168 0.304 0.712 0.263 ...
##  $ Ladder_score_in_Dystopia                  : num [1:153] 1.97 1.97 1.97 1.97 1.97 ...
##  $ Explained by: Log GDP per capita          : num [1:153] 1.29 1.33 1.39 1.33 1.42 ...
##  $ Explained by: Social support              : num [1:153] 1.5 1.5 1.47 1.55 1.5 ...
##  $ Explained by: Healthy life expectancy     : num [1:153] 0.961 0.979 1.041 1.001 1.008 ...
##  $ Explained by: Freedom to make life choices: num [1:153] 0.662 0.665 0.629 0.662 0.67 ...
##  $ Explained by: Generosity                  : num [1:153] 0.16 0.243 0.269 0.362 0.288 ...
##  $ Explained by: Perceptions of corruption   : num [1:153] 0.478 0.495 0.408 0.145 0.434 ...
##  $ Dystopia + residual                       : num [1:153] 2.76 2.43 2.35 2.46 2.17 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Country = col_character(),
##   ..   Region = col_character(),
##   ..   Ladder_score = col_double(),
##   ..   Standard_error_of_ladder_score = col_double(),
##   ..   upperwhisker = col_double(),
##   ..   lowerwhisker = col_double(),
##   ..   GDP_per_capita = col_double(),
##   ..   Social_support = col_double(),
##   ..   Life_expectancy = col_double(),
##   ..   Freedom = col_double(),
##   ..   Generosity = col_double(),
##   ..   Perceptions_of_corruption = col_double(),
##   ..   Ladder_score_in_Dystopia = col_double(),
##   ..   `Explained by: Log GDP per capita` = col_double(),
##   ..   `Explained by: Social support` = col_double(),
##   ..   `Explained by: Healthy life expectancy` = col_double(),
##   ..   `Explained by: Freedom to make life choices` = col_double(),
##   ..   `Explained by: Generosity` = col_double(),
##   ..   `Explained by: Perceptions of corruption` = col_double(),
##   ..   `Dystopia + residual` = col_double()
##   .. )
setwd("~/Desktop/Pankti _ Data Science")
happiness_all <- read_csv("~/Desktop/Pankti _ Data Science/WHR_all.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Country = col_character(),
##   year = col_double(),
##   Ladder_score = col_double(),
##   `Log GDP per capita` = col_double(),
##   `Social support` = col_double(),
##   `Healthy life expectancy at birth` = col_double(),
##   `Freedom to make life choices` = col_double(),
##   Generosity = col_double(),
##   `Perceptions of corruption` = col_double(),
##   `Positive affect` = col_double(),
##   `Negative affect` = col_double()
## )
#Viewing stats about each attribute
summary(happiness_2020)
##    Country             Region           Ladder_score  
##  Length:153         Length:153         Min.   :2.567  
##  Class :character   Class :character   1st Qu.:4.724  
##  Mode  :character   Mode  :character   Median :5.515  
##                                        Mean   :5.473  
##                                        3rd Qu.:6.229  
##                                        Max.   :7.809  
##  Standard_error_of_ladder_score  upperwhisker    lowerwhisker  
##  Min.   :0.02590                Min.   :2.628   Min.   :2.506  
##  1st Qu.:0.04070                1st Qu.:4.826   1st Qu.:4.603  
##  Median :0.05061                Median :5.608   Median :5.431  
##  Mean   :0.05354                Mean   :5.578   Mean   :5.368  
##  3rd Qu.:0.06068                3rd Qu.:6.364   3rd Qu.:6.139  
##  Max.   :0.12059                Max.   :7.870   Max.   :7.748  
##  GDP_per_capita   Social_support   Life_expectancy    Freedom      
##  Min.   : 6.493   Min.   :0.3195   Min.   :45.20   Min.   :0.3966  
##  1st Qu.: 8.351   1st Qu.:0.7372   1st Qu.:58.96   1st Qu.:0.7148  
##  Median : 9.456   Median :0.8292   Median :66.31   Median :0.7998  
##  Mean   : 9.296   Mean   :0.8087   Mean   :64.45   Mean   :0.7834  
##  3rd Qu.:10.265   3rd Qu.:0.9067   3rd Qu.:69.29   3rd Qu.:0.8777  
##  Max.   :11.451   Max.   :0.9747   Max.   :76.80   Max.   :0.9750  
##    Generosity       Perceptions_of_corruption Ladder_score_in_Dystopia
##  Min.   :-0.30091   Min.   :0.1098            Min.   :1.972           
##  1st Qu.:-0.12701   1st Qu.:0.6830            1st Qu.:1.972           
##  Median :-0.03366   Median :0.7831            Median :1.972           
##  Mean   :-0.01457   Mean   :0.7331            Mean   :1.972           
##  3rd Qu.: 0.08543   3rd Qu.:0.8492            3rd Qu.:1.972           
##  Max.   : 0.56066   Max.   :0.9356            Max.   :1.972           
##  Explained by: Log GDP per capita Explained by: Social support
##  Min.   :0.0000                   Min.   :0.0000              
##  1st Qu.:0.5759                   1st Qu.:0.9867              
##  Median :0.9185                   Median :1.2040              
##  Mean   :0.8688                   Mean   :1.1556              
##  3rd Qu.:1.1692                   3rd Qu.:1.3871              
##  Max.   :1.5367                   Max.   :1.5476              
##  Explained by: Healthy life expectancy
##  Min.   :0.0000                       
##  1st Qu.:0.4954                       
##  Median :0.7598                       
##  Mean   :0.6929                       
##  3rd Qu.:0.8672                       
##  Max.   :1.1378                       
##  Explained by: Freedom to make life choices Explained by: Generosity
##  Min.   :0.0000                             Min.   :0.0000          
##  1st Qu.:0.3815                             1st Qu.:0.1150          
##  Median :0.4833                             Median :0.1767          
##  Mean   :0.4636                             Mean   :0.1894          
##  3rd Qu.:0.5767                             3rd Qu.:0.2555          
##  Max.   :0.6933                             Max.   :0.5698          
##  Explained by: Perceptions of corruption Dystopia + residual
##  Min.   :0.00000                         Min.   :0.2572     
##  1st Qu.:0.05580                         1st Qu.:1.6299     
##  Median :0.09844                         Median :2.0463     
##  Mean   :0.13072                         Mean   :1.9723     
##  3rd Qu.:0.16306                         3rd Qu.:2.3503     
##  Max.   :0.53316                         Max.   :3.4408
#structure
str(happiness_2020)
## spec_tbl_df [153 × 20] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Country                                   : chr [1:153] "Finland" "Denmark" "Switzerland" "Iceland" ...
##  $ Region                                    : chr [1:153] "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
##  $ Ladder_score                              : num [1:153] 7.81 7.65 7.56 7.5 7.49 ...
##  $ Standard_error_of_ladder_score            : num [1:153] 0.0312 0.0335 0.035 0.0596 0.0348 ...
##  $ upperwhisker                              : num [1:153] 7.87 7.71 7.63 7.62 7.56 ...
##  $ lowerwhisker                              : num [1:153] 7.75 7.58 7.49 7.39 7.42 ...
##  $ GDP_per_capita                            : num [1:153] 10.6 10.8 11 10.8 11.1 ...
##  $ Social_support                            : num [1:153] 0.954 0.956 0.943 0.975 0.952 ...
##  $ Life_expectancy                           : num [1:153] 71.9 72.4 74.1 73 73.2 ...
##  $ Freedom                                   : num [1:153] 0.949 0.951 0.921 0.949 0.956 ...
##  $ Generosity                                : num [1:153] -0.0595 0.0662 0.1059 0.2469 0.1345 ...
##  $ Perceptions_of_corruption                 : num [1:153] 0.195 0.168 0.304 0.712 0.263 ...
##  $ Ladder_score_in_Dystopia                  : num [1:153] 1.97 1.97 1.97 1.97 1.97 ...
##  $ Explained by: Log GDP per capita          : num [1:153] 1.29 1.33 1.39 1.33 1.42 ...
##  $ Explained by: Social support              : num [1:153] 1.5 1.5 1.47 1.55 1.5 ...
##  $ Explained by: Healthy life expectancy     : num [1:153] 0.961 0.979 1.041 1.001 1.008 ...
##  $ Explained by: Freedom to make life choices: num [1:153] 0.662 0.665 0.629 0.662 0.67 ...
##  $ Explained by: Generosity                  : num [1:153] 0.16 0.243 0.269 0.362 0.288 ...
##  $ Explained by: Perceptions of corruption   : num [1:153] 0.478 0.495 0.408 0.145 0.434 ...
##  $ Dystopia + residual                       : num [1:153] 2.76 2.43 2.35 2.46 2.17 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Country = col_character(),
##   ..   Region = col_character(),
##   ..   Ladder_score = col_double(),
##   ..   Standard_error_of_ladder_score = col_double(),
##   ..   upperwhisker = col_double(),
##   ..   lowerwhisker = col_double(),
##   ..   GDP_per_capita = col_double(),
##   ..   Social_support = col_double(),
##   ..   Life_expectancy = col_double(),
##   ..   Freedom = col_double(),
##   ..   Generosity = col_double(),
##   ..   Perceptions_of_corruption = col_double(),
##   ..   Ladder_score_in_Dystopia = col_double(),
##   ..   `Explained by: Log GDP per capita` = col_double(),
##   ..   `Explained by: Social support` = col_double(),
##   ..   `Explained by: Healthy life expectancy` = col_double(),
##   ..   `Explained by: Freedom to make life choices` = col_double(),
##   ..   `Explained by: Generosity` = col_double(),
##   ..   `Explained by: Perceptions of corruption` = col_double(),
##   ..   `Dystopia + residual` = col_double()
##   .. )
#missing values 
sum(is.na(happiness_2020))
## [1] 0

Creating a table for logical values

library(fBasics) #library for the summary table
## Loading required package: timeDate
## Loading required package: timeSeries
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
#subsetting the dataset to only include numeric values
num_tab <- happiness_2020[, c("Ladder_score","Standard_error_of_ladder_score","GDP_per_capita","Social_support","Life_expectancy","Freedom","Generosity","Perceptions_of_corruption")]

#subsetting the summary table to view only stats below
summary <- basicStats(num_tab)[c("Mean", "Stdev", "Median", "Minimum", "Maximum"),]
#styling the summary table
kable(summary)%>%
  kable_styling(bootstrap_options = c("striped", "hover"), font_size = 12)
Ladder_score Standard_error_of_ladder_score GDP_per_capita Social_support Life_expectancy Freedom Generosity Perceptions_of_corruption
Mean 5.47324 0.053538 9.295706 0.808721 64.445529 0.783360 -0.014568 0.733120
Stdev 1.11227 0.018183 1.201588 0.121453 7.057848 0.117786 0.151809 0.175172
Median 5.51500 0.050606 9.456313 0.829204 66.305145 0.799805 -0.033665 0.783122
Minimum 2.56690 0.025902 6.492642 0.319460 45.200001 0.396573 -0.300907 0.109784
Maximum 7.80870 0.120590 11.450681 0.974670 76.804581 0.974998 0.560664 0.935585

Creating the boxplot to measure happiness score of all the regions.

library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
box <- ggplot(happiness_2020, aes(x = Region, y = Ladder_score, color = Region)) +
  geom_boxplot() + 
  geom_jitter(aes(color=Country), size = 0.5) +
  ggtitle("Happiness Score for Regions and Countries for 2020") + 
  coord_flip() + 
  theme(legend.position="none")
ggplotly(box)

The graph shows that Western Europe is the happiest region for 2020 followed by Australia and North America. Sub-saharan region is the least happiest region followed by South Asia.

Average happiness for the region

avg_happiness_region <-happiness_2020 %>%
        group_by(Region) %>%          
        summarise(avg_happiness = mean(Ladder_score, round(1)))


#Plotting the average happiness scores to compare regions using plotly
p_avg_happiness_region <- plot_ly(avg_happiness_region, x = ~Region,
                                  y = ~avg_happiness, 
                                  type = 'bar', 
                                  name = 'Average Happiness',
                                  marker = list(color = 'rgb(158,202,225)')) %>% 
  add_lines(y = ~mean(happiness_2020$Ladder_score), name = 'world average')%>%
  layout(title="Average Happiness per Region in 2020", yaxis = list(title = "avg. happiness score"))
htmltools::tagList(list(p_avg_happiness_region))
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...

The world average happiness score is 5.5. The commonwealth and Independent states surprisingly has the exact average score. It is important to note that both of the first two regions (NA and ANZ) include only 2 countries and Western Europe has 21 countries. Additionally all of these regions include countries with developed economies.

The saddest region is Sub-Saharan Africa, which includes 40 different countries. The second saddest region is South Asia.

Lets see the correlation between all the variables

num_hap <- happiness_2020[, c("Ladder_score","Standard_error_of_ladder_score","GDP_per_capita","Social_support","Life_expectancy","Freedom","Generosity","Perceptions_of_corruption")]

Correlation plot

m <- cor(num_hap) #creating correlation matrix
corrplot(m, method="circle", type='upper', tl.cex=0.8, tl.col = 'black')

The factors with positive correlation(+0.7) has strong relation with ladder score. These 4 factors are GDP,social support,life expectancy, freedom. Interestingly generosity is the least correlated factor followed by the perception of corruption.

Lets plot the top 4 factors with the happiness score.

Scatter plot of Happiness Score & GDP per Capita (2020)

ggplot(happiness_2020, aes(x=GDP_per_capita, y=Ladder_score))+ 
  geom_point(aes(color = Region)) +
  geom_smooth(method="lm") + 
  xlab("GDP per Capita") + 
  ylab("Happiness Score") + 
  labs(colour="Region") +
  ggtitle("All Regions: Happiness Score & GDP per Capita (2020)")
## `geom_smooth()` using formula 'y ~ x'

GDP is the crucial element to be happy. Here we Sub-Saharan Africa has the least GDP and happiness score. Again Western Europe has the highest score.

Calculating the corelation happiness score and GDP

cor(happiness_2020$Ladder_score,happiness_2020$GDP_per_capita)
## [1] 0.7753744

There is strong uphill(positive) relation between both the variables

Scatter plot of Happiness Score & Social Support per Capita (2020)

ggplot(happiness_2020, aes(x=Social_support, y=Ladder_score))+ 
  geom_point(aes(color = Region)) +
  geom_smooth(method="lm") + 
  xlab("Social Support") + 
  ylab("Happiness Score") + 
  labs(colour="Region") +
  ggtitle("All Regions: Happiness Score & Social Support per Capita (2020)")
## `geom_smooth()` using formula 'y ~ x'

Social Support is vital to measure the happiess score. Here as well Western Europe has the highest score based on social support.

Calculating the correlation between happiness score and social support

cor(happiness_2020$Ladder_score,happiness_2020$Social_support)
## [1] 0.7650008

There is strong uphill(positive) relation between both the variables

Scatter Plot of & Life Expectancy (2020)

ggplot(happiness_2020, aes(x=GDP_per_capita, y=Life_expectancy))+ 
  geom_point(aes(color = Region)) +
  geom_smooth(method="lm") + 
  xlab("Life Expectancy") + 
  ylab("Happiness Score") + 
  labs(colour="Region") +
  ggtitle("All Regions: Happiness Score & Life Expectancy (2020)")
## `geom_smooth()` using formula 'y ~ x'

The life expectancy in the Sub-Saharan African region is on lowest side of the slope. the highest life expectancy is in Southeast Asia.

Calculating the corelation between happiness score and life expectancy

cor(happiness_2020$Ladder_score,happiness_2020$Life_expectancy)
## [1] 0.7703163

There is positive uphill relation with exception to Sub Saharan region

Scatter plot of Happiness Score & Freedom (2020)

ggplot(happiness_2020, aes(x=Freedom, y=Ladder_score))+ 
  geom_point(aes(color = Region)) +
  geom_smooth(method="lm") + 
  xlab("Freedom") + 
  ylab("Happiness Score") + 
  labs(colour="Region") +
  ggtitle("All Regions: Happiness Score & Freedom (2020)")
## `geom_smooth()` using formula 'y ~ x'

The freedom is one of the least factor compared to top 4 factors. Western Europe has the highest freedom to choose their life and the live accordingly thus having highest happiness score.

Except life expectancy where South East Asia has the lead over the region. All the 3 factors - GDP,social support and freedom makes the Western Europe happiest

Calculating correlation between happiness score and freedom

cor(happiness_2020$Ladder_score,happiness_2020$Freedom)
## [1] 0.5905968

There is moderate uphill(positive) relation between both the variables

Lets find the top 10 happiest countries of the world

top10_2020<-happiness_2020 %>% select(Country,Region,Ladder_score) %>% head(n=10)

g1<-ggplot(top10_2020,aes(x=factor(Country,levels=Country),y=Ladder_score))+geom_bar(stat="identity",width=0.5,fill="navyblue")+geom_hline(yintercept = mean(top10_2020$Ladder_score),linetype = "dashed", color= "orange",size=1)+theme(axis.text.x = element_text(angle=90, vjust=0.6))+labs(title="Top10 Happiest Countries-2020",x="Country",y="Score")+ggplot2::annotate("text", x = "Finland", y = 7.3, label = "Mean Line", vjust = -0.4,color="lightblue")
g1

fig<-ggplotly(g1)
fig

Except New Zealand all the 9 countries are from Western Europe region proving the correlation between happiness score and the factors affecting the happiness score.

Lets find the 10 saddest countries of the world

bottom10_2020<-happiness_2020 %>% select(Country,Region,Ladder_score) %>% tail(n=10)

g2 <-ggplot(bottom10_2020,aes(x=factor(Country,levels=Country),y=Ladder_score))+geom_bar(stat="identity",width=0.5,fill="firebrick")+geom_hline(yintercept = mean(bottom10_2020$Ladder_score),linetype = "dashed", color= "orange",size=1)+theme(axis.text.x = element_text(angle=90, vjust=0.6))+labs(title="10 Saddest Countries-2020",x="Country",y="Score")+ ggplot2::annotate("text", x = "India", y = 3.3, label = "Mean Line", vjust = -0.4,color="lightblue")
g2

fig<-ggplotly(g2)
fig

The 7 countries from the least happiest countries are under Sub-Saharn region again proving correlation between happiness score and the factors affecting the hapiness score.

Now lets see the difference between the happiness score around the world for 2019 and 2020

library(ggalt)
## Registered S3 methods overwritten by 'ggalt':
##   method                  from   
##   grid.draw.absoluteGrob  ggplot2
##   grobHeight.absoluteGrob ggplot2
##   grobWidth.absoluteGrob  ggplot2
##   grobX.absoluteGrob      ggplot2
##   grobY.absoluteGrob      ggplot2
d20<-happiness_2020 %>% select(Country,HS20=Ladder_score)
d19<-happiness_all %>% dplyr::filter(year == 2019)%>% select(Country,HS19=Ladder_score)

score<-inner_join(d20,d19)%>% mutate(score_diff= HS20-HS19)%>% dplyr::filter(score_diff>0)
## Joining, by = "Country"
score$Country <- factor(score$Country, levels=as.character(score$Country))
gg <- ggplot(score, aes(x=HS19, xend=HS20, y=Country, group=Country)) + 
  geom_dumbbell(size=2, color="#e3e2e1", 
                colour_x = "yellow", colour_xend = "#edae52",
                dot_guide=TRUE, dot_guide_size=0.25) + 
      labs(x='', y=NULL, title="Happiness: from pre-Covid (2019) to amidst-Covid (2020)",
       subtitle = 'Despite covid, some regions see increases in happiness.',
       caption= 'Source: World Happiness Report (2020)')+
  
  theme(plot.title = element_text(hjust=0.5, face="bold"),
        plot.subtitle = element_text(face = "italic", hjust = 0.6),
        plot.background=element_rect(fill="#f7f7f7"),
        panel.background=element_rect(fill="#f7f7f7"),
        panel.grid.minor=element_blank(),
        panel.grid.major.y=element_blank(),
        panel.grid.major.x=element_line(),
        axis.ticks=element_blank(),
        legend.position="top",
        panel.border=element_blank())

plot(gg)

The above graph shows that their are still countries whose happiness score increased from 2019 to 2020 despite Covid. I did not expected so many countries showing up on the graph. That’s surprising that these countries had good impact during Covid when whole world was suffering. Only thing I can think of is initial lockdown imposed by the countries help the families to spend time together, working stress free from home etc.

I found a paper“A tale of three countries: How did Covid-19 lockdown impact happiness?” where author has proved the following “The main idea is to determine, notwithstanding the characteristicsof a country or the lockdown regulations, whether a lockdown negatively affects happiness. Secondly, we compare the effect size of the lockdown on happiness between these countries. We make use of Difference-inDifference estimations to determine the causal effect of the lockdown and Least Squares Dummy Variable estimations to study the heterogeneity in the effect size of the lockdown by country. Our results show that,regardless of the characteristics of the country, or the type or duration of the lockdown regulations; a lockdown causes a decline in happiness. Furthermore, the negative effect differs between countries, seeming that the more stringent the stay-at-home regulations are, the greater the negative effect”https://www.econstor.eu/bitstream/10419/221748/1/GLO-DP-0584.pdf

Now lets see by how much percentage the happiness score has increased from 2018 - 2019 and 2019 - 2020

country_region = happiness_2020 %>% select(Country,Region) %>% unique()

df_happy_increase <- happiness_all %>% 
    dplyr::filter(year >= 2018) %>%
    left_join(country_region, by = c('Country')) %>%
    select(Country , year, Ladder_score)  %>%
    pivot_wider(names_from = 'year', names_prefix = 'year', values_from = 'Ladder_score') %>%
    mutate(increase_in_2019 = ifelse(year2019>year2018, 1, 0),
          increase_in_2020 = ifelse(year2020>year2019,1,0))

df_increase_in_2019 <- df_happy_increase %>% summarize(pct = mean(increase_in_2019, na.rm = TRUE))
df_increase_in_2020 <- df_happy_increase %>% summarize(pct = mean(increase_in_2020, na.rm = TRUE))

Increase in happiness score from 2018-2019 and from 2019-2020

library(cowplot)
donut_plot <- function(df, title = '', subtitle = '', caption = '') {
ggplot(df) +
    geom_rect(aes(ymax = 1, ymin = 0, xmax = 2, xmin=1.2, fill = "base"))  +
    geom_rect(aes(ymax = pct, ymin = 0, xmax = 2.2, xmin = 1.2, fill = 'main')) +
    geom_text(x = 0, y = 0, label = paste0(round(df$pct*100,0),'%'), size = 16) + 
    coord_polar(theta = 'y') +
    xlim(c(0,2.2)) + 
    scale_fill_manual(values = c("light grey", "yellow")) +
    labs(title = title, subtitle = subtitle, caption = caption) +
    theme_void() + 
    theme(plot.title = element_text(size=18, face = 'bold'),
          plot.subtitle = element_text(size = 14, hjust = 0.5),
          plot.caption = element_text(size = 12),
          legend.position = 'None')}

p1 <- donut_plot(df_increase_in_2019, title = 'Percent of countries with increased happiness\n', subtitle = '2018 -> 2019')
p2 <- donut_plot(df_increase_in_2020, title = '\n', subtitle = '2019 -> 2020', caption = '\nSource: World Happiness Report (2021)')
plot_grid(p1, p2)

As their was no impact of Covid 19 there is 57% increase in happiness from 2018 to 2019 and there is 10% drop in happiness from 2019 to 2020 as there was impact of unknown factor Covid-19.

Abstract:

Happiness score globally was not a common variable but looking at the situation during Covid times I decided to pursue this topic. I did correlation matrix,scatterplot,bar graph, donut graph to determine these questions:

1.What is average happiness score of 2020 2.Which factors correlates to happiness score(showing inverse relation) 3.As my research was based on Continents and region, I wanted to check the happiest and saddest region 4.What is correlation between happiness and other variables (Negative,Moderate,Strong) 5.Increase of happiness score of the countries 6.What is the percentage of happiness score from 2018-2019 and 2019-2020?

Problem or Topic Statement: The world changed when covid was hit. From roaming freely on th streets,park etc to confined to your own space.From commuting to work to following work from home concept. To see loved one suffer. To educate the people around you and to educate yourself. These are few examples which impacted our lives out of the blue and which changed the scenario globally.As these incidents left physical and mental toll I decided to take this topic to calculate overall happiness index and also look at the impact Covid left on the minds of people as well as Government.While searching for database for this project I cam across happiness report database on kaggle and thought to do analysis by my own.

Analysis In this report I have done correlation analysis, linear model and other visualisation to evealuta the happiness score accross globe. First I worked with happiness data from 2020 and then to find the difference is the happiness index I merge the database which had data till 2019 to 2020 dataset.

To look for the dependencies of the variable I started creating the correlation matrix to compare happiness score with given variables which are GDP,freedom to chose life, life expectancy, social support, dystopia, trust in government and uppore and lower score.While creating the correlation matrix I found that social support, GDP and life expectancy are top three variables which impact the happiness score. If all three are satisfactory the happiness score will automatically increase(Example - West Europe region) and saddest region is Sub-Saharan Africa where trust in government is also vital variable to determine the happiness score.

Research I tried to research to find out why Western Europe is the happiest I found the chapter from world happiness report 2020 stating :“What exactly makes Nordic citizens so exceptionally satisfied with their lives? This is the question that this chapter aims to answer. Through reviewing the existing studies, theories, and data behind the World Happiness Report, we find that the most prominent explanations include factors related to the quality of institutions, such as reliable and extensive welfare benefits, low corruption, and well-functioning democracy and state institutions. Furthermore, Nordic citizens experience a high sense of autonomy and freedom, as well as high levels of social trust towards each other, which play an important role in determining life satisfaction. On the other hand, we show that a few popular explanations for Nordic happiness such as the small population and homogeneity of the Nordic countries, and a few counterarguments against Nordic happiness such as the cold weather and the suicide rates, actually don’t seem to have much to do with Nordic happiness.” https://worldhappiness.report/ed/2020/the-nordic-exceptionalism-what-explains-why-the-nordic-countries-are-constantly-among-the-happiest-in-the-world/

Bibliography https://happiness-report.s3.amazonaws.com/2020/WHR20_Ch2_Statistical_Appendix.pdf https://worldhappiness.report/faq/ https://www.ipsos.com/en/global-happiness-study-2020 https://worldhappiness.report/ed/2020/the-nordic-exceptionalism-what-explains-why-the-nordic-countries-are-constantly-among-the-happiest-in-the-world/ https://www.dummies.com/education/math/statistics/how-to-interpret-a-correlation-coefficient-r/ https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf https://worldhappiness.report/ https://www.kaggle.com/ https://www.econstor.eu/bitstream/10419/221748/1/GLO-DP-0584.pdf https://worldhappiness.report/ed/2021/happiness-trust-and-deaths-under-covid-19/