The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. It contains articles, and rankings of national happiness based on respondent ratings of their own lives, which the report also correlates with various life factors. Each year, ~1,000 individuals are sampled from each of more than 150 countries.
Fair to say that in the current COVID pandemic situation not many people would be happy, my analysis here perhaps may help to look or shape our post-covid world. May not fully but at least aim to look at the brighter side of life.
Research questions that I plan to get some insights are:
Life Expectancy (years) ?Generous make people happy ?We will check the above for the calendar year 2019.
Data collection:
The data was collected from Kaggle, but the original source for the data is World Bank Data. The data was collected from below sites:
Cases:
Each case in the Happiness dataset is for a given Country and shows its rank, score and other parameter score. There are 156 cases in each 2018 and 2019 dataset.
Each case in the Life Expectancy dataset is for a Given country and year and shows its life expectancy for the year. There are 19028 cases in the dataset.
Score variable and its quantitative (numerical).Life expectancy (years) variable and the other variable is Generosity in the happiness dataset; Both these are quantitative (numerical).Type of study:
This is an observational study. We are looking at data for year 2018 and 2019 for the countries in the world.
# Load data from the life-expectancy csv file:
theUrl.lifeExpectancy <- 'https://raw.githubusercontent.com/kamathvk1982/Data606-FinalProject/master/data/life-expectancy.csv'
lifeExp.df <- read_csv(theUrl.lifeExpectancy, na = c("", "NA","N/A"))
# Load data from the world-happiness csv file for 2018 and 2019:
theUrl.happ.2018 <- 'https://raw.githubusercontent.com/kamathvk1982/Data606-FinalProject/master/data/world-happiness-2018.csv'
happ.2018 <- read_csv(theUrl.happ.2018, na = c("", "NA","N/A"))
theUrl.happ.2019 <- 'https://raw.githubusercontent.com/kamathvk1982/Data606-FinalProject/master/data/world-happiness-2019.csv'
happ.2019 <- read_csv(theUrl.happ.2019, na = c("", "NA","N/A"))## [1] 19028 4
| Entity | Code | Year | Life expectancy (years) |
|---|---|---|---|
| Afghanistan | AFG | 1950 | 27.638 |
| Afghanistan | AFG | 1951 | 27.878 |
| Afghanistan | AFG | 1952 | 28.361 |
| Afghanistan | AFG | 1953 | 28.852 |
| Afghanistan | AFG | 1954 | 29.350 |
| Afghanistan | AFG | 1955 | 29.854 |
| Afghanistan | AFG | 1956 | 30.365 |
| Afghanistan | AFG | 1957 | 30.882 |
| Afghanistan | AFG | 1958 | 31.403 |
| Afghanistan | AFG | 1959 | 31.925 |
## [1] 156 9
| Overall rank | Country or region | Score | GDP per capita | Social support | Healthy life expectancy | Freedom to make life choices | Generosity | Perceptions of corruption |
|---|---|---|---|---|---|---|---|---|
| 1 | Finland | 7.632 | 1.305 | 1.592 | 0.874 | 0.681 | 0.202 | 0.393 |
| 2 | Norway | 7.594 | 1.456 | 1.582 | 0.861 | 0.686 | 0.286 | 0.340 |
| 3 | Denmark | 7.555 | 1.351 | 1.590 | 0.868 | 0.683 | 0.284 | 0.408 |
| 4 | Iceland | 7.495 | 1.343 | 1.644 | 0.914 | 0.677 | 0.353 | 0.138 |
| 5 | Switzerland | 7.487 | 1.420 | 1.549 | 0.927 | 0.660 | 0.256 | 0.357 |
| 6 | Netherlands | 7.441 | 1.361 | 1.488 | 0.878 | 0.638 | 0.333 | 0.295 |
| 7 | Canada | 7.328 | 1.330 | 1.532 | 0.896 | 0.653 | 0.321 | 0.291 |
| 8 | New Zealand | 7.324 | 1.268 | 1.601 | 0.876 | 0.669 | 0.365 | 0.389 |
| 9 | Sweden | 7.314 | 1.355 | 1.501 | 0.913 | 0.659 | 0.285 | 0.383 |
| 10 | Australia | 7.272 | 1.340 | 1.573 | 0.910 | 0.647 | 0.361 | 0.302 |
## [1] 156 9
| Overall rank | Country or region | Score | GDP per capita | Social support | Healthy life expectancy | Freedom to make life choices | Generosity | Perceptions of corruption |
|---|---|---|---|---|---|---|---|---|
| 1 | Finland | 7.769 | 1.340 | 1.587 | 0.986 | 0.596 | 0.153 | 0.393 |
| 2 | Denmark | 7.600 | 1.383 | 1.573 | 0.996 | 0.592 | 0.252 | 0.410 |
| 3 | Norway | 7.554 | 1.488 | 1.582 | 1.028 | 0.603 | 0.271 | 0.341 |
| 4 | Iceland | 7.494 | 1.380 | 1.624 | 1.026 | 0.591 | 0.354 | 0.118 |
| 5 | Netherlands | 7.488 | 1.396 | 1.522 | 0.999 | 0.557 | 0.322 | 0.298 |
| 6 | Switzerland | 7.480 | 1.452 | 1.526 | 1.052 | 0.572 | 0.263 | 0.343 |
| 7 | Sweden | 7.343 | 1.387 | 1.487 | 1.009 | 0.574 | 0.267 | 0.373 |
| 8 | New Zealand | 7.307 | 1.303 | 1.557 | 1.026 | 0.585 | 0.330 | 0.380 |
| 9 | Canada | 7.278 | 1.365 | 1.505 | 1.039 | 0.584 | 0.285 | 0.308 |
| 10 | Austria | 7.246 | 1.376 | 1.475 | 1.016 | 0.532 | 0.244 | 0.226 |
Six key variables are provided to build the country’s happiness score. The six variables are as follows: GDP per capita , Social support, Healthy life expectancy, Freedom to make life choices, Generosity, and Perceptions of corruption .
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 19028 obs. of 4 variables:
## $ Entity : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ Code : chr "AFG" "AFG" "AFG" "AFG" ...
## $ Year : num 1950 1951 1952 1953 1954 ...
## $ Life expectancy (years): num 27.6 27.9 28.4 28.9 29.4 ...
## - attr(*, "spec")=
## .. cols(
## .. Entity = col_character(),
## .. Code = col_character(),
## .. Year = col_double(),
## .. `Life expectancy (years)` = col_double()
## .. )
## Entity Code Year Life expectancy (years)
## Length:19028 Length:19028 Min. :1543 Min. :17.76
## Class :character Class :character 1st Qu.:1961 1st Qu.:52.31
## Mode :character Mode :character Median :1980 Median :64.71
## Mean :1975 Mean :61.75
## 3rd Qu.:2000 3rd Qu.:71.98
## Max. :2019 Max. :86.75
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 156 obs. of 9 variables:
## $ Overall rank : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Country or region : chr "Finland" "Norway" "Denmark" "Iceland" ...
## $ Score : num 7.63 7.59 7.55 7.5 7.49 ...
## $ GDP per capita : num 1.3 1.46 1.35 1.34 1.42 ...
## $ Social support : num 1.59 1.58 1.59 1.64 1.55 ...
## $ Healthy life expectancy : num 0.874 0.861 0.868 0.914 0.927 0.878 0.896 0.876 0.913 0.91 ...
## $ Freedom to make life choices: num 0.681 0.686 0.683 0.677 0.66 0.638 0.653 0.669 0.659 0.647 ...
## $ Generosity : num 0.202 0.286 0.284 0.353 0.256 0.333 0.321 0.365 0.285 0.361 ...
## $ Perceptions of corruption : num 0.393 0.34 0.408 0.138 0.357 0.295 0.291 0.389 0.383 0.302 ...
## - attr(*, "spec")=
## .. cols(
## .. `Overall rank` = col_double(),
## .. `Country or region` = col_character(),
## .. Score = col_double(),
## .. `GDP per capita` = col_double(),
## .. `Social support` = col_double(),
## .. `Healthy life expectancy` = col_double(),
## .. `Freedom to make life choices` = col_double(),
## .. Generosity = col_double(),
## .. `Perceptions of corruption` = col_double()
## .. )
## Overall rank Country or region Score GDP per capita
## Min. : 1.00 Length:156 Min. :2.905 Min. :0.0000
## 1st Qu.: 39.75 Class :character 1st Qu.:4.454 1st Qu.:0.6162
## Median : 78.50 Mode :character Median :5.378 Median :0.9495
## Mean : 78.50 Mean :5.376 Mean :0.8914
## 3rd Qu.:117.25 3rd Qu.:6.168 3rd Qu.:1.1978
## Max. :156.00 Max. :7.632 Max. :2.0960
##
## Social support Healthy life expectancy Freedom to make life choices
## Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.067 1st Qu.:0.4223 1st Qu.:0.3560
## Median :1.255 Median :0.6440 Median :0.4870
## Mean :1.213 Mean :0.5973 Mean :0.4545
## 3rd Qu.:1.463 3rd Qu.:0.7772 3rd Qu.:0.5785
## Max. :1.644 Max. :1.0300 Max. :0.7240
##
## Generosity Perceptions of corruption
## Min. :0.0000 Min. :0.000
## 1st Qu.:0.1095 1st Qu.:0.051
## Median :0.1740 Median :0.082
## Mean :0.1810 Mean :0.112
## 3rd Qu.:0.2390 3rd Qu.:0.137
## Max. :0.5980 Max. :0.457
## NA's :1
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 156 obs. of 9 variables:
## $ Overall rank : num 1 2 3 4 5 6 7 8 9 10 ...
## $ Country or region : chr "Finland" "Denmark" "Norway" "Iceland" ...
## $ Score : num 7.77 7.6 7.55 7.49 7.49 ...
## $ GDP per capita : num 1.34 1.38 1.49 1.38 1.4 ...
## $ Social support : num 1.59 1.57 1.58 1.62 1.52 ...
## $ Healthy life expectancy : num 0.986 0.996 1.028 1.026 0.999 ...
## $ Freedom to make life choices: num 0.596 0.592 0.603 0.591 0.557 0.572 0.574 0.585 0.584 0.532 ...
## $ Generosity : num 0.153 0.252 0.271 0.354 0.322 0.263 0.267 0.33 0.285 0.244 ...
## $ Perceptions of corruption : num 0.393 0.41 0.341 0.118 0.298 0.343 0.373 0.38 0.308 0.226 ...
## - attr(*, "spec")=
## .. cols(
## .. `Overall rank` = col_double(),
## .. `Country or region` = col_character(),
## .. Score = col_double(),
## .. `GDP per capita` = col_double(),
## .. `Social support` = col_double(),
## .. `Healthy life expectancy` = col_double(),
## .. `Freedom to make life choices` = col_double(),
## .. Generosity = col_double(),
## .. `Perceptions of corruption` = col_double()
## .. )
## Overall rank Country or region Score GDP per capita
## Min. : 1.00 Length:156 Min. :2.853 Min. :0.0000
## 1st Qu.: 39.75 Class :character 1st Qu.:4.545 1st Qu.:0.6028
## Median : 78.50 Mode :character Median :5.380 Median :0.9600
## Mean : 78.50 Mean :5.407 Mean :0.9051
## 3rd Qu.:117.25 3rd Qu.:6.184 3rd Qu.:1.2325
## Max. :156.00 Max. :7.769 Max. :1.6840
## Social support Healthy life expectancy Freedom to make life choices
## Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.056 1st Qu.:0.5477 1st Qu.:0.3080
## Median :1.272 Median :0.7890 Median :0.4170
## Mean :1.209 Mean :0.7252 Mean :0.3926
## 3rd Qu.:1.452 3rd Qu.:0.8818 3rd Qu.:0.5072
## Max. :1.624 Max. :1.1410 Max. :0.6310
## Generosity Perceptions of corruption
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.1087 1st Qu.:0.0470
## Median :0.1775 Median :0.0855
## Mean :0.1848 Mean :0.1106
## 3rd Qu.:0.2482 3rd Qu.:0.1412
## Max. :0.5660 Max. :0.4530
From the above summary table for happiness data for 2019, we can quickly identify that the minimum score value is set at 2.853 and the maximum is at 7.769 with a median score value of 5.380 per country.
lifeExp.df.2019 <- lifeExp.df %>%
filter(Year == 2019)
top.10.life <- top_n(lifeExp.df.2019, 10, lifeExp.df.2019$"Life expectancy (years)")
# We are using Left join here as not all countries data is avialble in Happ.2019:
df.main <- left_join(top.10.life,happ.2019, by=c( "Entity"="Country or region" ))
kable(df.main)| Entity | Code | Year | Life expectancy (years) | Overall rank | Score | GDP per capita | Social support | Healthy life expectancy | Freedom to make life choices | Generosity | Perceptions of corruption |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Andorra | AND | 2019 | 83.732 | NA | NA | NA | NA | NA | NA | NA | NA |
| Cayman Islands | CYM | 2019 | 83.924 | NA | NA | NA | NA | NA | NA | NA | NA |
| Hong Kong | HKG | 2019 | 84.857 | 76 | 5.430 | 1.438 | 1.277 | 1.122 | 0.440 | 0.258 | 0.287 |
| Japan | JPN | 2019 | 84.629 | 58 | 5.886 | 1.327 | 1.419 | 1.088 | 0.445 | 0.069 | 0.140 |
| Macao | MAC | 2019 | 84.244 | NA | NA | NA | NA | NA | NA | NA | NA |
| Monaco | MCO | 2019 | 86.751 | NA | NA | NA | NA | NA | NA | NA | NA |
| San Marino | SMR | 2019 | 84.972 | NA | NA | NA | NA | NA | NA | NA | NA |
| Singapore | SGP | 2019 | 83.620 | 34 | 6.262 | 1.572 | 1.463 | 1.141 | 0.556 | 0.271 | 0.453 |
| Spain | ESP | 2019 | 83.565 | 30 | 6.354 | 1.286 | 1.484 | 1.062 | 0.362 | 0.153 | 0.079 |
| Switzerland | CHE | 2019 | 83.779 | 6 | 7.480 | 1.452 | 1.526 | 1.052 | 0.572 | 0.263 | 0.343 |
ggplot(df.main) +
geom_point(aes(x=df.main$"Overall rank", y=df.main$"Life expectancy (years)"
, colour = factor(Entity)) ) ## Warning: Removed 5 rows containing missing values (geom_point).
=> By checking the Rank and Score for these 5 countries we can say that higher life expectancy does not lead to Higher Happiness.
Hypothesis:
H(Null): There is no association between Being Generous makes people happy.
H(Alternate): There is association between Being Generous makes people happy.
Conditions:
The sample size is greater than 30. The data sets follow a uni modal normal distribution. The samples are random.
Hence, the conditions for inference seems to be satisfied.
ANOVA:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.853 4.545 5.380 5.407 6.184 7.769
plot(happ.2019$Score ~ happ.2019$"Freedom to make life choices" + happ.2019$"Generosity"
+ happ.2019$"Social support" + happ.2019$"GDP per capita" + happ.2019$"Perceptions of corruption"
+ happ.2019$"Healthy life expectancy") m_bty <- lm(happ.2019$Score~happ.2019$"Freedom to make life choices"+ happ.2019$"Generosity"
+ happ.2019$"Social support"+ happ.2019$"GDP per capita" + happ.2019$"Perceptions of corruption"
+ happ.2019$"Healthy life expectancy")
plot(jitter(happ.2019$Score) ~ jitter(happ.2019$"Freedom to make life choices"
+ happ.2019$"Generosity" + happ.2019$"Social support"
+ happ.2019$"GDP per capita" + happ.2019$"Perceptions of corruption"
+ happ.2019$"Healthy life expectancy"))
abline(m_bty)##
## Call:
## lm(formula = happ.2019$Score ~ happ.2019$"Freedom to make life choices" +
## happ.2019$Generosity + happ.2019$"Social support" + happ.2019$"GDP per capita" +
## happ.2019$"Perceptions of corruption" + happ.2019$"Healthy life expectancy")
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.75304 -0.35306 0.05703 0.36695 1.19059
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.7952 0.2111 8.505 1.77e-14
## happ.2019$"Freedom to make life choices" 1.4548 0.3753 3.876 0.000159
## happ.2019$Generosity 0.4898 0.4977 0.984 0.326709
## happ.2019$"Social support" 1.1242 0.2369 4.745 4.83e-06
## happ.2019$"GDP per capita" 0.7754 0.2182 3.553 0.000510
## happ.2019$"Perceptions of corruption" 0.9723 0.5424 1.793 0.075053
## happ.2019$"Healthy life expectancy" 1.0781 0.3345 3.223 0.001560
##
## (Intercept) ***
## happ.2019$"Freedom to make life choices" ***
## happ.2019$Generosity
## happ.2019$"Social support" ***
## happ.2019$"GDP per capita" ***
## happ.2019$"Perceptions of corruption" .
## happ.2019$"Healthy life expectancy" **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5335 on 149 degrees of freedom
## Multiple R-squared: 0.7792, Adjusted R-squared: 0.7703
## F-statistic: 87.62 on 6 and 149 DF, p-value: < 2.2e-16
## Analysis of Variance Table
##
## Response: happ.2019$Score
## Df Sum Sq Mean Sq F value Pr(>F)
## happ.2019$"Freedom to make life choices" 1 61.686 61.686 216.7138 < 2e-16
## happ.2019$Generosity 1 1.230 1.230 4.3199 0.03938
## happ.2019$"Social support" 1 64.819 64.819 227.7193 < 2e-16
## happ.2019$"GDP per capita" 1 17.774 17.774 62.4442 5.6e-13
## happ.2019$"Perceptions of corruption" 1 1.174 1.174 4.1248 0.04404
## happ.2019$"Healthy life expectancy" 1 2.956 2.956 10.3863 0.00156
## Residuals 149 42.412 0.285
##
## happ.2019$"Freedom to make life choices" ***
## happ.2019$Generosity *
## happ.2019$"Social support" ***
## happ.2019$"GDP per capita" ***
## happ.2019$"Perceptions of corruption" *
## happ.2019$"Healthy life expectancy" **
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
=> From the ANOVA result, we can see that the p-value for Generosity is 0.03938 which is less than 0.05 indicating that there is at least an association with the response variable. Therefore, we should reject the null hypothesis in favor of the alternative hypothesis.
plot(m_bty$residuals ~ happ.2019$"Freedom to make life choices" + happ.2019$"Generosity"
+ happ.2019$"Social support" + happ.2019$"GDP per capita" + happ.2019$"Perceptions of corruption"
+ happ.2019$"Healthy life expectancy")=> Residual are nearly normal; The hist plot seems normal although leftward skewed. Using diagnostic plots, we can conclude that the conditions for this model are reasonable. The variables are linearly related to the outcome.
We do know that the different attributes collectively impact the score. But of these attributes we can see that the variable Generosity does play a significant factor on the score.
Checking this data closely for neighboring countries that differ a lot in their score to see what other factors influence Happiness. Or to check and compare data between countries in an continent. Would be interesting to analyze this further with COVID pandemic 2020 data set.