Life Expectancy and Happiness; We will use the following data-sets from Kaggle on relating happiness with life expectancy. Life expectancy could be linked to many factors like monetary or physical needs or living condition or external factors with politics or like. Over here we will aim to correlate life expectancy with happiness as a key factor.
Life Expectancy (WHO)
World Happiness Report
# Load data from the life-expectancy csv file:
theUrl.lifeExpectancy <- 'https://raw.githubusercontent.com/kamathvk1982/Data606-FinalProject/master/data/life-expectancy.csv'
lifeExp.df <- read_csv(theUrl.lifeExpectancy, na = c("", "NA","N/A"))
# Load data from the world-happiness csv file for 2018 and 2019:
theUrl.happ.2018 <- 'https://raw.githubusercontent.com/kamathvk1982/Data606-FinalProject/master/data/world-happiness-2018.csv'
happ.2018 <- read_csv(theUrl.happ.2018, na = c("", "NA","N/A"))
theUrl.happ.2019 <- 'https://raw.githubusercontent.com/kamathvk1982/Data606-FinalProject/master/data/world-happiness-2019.csv'
happ.2019 <- read_csv(theUrl.happ.2019, na = c("", "NA","N/A"))## [1] 19028 4
| Entity | Code | Year | Life expectancy (years) |
|---|---|---|---|
| Afghanistan | AFG | 1950 | 27.638 |
| Afghanistan | AFG | 1951 | 27.878 |
| Afghanistan | AFG | 1952 | 28.361 |
| Afghanistan | AFG | 1953 | 28.852 |
| Afghanistan | AFG | 1954 | 29.350 |
| Afghanistan | AFG | 1955 | 29.854 |
| Afghanistan | AFG | 1956 | 30.365 |
| Afghanistan | AFG | 1957 | 30.882 |
| Afghanistan | AFG | 1958 | 31.403 |
| Afghanistan | AFG | 1959 | 31.925 |
## [1] 156 9
| Overall rank | Country or region | Score | GDP per capita | Social support | Healthy life expectancy | Freedom to make life choices | Generosity | Perceptions of corruption |
|---|---|---|---|---|---|---|---|---|
| 1 | Finland | 7.632 | 1.305 | 1.592 | 0.874 | 0.681 | 0.202 | 0.393 |
| 2 | Norway | 7.594 | 1.456 | 1.582 | 0.861 | 0.686 | 0.286 | 0.340 |
| 3 | Denmark | 7.555 | 1.351 | 1.590 | 0.868 | 0.683 | 0.284 | 0.408 |
| 4 | Iceland | 7.495 | 1.343 | 1.644 | 0.914 | 0.677 | 0.353 | 0.138 |
| 5 | Switzerland | 7.487 | 1.420 | 1.549 | 0.927 | 0.660 | 0.256 | 0.357 |
| 6 | Netherlands | 7.441 | 1.361 | 1.488 | 0.878 | 0.638 | 0.333 | 0.295 |
| 7 | Canada | 7.328 | 1.330 | 1.532 | 0.896 | 0.653 | 0.321 | 0.291 |
| 8 | New Zealand | 7.324 | 1.268 | 1.601 | 0.876 | 0.669 | 0.365 | 0.389 |
| 9 | Sweden | 7.314 | 1.355 | 1.501 | 0.913 | 0.659 | 0.285 | 0.383 |
| 10 | Australia | 7.272 | 1.340 | 1.573 | 0.910 | 0.647 | 0.361 | 0.302 |
## [1] 156 9
| Overall rank | Country or region | Score | GDP per capita | Social support | Healthy life expectancy | Freedom to make life choices | Generosity | Perceptions of corruption |
|---|---|---|---|---|---|---|---|---|
| 1 | Finland | 7.769 | 1.340 | 1.587 | 0.986 | 0.596 | 0.153 | 0.393 |
| 2 | Denmark | 7.600 | 1.383 | 1.573 | 0.996 | 0.592 | 0.252 | 0.410 |
| 3 | Norway | 7.554 | 1.488 | 1.582 | 1.028 | 0.603 | 0.271 | 0.341 |
| 4 | Iceland | 7.494 | 1.380 | 1.624 | 1.026 | 0.591 | 0.354 | 0.118 |
| 5 | Netherlands | 7.488 | 1.396 | 1.522 | 0.999 | 0.557 | 0.322 | 0.298 |
| 6 | Switzerland | 7.480 | 1.452 | 1.526 | 1.052 | 0.572 | 0.263 | 0.343 |
| 7 | Sweden | 7.343 | 1.387 | 1.487 | 1.009 | 0.574 | 0.267 | 0.373 |
| 8 | New Zealand | 7.307 | 1.303 | 1.557 | 1.026 | 0.585 | 0.330 | 0.380 |
| 9 | Canada | 7.278 | 1.365 | 1.505 | 1.039 | 0.584 | 0.285 | 0.308 |
| 10 | Austria | 7.246 | 1.376 | 1.475 | 1.016 | 0.532 | 0.244 | 0.226 |
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Life Expectancy (years) ?Life Expectancy (years) countries are also most Happy (For 2019)?Generous make people happy ?Freedom to make life choices make people happy ?What are the cases, and how many are there?
Each case in the Happiness dataset is for a given Country and shows its rank, score and other parameter score. There are 156 cases in each 2018 and 2019 dataset.
Each case in the Life Expectancy dataset is for a Given country and year and shows its life expectancy for the year. There are 19028 cases in the dataset.
Describe the method of data collection.
The data was collected from Kaggle, but the original source for the data is World Bank Data.
What type of study is this (observational/experiment)?
This is an observational study. We are looking at data for year 2018 and 2019 for the countries in the world.
If you collected the data, state self-collected. If not, provide a citation/link.
The data was collected from below sites:
Life Expectancy (WHO) data is found here.
World Happiness Report data is found here.
What is the response variable? Is it quantitative or qualitative?
The response variable is the Score variable and its quantitative (numerical).
You should have two independent variables, one quantitative and one qualitative.
The independent variables are the Life expectancy (years) variable and the other variable is Generosity in the happiness dataset; Both these are quantitative (numerical).
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
## Entity Code Year Life expectancy (years)
## Length:19028 Length:19028 Min. :1543 Min. :17.76
## Class :character Class :character 1st Qu.:1961 1st Qu.:52.31
## Mode :character Mode :character Median :1980 Median :64.71
## Mean :1975 Mean :61.75
## 3rd Qu.:2000 3rd Qu.:71.98
## Max. :2019 Max. :86.75
## Overall rank Country or region Score GDP per capita
## Min. : 1.00 Length:156 Min. :2.905 Min. :0.0000
## 1st Qu.: 39.75 Class :character 1st Qu.:4.454 1st Qu.:0.6162
## Median : 78.50 Mode :character Median :5.378 Median :0.9495
## Mean : 78.50 Mean :5.376 Mean :0.8914
## 3rd Qu.:117.25 3rd Qu.:6.168 3rd Qu.:1.1978
## Max. :156.00 Max. :7.632 Max. :2.0960
##
## Social support Healthy life expectancy Freedom to make life choices
## Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.067 1st Qu.:0.4223 1st Qu.:0.3560
## Median :1.255 Median :0.6440 Median :0.4870
## Mean :1.213 Mean :0.5973 Mean :0.4545
## 3rd Qu.:1.463 3rd Qu.:0.7772 3rd Qu.:0.5785
## Max. :1.644 Max. :1.0300 Max. :0.7240
##
## Generosity Perceptions of corruption
## Min. :0.0000 Min. :0.000
## 1st Qu.:0.1095 1st Qu.:0.051
## Median :0.1740 Median :0.082
## Mean :0.1810 Mean :0.112
## 3rd Qu.:0.2390 3rd Qu.:0.137
## Max. :0.5980 Max. :0.457
## NA's :1
## Overall rank Country or region Score GDP per capita
## Min. : 1.00 Length:156 Min. :2.853 Min. :0.0000
## 1st Qu.: 39.75 Class :character 1st Qu.:4.545 1st Qu.:0.6028
## Median : 78.50 Mode :character Median :5.380 Median :0.9600
## Mean : 78.50 Mean :5.407 Mean :0.9051
## 3rd Qu.:117.25 3rd Qu.:6.184 3rd Qu.:1.2325
## Max. :156.00 Max. :7.769 Max. :1.6840
## Social support Healthy life expectancy Freedom to make life choices
## Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.056 1st Qu.:0.5477 1st Qu.:0.3080
## Median :1.272 Median :0.7890 Median :0.4170
## Mean :1.209 Mean :0.7252 Mean :0.3926
## 3rd Qu.:1.452 3rd Qu.:0.8818 3rd Qu.:0.5072
## Max. :1.624 Max. :1.1410 Max. :0.6310
## Generosity Perceptions of corruption
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.1087 1st Qu.:0.0470
## Median :0.1775 Median :0.0855
## Mean :0.1848 Mean :0.1106
## 3rd Qu.:0.2482 3rd Qu.:0.1412
## Max. :0.5660 Max. :0.4530
# We will get the Top 10 ranked Happy countries in 2018:
top.10.2018 <- happ.2018 %>%
filter(happ.2018$"Overall rank" < 11 ) %>%
select("Overall rank", "Country or region") %>%
mutate(Year = 2018)
# Next, We will get the Top 10 ranked Happy countries in 2019:
top.10.2019 <- happ.2019 %>%
filter(happ.2018$"Overall rank" < 11 ) %>%
select("Overall rank", "Country or region")%>%
mutate(Year = 2019)
# We will bind the above tow dataset into one:
top.10.new <- rbind(top.10.2018, top.10.2019)
# Next, we work on the Life Expectancy Dataset to get 2018 and 2019 data:
lifeExp.df.new <- lifeExp.df %>%
filter(lifeExp.df$Year == 2018 | lifeExp.df$Year == 2019 )
# Now, joining the above tow final data sets of Happieness and Life Expectancy
# we can check the life expectany for happier countries:
top.10.hap.life <- inner_join(lifeExp.df.new, top.10.new,
by = c(Entity="Country or region" , "Year" = "Year") ) %>%
arrange(Entity, Year)
kable(top.10.hap.life)| Entity | Code | Year | Life expectancy (years) | Overall rank |
|---|---|---|---|---|
| Australia | AUS | 2018 | 83.281 | 10 |
| Austria | AUT | 2019 | 81.544 | 10 |
| Canada | CAN | 2018 | 82.315 | 7 |
| Canada | CAN | 2019 | 82.434 | 9 |
| Denmark | DNK | 2018 | 80.784 | 3 |
| Denmark | DNK | 2019 | 80.898 | 2 |
| Finland | FIN | 2018 | 81.736 | 1 |
| Finland | FIN | 2019 | 81.908 | 1 |
| Iceland | ISL | 2018 | 82.855 | 4 |
| Iceland | ISL | 2019 | 82.993 | 4 |
| Netherlands | NLD | 2018 | 82.143 | 6 |
| Netherlands | NLD | 2019 | 82.283 | 5 |
| New Zealand | NZL | 2018 | 82.145 | 8 |
| New Zealand | NZL | 2019 | 82.288 | 8 |
| Norway | NOR | 2018 | 82.271 | 2 |
| Norway | NOR | 2019 | 82.404 | 3 |
| Sweden | SWE | 2018 | 82.654 | 9 |
| Sweden | SWE | 2019 | 82.797 | 7 |
| Switzerland | CHE | 2018 | 83.630 | 5 |
| Switzerland | CHE | 2019 | 83.779 | 6 |
=> We can see that the happier the country is the higher the life expectancy it has.
# Getting the 2019 data and then getting top 10 life expentancy countries:
lifeExp.df.2019 <- lifeExp.df %>%
filter(Year == 2019)
top.10.life <- top_n(lifeExp.df.2019, 10, lifeExp.df.2019$"Life expectancy (years)")
# We are using Left join here as not all countries data is avialble in Happ.2019:
df.main <- left_join(top.10.life,happ.2019, by=c( "Entity"="Country or region" ))
kable(df.main)| Entity | Code | Year | Life expectancy (years) | Overall rank | Score | GDP per capita | Social support | Healthy life expectancy | Freedom to make life choices | Generosity | Perceptions of corruption |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Andorra | AND | 2019 | 83.732 | NA | NA | NA | NA | NA | NA | NA | NA |
| Cayman Islands | CYM | 2019 | 83.924 | NA | NA | NA | NA | NA | NA | NA | NA |
| Hong Kong | HKG | 2019 | 84.857 | 76 | 5.430 | 1.438 | 1.277 | 1.122 | 0.440 | 0.258 | 0.287 |
| Japan | JPN | 2019 | 84.629 | 58 | 5.886 | 1.327 | 1.419 | 1.088 | 0.445 | 0.069 | 0.140 |
| Macao | MAC | 2019 | 84.244 | NA | NA | NA | NA | NA | NA | NA | NA |
| Monaco | MCO | 2019 | 86.751 | NA | NA | NA | NA | NA | NA | NA | NA |
| San Marino | SMR | 2019 | 84.972 | NA | NA | NA | NA | NA | NA | NA | NA |
| Singapore | SGP | 2019 | 83.620 | 34 | 6.262 | 1.572 | 1.463 | 1.141 | 0.556 | 0.271 | 0.453 |
| Spain | ESP | 2019 | 83.565 | 30 | 6.354 | 1.286 | 1.484 | 1.062 | 0.362 | 0.153 | 0.079 |
| Switzerland | CHE | 2019 | 83.779 | 6 | 7.480 | 1.452 | 1.526 | 1.052 | 0.572 | 0.263 | 0.343 |
ggplot(df.main) +
geom_point(aes(x=df.main$"Overall rank", y=df.main$"Life expectancy (years)"
, colour = factor(Entity)) ) ## Warning: Removed 5 rows containing missing values (geom_point).
=> By checking the Rank and Score for these 5 countries we can say that higher life expectancy does not lead to Higher Happiness.
# We can check this by comparing score of the country in 2018 vs that in 2019 and see
# if they dropped by a margin of more then 0.25 basis:
df.main1 <- inner_join(happ.2018 , happ.2019, by=c("Country or region" = "Country or region")
, suffix=c(".2018",".2019") ) %>%
filter(Score.2018 - Score.2019 >= 0.25) %>%
select ("Country or region", "Overall rank.2018","Score.2018", "Overall rank.2019", "Score.2019") %>%
mutate(Margin.Diff = Score.2018 - Score.2019) %>%
arrange(desc(Margin.Diff))
kable(df.main1)| Country or region | Overall rank.2018 | Score.2018 | Overall rank.2019 | Score.2019 | Margin.Diff |
|---|---|---|---|---|---|
| Malaysia | 35 | 6.322 | 80 | 5.339 | 0.983 |
| Afghanistan | 145 | 3.632 | 154 | 3.203 | 0.429 |
| South Sudan | 154 | 3.254 | 156 | 2.853 | 0.401 |
| Turkmenistan | 68 | 5.636 | 87 | 5.247 | 0.389 |
| Somalia | 98 | 4.982 | 112 | 4.668 | 0.314 |
| Argentina | 29 | 6.388 | 47 | 6.086 | 0.302 |
| Zambia | 125 | 4.377 | 138 | 4.107 | 0.270 |
| Jordan | 90 | 5.161 | 101 | 4.906 | 0.255 |
| Egypt | 122 | 4.419 | 137 | 4.166 | 0.253 |
barplot(df.main1$Margin.Diff, names.arg=df.main1$"Country or region", las=2,cex.names=0.7
, main="Less happy in 2019 compared to 2018") => There are 9 (out of 156) countries that were less happy in 2019 compared to 2018.
Generosity :# We can get the min, max and mean of the Score for first 10 ranked Countries
# and compare aginst the same for first 10 `Generous` countries:
# Min, Max and Mean for top 10 ranked countries:
top_n( happ.2018, -10, happ.2018$"Overall rank" ) %>%
select(Score) %>%
summary(Score)## Score
## Min. :7.272
## 1st Qu.:7.325
## Median :7.464
## Mean :7.444
## 3rd Qu.:7.540
## Max. :7.632
# Min, Max and Mean for top 10 `Generous` countries:
top_n( happ.2018, 10, Generosity ) %>%
select(Score) %>%
summary(Score)## Score
## Min. :3.462
## 1st Qu.:4.502
## Median :5.582
## Mean :5.564
## 3rd Qu.:6.767
## Max. :7.324
boxplot(c(top_n( happ.2018, -10, happ.2018$"Overall rank" ) %>%
select(Score), top_n( happ.2018, 10, Generosity ) %>%
select(Score) ) )=> By comparing the three attributes Min, Max and mean, we can say that being Generous does not lead to being Happy.
Freedom to make life choices :# We can get the min, max and mean of the Score for first 10 ranked Countries
# and compare aginst the same for first 10 `Freedom to make life choices` countries:
# Min, Max and Mean for top 10 ranked countries:
top_n( happ.2018, -10, happ.2018$"Overall rank" ) %>%
select(Score) %>%
summary(Score)## Score
## Min. :7.272
## 1st Qu.:7.325
## Median :7.464
## Mean :7.444
## 3rd Qu.:7.540
## Max. :7.632
# Min, Max and Mean for top 10 `Freedom to make life choices` countries:
top_n( happ.2018, 10, happ.2018$"Freedom to make life choices" ) %>%
select(Score) %>%
summary(Score)## Score
## Min. :4.433
## 1st Qu.:6.401
## Median :7.405
## Mean :6.791
## 3rd Qu.:7.540
## Max. :7.632
boxplot(c(top_n( happ.2018, -10, happ.2018$"Overall rank" ) %>%
select(Score), top_n( happ.2018, 10, happ.2018$"Freedom to make life choices" ) %>%
select(Score) ) )=> By comparing the three attributes Min, Max and mean, we can say that having Freedom to make life choices does lead to being Happy. The mean is close to the top ranked mean and max score matches.
We can make a lot of linear regression comparisons between various variables, comparing it to happiness Score, and use various other statistical inference techniques to determine best variables to use.