Analysis on Happiness across Countries

Packages Required

During the course of this analysis, we will be using the following R packages:

library(readr)        ## To import data from a csv file
library(tidyr)        ## For Data Cleaning
library(dplyr)        ## For Data manipulation
library(ggplot2)      ## For Plotting techniques
library(rworldmap)    ## For Visualisation on a map
library(DT)           ## To Display R data objects as tables on HTML page
library(ggcorrplot)   ## For visualisation of correlation matrix
library(viridis)      ## To choose better color scales

Data Preparation

Data Source

The World Happiness Report data is obtained from kaggle.com. Click here to view original data source.

Purpose of Collected Data

The World Happiness Report surveys happiness across different country. The report has been published five times, the first one being in 2012. The survey inclued a sample of nearly 1000 people from each country for every year of the survey. The citizens being surveyed were asked to rate their current lives on a scale of 1 to 10. Then the extent to which each of the 6 factors namely - Per capita GDP, life expectancy, social support, freedom, generosity and trust(corruption in government) contribute to calculation of happiness score was calculated for each country.

We have data sets based on three reports - for the years 2015,2016 and 2017.

Data set for year 2015 contains 158 observations(1 each for 158 countries) and 12 variables. Data set for year 2016 contains 157 observations(1 each for 157 countries) and 13 variables. Data set for year 2017 contains 155 observations(1 each for 155 countries) and 12 variables.

There are no missing values in any of these data sets. These data sets are clean individually.

Data Importing, Cleaning and Manipulation

Importing data sets as csv in R:

#Importing data for the World Happiness Report 2015 
imp15 <- read_csv("C:/Users/Devanshu/Documents/Data Wrangling/Project/world-happiness-report (1)/2015.csv")

#Importing data for the World Happiness Report 2016  
imp16 <- read_csv("C:/Users/Devanshu/Documents/Data Wrangling/Project/world-happiness-report (1)/2016.csv")

#Importing data for the World Happiness Report 2017  
imp17 <- read_csv("C:/Users/Devanshu/Documents/Data Wrangling/Project/world-happiness-report (1)/2017.csv")

When we look at the data sets, we see there are different set of variables in each set. As a first step, only variables of interest and the ones that were common to all data sets are selected. There were total 11 variables of interest identified.
Out of these 11 variables, data set for year 2017 contains only 10 variables. The variable Region is missing in this data set. Region is very important to our analysis since we want to find out the happiest regions across the globe. Hence, a new data set is created in form of a tibble that contains values of Region mapped to each country in data set 2015. This data set is then joined to the data set for year 2017, which helps map 149 countries to their regions. Countries with missing values for region are ommitted and a cleaner data set is created.
The column names representing the same variable are different across the data sets. The names for columns across all data sets are then standardised using the rename() function.
To compare change in happiness scores across the three yearly reports, we need to combine the three data sets. We can only keep the countries which are common to the three data sets since we need to have happiness scores in all three reports. We join cleaned data set for 2016 to cleaned data set for 2015 and omit the missing values. We then join this data set to cleaned data set for year 2017 and omit the missing values again. We come up with records for 146 countries for which we can compare change in happiness scores across 3 years.

####Data Preparation and Cleaning

## Standardising number of variables in each data set
# Not all the data sets are identical in number of variables they represent. Here we standardise the number of variables in each dataset

#Selecting all variables from data set for year 2015
names(imp15)
std15 <- imp15 %>% select(Country, Region,'Happiness Rank','Happiness Score','Economy (GDP per Capita)',Family,  
                          'Health (Life Expectancy)',Freedom,'Trust (Government Corruption)',Generosity,'Dystopia Residual')
names(std15)

#Selecting all variables except Lower Confidence Interval and Upper Confidence Interval for data set of year 2016
names(imp16)
std16 <- imp16 %>% select(Country,Region,'Happiness Rank','Happiness Score','Economy (GDP per Capita)',
                          Family,'Health (Life Expectancy)' ,Freedom,'Trust (Government Corruption)',
                          Generosity,'Dystopia Residual')
names(std16)

# Selecting all variables except 'Whisker.high' and 'Whisker.low' for data set of year 2017
names(imp17)
std17 <- imp17 %>% select(Country,Happiness.Rank,Happiness.Score,Economy..GDP.per.Capita.,Family,Health..Life.Expectancy.,
                          Freedom,Generosity,Trust..Government.Corruption.,Dystopia.Residual)
names(std17)

### Data set for year 2017 does not contain the variable "Region". We will use the data set of year 2015 and reference region names from that

#Create a new dataset containing only countries and regions from data set for year 2015
ref <- select(std15,Country,Region)
ref

newstd17 <- left_join(std17,ref, by = "Country")
dim(newstd17)
str(newstd17)
#Finding missing values of "Region" in new data set
colSums(is.na(newstd17))
c()
c <- which(rowSums(is.na(newstd17)) == 1)
#Omitting the countries for which no region was specified in the data set 2017
clean17 <- na.omit(newstd17)
clean17



###Renaming the variable names across the 3 data sets
#We observe that same variables are represented by different names in the data sets. We standardise these names across all datasets.

## Standardise variable names in data set for year 2015 
summary(std15)
#Renaming variables to be standardised across all data sets
final15 <- std15 %>% rename("HappinessRank" = "Happiness Rank","HappinessScore" = "Happiness Score",
                            "Economy_GDPperCapita" = "Economy (GDP per Capita)",
                            "Health_LifeExpectancy" = "Health (Life Expectancy)",
                            "Trust_GovernmentCorruption" = "Trust (Government Corruption)",
                            "Dystopia_Residual" = "Dystopia Residual")
                      

#Checking for renamed variable names in data set for year 2015
summary(final15)


## Standardise variable names in data set for year 2016
summary(std16)
#Renaming variables to be standardised across all data sets
final16 <- std16 %>% rename("HappinessRank" = "Happiness Rank","HappinessScore" = "Happiness Score",
                            "Economy_GDPperCapita" = "Economy (GDP per Capita)",
                            "Health_LifeExpectancy" = "Health (Life Expectancy)",
                            "Trust_GovernmentCorruption" = "Trust (Government Corruption)",
                            "Dystopia_Residual" = "Dystopia Residual")


#Checking for renamed variable names in data set for Year 2015
summary(final16)

## Standardise variable names in data set for year 2016
summary(clean17)
#Renaming variables to be standardised across all data sets
final17 <- clean17 %>% rename(HappinessRank = Happiness.Rank,HappinessScore = Happiness.Score,
                              Economy_GDPperCapita = Economy..GDP.per.Capita.,Health_LifeExpectancy = Health..Life.Expectancy.,
                              Trust_GovernmentCorruption = Trust..Government.Corruption.,
                              Dystopia_Residual = Dystopia.Residual)


#Checking for renamed variable names in data set for Year 2015
summary(final17)


### Checking for missing values in the data sets
sum(is.na(final15))
sum(is.na(final16))
sum(is.na(final17))

### Common data set for comparison of Happiness Scores over 3 years
happiness <- final15 %>% 
  left_join(final16, by=c("Country","Region")) %>%
  na.omit() %>%
  left_join(final17,by=c("Country","Region")) %>%
  na.omit() %>%
  select(Country,Region,HappinessRank.x,HappinessScore.x,HappinessRank.y,HappinessScore.y,HappinessRank,HappinessScore) %>%
  rename(HappinessRank_2015 = HappinessRank.x,HappinessScore_2015 = HappinessScore.x,HappinessRank_2016 = HappinessRank.y,
         HappinessScore_2016 = HappinessScore.y,HappinessRank_2017 = HappinessRank,HappinessScore_2017 = HappinessScore)

happiness$HappinessScore_2017 <- round(happiness$HappinessScore_2017,3)

#Checking for missing values in the common data set
sum(is.na(happiness))

### For predicting happiness, we combine all three data sets and introduce a new variable for year
master <- inner_join(final15,final16, by = c("Country","Region")) %>%
          inner_join(final17,by = c("Country","Region"))

score.df <- select(happiness, Country = Country,Region = Region,'2015' = 'HappinessScore_2015','2016' = 'HappinessScore_2016','2017' = 'HappinessScore_2017')
score.comb <- gather(score.df,Year,HappinessScore,3:5)

rank.df <- select(happiness, Country = Country,Region = Region,'2015' = 'HappinessRank_2015','2016' = 'HappinessRank_2016','2017' = 'HappinessRank_2017')
rank.comb <- gather(rank.df,Year,HappinessRank,3:5)

Economy.df <- select(master, Country = Country,Region = Region,'2015' = 'Economy_GDPperCapita.x','2016' = 'Economy_GDPperCapita.y','2017' = 'Economy_GDPperCapita')
Economy.comb <- gather(Economy.df,Year,'GDPperCapita',3:5)

Family.df <- select(master, Country = Country,Region = Region,'2015' = 'Family.x','2016' = 'Family.y','2017' = 'Family')
Family.comb <- gather(Family.df,Year,'Family',3:5)

Health.df <- select(master, Country = Country,Region = Region,'2015' = 'Health_LifeExpectancy.x','2016' = 'Health_LifeExpectancy.y','2017' = 'Health_LifeExpectancy')
Health.comb <- gather(Health.df,Year,'LifeExpectancy',3:5)

Trust.df <- select(master, Country = Country,Region = Region,'2015' = 'Trust_GovernmentCorruption.x','2016' = 'Trust_GovernmentCorruption.y','2017' = 'Trust_GovernmentCorruption')
Trust.comb <- gather(Trust.df,Year,'Corruption',3:5)

Freedom.df <- select(master, Country = Country,Region = Region,'2015' = 'Freedom.x','2016' = 'Freedom.y','2017' = 'Freedom')
Freedom.comb <- gather(Freedom.df,Year,'Freedom',3:5)

Generosity.df <- select(master, Country = Country,Region = Region,'2015' = 'Generosity.x','2016' = 'Generosity.y','2017' = 'Generosity')
Generosity.comb <- gather(Generosity.df,Year,'Generosity',3:5)

# Combining datasets for all variables on the variable for Year

combined <- inner_join(score.comb,rank.comb,by = c("Country","Region","Year")) %>%
            inner_join(Economy.comb,by = c("Country","Region","Year")) %>%
            inner_join(Family.comb,rank.comb,by = c("Country","Region","Year")) %>%
            inner_join(Health.comb,rank.comb,by = c("Country","Region","Year")) %>%
            inner_join(Trust.comb,rank.comb,by = c("Country","Region","Year")) %>%
            inner_join(Freedom.comb,rank.comb,by = c("Country","Region","Year")) %>%
            inner_join(Generosity.comb,rank.comb,by = c("Country","Region","Year"))

Clean Data sets

Cleaned data set for year 2015

datatable(head(final15,160),caption = "Table 1: Cleaned data set for Year 2015")

Cleaned data set for year 2016

datatable(head(final16,160),caption = "Table 2: Cleaned data set for Year 2016")

Cleaned data set for year 2017

datatable(head(final17,160),caption = "Table 3: Cleaned data set for Year 2017")

Combined data set for the happiness scores and happiness ranks for years 2015,2016 and 2017

datatable(head(happiness,160))

Master data set for all years containing all major parameters

datatable(head(combined,160))

Summary

We use 3 small data sets for each year,1 data set for happiness scores/ranks and 1 master data set containing data of all years, for exploratory data analysis. We will majorly be using the master data set for the analysis. The summary for each data set is below.

Data set combined

This master data set will be used for most part of the exploratory analysis. We have joined the data sets for all three years and have also included a new variable called Year to indicate which year the record belongs to.

Number of variables : 11
Number of countries : 146

Data set final15

This data set will be used for analysis of happiness score and other variables for the year 2015. From the original data set for year 2015, we have ommitted variable Standard Error and standardised Column names during the cleaning process.

Number of variables : 11
Number of countries : 158
Maximum happiness score : 7.587
Minimum happiness score : 2.839
Mean happiness score : 5.3757
Median happiness score : 5.2325

Happiness score of Bhutan is 5.253.

Data set final16

This data set will be used for analysis of happiness score and other variables for the year 2016. From the original data set for year 2016, we have ommitted variables Lower Confidence Interval and Upper Confidence Interval, and standardised Column names during the cleaning process.

Number of variables : 11
Number of countries : 157
Maximum happiness score : 7.526
Minimum happiness score : 2.905
Mean happiness score : 5.3821
Median happiness score : 5.314

Happiness score of Bhutan is 5.196.

Data set final17

This data set will be used for analysis of happiness score and other variables for the year 2017. From the original data set for year 2016, we have ommitted variables Whisker.high and Whisker.low, and standardised Column names during the cleaning process. We have also added column Region by comparing the primary key Country with data set for year 2015 and adding the regions from that data set.

Number of variables : 11
Number of countries : 149
Maximum happiness score : 7.537
Minimum happiness score : 2.693
Mean happiness score : 5.36045
Median happiness score : 5.279

Happiness score of Bhutan is 5.011.

Set of variables in final16, final16 and final17

The three data sets contain the following variables:
Country - Name of the Country
Region - Region the country belongs to. Example : East Asia
HappinessRank - Rank of the Country based on Happiness Score
HappinessScore - On a scale of 1 to 10, indicates the happiness of the country
Economy_GDPperCapita - The extent to which GDP contributes to the calculation of the Happiness Score
Family - The extent to which Family contributes to the calculation of the Happiness Score
Health_LifeExpectancy - The extent to which Life Expectancy contriutes to calculation of Happiness Score
Freedom - The extent to which Freedom in a country contriutes to calculation of Happiness Score
Generosity - The extent to which Generosity contributed to the calculation of the Happiness Score
Trust_GovernmentCorruption - The extent to which Perception of Corruption contributes to Happiness Score
Dystopia.Residual - The extent to which Dystopia Residual contributed to the calculation of the Happiness Score

Data set happiness

This data set contains the happiness scores and happiness ranks from the data sets final15, final16 and final17, along with the country and region. This data set is built to analyze the changes in happiness scores and happiness ranks over 3 years. This will help us identify the countries who are getting happier by each passing year and which countries need to bring reforms so that their citizens become happier. This data set contains data of those countries only who feature in all the three data sets.

Number of variables : 8
Number of countries : 146

Exploratory Data Analysis

Predicting Happiness

In this section, we perform the following analysis:

Visualize the Happiness Scores for the three years on world maps and check for any patterns.
Identifying countries with increase in happiness from 2015 to 2016 and from 2016 to 2017. Then we compare these with the Happiness Rank to identify countries with happiness rank within 40 for all years. This will be termed as Happiness Zone.
Amongst the countries identified with increasing happiness, we will identify countries with Happiness Rank between 40 and 80 for all years and call them a part of High Potential Zone for happiness.
Identifying countries with consistently decreasing happiness scores over the years. We will term this as Alarming Zone.

Year Wise Happiness

We have Happiness Scores from 3 years for more than 146 countries. Let us visualize the scores on a world map and see if we can observe any patterns.

 #Worldmap2015 happiness scores
world15 <- joinCountryData2Map(final15, joinCode = "NAME", nameJoinColumn = "Country")
mapCountryData(world15,nameColumnToPlot = "HappinessScore",mapTitle = "Happiness Scores across the Globe - 2015")

 #Worldmap2016 happiness scores
world16 <- joinCountryData2Map(final16, joinCode = "NAME", nameJoinColumn = "Country")
mapCountryData(world16,nameColumnToPlot = "HappinessScore",mapTitle = "Happiness Scores across the Globe - 2016",colourPalette = "terrain")

 #Worldmap2017 happiness scores
world17 <- joinCountryData2Map(final17, joinCode = "NAME", nameJoinColumn = "Country")
mapCountryData(world17,nameColumnToPlot = "HappinessScore",mapTitle = "Happiness Scores across the Globe - 2017",colourPalette = "rainbow")

We can see that Africa and parts of Asia have very low hapiness scores across all the three years.

Trend for top 10 countries in 2017

We take the top 10 happy countries in 2017 and see how they have fared over the years.

#Line chart of variation over three years/=
c1 <- final17 %>% filter(HappinessRank <= 10) %>% select(Country)
c2 <- as.list(c1)
View(c2)

viz1 <- combined %>% filter(combined$Country %in% c2$Country == 1)

ggplot(viz1,aes(Year,HappinessScore,color = Country)) + 
  geom_line(aes(group = Country)) + geom_text(aes(label = Country),size = 2) +
  geom_point() + theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank())

Finland is the only county which has been improving in happiness scores for the past three years.

Canada,Switzerland and Sweden have seen a decline in happiness scores over the past three years.

Zone Wise Happiness

Happiness Zone

The happiness zone would comprise of countries showing steady increase in happiness score from 2015 to 2016 and from 2016 to 2017, and which are ranked in top 40 countries as per the happiness scores for all the three years. This group of countries can be termed as the Happiness Zone and would be ideal to live in the present scenario.

  hapzone <- happiness %>%
              filter(HappinessScore_2016 > HappinessScore_2015 & HappinessScore_2017 > HappinessScore_2016) %>%
  filter(HappinessRank_2015 <= 40 & HappinessRank_2016 <= 40 & HappinessRank_2017<= 40)

  datatable(head(hapzone,10))

Finland is the only country in the top 10 ranked countries which shows increase in happiness score over all three years and is a part of the Happiness Zone. It is interesting to see that all these countries are from the European Regions. All other countries have seen a drop in their happiness scores atleast once over the course of the three years.

High Potential Zone for Happiness

Outside the top 40 ranked countries also, there are countries exhibiting positive change in happiness scores with every passing year.These countries comprise the High Potential Zone For Happiness.

  potential <- happiness %>%
  filter(HappinessScore_2016 > HappinessScore_2015 & HappinessScore_2017 > HappinessScore_2016)    %>%
  filter(HappinessRank_2015 <= 80 & HappinessRank_2016 <= 80 & HappinessRank_2017 <= 80 & HappinessRank_2015 > 40 & HappinessRank_2016 > 40 & HappinessRank_2017 > 40)

  datatable(head(potential,50))

 ##Potential Zone visualization
potentialviz <- joinCountryData2Map(potential, joinCode = "NAME", nameJoinColumn = "Country")

mapCountryData(potentialviz,nameColumnToPlot = "Country",mapTitle = "High Potential Zone for Happiness",borderCol = "black",colourPalette="rainbow")

Alarming Zone

Countries which show a continuous trend of decreasing happiness score comprise the Alarming Zone.

  alarmzone <- happiness %>%
              filter(HappinessScore_2016 < HappinessScore_2015 & HappinessScore_2017 < HappinessScore_2016)

  datatable(head(alarmzone,50))

We can see there are a total of 50 countries in the alarming Zone. Let us visualize them on a map.

 ##Alarming Zone visualization
alarmviz <- joinCountryData2Map(alarmzone, joinCode = "NAME", nameJoinColumn = "Country")

mapCountryData(alarmviz,nameColumnToPlot = "Country",mapTitle = "Alarming Zone",borderCol = "black",colourPalette="heat",addLegend=FALSE)

An interesting observation is that United States is also a part of this zone.

Correlation of contributing factors

In this section, we study how the happiness score is correlated with variables for GDP per Capita,LifeExpectancy, Freedom, Generosity, Corruption and Social Support.

We will try to find answers to questions like:
How does the GDP per Capita contribute to happiness score?
Does high contribution of GDP per Capita also imply high contribution of life expectancy in calculation of happiness score?

First we check the correlation between the variables for the three years and check if we observe any trend.

We have generated the correlation matrix for 3 years below.

cor.matrix.2015 <- cor(final15[,4:10])
ggcorrplot(cor.matrix.2015,method = "circle", colors = c("brown","pink","blue")) + ggtitle("Correlation Matrix for 2015")

cor.matrix.2016 <- cor(final16[,4:10])
ggcorrplot(cor.matrix.2016,method = "circle", colors = c("blue","black","yellow")) + ggtitle("Correlation Matrix for 2016")

cor.matrix.2017 <- cor(final17[,4:10])
ggcorrplot(cor.matrix.2017,method = "circle", colors = c("red","white","orange")) + ggtitle("Correlation Matrix for 2017")

We can see that Happiness Score has a strong positive correlation with GDP per Capita, Life Expectancy, Freedom and Social Support. Generosity has a weak effect on happiness scores.

We are now interested in seeing how Happiness Scores vary with these factors over the three years.

Happiness Scores vs. GDP per Capita

combined$Year <- factor(combined$Year)

ggplot(data = combined, aes(x = GDPperCapita, y = HappinessScore,color = Year)) + geom_point(alpha = 0.5,aes(color = Year)) + scale_fill_viridis(discrete = F) + geom_smooth(se = FALSE)

We can see that Happiness Scores increase with GDP per Capita for all three years.

Happiness Scores vs. Life Expectancy

ggplot(data = combined, aes(x = LifeExpectancy, y = HappinessScore,color = Year)) + geom_point(alpha = 0.5,aes(color = Year)) + scale_fill_viridis(discrete = F) + geom_smooth(se = FALSE)

For countries with very low life expectancy, we see that Happiness Scores decreases with increasing life expectancy. For life expectancy index greater than 0.25, Happiness scores increase with life expectancy index.

Happiness Scores vs. Freedom

ggplot(data = combined, aes(x = Freedom, y = HappinessScore,color = Year)) + geom_point(alpha = 0.5,aes(color = Year)) + scale_fill_viridis(discrete = F) + geom_smooth(se = FALSE)

Higher Freedom indicates higher happiness score here.

Happiness Scores vs. Corruption

ggplot(data = combined, aes(x = Corruption, y = HappinessScore,color = Year)) + geom_point(alpha = 0.5,aes(color = Year)) + scale_fill_viridis(discrete = F) + geom_smooth(se = FALSE)

For countries with higher corruption index, happiness scores decrease with increase in corruption. However, for countries with moderate corruption index, happiness scores tend to increase with increasing corruption.

Factors like health expectancy and GDP per Capita might also have a correlation amongst themselves and it will be interesting to see if high GDP per Capita also means high life expectancy or high freedom in a country. We check these below.

Life Expectancy vs. GDP per Capita

ggplot(data = combined, aes(x = GDPperCapita, y = LifeExpectancy,color = Year)) + geom_point(alpha=0.5,aes(color = Year)) + scale_fill_viridis(discrete = F) + geom_smooth(se = FALSE)

Life Expectancy increases for most part with increasing GDP per Capita. This might be due to access to better medical facilities and higher affordability.

Freedom vs. GDP per Capita

ggplot(data = combined, aes(x = GDPperCapita, y = Freedom,color = Year)) + geom_point(alpha = 0.5,aes(color = Year)) + scale_fill_viridis(discrete = F) + geom_smooth(se = FALSE)

For GDP per Capita index less than 1, the freedom does not change by much with GDP per Capita. However for larger GDP per Capita index, freedom increases with increasing GDP per Capita, implying citizens in prosperous economies feel they have high sense of freedom.

Corruption vs. GDP per Capita

ggplot(data = combined, aes(x = GDPperCapita, y = Corruption ,color = Year)) + geom_point(alpha = 0.5,aes(color = Year)) + scale_fill_viridis(discrete = F) + geom_smooth(se = FALSE)

For countries with GDP per Capita index lower than 1, corruption has a slight decreasing trend with increasing GDP per Capita. However for countries with GDP per Capita greater than 1, the corrution increases drastically with increased GDP per Capita. This indicates that prosperous economies also run with the constant risk of higher corruption.

Generosity vs. Freedom

ggplot(data = combined, aes(x = Freedom, y = Generosity,color = Year)) + geom_point(alpha = 0.5,aes(color = Year)) + scale_fill_viridis(discrete = F) + geom_smooth(se = FALSE)

We see that Freedom has a very weak effect on generosity, implying that a country with higher sense of freedom does not necessarily mean that people will be more generous as well.

Comparison by region

In this section, we will study how the average happiness score varies from region to region. This will help us identify the regions with the highest and the lowest average happiness score.

Happiness Scores By Year

We check how the happiness scores have varied for the three years by region.

#2015
line15 <- aggregate(x = final15,by = list(as.factor(final15$Region)),FUN = "mean")
line15new <- rename(line15,RegionName = Group.1)

theme_set(theme_bw())
ggplot(line15new,aes(RegionName,HappinessScore)) + geom_point(aes(color = RegionName),size = 3) + geom_segment(aes(x = RegionName,xend = RegionName,y = 0,yend = HappinessScore)) + theme(axis.text.x = element_text(angle = 90)) + ggtitle("Happiness Score by Region for 2015")

For 2015, we can see that Australia and New Zealand along with North America exhibit high average happiness score, while Sub-Saharan Africa and Southeastern Asia have the lowest scores.

#2016
line16 <- aggregate(x = final16,by = list(as.factor(final16$Region)),FUN = "mean")
line16new <- rename(line16,RegionName = Group.1)

theme_set(theme_bw())
ggplot(line16new,aes(RegionName,HappinessScore)) + geom_point(aes(color = RegionName),size = 3) + geom_segment(aes(x = RegionName,xend = RegionName,y = 0,yend = HappinessScore)) + theme(axis.text.x = element_text(angle = 90)) + ggtitle("Happiness Score by Region for 2016")

For 2016, we can see that Australia and New Zealand along with North America exhibit high average happiness score, while Sub-Saharan Africa and Southeastern Asia have the lowest scores.

#2017
line17 <- aggregate(x = final17,by = list(as.factor(final17$Region)),FUN = "mean")
line17new <- rename(line17,RegionName = Group.1)

theme_set(theme_bw())
ggplot(line17new,aes(RegionName,HappinessScore)) + geom_point(aes(color = RegionName),size = 3) + geom_segment(aes(x = RegionName,xend = RegionName,y = 0,yend = HappinessScore)) + theme(axis.text.x = element_text(angle = 90)) + ggtitle("Happiness Score by Region for 2017")

The average happiness scores for each region follows the same trend in 2017 as for the previous two years.

Happiness Score Trend by Region

###Overall Happiness Fluctuation
overallhap <- inner_join(line15new,line16new,by = "RegionName") %>%
              inner_join(line17new, by = "RegionName") %>%
              select(RegionName,'2015' = HappinessScore.x,'2016'= HappinessScore.y,'2017'= HappinessScore) %>%
              gather(Year,AverageHappinessScore,2:4)

theme_set(theme_bw())
ggplot(overallhap,aes(Year,AverageHappinessScore,color = RegionName)) + 
  geom_line(aes(group = RegionName)) + geom_text(aes(label = RegionName),size = 2)+
  geom_point() + theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank())

Regions showing increase in happiness scores over the years are Australia and New Zealand, Southern Asia and Southeastern Asia.

Regions showing slight drop in happiness scores over the years are North America, Latin America and Carribean,Eastern Asia, Middle East and Northern Africa and Sub-Saharan Africa.

The drop in happiness score of Sub-Saharan Africa region is alarming because it is anyways at the bottom of the happiness score table and is going down further.

Outliers in happiness scores by Region

#Boxplots
theme_set(theme_bw())

final15$Region <- factor(final15$Region)
ggplot(final15,aes(Region,HappinessScore)) + geom_boxplot(aes(fill = Region,outlier.color = "red",alpha = 0.3)) +  
  scale_fill_viridis(discrete = T) + theme(axis.text.x = element_text(angle = 90)) + theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank(),axis.ticks = element_blank()) + ggtitle("Boxplot for Happiness Scores For 2015")

We can see that there is one country in Latin America and Carribean region which has a happiness score way lower than the mean happiness score for that region.

#Boxplots
theme_set(theme_bw())

final16$Region <- factor(final16$Region)
ggplot(final16,aes(Region,HappinessScore)) + geom_boxplot(aes(fill = Region,outlier.color = "red",alpha = 0.3)) +  
  scale_fill_viridis(discrete = T) + theme(axis.text.x = element_text(angle = 90)) + theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank(),axis.ticks = element_blank()) + ggtitle("Boxplot for Happiness Scores For 2016")

We can see that there is one country each in Latin America and Carribean as well as Southern Asia regions which has a happiness score way lower than the mean happiness score for that region. One country in Sub-Saharan Africa is doing way better than the entire region in terms of happiness.

#Boxplots
theme_set(theme_bw())

final17$Region <- factor(final17$Region)
ggplot(final17,aes(Region,HappinessScore)) + geom_boxplot(aes(fill = Region,outlier.color = "red",alpha = 0.3)) +  
  scale_fill_viridis(discrete = T) + theme(axis.text.x = element_text(angle = 90)) + theme(panel.grid.major = element_blank(),panel.grid.minor = element_blank(),axis.ticks = element_blank()) + ggtitle("Boxplot for Happiness Scores For 2017")

We can see that there is one country each in Latin America and Carribean as well as Central and Eastern Europe regions which has a happiness score way lower than the mean happiness score for that region. One country in Sub-Saharan Africa is doing way better than the entire region in terms of happiness.

Happier than Bhutan?

Average Happiness Score of Bhutan is 5.15 over the three years.

We compare Bhutan’s Average Happiness Score with countries whose GDP per Capita index is within 0.2 unit of Bhutan. There are 40 such countries. This will give us a better idea of Bhutan’s Happines compared to countries of its own economic size.

This will help us find an answer to the question -
“Should one consider living in Bhutan taking into account the steps their government has undertaken to measure and promote happiness?”

bhutan.df <- mutate(score.df,AvgHappinessScore=(score.df$`2015` + score.df$`2016`+ score.df$`2017`)/3)
avg.bhutan <- bhutan.df %>% filter(Country == 'Bhutan') %>% select(AvgHappinessScore)

bhutan.Economy <- mutate(Economy.df,AvgGDPperCapita =(Economy.df$`2015` + Economy.df$`2016`+ Economy.df$`2017`)/3)
avg_GDP<- bhutan.Economy %>% filter(Country == 'Bhutan') %>% select(AvgGDPperCapita)
avg.GDP <- as.numeric(avg_GDP)

count1 <- avg.GDP + 0.2
count2 <- avg.GDP - 0.2

bhutan.df.filter <- bhutan.Economy %>% filter(AvgGDPperCapita < count1 & AvgGDPperCapita > count2)

View(bhutan.df.filter)

bhutan.df.GDP <- inner_join(bhutan.df,bhutan.df.filter, by= c("Country","Region")) %>%
                  select(Country,Region,AvgHappinessScore,AvgGDPperCapita)

View(bhutan.df.GDP)

avg <- as.numeric(avg.bhutan)  
bhutan.df.GDP$diff <- round((bhutan.df.GDP$AvgHappinessScore - avg), 2)
bhutan.df.GDP$check <- ifelse((bhutan.df.GDP$diff>0),"above","below")
bhutan.df.GDP <- bhutan.df.GDP[order(bhutan.df.GDP$diff),] 
bhutan.df.GDP$Country <- factor(bhutan.df.GDP$Country,levels = bhutan.df.GDP$Country)

ggplot(bhutan.df.GDP, aes(x = Country, y = diff, label = diff)) + 
  geom_bar(stat = 'identity', aes(fill = bhutan.df.GDP$check), width = 0.5)  +
  scale_fill_manual(name = "Average Happiness Score", 
                    labels = c("Above Bhutan", "Below Bhutan"), 
                    values = c("above" = "black", "below" = "blue")) + coord_flip() + labs(y="Difference in Average Happiness Score from Bhutan")

Within the 41 countries of its own economic size, Bhutan lies nearly in the middle when it comes to Average Happiness Score.

Summary

We performed the exploratory analysis on the World Happiness Report data sets for the years 2015 to 2017, to understand how Happiness varies across the globe and what affects it the most.We also tried to see how Bhutan, which is highly proactive on promoting happiness, fares when compared globally.

Here’s summarising what we found:

Finland is the only country that has seen increase in happiness amongst the top 10 countries of World Happiness Report 2017.
United States has been witnessting a decrease in happiness scores over the past 3 years. So has North America as a region.
The four countries that are a part of the Happiness Zone are all from the European Region.
Sub-saharan Africa as a region has the lowest happiness scores for all three years. It is also witnessing a decrease in happiness with each passing year.
Australia and New Zealand has the highest happiness scores as a region across the three years.
GDP per capita as a factor has a positive effect on Happiness. Life Expectancy and Freedom have similar effects on Happiness.
Countries with high corruption index have a low happiness score.
Countries with higher GDP per Capita are expected to find more corruption

Based on this analysis, we can also answer the question - “Should one consider living in Bhutan taking into account the steps their government has undertaken to measure and promote happiness?”

Bhutan, for all its efforts in promoting happiness through constant measuring of Gross National Happiness Index, still has a long way to go in this direction. Even within economies of its own size, it lies in the middle, which indicates that more efforts are required to elevate Bhutan to the position of the happiest country.

Analysis on Happiness across Countries

Devanshu Awasthi

December 3,2017

Living in Bhutan vs Rest of the World

Packages Required

Data Preparation

Data Source

Purpose of Collected Data

Data Importing, Cleaning and Manipulation

Clean Data sets

Cleaned data set for year 2015

Cleaned data set for year 2016

Cleaned data set for year 2017

Combined data set for the happiness scores and happiness ranks for years 2015,2016 and 2017

Master data set for all years containing all major parameters

Summary

Data set combined

Data set final15

Data set final16

Data set final17

Set of variables in final16, final16 and final17

Data set happiness

Exploratory Data Analysis

Predicting Happiness

Year Wise Happiness

Trend for top 10 countries in 2017

Zone Wise Happiness

Happiness Zone

High Potential Zone for Happiness

Alarming Zone

Correlation of contributing factors

Comparison by region

Happiness Scores By Year

Happiness Score Trend by Region

Outliers in happiness scores by Region

Happier than Bhutan?

Summary