1 Overview

The Happiness Index is measured based on the answers from a main life evaluation question asked in Gallup World Poll. The technique used is called the Cantril ladder - respondents are asked to think of a ladder, with the best possible life for them being a 10, and the worst possible life being 0. They are then asked to rate their own current lives on that 0 to 10 scale. The Happiness Index focuses on six factors that are estimated to contribute to making life evaluation higher in each country than they are in Dystopia (a hypothetical country that has the world’s lowest national averages for each of the six factors)

  • Level of GDP per Capita
  • Life Expectancy
  • Generousity
  • Social Support
  • Freedom
  • Corruption

However, research has shown that Happier countries are more suicidal as compared to less happy countries. There could be many reasons and factors behind a higher suicidal rate. Moreover, the Happiness Index does not include the psychological and mental state of the citizens.

Therefore, the purpose of this assignment is to visualise World Happiness Index and analyse whether a country is truly “happy” using factors that measure the mental state of the citizens.

1.1 Data Collection

There are a total of four datasets used in this assignment.

Dataset Description
World Hapiness Index Provides World Happiness Index of 157 countries from 2015-2019
Depression rates Provides the percentage of population suffering from depression for 231 countries from 1990-2017
Crude Suicide Rates Crude Suicide Rate per 1000 people for 183 countries from 2000-2019
World Country Dataset A shapefile for the world country

As the latest dataset for Depression rates is 2017, we will be standardising the visualisation by using 2017 data for World Happiness and Crude Suicide Rates.

1.2 Challenges Faced

1.2.1 Deciding on factors that measures “Unhappiness”

There are many factors in determining unhappiness within a country. Based on my research, suicide and depression rates can be used an indicator to measure the mental state of the country. Although it could not a great measure, but the difference in values will represent the mental state of the citizens. Hence, the factors used to determine the mental state of the country is suicide and depression rates.

1.2.2 Inconsistent naming of countries throughout the dataset

Due to the usage of different dataset for each measure, there is bound to be inconsistency in the naming of countries. For example, within Crude Suicide Rates dataset, Vietnam is named as “Viet Nam” instead. Therefore, there is a need to standardise the countries names accordingly as the inconsistent naming will result to many “NA” countries when join is performed on the dataset.

World Country Dataset will be the main dataset, so the other three dataset countries’ names will be rename accordingly to match with World Country Dataset.

1.2.3 Countries data are missing for some dataset

Even after standardising the countries’ naming, there are still missing countries data and this is due to the different data sources. Referencing to the previous table on the data collection, World Happiness Index has only data for 157 countries while Depression and Suicide have 231 and 183 data. Since the assignment is focused on visualising countries with Happiness Index, countries that have no happiness index will be excluded from the visualisation and analysis.

1.2.4 Crude Suicide Rate Dataset

1. High range of Crude Suicide Rate between countries

Data summary
Name suicide
Number of rows 183
Number of columns 5
_______________________
Column type frequency:
character 1
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Country 0 1 4 52 0 183 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Year 0 1 2017.00 0.00 2017 2017.00 2017.0 2017.00 2017.0 ▁▁▇▁▁
Overall 0 1 9.67 8.44 0 4.70 7.7 12.45 78.3 ▇▁▁▁▁
Male 0 1 15.03 14.01 0 6.80 11.4 18.95 127.2 ▇▁▁▁▁
Female 0 1 4.43 3.58 0 2.05 3.5 6.10 30.8 ▇▂▁▁▁

By referencing to the data summary of the suicide dataset, we can see that the range for Overall is very large with the maximum value of 78.3 per 1000 people in the country. However, this could be due to the difference in population in that particular country. It could be seen in the following scatter plot that shows the population and crude suicide rate. Based on the graph, Lesotho have a high crude suicide rate but low population as compared to countries with higher population. While plotting the map for Crude Suicide Rate, there was an overlap of values with South Africa and Lesotho due to the similarity of geolocation. Therefore, it would be better to exclude Lesotho from the visualisation.

2. Removal of data that is not useful for analysis

Other than Crude Suicide rate for each country, the dataset also provides an error range for the crude suicide rate. However, for the simplicity of the visualisation, it would better to remove the additional information as it might affect the plotting of map visualisation. As shown below, there are square brackets beside each crude suicide rate, and it needs to be remove for Both Sexes, Male, Female. Therefore, dplyr::mutate_at and substring will be used to extract the error range for the three columns.

3. Difficult to display Male and Female in one map

The Crude Suicide Rate dataset provides the rate for Overall, Male and Female. However, it would be difficult to display both Male and Female in one map as they have the similar coordinates. It will result to them overlapping each other and thus defeating the purpose of the visualisation. Although it can be solve by using tmap::facet, it does not really provide much insights to the analysis. Therefore, the visualisation will only be displaying the overall crude suicide rate instead.

2 Sketch of Proposed Design

Proposed Design Sketch

  • Overview map visualisation on World Happiness Index

Using map visualisation, it provides an overview on the difference in happiness score around the world and further analyse it with other factors such as depression and suicide rate.

  • Top 10 Happiest & Unhappiest Country

This chart acts as an introduction to World Happiness Index, which displays the top 10 happiest and unhappiest countries based on the happiness score given. It further allows the user to have a knowledge on the countries to focus the analysis on and the distribution for Happiness Index dataset.

  • Visualising the relationship between Happiness Score and Depression/Suicide

By using scatter plots, it provides insights on the correlation between Happiness Score and the two factors used.

3 Step-by-Step Data Visualisation

3.1 Install and Load R Packages

  • sf reads geospatial data into simple features that are suitable for visualisation.
  • tidyverse contains a set of essential packages for data manipulation and exploration.
  • dplyr provides functions for data manipulation such as mutate and filter.
  • skimr helps to create summary statistics about variables in dataframes, tibbles, data tables and vectors.
  • plotly to create interactive web graphics from ‘ggplot2’ graphs.
  • leaflet to create interactive maps that includes many interesting features.
  • viridis is a color palette library package that includes pretty colour scales for plotting and palette for colourblindness and gray scale palette.
packages = c('tidyverse','dplyr','skimr','plotly','sf','leaflet','viridis')

for (p in packages){
  if (!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

if (!require(devtools)) {
    install.packages('devtools')
}

3.2 Load the Dataset

world <- st_read(dsn = "data/geospatial", layer = "world_countries_boundaries")
happiness <- read_csv('data/happiness-index/2017.csv')
depression <- read_csv('data/share-with-depression.csv')
suicide <- read_csv('data/crude_suicide.csv')

3.3 Data Preparation

The data needs to be transformed to provide a more insightful visualisation. The steps for data preparation are Data Wrangling and Merging of Attributes and Geospatial Data.

3.3.1 Data Wrangling

This section of the data preparation consists of three sections. The standardisation of the countries’ name in all datasets, followed by filtering data needed for visualisation and finally renaming columns to a more readable name.

The data transformation for Suicide dataset differs from others due to the additional information within the Crude Suicide Rate. extract_num is then created to extract the Crude Suicide Rate from the original string that includes the error range. The function uses substring and regexpr to identify the string position to extract the Crude Suicide Rate from the string and using as.numeric to transform the type to numeric instead of character.

# Renaming the countries to be consistent to the main dataset (world dataset)
happiness$Country[happiness$Country == "Taiwan Province of China"] <- "Taiwan"
happiness$Country[happiness$Country == "Myanmar"] <- "Myanmar (Burma)"
happiness$Country[happiness$Country == "South Sudan"] <- "Sudan"
happiness$Country[happiness$Country == "Tanzania"] <- "Tanzania, United Republic of"

depression$Entity[depression$Entity == "Czechia"] <- "Czech Republic"
depression$Entity[depression$Entity == "Tanzania"] <- "Tanzania, United Republic of"

suicide$Country[suicide$Country == "United Kingdom of Great Britain and Northern Ireland"] <- "United Kingdom"
suicide$Country[suicide$Country == "United States of America"] <- "United States"
suicide$Country[suicide$Country == "United Republic of Tanzania "] <- "Tanzania"
suicide$Country[suicide$Country == "Viet Nam"] <- "Vietnam"
suicide$Country[suicide$Country == "Venezuela (Bolivarian Republic of)"] <- "Venezuela"
suicide$Country[suicide$Country == "Bolivia (Plurinational State of)"] <- "Bolivia"
suicide$Country[suicide$Country == "Russian Federation"] <- "Russia"
suicide$Country[suicide$Country == "United Republic of Tanzania"] <- "Tanzania, United Republic of"
suicide$Country[suicide$Country == "Democratic People's Republic of Korea"] <- "North Korea"
suicide$Country[suicide$Country == "Republic of Korea"] <- "South Korea"
suicide$Country[suicide$Country == "Iran (Islamic Republic of)"] <- "Iran"
suicide$Country[suicide$Country == "Myanmar"] <- "Myanmar (Burma)"
suicide$Country[suicide$Country == "Syrian Arab Republic"] <- "Syria"

# Extract the data needed from column, rename the column and remove outliers 
extract_num <- function(string, char) {
  as.numeric(substring(string, 1, regexpr(" ", string)-1))
}

suicide <- suicide %>%
  mutate_at(c("Both sexes", "Male","Female"), extract_num) %>%
  rename("Overall" = "Both sexes") %>% filter(Year=="2017" & Overall < 70 & Country != "Lesotho")

happiness <- happiness %>% filter(Country!="Lesotho")

# Extract the data from 2017 and rename the column to a shorter form
depression <- depression %>% filter(Year=="2017") %>% 
  rename("depression" = "Prevalence - Depressive disorders - Sex: Both - Age: Age-standardized (Percent)")

3.3.2 Merging attribute and geospatial data

Using the transformed data, we will first merge the three attribute datasets followed by merging it with the geospatial data. For the merging of attribute data, the World Happiness dataset will be used as the primary dataset, therefore left_join is used on merging the dataset. Countries without happiness index score will be excluded from the visualisation.

Once all the attribute data have been merged into one dataset, it is then merged with the geospatial data using to enable the ability to visualise the data on a geographical map.

# merging of data for 3 attribute data 
HND <- left_join(happiness,depression[c('Entity','depression')],by=c('Country'='Entity'))
HDS <- left_join(HND, suicide[c('Country','Overall')], by=c('Country'='Country')) %>% rename("suicide" = "Overall")

# merge with geo-data 
final_dat <- inner_join(world, HDS, by = c("CNTRY_NAME" = "Country"))

# simplify the geometry
final_dat <- rmapshaper::ms_simplify(final_dat, keep = 0.05, keep_shapes = TRUE) 
## Registered S3 method overwritten by 'geojsonlint':
##   method         from 
##   print.location dplyr

3.4 Visualising the Bar Chart

To visualise the top 10 happiest and unhappiest country, some data wrangling is needed to extract the top 10 and bottom 10 records from the dataset.

top <- head(happiness,10)[c('Country','Happiness.Score')]
top$Rank <- 'Top 10'

bottom <- tail(happiness,10)[c('Country','Happiness.Score')]
bottom$Rank <- 'Bottom 10'

newdf <- rbind(top,bottom)

ggplot and plotly is then used to visualise the top 10 happiest and unhappiest countries in the world.

library(plotly)
ggplotly(
  ggplot(newdf) +
  geom_bar(aes(x=reorder(Country, Happiness.Score),y=Happiness.Score,fill=Rank,text=paste0(Country,": ",round(Happiness.Score,2))),
           stat="identity") + coord_flip()+ 
    scale_y_continuous(breaks=c(0:9)) +
    labs(y='Happiness Score', x= 'Country', title= "Top 10 Happiest and Unhappiest Country (2017)")+ theme_minimal() ,
  tooltip = 'text')

3.5 Visualising the Scatter Plots

Similarly, ggplot and plotly is also used to visualise the relationship between the factors.

library(plotly)
library(viridis)
## Loading required package: viridisLite
ggplotly(ggplot(final_dat, aes(y=Happiness.Score, x=suicide, color=CNTRY_NAME)) + 
    geom_point()+ scale_color_viridis(discrete=TRUE)+
      theme_minimal(base_family='Tahoma') + theme(
    axis.title.x  = element_text(size=9),
    axis.title.y = element_text(size=9))+ 
      labs(x="Crude Suicide Rate (per 1000)", y="Happiness Score",title="Happiness Index & Crude Suicide Rate by Country"))
ggplotly(ggplot(final_dat, aes(y=Happiness.Score, x=depression, color=CNTRY_NAME)) + 
    geom_point()+ scale_color_viridis(discrete=TRUE) +
      theme_minimal(base_family='Tahoma') + theme(
    axis.title.x  = element_text(size=9),
    axis.title.y = element_text(size=9))+ labs(x="Depression Rate", y="Happiness Score",title="Happiness Index & Depression Rate by Country"))

3.6 Visualising Map Visualisation

There are a total of 4 steps on visualising the map -

  1. Visualising the World Happiness Index with Chloropleth Mapping
  2. Visualising the Depression Rate using Bubbles
  3. Visualising the Crude Suicide Rate using Bubbles
  4. Combination of all visualisation using LayersControl

3.6.1 Visualising World Happiness Index with Chloropleth Mapping

The first step is to visualise the World Happiness Index using Chloropeth Mapping, therefore leaflet and addPolygons is used in the code.

library(leaflet)
library(tidyverse)

# prepare map labels
labels <- sprintf(
  "<strong>%s</strong><br/>Happiness Score: %g",
  final_dat$CNTRY_NAME, final_dat$Happiness.Score
) %>% lapply(htmltools::HTML)

# declare colour palette
pal <- colorBin(
  palette = 'RdYlGn',
  domain = final_dat$Happiness.Score,5, na.color="lightgrey")

# visualise the data on map
leaflet(final_dat) %>%
  addTiles(options=tileOptions(opacity=0.6)) %>%
  addPolygons(weight = 1, color = "grey", smoothFactor = 0.3,
              fillOpacity = 0.8, fillColor = ~pal(final_dat$Happiness.Score), 
              highlightOptions = highlightOptions(color = "white", weight = 2),label=labels) %>%
  addLegend(pal = pal, values = ~(final_dat$Happiness.Score), opacity = 1.0,title = "Happiness Score",position = "bottomright")

3.6.2 Visualising Depression Rate with Bubbles Mapping

Bubbles mapping requires a different function, addCircles, which needs the Latitude and Longitude to plot the circle. Therefore, st_centroid is used to transform multi-polygon to centroid. Although st_centroid does not always return an accurate representation, it enables us to use it for visualisation.

Other than centroid, radius is also required to plot circles on the map. Hence, the depression rate needs to be multipied by 50,000 to ensure that the radius is large enough for the user to perform an analysis.

labels <- sprintf(
  "<strong>%s</strong><br/>Depression : %g",
  final_dat$CNTRY_NAME, final_dat$depression) %>% lapply(htmltools::HTML)

final_dat$centroid = st_centroid(final_dat)

leaflet(final_dat) %>%
  addTiles(options=tileOptions(opacity=0.6)) %>%
  addCircles(data = final_dat$centroid, fill = TRUE, stroke = TRUE, color = "blue", 
             radius = final_dat$depression*50000, weight = 1, label = labels, 
             highlightOptions = highlightOptions(color = "white", weight = 2))

3.6.3 Visualising Suicide Rates with Bubbles Mapping

The visualisation of Crude Suicide Rates is similar to Depression Rate as both of them are bubble maps. The only difference is that Crude Suicide Rate has a lower radius as compared to Depression Rate because Crude Suicide Rate have a higher range.

labels <- sprintf(
  "<strong>%s</strong><br/>Crude suicide Rate : %g",
  final_dat$CNTRY_NAME, final_dat$suicide
) %>% lapply(htmltools::HTML)

leaflet(final_dat) %>%
  addTiles(group = "OpenStreeMap.Default", options=tileOptions(opacity=0.6)) %>%
  addCircles(data = final_dat$centroid, fill = TRUE, stroke = TRUE, 
             radius = final_dat$suicide*5000,  weight = 1, label = labels,
             highlightOptions = highlightOptions(color = "white", weight = 2))

3.6.4 Combination of map charts using LayersControl

As there are two bubble charts to visualise, addLayersControl is used to enable user to switch between Depression and Suicide to analyse the relationship between Happiness Index. There are a total of two layers for the visualisation, Crude Suicide Rate and Depression Rate, which can be accessed easily on the top-left corner of the map visualisation. Both layers will have choropleth mapping of World Happiness Index with an addition of either Crude Suicide Rate or Depression Rate in a form of bubbles.

labels <- sprintf(
  "<strong>%s</strong><br/>Happiness Score: %g<br/>Depression: %g",
  final_dat$CNTRY_NAME, final_dat$Happiness.Score, final_dat$depression
) %>% lapply(htmltools::HTML)

slabels <- sprintf(
  "<strong>%s</strong><br/>Happiness Score: %g<br/>Crude Suicide Rate: %g",
  final_dat$CNTRY_NAME, final_dat$Happiness.Score, final_dat$suicide
) %>% lapply(htmltools::HTML)

hpal <- colorBin(
  palette = 'RdYlGn',
  domain = final_dat$Happiness.Score,5, na.color="lightgrey")

final_dat$centroid = st_centroid(final_dat)

leaflet(final_dat) %>%
  addTiles(options=tileOptions(opacity=0.6)) %>%
  setView(lng = 5.55224, lat = 38.95747, zoom = 2) %>%
  
  # Depression layer
  addPolygons(weight = 1, color = "grey", smoothFactor = 0.3,
              fillOpacity = 0.8, fillColor = ~hpal(final_dat$Happiness.Score), 
              highlightOptions = highlightOptions(color = "white", weight = 2),
              label=labels, group="Depression Rate") %>%
  addCircles(data = final_dat$centroid, fillColor = '#32a8a4', 
             fillOpacity = 0.9, stroke = TRUE, radius = final_dat$depression*25000,
             weight = 1, label = labels,
             highlightOptions = highlightOptions(color = "white", weight = 2), 
             group = "Depression Rate") %>%
  
  # Suicide layer
  addPolygons(weight = 1, color = "grey", smoothFactor = 0.3,
              fillOpacity = 0.8, fillColor = ~hpal(final_dat$Happiness.Score), 
              highlightOptions = highlightOptions(color = "white", weight = 2),
              label=slabels,group="Crude Suicide Rate") %>%
  addCircles(data = final_dat$centroid, fillColor = 'purple', 
             fillOpacity = 0.9,radius = final_dat$suicide*5000,  
             weight = 1,label=slabels,
             group='Crude Suicide Rate') %>%
  
  addLegend(pal = hpal, values = ~(final_dat$Happiness.Score), opacity = 1.0,
            title = "Happiness Index",position = "bottomright") %>%
  addLayersControl(
    baseGroups = c("Depression Rate","Crude Suicide Rate"),
    position = "topleft",
    options = layersControlOptions(collapsed = FALSE))%>%
  addEasyButton(easyButton(
    icon="fa-globe", title="Zoom to Level 1",
    onClick=JS("function(btn, map){ map.setZoom(1); }")))

4 Final Visualisation & Insights

4.1 Final Visualisation

4.2 Insight #1

Most of the happiest countries are from Europe, while the unhappiest countries are from Africa. It could be due to the difference in GDP per capita and standard of living in both regions. Until this day, Africa remains underdeveloped and suffers from corruption and civil unrest. Hence, resulting in Africa being the unhappiest region in the world.

4.3 Insight #2

Based on the scatter plot, it shows that the average Crude Suicide Rate is below 10. However, the Happiness Index and Crude Suicide Rate show a slightly positive correlation. Countries with a happiness index of 7 and above have a higher Crude Suicide Rate with a value of more than 10.

Similarly, the depression rate also increases with a happiness index of 7 and above. This further analysis shows that even though they are considered a happy country by world standards, it does not mean that the citizens are happy with living in a “happy” country.

4.4 Insight #3

Despite being the happiest countries, countries with a happiness index of 7 and above have a higher crude suicide rate. The unhappiest country has a lower Crude Suicide Rate of 14.3, while Finland has a Crude Suicide Rate of 15.9. Most of the unhappiest countries also observed a lower crude suicide rate.

Similarly, the happiest countries also have a higher depression percentage as compared to less happy countries. The top 5 happiest countries have at least a depression rate of 3.5 with Finland having the highest depression rate of 4.79. Even the unhappiest country, the Central African Republic, have a depression rate of 4.21, which is slightly lower than Finland. The visualization results show that the happiest country does not mean it will be the best place to live.

In Asia, the happiest countries are Singapore, Malaysia, and Thailand with a happiness index of above 6. The crude suicide rate for Asia’s happiest countries is significantly lower as compared to less happy countries. Moreover, the unhappy countries in Asia have a higher Crude Suicide Rate.

(328 words)

5 References

Leaflet

Research