2. DataViz Step-by-step

2.1. Install and Load R packages

packages = c('tidyverse', 'dplyr', 'plotly', 'countrycode', 'highcharter', 'DT', 'RColorBrewer', 'broom', 'ggplot2', 'gridExtra', "htmlwidgets")

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)}

2.2. Dataset Overview

After installing and loading nessary libraries, we can load the dataset, then ascertain the dataframe dimensions, variable datatypes and number of missing values.

data <- read_csv("datasets_2020.csv")


print(paste('DataFrame contains ', dim(data)[1], 'rows and ', dim(data)[2], 'columns.'))

## [1] "DataFrame contains  153 rows and  20 columns."

print(paste('There are ', sum(is.na(data)), ' missing values representing ', 
            round(100*sum(is.na(data))/(nrow(data)*ncol(data)), 5), '% of the total.'))

## [1] "There are  0  missing values representing  0 % of the total."

print('Variable names are as follows:')

## [1] "Variable names are as follows:"

print(names(data))

##  [1] "Country name"                              
##  [2] "Regional indicator"                        
##  [3] "Ladder score"                              
##  [4] "Standard error of ladder score"            
##  [5] "upperwhisker"                              
##  [6] "lowerwhisker"                              
##  [7] "Logged GDP per capita"                     
##  [8] "Social support"                            
##  [9] "Healthy life expectancy"                   
## [10] "Freedom to make life choices"              
## [11] "Generosity"                                
## [12] "Perceptions of corruption"                 
## [13] "Ladder score in Dystopia"                  
## [14] "Explained by: Log GDP per capita"          
## [15] "Explained by: Social support"              
## [16] "Explained by: Healthy life expectancy"     
## [17] "Explained by: Freedom to make life choices"
## [18] "Explained by: Generosity"                  
## [19] "Explained by: Perceptions of corruption"   
## [20] "Dystopia + residual"

print('data source: https://worldhappiness.report/ed/2020/')

## [1] "data source: https://worldhappiness.report/ed/2020/"

names(data) <- c("Country", "Region", "Score", "Standard_error_of_score", "upperwhisker", "lowerwhisker", "Logged_GDP_per_capita",  "Social_support",  "Life_expectancy", "Freedom_to_make_life_choices", "Generosity", "Perceptions_of_corruption", "Ladder_score_in_Dystopia", "Explained by: Log GDP per capita", "Explained by: Social support", "Explained by: Healthy life expectancy", "Explained by: Freedom to make life choices", "Explained by: Generosity", "Explained by: Perceptions of corruption", "Dystopia + residual")

data <- data[with(data, order(-Score)),]

datatable(data, rownames = FALSE, class = 'compact')

The dataset contains 20 variables for 153 countries. By looking at the data table, we see only ‘Country’ and ‘Region’ are character variables while the rest are numeric. The happiness score can be explained by six factors here: “Logged GDP per capita”, “Social support”, “Life expectancy”, “Freedom to make life choices”, “Generosity”, “Perceptions of corruption”. The variables with prefix “Explained by” represent the estimated extent to which each of six factors contribute to the happiness score. The data frame is already close to being ready for visualization since there are no missing value.

Before doing analysis on different continents, I classified the regions of countries into 5 continents: “Africa”, “Asia”, “Europe”, “North America”, “Oceania”, “South America”.

country_continent <- read_csv("country_continent.csv") 
names(country_continent) <- c("Country", "Continent")
data <- merge(data, country_continent, by="Country", all=FALSE)
print(levels(as.factor(data$Continent)))

## [1] "Africa"        "Asia"          "Europe"        "North America"
## [5] "Oceania"       "South America"

data <- data[with(data, order(-Score)),]
datatable(data, rownames = FALSE, class = 'compact')

2.3. Data Wrangling

2.3.1. Box Plot of Happiness Score by Continent

We first have a global view of happiness score of countries in different continents. From the box plot, we see the happiness scores are great different among these continents. Overall, countries in Africa has a lower happiness score compared to countries in other continents, North America and Oceania have highest scores which means these 4 countries: Canada, United States, Australia, and New Zealand have high scores. We also see Countries in Europe have much diverse happiness scores that the happiest countries are most in Europe. To be conclude, the happiest countries in each continent are: Israel, Singapore, Finland, Canada, New Zealand, and Costa Rica.

plot_ly(data, x=~Continent,
              y=~Score,
              type="box",
              boxpoints="all",
              color=~Continent,
        text = ~paste("<b>Continent:</b> ", Continent,
                      "<br><b>Country:</b> ", Country,
                      "<br><b>Happiness Score:</b>", Score))%>%
  layout(xaxis=list(showticklabels = FALSE),
         yaxis=list(title="Happiness Score"),
         legend=list(x = 100, y = 1),
         margin=list(b = 100),
         title="Box Plot of Happiness Score by Continent")

2.3.2. Correlation Matrixes

From the correlation matrixes, we can discover some relationship between countries’s happiness degree and different factors. The result shows that economy development, social bonds, life expectancy, and freedom to make a life choice have a positive influence on people’s happiness, while governments’ corruption perception may negatively influence people’s happiness. We can also see a countries GDP is positively correlated to people’s life expectancy.

corrplot <- data %>% select("Score", "Logged_GDP_per_capita",  "Social_support",  "Life_expectancy", "Freedom_to_make_life_choices", "Generosity", "Perceptions_of_corruption")
hchart(cor(corrplot))%>%
  hc_title(text = 'Correlation Matrixes Colormap')

2.3.3. Scatter plot of happiness score

We can further plot the relationship between happiness scores and other factors respectively to have a better understanding. Obviously, countries with higher happiness score tend to have higher GDP and people’s life expectancy. From the plot of happiness score by corruption perception, we see their relationship is not so clear that most countries with the corruption perception score larger than 0.6 don’t show a linear relationship between happiness score and the corruption perception index. Overall, a country’s happiness score is mainly affected by GDP per capita and People’s life expectancy. I also use various colors to represent countries of different continents. We see most Europe countries have high GDP per capita and life expectancy while most Africa countries have lower ones. Luxembourg has the highest GDP per capita, followed by Singapore. Singapore also have the highest life expectancy at around 76.8 years old and the lowest corruption perception index at around 0.11.

plot_ly(data=data, x=~Logged_GDP_per_capita, 
        y=~Score,
        color=~Continent,
        type="scatter", mode = "markers",
        marker = list(size = 15),
        # Hover text:
        text = ~paste("<b>Continent:</b> ", Continent,
                      "<br><b>Country:</b> ", Country,
                      "<br><b>Happiness Score:</b>", Score,
                      "<br><b>GDP per capita:</b>", Logged_GDP_per_capita)) %>%
  layout(xaxis=list(title="Logged_GDP per Capita"),
         yaxis=list(title="Happiness Score"),
         title="Scatter plot of happiness score by GDP per capita")

plot_ly(data=data, x=~Life_expectancy, 
        y=~Score,
        color=~Continent,
        type="scatter", mode = "markers",
        marker = list(size = 15),
        # Hover text:
        text = ~paste("<b>Continent:</b> ", Continent,
                      "<br><b>Country:</b> ", Country,
                      "<br><b>Happiness Score:</b>", Score,
                      "<br><b>Life Expectancy:</b>", Life_expectancy)) %>%
  layout(xaxis=list(title="Life Expectancy"),
         yaxis=list(title="Happiness Score"),
         title="Scatter plot of happiness score by life expectancy")

plot_ly(data=data, x=~Perceptions_of_corruption, 
        y=~Score,
        color=~Continent,
        type="scatter", mode = "markers",
        marker = list(size = 15),
        # Hover text:
        text = ~paste("<b>Continent:</b> ", Continent,
                      "<br><b>Country:</b> ", Country,
                      "<br><b>Happiness Score:</b>", Score,
                      "<br><b>Perceptions of Corruption Index:</b>", Perceptions_of_corruption)) %>%
  layout(xaxis=list(title="Perceptions of corruption"),
         yaxis=list(title="Happiness Score"),
         title="Scatter plot of happiness score by perceptions of corruption")

2.3.3. Choropleth Map

The choropleth map give us a geographical view of the distribution of happiness score in the world. You can zoom in to look at a specific location on the map. It is in accordance with above conclusion that there is a large difference of happiness score among continents. It is also obvious that most countries in Africa have low GDP per capita, low life expectancy and happiness scores. From the global map for happiness score, Afghanistan standout with a much darker color, there has a much lower happiness score because there is at war.

data$iso3 <- countrycode(data$Country, 'country.name', 'iso3c')
data[data$Country=="Kosovo","iso3"] <- "XKX"

data(worldgeojson, package = "highcharter")

highchart() %>%
    hc_add_series_map(worldgeojson, data, value = 'Score', joinBy = 'iso3') %>%
    hc_title(text = 'World Happiness Map, Year 2020') %>%
    hc_subtitle(text = "") %>%
    hc_mapNavigation(enabled = TRUE) %>%
    hc_legend(valueDecimals = 0) %>%
    hc_colorAxis(stops = color_stops()) %>%  
    hc_tooltip(useHTML = TRUE, headerFormat = "",
        pointFormat = "<b>Counrty:</b> {point.Country}
                      <br><b>Happiness Score:</b> {point.value}")

highchart() %>%
    hc_add_series_map(worldgeojson, data, value = 'Logged_GDP_per_capita', joinBy = 'iso3') %>%
    hc_title(text = 'World Map for Logged GDP per capita, Year 2020') %>%
    hc_subtitle(text = "") %>%
    hc_mapNavigation(enabled = TRUE) %>%
    hc_legend(valueDecimals = 0) %>%
    hc_colorAxis(stops = color_stops()) %>%  
    hc_tooltip(useHTML = TRUE, headerFormat = "",
        pointFormat = "<b>Counrty:</b> {point.Country}
                      <br><b>GDP per capita:</b> {point.value}")

highchart() %>%
    hc_add_series_map(worldgeojson, data, value = 'Life_expectancy', joinBy = 'iso3') %>%
    hc_title(text = 'World Map for Life Expectancy Score, Year 2020') %>%
    hc_subtitle(text = "") %>%
    hc_mapNavigation(enabled = TRUE) %>%
    hc_legend(valueDecimals = 0) %>%
    hc_colorAxis(stops = color_stops()) %>%  
    hc_tooltip(useHTML = TRUE, headerFormat = "",
        pointFormat = "<b>Counrty:</b> {point.Country}
                      <br><b>Life Expectancy Score:</b> {point.value}")

2.3.4. Polar Charts of Countries’ Difference

I use polar chart here to compare the difference among countries, every circular sector represents a country, the radius length of a small circular sector is the feature value. The circular sector’s color also represents the value size that lighter color represent larger value. I sort countries by happiness score, so we can see the happiest country is Finland while the most unhappy country is Afghanistan. From the charts below, we see the countries which have the highest happiness score also have the highest GDP per capita, the highest life expectancy and the lowest corruption perception index.

x <- c("Country", "Happiness Score")
y <- sprintf("{point.%s}", c("Country", "Score"))
tltip <- tooltip_table(x, y)

hchart(data, type = "columnrange",
       hcaes(x = Country, low = 0, high = Score,
             color = Score)) %>%
  hc_chart(polar = TRUE) %>%
  hc_yAxis( max = 8, min = 0, labels = list(format = "{value}"),
            showFirstLabel = FALSE) %>%
  hc_xAxis(
    title = list(text = ""), gridLineWidth = 0.5,
    labels = list(format = "{value: %b}")) %>%
  hc_title(text = "Polar Chart of Countries' Happiness Scores") %>%
  hc_tooltip(useHTML = TRUE, pointFormat = tltip,
             headerFormat = "")

x <- c("Country", "Logged GDP per capita")
y <- sprintf("{point.%s}", c("Country", "Logged_GDP_per_capita"))
tltip <- tooltip_table(x, y)

hchart(data, type = "columnrange",
       hcaes(x = Country, low = 0, high = Logged_GDP_per_capita,
             color = Logged_GDP_per_capita)) %>%
  hc_chart(polar = TRUE) %>%
  hc_yAxis( max = 11, min = 0, labels = list(format = "{value}"),
            showFirstLabel = FALSE) %>%
  hc_xAxis(
    title = list(text = ""), gridLineWidth = 0.5,
    labels = list(format = "{value: %b}")) %>%
  hc_title(text = "Polar Chart of Countries' Logged GDP per Capita") %>%
  hc_tooltip(useHTML = TRUE, pointFormat = tltip,
             headerFormat = "")

x <- c("Country", "Life expectancy")
y <- sprintf("{point.%s}", c("Country", "Life_expectancy"))
tltip <- tooltip_table(x, y)

hchart(data, type = "columnrange",
       hcaes(x = Country, low = 0, high = Life_expectancy,
             color = Life_expectancy)) %>%
  hc_chart(polar = TRUE) %>%
  hc_yAxis( max = 100, min = 0, labels = list(format = "{value}"),
            showFirstLabel = FALSE) %>%
  hc_xAxis(
    title = list(text = ""), gridLineWidth = 0.5,
    labels = list(format = "{value: %b}")) %>%
  hc_title(text = "Polar Chart of Countries' People's Life Expectancy") %>%
  hc_tooltip(useHTML = TRUE, pointFormat = tltip,
             headerFormat = "")

x <- c("Country", "Perceptions of corruption" )
y <- sprintf("{point.%s}", c("Country", "Perceptions_of_corruption" ))
tltip <- tooltip_table(x, y)

hchart(data, type = "columnrange",
       hcaes(x = Country, low = 0, high = "Perceptions_of_corruption" ,
             color = Perceptions_of_corruption)) %>%
  hc_chart(polar = TRUE) %>%
  hc_yAxis( max = 1, min = 0, labels = list(format = "{value}"),
            showFirstLabel = FALSE) %>%
  hc_xAxis(
    title = list(text = ""), gridLineWidth = 0.5,
    labels = list(format = "{value: %b}")) %>%
  hc_title(text = "Polar Chart of Countries' Corruption Perception Index") %>%
  hc_tooltip(useHTML = TRUE, pointFormat = tltip,
             headerFormat = "")

2.3.5. Polar Chart of Happiness Score’s Change

After combine data with happiness score in 2015, we can plot the change of happiness score for each country. From the graph, We see the top 10 happiest countries almost remain the same from 2015 to 2020, the greatest change happened in the countries with happiness score lower than the median. Venezuela’s happiness score decrease the most while Benin’s increase the most in five years.

data2015 <- read_csv("datasets_2015.csv")
data2015 <- data2015 %>% select(Country, "Happiness Score")
names(data2015) <- c("Country", "Score2015")
trend <- merge(data, data2015, by="Country", all=FALSE)
trend$diff <- trend$Score - trend$Score2015
trend <- trend[with(trend, order(-Score)),]
  
x <- c("Country", "2015 Happiness Score", "2019 Happiness Score", "Change")
y <- sprintf("{point.%s}", c("Country", "Score2015", "Score", "diff"))
tltip <- tooltip_table(x, y)

hchart(trend, type = "columnrange",
       hcaes(x = Country, low = Score2015, high = Score,
             color = diff)) %>%
  hc_chart(polar = TRUE) %>%
  hc_yAxis( max = 8, min = 0, labels = list(format = "{value}"),
            showFirstLabel = FALSE) %>%
  hc_xAxis(
    title = list(text = ""), gridLineWidth = 0.5,
    labels = list(format = "{value: %b}")) %>%
  hc_title(text = "Polar Chart of Countries' Change of Happiness Score, 2015-2020") %>%
  hc_tooltip(useHTML = TRUE, pointFormat = tltip,
             headerFormat = "")

2.3.6. Polar Chart of Singapore’s Influential Factors to Happiness

Finally, I want to have a close look at Singapore’s influential factors to happiness. Singapore has a happiness score at around 6.38, which ranked the 31st in the world and the first in Asia. You can hover on the circular sectors to see the difference between Singapore’s value and the world’s median value. We see Singapore’s happiness score is mainly explained by GDP per capita, social support, and people’s life expectancy. Singapore has higher GDP per capita, longer life expectancy, more freedom to make life choice, and lower corruption perception index than 80% of other countries, but have a slightly lower Generosity score.

data_sg <- data[data$Country=='Singapore',]
t1 <- t(data.frame(data_sg, row.names= "value"))
feature <- c("Country", "Region", "Score", "Standard_error_of_score", "upperwhisker", "lowerwhisker", "Logged_GDP_per_capita",  "Social_support",  "Life_expectancy", "Freedom_to_make_life_choices", "Generosity", "Perceptions_of_corruption", "Ladder_score_in_Dystopia", "Explained by: Log GDP per capita", "Explained by: Social support", "Explained by: Healthy life expectancy", "Explained by: Freedom to make life choices", "Explained by: Generosity", "Explained by: Perceptions of corruption", "Dystopia + residual", "Continent", "iso3")
t2 <- as.data.frame(t1,row.names=F)
t3 <- as.data.frame(cbind(feature, t2))
t4 <- t3[14:19,]

t4$median <- c(median(data$Logged_GDP_per_capita), median(data$Social_support), median(data$Life_expectancy), median(data$Freedom_to_make_life_choices), median(data$Generosity), median(data$Perceptions_of_corruption))

t4$quantile80 <- c(quantile(data$Logged_GDP_per_capita, 0.8), quantile(data$Social_support, 0.8), quantile(data$Life_expectancy, 0.8), quantile(data$Freedom_to_make_life_choices, 0.8), quantile(data$Generosity, 0.8), quantile(data$Perceptions_of_corruption, 0.2))

t4$original_Value <- c(t3[7:12,]$value)

x <- c("The extent of Singapore's happiness Score explained by the factor", "Singapore's value", "World Median value", "World Quantile 0.8")
y <- sprintf("{point.%s}", c("value", "original_Value" , "median", "quantile80"))
tltip <- tooltip_table(x, y)

hchart(t4, type = "columnrange",
       hcaes(x = feature, low = 0, high = value, 
             color = value)) %>%
  hc_chart(polar = TRUE) %>%
  hc_yAxis( max = 2, min = 0, labels = list(format = "{value}"),
            showFirstLabel = FALSE) %>%
  hc_xAxis(
    title = list(text = ""), gridLineWidth = 0.5,
    labels = list(format = "{value: %b}")) %>%
  hc_title(text = "Polar Chart of Singapore's Influential Factors to Happiness") %>%
  hc_tooltip(useHTML = TRUE, pointFormat = tltip,
             headerFormat = "")

ISSS608 Visual Analytics and Applications | DataViz Makeover 5 - World Happiness Analysis

Liu Cuiyi

August 7, 2020

1. Overview

1.1. Design Challenges

1.2. Proposed Sketched Design