Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The data visualisation above aims to convey a correlation between homework time, GDP and Happiness ranking in different countries. Specifically, they are trying to show the audience the impact of large amounts of homework on a country with reference to GDP per capita and a countries happiness index rating. For example, the visualtion tries to show that students in Australia receiving 6 hours of homework per week results in a greater happiness (happiness index rank 11) and higher GDP per capita growth ($55,060.33) in the country. This visualisation is particularly targeted towards those in education to highlight the potential effects large amounts of homework can have on a country.
The visualisation chosen had the following three main issues:
Firstly, the most striking issue with this visualisation is its failure to convey any findings to an audience. The spiral like display of the visualation makes this very difficult to understand, especially when wanting to compare between two different countries. As stated in the title, they are trying to demonstrate how homework time correlates with GDP and happiness in different countries. A stacked bar chart is not the best way to present data when wanting to investigative correlation. Additionally, the outermost, middle and innermost variables on the circular stacked bar chart are not in the same order for each country, making it very clustered and hard to read. The overall choice to present the data in this form is a failure in answering whether there is correlation between the variables and presenting its audience with a clear visualisation.
Another notable issue with this visualisation is that it doesn’t specify anywhere for what timespan the data is relevant for. The lack of context here can prevent us from taking anything away from the visualisation. For all the audience knows, this data could have been sampled from 70 years ago or 1 year ago. By simplify specifying the times in which the data was sourced from, the value of the visualisation could be significantly improved. Upon researching, it was found that the data sourced for ‘Homework Time’ to construct the original visualisation was from the year 2012. From this, we can assume the visualisation constructed was aimed to represent for the year 2012.
Finally, there is no Happiness index value for the country ‘Macao’ apparent in the visualisation. The missing data for this country makes it redundant in the overall data visualisation. The data from Macao should not be included in such visualisation where we are finding correlation between variables, given we cannot in anyway find correlation with ‘NA’ values.
Reference
The following code was used to fix the issues identified in the original.
## Packages used ##
library(countrycode)
library(ggplot2)
library(ggrepel)
library(ggeasy)
library(dplyr)
library(gapminder)
setwd("C:/Users/Caleb/OneDrive/Documents/Data Visualisation 2")
## Source Data ##
#Homework Time Data
Homework_Time <- read.csv("Homework Time.csv")
#Adjust data so it is appropriate for merging
Homework_Time <- Homework_Time %>% mutate(Country = recode(Country, UAE = 'United Arab Emirates'))
Homework_Time <- Homework_Time[,(names(Homework_Time) != "Rank")]
#GDP Data
GDP <- read.csv("gdp-per-capita-maddison-2020.csv")
#Adjust data so it is appropriate for merging
GDP <- GDP[,(names(GDP) != "Code")]
GDP <- GDP %>% rename(Country = Entity)
#Happiness Data
Happiness <- read.csv("Happiness.csv")
#Adjust data so it is appropriate for merging
Happiness <- Happiness %>% rename(Country = ï..Country)
#Merge All data
df_list <- list(Homework_Time , GDP, Happiness)
Full_Data <- Reduce(function(x, y) merge(x, y, by = "Country"), df_list)
Full_Data$Continent <- countrycode(sourcevar = Full_Data[, "Country"],
origin = "country.name",
destination = "continent")
## Use ggplot2 and other various packages to produce a more well structured data visualisation ##
Data_2012 <- Full_Data %>% filter(Year == 2012)
Plot <- Data_2012 %>%
ggplot(aes(x=Average.Homework.Time.Per.Week, y=Happiness.Score, size=GDP.per.capita, color=Continent)) +
ylab("Happiness Score") +
xlab("Average Homewrork Time per week (Hours)") +
geom_point(alpha=0.5) +
geom_text_repel(label=Data_2012$Country,cex =3, col = "black" ,check_overlap = F)+
scale_size(range = c(1, 15), name="GDP per capita") +
ggtitle("How Homework Time Around The World Crorrelates With GDP and Happiness (2012)") +
theme_minimal() +
theme(axis.line.x.bottom=element_line(size=1)) +
theme(axis.line.y.left=element_line(size=1))
Data Reference
Chepkemoi J, Countries Who Spend The Most Time Doing Homework (2017). Retrieved April 24, 2022, from World Atlas website: https://www.worldatlas.com/articles/countries-who-spend-the-most-time-doing-homework.html
World Happiness by Country - Happiness Country Rankings 2010-2012 (2014). Retrieved April 24, 2022, from Countries of the World website: https://photius.com/rankings/happiness_country_rankings_2012.html
GDP per capita, 1820 to 2018 (n.d). Retrieved April 24, 2022, from Our World in Data website: https://ourworldindata.org/grapher/gdp-per-capita-maddison-2020
The following plot fixes the main issues in the original.