Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
Explain the objective of the original data visualisation and the targetted audience.
The visualisation chosen had the following three main issues:
issue 1: Wrong choice of plot. The original plot fails to directly reflect the trend on how the amount of boardgames published each years changes through the timeline from 1980 to 2020 by using histogram. If we choose to use scatter plot paired with trendline, we can answer the questions by showing more obvious and precise figure to our audiences.
issue 2: Too many information. So many y-axis labels make it harder to figure out which year is the peak year that has the highest amount of boardgame published. By labeling the peak year directly on the plot, we can avoid such problem.
issue 3: Color problem. The author using only a single color to reflect the trend makes it feeling a bit boring and monotonic to look at.
Reference
The following code was used to fix the issues identified in the original.
library(ggplot2)
library(readxl)
library(dplyr)
BGGdata <- read_excel("BGG_Data_Set.xlsx")
print(names(BGGdata))
## [1] "ID" "Name" "Year Published"
## [4] "Min Players" "Max Players" "Play Time"
## [7] "Min Age" "Users Rated" "Rating Average"
## [10] "BGG Rank" "Complexity Average" "Owned Users"
## [13] "Mechanics" "Domains"
class(BGGdata$`Year Published`)
## [1] "character"
BGGdata$'Year Published' <- as.numeric(BGGdata$'Year Published')
class(BGGdata$`Year Published`)
## [1] "numeric"
pctdata <- BGGdata %>% filter(`Year Published` >= 1980 & `Year Published` <= 2020 & `BGG Rank` < 10000)
pctdata %<>% group_by(`Year Published`) %>%
summarise('Count' = n(), 'Percentage%' = n()/nrow(pctdata) * 100)
peakdata <- pctdata %>% arrange(desc(`Percentage%`)) %>% head(1)
p1 <- ggplot(pctdata, aes(x = `Year Published`, y = `Percentage%`)) +
geom_point(color = "black", size = 4, alpha = 0.5) +
stat_smooth(method="lm", formula = y ~ poly(x, 10), se=FALSE, size=1.4) +
theme_classic() +
xlab("Year Published") + ylab("%Percentage") +
scale_y_continuous(limits = c(0,10), expand = c(0,0.1)) +
geom_text(data=pctdata, aes(x = peakdata$`Year Published`[1], y = peakdata$`Percentage%`[1],
label = "peak year: 2017\nvalue: 8.39%", vjust = -0.3)) +
theme(plot.title = element_text(hjust = 0.5, size = 12, margin = margin(b = 20)),
axis.title = element_text(size = 10, margin = margin(t = 20)),
axis.text = element_text(size = 10),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(color = "black", size = 0.5)) +
labs(title = "Trendline of number of games published across the period of years 1980-2020\nas a percentage of the total number of games (9,504 games)in the top 10,000\nranked games published over this period.", size = 12, subtitle = NULL, caption = NULL)
Data Reference
*Dilini Samarasinghe, July 5, 2021, “BoardGameGeek Dataset on Board
Games”, IEEE Dataport, doi: https://dx.doi.org/10.21227/9g61-bs59.
website: https://ieee-dataport.org/open-access/boardgamegeek-dataset-board-games
The following plot fixes the main issues in the original.