Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Microsiervos (2007).


Objective

The aim of the visualization was to show the frequency of winning lottery numbers from the Primitive Lottery and the Bonoloto in Spain from 1988 to 2006. The objective of the original data visualization was to determine if there was any trend in winning (drawn) lottery number and year. The target audience was prospective lottery players.

The visualisation chosen had the following three main issues:

  • There was no legend. One cannot determine how many numbers (variables) are in the dataset, nor can the winning number be identified by each colour representing a number.
  • The visualization appears to be a convoluted mess. It cannot be interpreted visually (a story cannot be told by the visualization).
  • The web-like structure (black crossed-line; percentage scale) cannot visually depict percantage. It is extrememly difficult to determine the proportion of a given number per year as the the colour representing the given number cannot even be deciphered.

Reference

Code

The following code was used to fix the issues identified in the original.

#import dataset, ggplot and create new call name
library(readr)
library(ggplot2)
lotto <- read_csv("~/lotto.csv")
#factor variables
lotto$year <- as.factor(lotto$year)
lotto$numbers <- as.factor(lotto$numbers)
str(lotto)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 760 obs. of  4 variables:
##  $ X1        : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ year      : Factor w/ 19 levels "year1988","year1989",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ numbers   : Factor w/ 40 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Proportion: num  0.0216 0.0278 0.0278 0.0463 0.0278 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   X1 = col_double(),
##   ..   year = col_character(),
##   ..   numbers = col_double(),
##   ..   Proportion = col_double()
##   .. )
p1 <- ggplot(data = lotto, aes(x = year, y = Proportion, fill = numbers))
p1 <- p1 + geom_bar(stat = "identity",position = "stack",  width = .7, colour="black", lwd=0.2) + coord_flip() + labs(title = "Proportion of Winning Lottery Numbers Per Year", y = "Proportion of Winning Lottery Numbers") + theme(legend.position="right", legend.direction="vertical") 

Data Reference

Reconstruction

The following plot fixes the main issues in the original.