Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


(Curtice, 2019) Graph showing the top 6 important issues to voters of the 2019 general election in the UK, on date ranges 30 Oct - 2 Nov and 27 Nov - 30 Nov, surveyed by 3 different polling agencies (Deltapoll, Opinium and Panelbase).
Source: Curits (2019).


Objective

This graph is the first of two graphs in an article made by Sir John Curtice, published by the BBC, that attempts to discern if the divide caused by the announcement of Brexit in 2016 has an influence on how the population will vote for the 2019 UK general election.

The first goal of the data visualisation made by Curtice (2019) was to showcase how important of an issue Brexit was compared to other common issues voters of the 2019 UK general election were concerned about. The second goal was to observe whether the importance of this issue to voters changes towards the end of the election.

The targeted audience was the general public of the UK, particularly citizens who were interested in the trajectory of the outcome of the general election. Internationally, we could say that people who are interested in world politics would add to the target audience.

The visualisation chosen had the following three main issues:

  • Inappropriate scaling of the y-axis: Considering that a goal of the visualisation was to compare how important Brexit (as well as other issues) was to voters, the visualisation failed to show this change as the scale intervals were to big compared to the actual change of percentage. Furthermore, the bars that were meant to be compared are apart from each other, with other bars in between, making it difficult for some to see whether there was any change at all.

  • Lack of Information - Dates: If the graph were to stand alone (without its source), the audience would not be able decipher when exactly is ‘4 weeks ago’ or ‘Now’. Even with the sources, while they do get date ranges, the audience would not be able to figure out which year the data is referring to and readers will have to look down and away from the graph to get the dates right. Furthermore, it does not show what happens to the voters opinions in between the beginning and end of the election, leaving space for assumption or bias.

  • Unclear what the y-axis represents: There is zero context on what the y-axis could represent as it lacks an axis label. Moreover, the percentages were also not explained in the article containing the graph. Do the percentages represent the UK population? Number of eligible voters? This ruins the integrity of the data with unclear variables and may cause further misinterpretation and understate crucial data.

While there are many other issues found in the data visualisation. The reconstruction of the assignment will focus on the above three main issues.

Reference

Code

The following code was used to fix the issues identified in the original.

#Add necessary packages for the assignment
library(ggplot2)
library(dplyr)
library(stringr)
library(scales)

#Adjust height and width of plot for publication
knitr::opts_chunk$set(fig.height=12)
knitr::opts_chunk$set(fig.width=9)

#Get data from three different surveys. Survey results were extracted as data in csv

deltapoll <- read.csv("Deltapoll.csv", fileEncoding="UTF-8-BOM")
opinium <- read.csv("Opinium.csv", fileEncoding="UTF-8-BOM")
panelbase <- read.csv("Panelbase.csv", fileEncoding="UTF-8-BOM")

#As the date ranges for the weekly surveys are different for each agency. The variable will instead be the dates of every Friday of the week the surveys were conducted and reported so dates are consistent for the purpose (and consistency) of this graph.
#Deltapoll
deltapoll$Date.Range <- str_replace(deltapoll$Date.Range, "31 Oct - 2 Nov", "1 November")
deltapoll$Date.Range <- str_replace(deltapoll$Date.Range, "6 Nov - 9 Nov", "8 November")
deltapoll$Date.Range <- str_replace(deltapoll$Date.Range, "14 Nov - 16 Nov", "15 November")
deltapoll$Date.Range <- str_replace(deltapoll$Date.Range, "21 Nov - 23 Nov", "22 November")
deltapoll$Date.Range <- str_replace(deltapoll$Date.Range, "28 Nov - 30 Nov", "29 November")

#Opinium
opinium$Date.Range <- str_replace(opinium$Date.Range, "30 Oct - 1 Nov", "1 November")
opinium$Date.Range <- str_replace(opinium$Date.Range, "6 Nov - 8 Nov", "8 November")
opinium$Date.Range <- str_replace(opinium$Date.Range, "13 Nov - 15 Nov", "15 November")
opinium$Date.Range <- str_replace(opinium$Date.Range, "20 Nov - 22 Nov", "22 November")
opinium$Date.Range <- str_replace(opinium$Date.Range, "27 Nov - 29 Nov", "29 November")

#Panelbase
panelbase$Date.Range <- str_replace(panelbase$Date.Range, "30 Oct - 31 Oct", "1 November")
panelbase$Date.Range <- str_replace(panelbase$Date.Range, "6 Nov - 8 Nov", "8 November")
panelbase$Date.Range <- str_replace(panelbase$Date.Range, "13 Nov - 14 Nov", "15 November")
panelbase$Date.Range <- str_replace(panelbase$Date.Range, "20 Nov - 22 Nov", "22 November")
panelbase$Date.Range <- str_replace(panelbase$Date.Range, "27 Nov - 29 Nov", "29 November")

#Label base on Poll agency and combining vertically
deltapoll$Agency <- "Deltapoll"
opinium$Agency <- "Opinium"
panelbase$Agency <- "Panelbase"


poll_results <- rbind(deltapoll, opinium, panelbase)


# Rename "Date range" to "Date" as it is no longer a range of dates

names(poll_results)[names(poll_results) == 'Date.Range'] <- 'Date'

# Create Data Visualization
poll_graph <- ggplot(data = poll_results, aes(x = factor(Date, levels = c("1 November", "8 November", "15 November", "22 November", "29 November" )), y = Percentage, group = Issue)) + geom_line(aes(color = Issue), size = 1) + geom_point(aes(color = Issue), size = 3) + facet_grid(Agency ~.)

# Aesthetic Improvements for Data Visualition: Colours, Labels, x and y axis adjustment
final_graph <- poll_graph + scale_color_manual( values = c("Brexit" = "#e41a1c", "Health" = "#377eb8", "Crime" = "#f781bf", "Economy" = "#4daf4a", "Environment" = "#984ea3","Immigration" = "#ff7f00")) + labs(title= "Most Important Issues To Voters Of The 2019 UK General Election", subtitle = "Based on the percentage of the number of voters who particiapted in surveys from \nthree polling agencies. Each voter was allowed to pick up to three issues.", x = "Date (Every Friday of the month of the election)", y = "Percentage Of Survey Sample Of Voters(%)") + scale_y_continuous(breaks = seq(0, 75, by = 10))

Data Reference

Reconstruction

The following plot fixes the main issues in the original.