Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: Wikipedia 2020.


Objective

The visualisation was created based on the interviews and tracking surveys from the adults including likely voters and registered voters. This allows the targeted audience - adults with voting rights visually compare the approval and disapproval rating for Donald Trump Presidential Approval Ratings. There are four variables which are Date, Approve%, Disapprove% and No opinion% in the original dataset. In order to create less distraction for audience, I have excluded the no opinion% variable from the data in my reconstructed visualisation, so it helps audience to focus on the comparison of Donald Trump presidential approval ratings from 20th January 2017 to 13th September 2020.

The visualisation chosen had the following three main issues:

  • Misleading (biased) visualisation - According to the original data source from Gallup, the trend of original visualisation is not representing the data. One of the reasons for this claim is that the chains become less dense after 2018 which provides less genuineness of data.
  • Data Integrity and Transparency - Considering the type of data we are observing, sensitivity and accuracy of these data are crucial to the future of a country. It is our responsibility to uphold the integrity of data in the message we are trying to convey. The visualisation was found on the wikipedia. However, the author did not identify the source in the visualisation. After further investigation, the data came from Gallup which they have reported on presidential job approval since 1938. The transparency of data has to be improved by showing sufficient information without tampering the data.
  • Graphic Integrity - The titles and legends are missing which caused confusion to audience as to what the interest of variables and purpose are. Preattentive processing in leveraging with colour was not done well. The intent of the visualisation was to compare between disapproval and approval ratings, not to show unclear information with excessively rich colours which might cause distraction and misleading information to audience.

Reference

Code

The following code was used to fix the issues identified in the original.

library(ggplot2)
library(dplyr)
library(tidyr)
library(stringi)
library(readr)
theme_set(theme_light()) # This is tweak the display of an existing theme which directs more attention towards the data.


# Read data
Poll <- read_csv("gallup_poll.csv")

# In this case, we do not need no opinion% variable to present the time series of Donald Trump's approval rating so we filter it out with select function
Trumppoll <- select(Poll, Date, "Approve %", "Disapprove %")

Trumppoll$ID <- 1:nrow(Trumppoll) # Generate New ID for inner join later

# Data Prepocessing for Date variable to - Trumppoll1 table
Poll_yr <- Trumppoll$Date %>% substr(1, 4)

Trumppoll1 <- Trumppoll %>% separate(Date, c("Left", "right"), sep = "-")

Trumppoll1$Date <- paste(Poll_yr, Trumppoll1$right, sep=" ")

Trumppoll1 <- select(Trumppoll1, Date, "Approve %", "Disapprove %")

Trumppoll1$Date <- as.Date(Trumppoll1$Date, format="%Y %b %d")

Trumppoll1.narm <- Trumppoll1[!is.na(Trumppoll1$Date), ]

# Date Preprocessing for Date variable - Trumppoll Table
Poll_yrmo <- Trumppoll$Date %>% substr(1, 8)

Poll <- Trumppoll %>% separate(Date, c("Left", "right"), sep = "-")

as.numeric(Poll$right)
##   [1] NA NA 23 30 NA 13 28 14 22 13 28 16 29 15 15 14 31 13 30 15 30 14 31 12 30
##  [26] 16 30 12 30  9 10 28 10 27 10 22 16  9 NA 25 18 11 NA 28 21 14  7 30 23 16
##  [51]  9 NA 26 19 12 NA 29 22 15  8 NA 24 17 10 NA 27 20 13 NA 29 22 15  8 NA 25
##  [76] 18 11 NA 25 18 11 NA 28 21 14  7 31 24 17 10 NA 26 19 12 NA 29 22 15  8 NA
## [101] 24 17 10 NA 27 20 13 NA 30 23 16  9 NA 25 18 11 NA 28 21 14  7 30 23 16  9
## [126] NA 26 19 12 NA 26 19 12 NA 29
Poll$Date <- paste(Poll_yrmo, Poll$right, sep=" ")

Poll$Date <- as.Date(Poll$Date, format="%Y %b %d")

Poll <- select(Poll, Date, "Approve %", "Disapprove %")

Poll.narm <- Poll[!is.na(Poll$Date), ]

Poll.narm
## # A tibble: 112 x 3
##    Date       `Approve %` `Disapprove %`
##    <date>     <chr>       <chr>         
##  1 2020-07-23 41%         56%           
##  2 2020-06-30 38%         57%           
##  3 2020-05-13 49%         48%           
##  4 2020-04-28 49%         47%           
##  5 2020-04-14 43%         54%           
##  6 2020-03-22 49%         45%           
##  7 2020-03-13 44%         52%           
##  8 2020-02-28 47%         51%           
##  9 2020-02-16 49%         48%           
## 10 2020-01-29 49%         50%           
## # ... with 102 more rows
# Combine two tables with removed NA values
Poll_final <- bind_rows(Poll.narm, Trumppoll1.narm) 

View(Poll_final)

# Data Prepocessing and other Data Type Coercion for Poll_final
Poll_final$`Approve %` <- gsub("%", "", Poll_final$`Approve %`)

Poll_final$`Approve %` <- as.numeric(Poll_final$`Approve %`)

as.numeric(Poll_final$`Approve %`)
##   [1] 41 38 49 49 43 49 44 47 49 49 44 45 43 41 39 40 43 39 41 42 44 41 43 40 42
##  [26] 46 45 39 43 44 37 37 39 38 40 38 43 38 40 44 44 43 42 40 38 40 41 42 39 40
##  [51] 42 43 41 41 45 42 40 42 43 42 38 39 41 39 40 39 39 37 40 38 36 38 37 39 37
##  [76] 35 36 37 38 38 35 36 37 38 38 38 37 35 37 36 38 37 39 38 39 38 37 41 38 38
## [101] 42 41 41 40 40 39 40 42 42 40 41 45 42 42 39 40 40 41 41 42 41 42 39 39 40
## [126] 35 38 37 36 37 39 38 38 43 43
Poll_final$`Disapprove %` <- gsub("%", "", Poll_final$`Disapprove %`)

Poll_final$`Disapprove %` <- as.numeric(Poll_final$`Disapprove %`)

as.numeric(Poll_final$`Disapprove %`)
##   [1] 56 57 48 47 54 45 52 51 48 50 53 51 54 57 57 56 54 57 54 54 51 54 55 55 52
##  [26] 50 51 57 54 52 59 59 55 57 56 60 53 56 54 50 51 53 53 56 56 54 54 52 56 55
##  [51] 54 52 56 55 50 54 55 54 52 53 57 55 54 55 56 56 56 59 57 57 59 57 58 55 57
##  [76] 60 59 56 57 56 60 58 57 56 55 57 57 60 58 58 57 58 56 57 56 57 58 54 56 56
## [101] 53 54 52 54 53 56 55 52 53 54 53 47 56 55 57 56 54 53 54 53 55 52 56 55 57
## [126] 59 58 58 59 58 56 56 57 51 52
Poll <- Poll_final %>% gather(`Approve %`, `Disapprove %`, key = "variable", value = "value")

str(Poll)
## Classes 'tbl_df', 'tbl' and 'data.frame':    270 obs. of  3 variables:
##  $ Date    : Date, format: "2020-07-23" "2020-06-30" ...
##  $ variable: chr  "Approve %" "Approve %" "Approve %" "Approve %" ...
##  $ value   : num  41 38 49 49 43 49 44 47 49 49 ...
# plot
plot <- ggplot(Poll, aes(x = Date, y = value)) + 
  geom_line(aes(color = variable)) + 
  labs(title="Time Series of Donald Trump Approval Rating", 
       subtitle="20th January 2017 - 13th September 2020", 
       caption="Source: Gallup 2020", y="Disapproval vs Approval Ratings %")+
  scale_color_manual(values = c("#00ba38", "#f8766d"))+
  theme(legend.title=element_blank())

Data Reference

Reconstruction

The following plot fixes the main issues in the original.