Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.
Objective
The visualisation was created based on the interviews and tracking surveys from the adults including likely voters and registered voters. This allows the targeted audience - adults with voting rights visually compare the approval and disapproval rating for Donald Trump Presidential Approval Ratings. There are four variables which are Date, Approve%, Disapprove% and No opinion% in the original dataset. In order to create less distraction for audience, I have excluded the no opinion% variable from the data in my reconstructed visualisation, so it helps audience to focus on the comparison of Donald Trump presidential approval ratings from 20th January 2017 to 13th September 2020.
The visualisation chosen had the following three main issues:
Reference
The following code was used to fix the issues identified in the original.
library(ggplot2)
library(dplyr)
library(tidyr)
library(stringi)
library(readr)
theme_set(theme_light()) # This is tweak the display of an existing theme which directs more attention towards the data.
# Read data
Poll <- read_csv("gallup_poll.csv")
# In this case, we do not need no opinion% variable to present the time series of Donald Trump's approval rating so we filter it out with select function
Trumppoll <- select(Poll, Date, "Approve %", "Disapprove %")
Trumppoll$ID <- 1:nrow(Trumppoll) # Generate New ID for inner join later
# Data Prepocessing for Date variable to - Trumppoll1 table
Poll_yr <- Trumppoll$Date %>% substr(1, 4)
Trumppoll1 <- Trumppoll %>% separate(Date, c("Left", "right"), sep = "-")
Trumppoll1$Date <- paste(Poll_yr, Trumppoll1$right, sep=" ")
Trumppoll1 <- select(Trumppoll1, Date, "Approve %", "Disapprove %")
Trumppoll1$Date <- as.Date(Trumppoll1$Date, format="%Y %b %d")
Trumppoll1.narm <- Trumppoll1[!is.na(Trumppoll1$Date), ]
# Date Preprocessing for Date variable - Trumppoll Table
Poll_yrmo <- Trumppoll$Date %>% substr(1, 8)
Poll <- Trumppoll %>% separate(Date, c("Left", "right"), sep = "-")
as.numeric(Poll$right)
## [1] NA NA 23 30 NA 13 28 14 22 13 28 16 29 15 15 14 31 13 30 15 30 14 31 12 30
## [26] 16 30 12 30 9 10 28 10 27 10 22 16 9 NA 25 18 11 NA 28 21 14 7 30 23 16
## [51] 9 NA 26 19 12 NA 29 22 15 8 NA 24 17 10 NA 27 20 13 NA 29 22 15 8 NA 25
## [76] 18 11 NA 25 18 11 NA 28 21 14 7 31 24 17 10 NA 26 19 12 NA 29 22 15 8 NA
## [101] 24 17 10 NA 27 20 13 NA 30 23 16 9 NA 25 18 11 NA 28 21 14 7 30 23 16 9
## [126] NA 26 19 12 NA 26 19 12 NA 29
Poll$Date <- paste(Poll_yrmo, Poll$right, sep=" ")
Poll$Date <- as.Date(Poll$Date, format="%Y %b %d")
Poll <- select(Poll, Date, "Approve %", "Disapprove %")
Poll.narm <- Poll[!is.na(Poll$Date), ]
Poll.narm
## # A tibble: 112 x 3
## Date `Approve %` `Disapprove %`
## <date> <chr> <chr>
## 1 2020-07-23 41% 56%
## 2 2020-06-30 38% 57%
## 3 2020-05-13 49% 48%
## 4 2020-04-28 49% 47%
## 5 2020-04-14 43% 54%
## 6 2020-03-22 49% 45%
## 7 2020-03-13 44% 52%
## 8 2020-02-28 47% 51%
## 9 2020-02-16 49% 48%
## 10 2020-01-29 49% 50%
## # ... with 102 more rows
# Combine two tables with removed NA values
Poll_final <- bind_rows(Poll.narm, Trumppoll1.narm)
View(Poll_final)
# Data Prepocessing and other Data Type Coercion for Poll_final
Poll_final$`Approve %` <- gsub("%", "", Poll_final$`Approve %`)
Poll_final$`Approve %` <- as.numeric(Poll_final$`Approve %`)
as.numeric(Poll_final$`Approve %`)
## [1] 41 38 49 49 43 49 44 47 49 49 44 45 43 41 39 40 43 39 41 42 44 41 43 40 42
## [26] 46 45 39 43 44 37 37 39 38 40 38 43 38 40 44 44 43 42 40 38 40 41 42 39 40
## [51] 42 43 41 41 45 42 40 42 43 42 38 39 41 39 40 39 39 37 40 38 36 38 37 39 37
## [76] 35 36 37 38 38 35 36 37 38 38 38 37 35 37 36 38 37 39 38 39 38 37 41 38 38
## [101] 42 41 41 40 40 39 40 42 42 40 41 45 42 42 39 40 40 41 41 42 41 42 39 39 40
## [126] 35 38 37 36 37 39 38 38 43 43
Poll_final$`Disapprove %` <- gsub("%", "", Poll_final$`Disapprove %`)
Poll_final$`Disapprove %` <- as.numeric(Poll_final$`Disapprove %`)
as.numeric(Poll_final$`Disapprove %`)
## [1] 56 57 48 47 54 45 52 51 48 50 53 51 54 57 57 56 54 57 54 54 51 54 55 55 52
## [26] 50 51 57 54 52 59 59 55 57 56 60 53 56 54 50 51 53 53 56 56 54 54 52 56 55
## [51] 54 52 56 55 50 54 55 54 52 53 57 55 54 55 56 56 56 59 57 57 59 57 58 55 57
## [76] 60 59 56 57 56 60 58 57 56 55 57 57 60 58 58 57 58 56 57 56 57 58 54 56 56
## [101] 53 54 52 54 53 56 55 52 53 54 53 47 56 55 57 56 54 53 54 53 55 52 56 55 57
## [126] 59 58 58 59 58 56 56 57 51 52
Poll <- Poll_final %>% gather(`Approve %`, `Disapprove %`, key = "variable", value = "value")
str(Poll)
## Classes 'tbl_df', 'tbl' and 'data.frame': 270 obs. of 3 variables:
## $ Date : Date, format: "2020-07-23" "2020-06-30" ...
## $ variable: chr "Approve %" "Approve %" "Approve %" "Approve %" ...
## $ value : num 41 38 49 49 43 49 44 47 49 49 ...
# plot
plot <- ggplot(Poll, aes(x = Date, y = value)) +
geom_line(aes(color = variable)) +
labs(title="Time Series of Donald Trump Approval Rating",
subtitle="20th January 2017 - 13th September 2020",
caption="Source: Gallup 2020", y="Disapproval vs Approval Ratings %")+
scale_color_manual(values = c("#00ba38", "#f8766d"))+
theme(legend.title=element_blank())
Data Reference
The following plot fixes the main issues in the original.