Tea Break R
A quick tutorial on Presidential Approval Rating
So today (Nov 6th, 2018) is Midterm Election Day. I can’t think of a more fitting event to do a quick R tutorial to show you how to visualize Presidential Approval Rating with just a few lines of code.
In this tutorial, I will show you how to compare the Presidential Approval Rating for President Trump and Obama during the first two years of their Presidency. I obtained data from the American Presidency Project.
All you have to do is to download the two datasets for President Trump and President Obama. The approval rating shows the percentage of respondents to the opinion poll conducted by Gallup who approve of the sitting President.
References
library(tidyverse) # For data wrangling and data manipulation
## Warning: package 'tidyverse' was built under R version 3.4.2
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.7
## ✔ tidyr 0.8.0 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.4.4
## Warning: package 'tibble' was built under R version 3.4.3
## Warning: package 'tidyr' was built under R version 3.4.3
## Warning: package 'purrr' was built under R version 3.4.4
## Warning: package 'dplyr' was built under R version 3.4.4
## Warning: package 'forcats' was built under R version 3.4.3
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate) # For date manipulation
## Warning: package 'lubridate' was built under R version 3.4.4
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
trump = read.csv('trump_approval.csv')
# Data structure
glimpse(trump)
## Observations: 90
## Variables: 5
## $ Start.Date <fct> 10/08/2018, 10/01/2018, 09/24/2018, 09/17/2018, ...
## $ End.Date <fct> 10/14/2018, 10/07/2018, 09/30/2018, 09/23/2018, ...
## $ Approving <int> 44, 43, 42, 40, 38, 40, 41, 41, 42, 39, 41, 40, ...
## $ Disapproving <int> 51, 53, 53, 56, 56, 54, 53, 54, 52, 56, 54, 55, ...
## $ Unsure.NoData <int> 5, 4, 5, 4, 6, 6, 6, 5, 6, 5, 5, 5, 4, 5, 3, 5, ...
trump$Start.Date = mdy(trump$Start.Date) # change 'Start.Date' to 'Date' format
## Warning in as.POSIXlt.POSIXct(x, tz): unknown timezone 'zone/tz/2018g.1.0/
## zoneinfo/America/Chicago'
trump$End.Date = mdy(trump$End.Date) # change 'End.Date' to 'Date' format
trump = trump %>%
arrange(-desc(Start.Date)) # arrange dates in ascending order
## Warning: package 'bindrcpp' was built under R version 3.4.4
glimpse(trump) # take a look at the new dataset
## Observations: 90
## Variables: 5
## $ Start.Date <date> 2017-01-23, 2017-01-30, 2017-02-06, 2017-02-13,...
## $ End.Date <date> 2017-01-29, 2017-02-05, 2017-02-12, 2017-02-19,...
## $ Approving <int> 45, 43, 41, 40, 42, 43, 42, 40, 39, 38, 40, 40, ...
## $ Disapproving <int> 47, 52, 53, 54, 53, 51, 52, 55, 56, 57, 53, 54, ...
## $ Unsure.NoData <int> 8, 5, 6, 6, 5, 6, 6, 5, 5, 5, 7, 6, 7, 5, 5, 6, ...
summary(trump)
## Start.Date End.Date Approving Disapproving
## Min. :2017-01-23 Min. :2017-01-29 Min. :35.00 Min. :47.00
## 1st Qu.:2017-06-27 1st Qu.:2017-07-03 1st Qu.:38.00 1st Qu.:54.00
## Median :2017-11-30 Median :2017-12-06 Median :39.00 Median :56.00
## Mean :2017-11-30 Mean :2017-12-06 Mean :39.37 Mean :55.41
## 3rd Qu.:2018-05-05 3rd Qu.:2018-05-11 3rd Qu.:41.00 3rd Qu.:57.00
## Max. :2018-10-08 Max. :2018-10-14 Max. :45.00 Max. :60.00
## Unsure.NoData
## Min. :3.000
## 1st Qu.:5.000
## Median :5.000
## Mean :5.222
## 3rd Qu.:6.000
## Max. :8.000
obama_full = read.csv('obama_approval.csv') # read the dataset in
obama_full$Start.Date = mdy(obama_full$Start.Date) # format date-related columns
obama_full$End.Date = mdy(obama_full$End.Date) # format date-related columns
glimpse(obama_full) # take a look at the data structure
## Observations: 2,787
## Variables: 5
## $ Start.Date <date> 2017-01-17, 2017-01-15, 2017-01-14, 2017-01-13,...
## $ End.Date <date> 2017-01-19, 2017-01-18, 2017-01-17, 2017-01-15,...
## $ Approving <int> 59, 58, 57, 57, 57, 58, 57, 57, 55, 56, 56, 56, ...
## $ Disapproving <int> 37, 38, 39, 38, 39, 37, 40, 40, 42, 40, 41, 40, ...
## $ Unsure.NoData <int> 4, 4, 4, 5, 4, 5, 3, 3, 3, 0, 3, 4, 3, 2, 3, 4, ...
glimpse(obama_full)
## Observations: 618
## Variables: 5
## $ Start.Date <date> 2009-01-23, 2009-01-24, 2009-01-25, 2009-01-26,...
## $ End.Date <date> 2009-01-25, 2009-01-26, 2009-01-27, 2009-01-28,...
## $ Approving <int> 67, 65, 64, 64, 66, 67, 67, 66, 66, 65, 65, 63, ...
## $ Disapproving <int> 14, 15, 16, 17, 17, 17, 18, 19, 19, 20, 20, 21, ...
## $ Unsure.NoData <int> 19, 20, 20, 19, 17, 16, 15, 15, 15, 15, 15, 16, ...
obama = obama_full[seq(1, nrow(obama_full), 7),] # get data for every 7 days
rownames(obama) = seq(length = nrow(obama)) # reset index
Extract the average approval rating for Obama
obama_mean_approval =
obama %>%
select(Start.Date, Approving) %>%
summarize(mean_approval = mean(Approving))
obama_mean_approval
Extract the average approval rating for Trump
trump_mean_approval =
trump %>%
select(Start.Date, Approving) %>%
summarize(mean_approval = mean(Approving))
trump_mean_approval
Combine ‘obama_mean_approval’ and ‘trump_mean_approval’
mean_approval = rbind(obama_mean_approval, trump_mean_approval) # bind the two dataframes by row
mean_approval$President = c('Obama', 'Trump') # rename the two columns
glimpse(mean_approval) # take a look at the data structure of the new dataframe
## Observations: 2
## Variables: 2
## $ mean_approval <dbl> 52.83146, 39.36667
## $ President <chr> "Obama", "Trump"
Plot a bar plot to compare the average approval rating for two Presidents
ggplot(mean_approval,
aes(x = President, y = mean_approval)) +
geom_bar(stat = 'identity', width = 0.4, fill = 'chocolate') + # set width & color
ggtitle('Average Presidential Approval Rating (%) \n after the First Two Years in Office') + # add title
coord_flip() + # change vertical bar plot to horizontal bar plot
theme_minimal()+ # choose the minimal theme
theme(axis.title.x = element_blank(), # remove x axis title
axis.title.y = element_blank(), # remove y axis title
plot.title = element_text(color = 'chocolate', size = 15, face = 'bold', hjust = 0.5), # set color and size for title
axis.text.y = element_text(color = 'chocolate', size = 15, face = 'bold'), # set color and size for texts on y axis
axis.text.x = element_text(color = 'chocolate', size = 15, face = 'bold'), # set color and size for texts on x axis
panel.grid.major = element_blank(), # remove major grid lines
panel.grid.minor = element_blank()) + # remove minor grid lines
geom_label(aes(label = round(mean_approval,1)), size = 5) # add labels (average approval rating score)
Obama’s average approval rating is 52.8% while Trump’s average approval rating is 39.4%. It will be interesting to look at the fluctuation in approval rating of the two Presidents over the first two years. Let’s proceed with our second plot.
colnames(obama) = c('o_start_date', 'o_end_date', 'o_approving',
'o_disapproving', 'o_no_data')
colnames(trump) = c('t_start_date', 't_end_date', 't_approving',
't_disapproving', 't_no_data')
Notice we have 89 observations for ‘obama’ and 90 observations for ‘trump.’ Let’s remove the last observation in the ‘trump’ dataframe
trump = trump[c(1:89),]
obama_trump = cbind(obama, trump) # combine two dataframes
glimpse(obama_trump) # take a look at the data structure
## Observations: 89
## Variables: 10
## $ o_start_date <date> 2009-01-23, 2009-01-30, 2009-02-06, 2009-02-14...
## $ o_end_date <date> 2009-01-25, 2009-02-01, 2009-02-08, 2009-02-16...
## $ o_approving <int> 67, 66, 66, 62, 61, 61, 62, 61, 63, 60, 60, 62,...
## $ o_disapproving <int> 14, 19, 21, 25, 24, 28, 27, 28, 27, 29, 28, 30,...
## $ o_no_data <int> 19, 15, 13, 13, 15, 11, 11, 11, 10, 11, 12, 8, ...
## $ t_start_date <date> 2017-01-23, 2017-01-30, 2017-02-06, 2017-02-13...
## $ t_end_date <date> 2017-01-29, 2017-02-05, 2017-02-12, 2017-02-19...
## $ t_approving <int> 45, 43, 41, 40, 42, 43, 42, 40, 39, 38, 40, 40,...
## $ t_disapproving <int> 47, 52, 53, 54, 53, 51, 52, 55, 56, 57, 53, 54,...
## $ t_no_data <int> 8, 5, 6, 6, 5, 6, 6, 5, 5, 5, 7, 6, 7, 5, 5, 6,...
ggplot(obama_trump) + # data frame
geom_line(aes(o_start_date, y = o_approving), color = 'blue') + # Obama line
geom_line(aes(o_start_date, y = t_approving), color = 'red') + # Trump line
ggtitle('Weekly Average Presidential Approval Rating (%) during the First Two Years \n January 2009 - October 2010 and January 2017 - October 2018') + # add title
theme_minimal() + # set thme
# set color, font, and size for title, texts for x axis, and texts for y axis
theme(plot.title = element_text(size = 13, face = 'bold', hjust = 0.5),
axis.text.y = element_text(size = 12, face = 'bold'),
axis.text.x = element_text(size = 12, face = 'bold'),
axis.title.y = element_blank(), # remove y-axis title
axis.title.x = element_blank(), # remove x-axis title
panel.grid.minor = element_blank()) + # remove minor grid
scale_y_continuous(breaks = seq(0, 80, 5)) + # set y-scale # interval of 5 for easy representation
scale_x_date(date_breaks = "2 month", # set date break every 2 months and extract the month only
date_labels = "%b") +
annotate("text", x = as.Date('2010-10-08'), # annotation
label = 'Obama', y = 50, size = 6, fontface = 'bold') +
annotate("text", x = as.Date('2010-10-08'), # annotation
label = 'Trump', y = 37, size = 6, fontface = 'bold')
There we have it. We can see that President Obama had a very high approval rating (more than 65%) at the start of his term and President Trump started off having only 45% approval rating. Obama’s approval rating declined sharply within the first 6 months of his presidency and steadily declined thereafter. For President Trump, his approval rating has never been higher than 45%. Right before the Midterm Election, the two Presidents have very comparable approval rating: a little below 45%.