Tea Break R

A quick tutorial on Presidential Approval Rating

So today (Nov 6th, 2018) is Midterm Election Day. I can’t think of a more fitting event to do a quick R tutorial to show you how to visualize Presidential Approval Rating with just a few lines of code.

In this tutorial, I will show you how to compare the Presidential Approval Rating for President Trump and Obama during the first two years of their Presidency. I obtained data from the American Presidency Project.

All you have to do is to download the two datasets for President Trump and President Obama. The approval rating shows the percentage of respondents to the opinion poll conducted by Gallup who approve of the sitting President.

Data sources

https://docs.google.com/spreadsheets/d/1iEl565M1mICTubTtoxXMdxzaHzAcPTnb3kpRndsrfyY/edit?ts=5bd7f609#gid=671375968

References

Load the necessary libraries

library(tidyverse) # For data wrangling and data manipulation
## Warning: package 'tidyverse' was built under R version 3.4.2
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.7
## ✔ tidyr   0.8.0     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## Warning: package 'ggplot2' was built under R version 3.4.4
## Warning: package 'tibble' was built under R version 3.4.3
## Warning: package 'tidyr' was built under R version 3.4.3
## Warning: package 'purrr' was built under R version 3.4.4
## Warning: package 'dplyr' was built under R version 3.4.4
## Warning: package 'forcats' was built under R version 3.4.3
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(lubridate) # For date manipulation
## Warning: package 'lubridate' was built under R version 3.4.4
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
Read in President Trump’s approval rating for the first two years and take a look at the data structure
trump = read.csv('trump_approval.csv')

# Data structure
glimpse(trump)
## Observations: 90
## Variables: 5
## $ Start.Date    <fct> 10/08/2018, 10/01/2018, 09/24/2018, 09/17/2018, ...
## $ End.Date      <fct> 10/14/2018, 10/07/2018, 09/30/2018, 09/23/2018, ...
## $ Approving     <int> 44, 43, 42, 40, 38, 40, 41, 41, 42, 39, 41, 40, ...
## $ Disapproving  <int> 51, 53, 53, 56, 56, 54, 53, 54, 52, 56, 54, 55, ...
## $ Unsure.NoData <int> 5, 4, 5, 4, 6, 6, 6, 5, 6, 5, 5, 5, 4, 5, 3, 5, ...
Format the dates (currently in ‘factor’ format) and arrange the dates in ascending order
trump$Start.Date = mdy(trump$Start.Date) # change 'Start.Date' to 'Date' format
## Warning in as.POSIXlt.POSIXct(x, tz): unknown timezone 'zone/tz/2018g.1.0/
## zoneinfo/America/Chicago'
trump$End.Date = mdy(trump$End.Date) # change 'End.Date' to 'Date' format

trump = trump %>%
  arrange(-desc(Start.Date)) # arrange dates in ascending order
## Warning: package 'bindrcpp' was built under R version 3.4.4
glimpse(trump) # take a look at the new dataset
## Observations: 90
## Variables: 5
## $ Start.Date    <date> 2017-01-23, 2017-01-30, 2017-02-06, 2017-02-13,...
## $ End.Date      <date> 2017-01-29, 2017-02-05, 2017-02-12, 2017-02-19,...
## $ Approving     <int> 45, 43, 41, 40, 42, 43, 42, 40, 39, 38, 40, 40, ...
## $ Disapproving  <int> 47, 52, 53, 54, 53, 51, 52, 55, 56, 57, 53, 54, ...
## $ Unsure.NoData <int> 8, 5, 6, 6, 5, 6, 6, 5, 5, 5, 7, 6, 7, 5, 5, 6, ...
Take a look at the summary of our dataframe
summary(trump)
##    Start.Date            End.Date            Approving      Disapproving  
##  Min.   :2017-01-23   Min.   :2017-01-29   Min.   :35.00   Min.   :47.00  
##  1st Qu.:2017-06-27   1st Qu.:2017-07-03   1st Qu.:38.00   1st Qu.:54.00  
##  Median :2017-11-30   Median :2017-12-06   Median :39.00   Median :56.00  
##  Mean   :2017-11-30   Mean   :2017-12-06   Mean   :39.37   Mean   :55.41  
##  3rd Qu.:2018-05-05   3rd Qu.:2018-05-11   3rd Qu.:41.00   3rd Qu.:57.00  
##  Max.   :2018-10-08   Max.   :2018-10-14   Max.   :45.00   Max.   :60.00  
##  Unsure.NoData  
##  Min.   :3.000  
##  1st Qu.:5.000  
##  Median :5.000  
##  Mean   :5.222  
##  3rd Qu.:6.000  
##  Max.   :8.000
Read in President Obama’s approval rating for the first two years and take a look at the data structure. Since President Obama was in office for 8 years, we will filter the dataset to look at the first two years he was in office to make it more comparable to the dataset for President Trump
obama_full = read.csv('obama_approval.csv') # read the dataset in 

obama_full$Start.Date = mdy(obama_full$Start.Date) # format date-related columns
obama_full$End.Date = mdy(obama_full$End.Date) # format date-related columns

glimpse(obama_full) # take a look at the data structure
## Observations: 2,787
## Variables: 5
## $ Start.Date    <date> 2017-01-17, 2017-01-15, 2017-01-14, 2017-01-13,...
## $ End.Date      <date> 2017-01-19, 2017-01-18, 2017-01-17, 2017-01-15,...
## $ Approving     <int> 59, 58, 57, 57, 57, 58, 57, 57, 55, 56, 56, 56, ...
## $ Disapproving  <int> 37, 38, 39, 38, 39, 37, 40, 40, 42, 40, 41, 40, ...
## $ Unsure.NoData <int> 4, 4, 4, 5, 4, 5, 3, 3, 3, 0, 3, 4, 3, 2, 3, 4, ...
Take a look at the data structure again
glimpse(obama_full)
## Observations: 618
## Variables: 5
## $ Start.Date    <date> 2009-01-23, 2009-01-24, 2009-01-25, 2009-01-26,...
## $ End.Date      <date> 2009-01-25, 2009-01-26, 2009-01-27, 2009-01-28,...
## $ Approving     <int> 67, 65, 64, 64, 66, 67, 67, 66, 66, 65, 65, 63, ...
## $ Disapproving  <int> 14, 15, 16, 17, 17, 17, 18, 19, 19, 20, 20, 21, ...
## $ Unsure.NoData <int> 19, 20, 20, 19, 17, 16, 15, 15, 15, 15, 15, 16, ...
Select the data recorded every 7 days for put them in a new dataframe called ‘obama.’ This will be comparable to the ‘trump’ dataframe
obama = obama_full[seq(1, nrow(obama_full), 7),] # get data for every 7 days

rownames(obama) = seq(length = nrow(obama)) # reset index
Now, we are ready to make our first plot. Compare the average approval rating during the first two years in office of Obama and Trump. The process is as follows:
  • Extract the mean approval rating for each president
  • Combine the two dataframes
  • Make a bar plot

Extract the average approval rating for Obama

obama_mean_approval = 
  obama %>% 
  select(Start.Date, Approving) %>%
  summarize(mean_approval = mean(Approving))

obama_mean_approval

Extract the average approval rating for Trump

trump_mean_approval = 
  trump %>% 
  select(Start.Date, Approving) %>%
  summarize(mean_approval = mean(Approving))

trump_mean_approval

Combine ‘obama_mean_approval’ and ‘trump_mean_approval’

mean_approval = rbind(obama_mean_approval, trump_mean_approval) # bind the two dataframes by row

mean_approval$President = c('Obama', 'Trump') # rename the two columns

glimpse(mean_approval) # take a look at the data structure of the new dataframe
## Observations: 2
## Variables: 2
## $ mean_approval <dbl> 52.83146, 39.36667
## $ President     <chr> "Obama", "Trump"

Plot a bar plot to compare the average approval rating for two Presidents

ggplot(mean_approval, 
       aes(x = President, y = mean_approval)) +
  geom_bar(stat = 'identity', width = 0.4, fill = 'chocolate') + # set width & color
  ggtitle('Average Presidential Approval Rating (%) \n after the First Two Years in Office') + # add title
  coord_flip() + # change vertical bar plot to horizontal bar plot
  theme_minimal()+ # choose the minimal theme
  theme(axis.title.x = element_blank(), # remove x axis title
  axis.title.y = element_blank(),  # remove y axis title
  plot.title = element_text(color = 'chocolate', size = 15, face = 'bold', hjust = 0.5), # set color and size for title
  axis.text.y = element_text(color = 'chocolate', size = 15, face = 'bold'), # set color and size for texts on y axis
  axis.text.x = element_text(color = 'chocolate', size = 15, face = 'bold'), # set color and size for texts on x axis
  panel.grid.major = element_blank(), # remove major grid lines
  panel.grid.minor = element_blank()) + # remove minor grid lines
  geom_label(aes(label = round(mean_approval,1)), size = 5) # add labels (average approval rating score)

Obama’s average approval rating is 52.8% while Trump’s average approval rating is 39.4%. It will be interesting to look at the fluctuation in approval rating of the two Presidents over the first two years. Let’s proceed with our second plot.

For our second plot, we are interested in plotting two line plots indicating Trump’s and Obama’s approval rating over the first two years. We will proceed with the following steps:
  • Rename the columns in two dataframes: ‘obama’ and ‘trump’
  • Combine the two dataframes
  • Plot the two line plots
Step 1: Rename the columns
colnames(obama) = c('o_start_date', 'o_end_date', 'o_approving',
                    'o_disapproving', 'o_no_data')

colnames(trump) = c('t_start_date', 't_end_date', 't_approving',
                    't_disapproving', 't_no_data')

Notice we have 89 observations for ‘obama’ and 90 observations for ‘trump.’ Let’s remove the last observation in the ‘trump’ dataframe

trump = trump[c(1:89),]
Step 2: Combine the two dataframes
obama_trump = cbind(obama, trump) # combine two dataframes


glimpse(obama_trump) # take a look at the data structure
## Observations: 89
## Variables: 10
## $ o_start_date   <date> 2009-01-23, 2009-01-30, 2009-02-06, 2009-02-14...
## $ o_end_date     <date> 2009-01-25, 2009-02-01, 2009-02-08, 2009-02-16...
## $ o_approving    <int> 67, 66, 66, 62, 61, 61, 62, 61, 63, 60, 60, 62,...
## $ o_disapproving <int> 14, 19, 21, 25, 24, 28, 27, 28, 27, 29, 28, 30,...
## $ o_no_data      <int> 19, 15, 13, 13, 15, 11, 11, 11, 10, 11, 12, 8, ...
## $ t_start_date   <date> 2017-01-23, 2017-01-30, 2017-02-06, 2017-02-13...
## $ t_end_date     <date> 2017-01-29, 2017-02-05, 2017-02-12, 2017-02-19...
## $ t_approving    <int> 45, 43, 41, 40, 42, 43, 42, 40, 39, 38, 40, 40,...
## $ t_disapproving <int> 47, 52, 53, 54, 53, 51, 52, 55, 56, 57, 53, 54,...
## $ t_no_data      <int> 8, 5, 6, 6, 5, 6, 6, 5, 5, 5, 7, 6, 7, 5, 5, 6,...
Step 3: Plot two line plots
ggplot(obama_trump) + # data frame
  geom_line(aes(o_start_date, y = o_approving), color = 'blue') + # Obama line
  geom_line(aes(o_start_date, y = t_approving), color = 'red') + # Trump line
  ggtitle('Weekly Average Presidential Approval Rating (%) during the First Two Years \n    January 2009 - October 2010 and January 2017 - October 2018') + # add title
  theme_minimal() + # set thme
    # set color, font, and size for title, texts for x axis, and texts for y axis
  theme(plot.title = element_text(size = 13, face = 'bold', hjust = 0.5),
        axis.text.y = element_text(size = 12, face = 'bold'),
        axis.text.x = element_text(size = 12, face = 'bold'),
        axis.title.y = element_blank(), # remove y-axis title
        axis.title.x = element_blank(), # remove x-axis title
        panel.grid.minor = element_blank()) + # remove minor grid
        scale_y_continuous(breaks = seq(0, 80, 5)) + # set y-scale # interval of 5 for easy representation
  scale_x_date(date_breaks = "2 month", # set date break every 2 months and extract the month only 
               date_labels = "%b") +
  annotate("text", x = as.Date('2010-10-08'), # annotation
           label = 'Obama', y = 50, size = 6, fontface = 'bold') +
  annotate("text", x = as.Date('2010-10-08'), # annotation
           label = 'Trump', y = 37, size = 6, fontface = 'bold')

There we have it. We can see that President Obama had a very high approval rating (more than 65%) at the start of his term and President Trump started off having only 45% approval rating. Obama’s approval rating declined sharply within the first 6 months of his presidency and steadily declined thereafter. For President Trump, his approval rating has never been higher than 45%. Right before the Midterm Election, the two Presidents have very comparable approval rating: a little below 45%.