2021-Virginia-Gubernatorial-Debate-Text-Analysis.R

output: html_document: default pdf_document: default ````

Adam Jockle

Loaded Packages

These are the packages that are loaded into R for this project

library(tidyverse)
library(tidytext)
library(genius)
library(wordcloud2)
library(readr)
library(ggplot2)
library(ggthemes)
library(gridExtra)
library(rmarkdown)
library(readr)

Introduction

For my text project, I plan on analyzing the first 2021 Virginia governors debate. More specifically, I will examine the words used during the debate to see if their is a more negative or positive sentiment in the words used. The two candidates running to replace democrat incumbent Ralph Northam are Glenn Youngkin, a Republican, and Terry McAuliffe who is a Democrat. Since the previous administration was democratic, and the president is a democrat I predict that the republican candidate(Youngkin) will be more negative and have a more negative sentiment. Since it is the opposing party in power I believe Youngkin will want to frame a picture that the democratic administration is not doing well. He will focus on the mistakes, which should lead to a more negative sentiment. There will be several different methods used in this analysis. The first part will be more generic, and focus more on the most common words. The second half of the project will do a deeper analysis using different lexicons.

Data

The transcript was pulled from this url: https://www.wtvr.com/news/local-news/full-transcription-of-virginias-first-gubernatorial-debate

Overall Word Analysis

The first part of the project will examine the overall word counts for each candidate, and take a closer look at the most common words that were said.

First, I will look at Terry McAuliffe

McAuliffe Analysis

library(readr)
McAuliffe <- read_delim("McAuliffe.txt", 
                        delim = ";", escape_double = FALSE, col_names = FALSE, 
                        trim_ws = TRUE)
colnames(McAuliffe)[1] <- "text"

McAuliffe %>% 
  unnest_tokens(word, text) -> mcauliffe_words

mcauliffe_words %>% 
  count()

## # A tibble: 1 × 1
##       n
##   <int>
## 1  3725

The total word count for McAuliffe is 3725.

## Joining, by = "word"

## # A tibble: 568 × 2
##    word             n
##    <chr>        <int>
##  1 virginia        29
##  2 governor        23
##  3 jobs            15
##  4 trump           14
##  5 people          13
##  6 america         12
##  7 plan            12
##  8 day             11
##  9 education       10
## 10 commonwealth     9
## # … with 558 more rows

However, when the stop words (such as I or The) are removed such as I and the his word count is 568.

Youngkin Analysis

youngkinww <- read_delim("youngkinww.txt", 
                         delim = ";", escape_double = FALSE, col_names = FALSE, 
                         trim_ws = TRUE)

## Rows: 25 Columns: 1

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr (1): X1

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

colnames(youngkinww)[1] <- "text"

youngkinww%>% 
  unnest_tokens(word, text) -> youngkin_words

youngkin_words %>% 
  count()

## # A tibble: 1 × 1
##       n
##   <int>
## 1  3193

The total word count for Youngkin is 3193

  youngkin_words %>% 
  anti_join(stop_words) %>% 
  count(word) %>% 
  arrange(desc(n))

## Joining, by = "word"

## # A tibble: 578 × 2
##    word            n
##    <chr>       <int>
##  1 virginia       29
##  2 opponent       20
##  3 virginians     17
##  4 terry          12
##  5 vaccine        12
##  6 governor       11
##  7 education      10
##  8 law            10
##  9 life           10
## 10 enforcement     8
## # … with 568 more rows

Youngkin word count without stop words is 578

As we can see Terry McAuliffe spoke a considerable amount more words than Glenn Youngkin overall, but actually spoke less words when the stop words were removed. This signals that while McAuliffe may have spent more time talking, it might not have all be substantive

Overall Frequency Word Clouds

Next, a word cloud will be developed for each candidate, along with the 15 most frequent words used during the debate.

First For McAuliffe.

  mcauliffe_words %>% 
  anti_join(stop_words) %>% 
  count(word, sort = TRUE) %>% 
  filter(!word %in% c("crosstalk", "00")) -> filter_mcauliffewords

## Joining, by = "word"

filter_mcauliffewords %>% 
  wordcloud2()

filter_mcauliffewords %>% 
  head(15) %>% 
  ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() + 
  theme_dark() + ggtitle("McAuliffe's 15 Most Frequent Words :
  2021 Virginia Gubernatorial Debate") + xlab("Word") + geom_bar(stat="identity", fill="#000099")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)

Now for Youngkin

  youngkin_words %>% 
  anti_join(stop_words) %>% 
  count(word, sort = TRUE) %>% 
  filter(!word %in% c("crosstalk", "00")) -> filter_youngkinwords

## Joining, by = "word"

filter_youngkinwords %>% 
  wordcloud2()

filter_youngkinwords %>% 
  head(15) %>% 
  ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() + 
  theme_dark() + ggtitle("Youngkin's 15 Most Frequent Words :
  2021 Virginia Gubernatorial Debate") + xlab("Word") + geom_bar(stat="identity", fill="#000099")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)

There are definitely some generic words that make it into the list of most common words, such a virginia and governor. However, one interesting thing is Donald and Trump make it into the top 15 words for McAuliffe, but neither of those words make it into Youngkins 15 most popular. Virginia has turned into a reliably democratic state, so it is possible that McAuliffe is trying to paint Youngkin as more radical in nature, in an attempt to win over more moderate voters. Youngkin may be attempting to distance himself from Trump since the state he is running is was lost by Trump in both the 2016 and 2020 elections.

Afinn Analysis

This part of the project will examine the sentiment of the candidates words by using the Afinn lexicon. This method assings certain values to words based on how positive or negative they are.

## Joining, by = "word"
## Joining, by = "word"

## [1] -0.1

McAuliffe had a mean value of -0.1. This is slightly more negative than neutral.

youngkin_words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("afinn")) -> youngkin_afinn

## Joining, by = "word"
## Joining, by = "word"

mean(youngkin_afinn$value)

## [1] 0.1965812

Youngkin had a mean sentiment of 0.196. A more postive sentiment.

Based on the Afinn analysis, Youngkin suprisingly had a more postive sentiment compared to McAuliffe.

The next part of the project will do an analysis on the most positive and negative words for each candidate based on the affin lexicon.

McAuliffe Afinn Analysis

First for the negative words.

mcauliffe_afinn %>% 
  filter(value < 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE) %>% 
  wordcloud2()

mcauliffe_afinn %>% 
  filter(value < 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE)

## # A tibble: 38 × 2
##    word             n
##    <chr>        <int>
##  1 ban              7
##  2 cut              6
##  3 hate             6
##  4 lowest           6
##  5 pay              5
##  6 crime            4
##  7 cuts             3
##  8 unemployment     3
##  9 cancer           2
## 10 chaos            2
## # … with 28 more rows

Now I will look at the most postive words for McAuliffe

## # A tibble: 40 × 2
##    word          n
##    <chr>     <int>
##  1 support       6
##  2 endorsed      5
##  3 united        5
##  4 clean         4
##  5 integrity     4
##  6 proud         4
##  7 care          3
##  8 huge          3
##  9 agree         2
## 10 benefits      2
## # … with 30 more rows

One of the most common negative words Terry McAuliffe used throughtout the debate was ‘hate’.

This Quote sums it up perfectly. He says “His education plan would cut 43,000 teachers. I really hate like, he’s talking about critical race theory that’s not taught in our schools. But what I hate is this is a big dog whistle. I really hate it.”

McAuliffe uses the word hate 3 times in just this short quote.

Youngkin Afinn Analysis

youngkin_afinn %>% 
  filter(value < 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE) %>%
  wordcloud2()

youngkin_afinn %>% 
  filter(value < 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE)

## # A tibble: 33 × 2
##    word          n
##    <chr>     <int>
##  1 difficult     5
##  2 anti          3
##  3 murder        3
##  4 cut           2
##  5 death         2
##  6 rape          2
##  7 risk          2
##  8 shortages     2
##  9 stalled       2
## 10 terribly      2
## # … with 23 more rows

youngkin_afinn %>% 
  filter(value > 0) %>% 
  arrange(desc(+value)) %>% 
  count(word, sort= TRUE) %>% 
  wordcloud2()

youngkin_afinn %>% 
  filter(value > 0) %>% 
  arrange(desc(+value)) %>% 
  count(word, sort= TRUE)

## # A tibble: 38 × 2
##    word            n
##    <chr>       <int>
##  1 encourage       5
##  2 clean           4
##  3 comfortable     4
##  4 trust           4
##  5 embrace         3
##  6 protect         3
##  7 safe            3
##  8 support         3
##  9 ability         2
## 10 authority       2
## # … with 28 more rows

It seems as if Youngkin made it a strong point to talk about law enforcement, and the dangers that could be made with certain democratic attitudes towards the police. Many of the negative words he use relate to this concept,such as rape, death, and murder and “law” was in the top ten of most common overall words for Youngkin.

An important quote was made to illustrate this when policing was brought up during the debate. Youngkin says, “Thank you for this most important question, because Virginia today is at a 20-year high murder rate. Now, my opponent’s not surprised by that, because when he was governor, the murder rate went up 43%. We see across law enforcement agencies in the Commonwealth of Virginia, an absolute depletion of resources, manpower, shortages, equipment shortages, we have a funding problem, but we also have a morale problem.”

Clearly, this was an important topic during the debate.

Bing Analysis

This part of the project will utitilize the bing lexicon to anaylze the sentiments.The bing lexicon is bimodal, meaning it rates words as either positive or negative.

youngkin_words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("bing")) %>% 
  group_by(sentiment, sort=TRUE) %>%
  count() %>%
ggplot(aes(sentiment,n, fill = sentiment)) + geom_col() + coord_flip() +
  ylab("Count") + ggtitle("Youngkin Bing Seniment Analysis") +
  geom_text(aes(label=n), hjust =2.5,vjust=0, color="black", size=3.5) -> youngkinBing

## Joining, by = "word"
## Joining, by = "word"

mcauliffe_words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("bing")) %>% 
  group_by(sentiment, sort=TRUE) %>%
  count() %>%
  ggplot(aes(sentiment,n, fill = sentiment)) + geom_col() + coord_flip() +
  ylab("Count") + ggtitle("McAuliffe Bing Seniment Analysis") +
  geom_text(aes(label=n), hjust =2.5,vjust=0, color="black", size=3.5) -> mcauliffeBing

## Joining, by = "word"
## Joining, by = "word"

grid.arrange(youngkinBing, mcauliffeBing, ncol = 2)

The bing analysis is actually quite different from the afinn analysis. It actaully shows that Youngkin had a much more negative sentiment than McAuliffe. It appears that overall Terry McAuliffe had a neutral sentiment, as he spoke almost the same amount of postive and negative words. However, when Glenn Youngkin was examined it showed that he used a considerable amount of more negative words.

Conclusion

Overall, the results from the analysis were fairly interesting and somewhat contradictory in nature. The Afinn lexicon showed that Youngkin had a more positive sentiment than McAuliffe, while the bing lexicon showed that McAuliffe had a neutral sentiment compared to McAuliffe having a very negative sentiment. The afinn lexicon assigns a value to each word, while the bing lexicon is bimodal.Terry McAuliffe seemed to have a more neutral sentiment overall, while there was a lot of variance in the sentiment of Glenn Youngkin.

During my introduction I mentioned that Youngkin would most likely have a more negative sentiment since the incumbent administration is democratic. An additional factor that may have influenced that is that Terry McAuliffe was actually the governor of Virginia from 2014 and 2018. Certainly, Youngkin wanted to paint a negative review of his first term as governor.

With that being said, it doesn’t seem like any clear conclusions can be made that would support or go against my hypothesis. Certainly, the bing lexicon provides a compelling case for Youngkin being more negative overall. However, the Afinn lexicon does show a more postive sentiment for Youngkin. Further anaylsis would need to be done to draw any hard conclussions.

**Sources

https://rpubs.com/emilyrogers/albumanalysis

https://rpubs.com/paigeminsky/presidential_debate_analysis title: Text-debate.R author: rstudio-user date: ‘2021-10-20’