output: html_document: default pdf_document: default ````
These are the packages that are loaded into R for this project
library(tidyverse)
library(tidytext)
library(genius)
library(wordcloud2)
library(readr)
library(ggplot2)
library(ggthemes)
library(gridExtra)
library(rmarkdown)
library(readr)
For my text project, I plan on analyzing the first 2021 Virginia governors debate. More specifically, I will examine the words used during the debate to see if their is a more negative or positive sentiment in the words used. The two candidates running to replace democrat incumbent Ralph Northam are Glenn Youngkin, a Republican, and Terry McAuliffe who is a Democrat. Since the previous administration was democratic, and the president is a democrat I predict that the republican candidate(Youngkin) will be more negative and have a more negative sentiment. Since it is the opposing party in power I believe Youngkin will want to frame a picture that the democratic administration is not doing well. He will focus on the mistakes, which should lead to a more negative sentiment. There will be several different methods used in this analysis. The first part will be more generic, and focus more on the most common words. The second half of the project will do a deeper analysis using different lexicons.
The transcript was pulled from this url: https://www.wtvr.com/news/local-news/full-transcription-of-virginias-first-gubernatorial-debate
The first part of the project will examine the overall word counts for each candidate, and take a closer look at the most common words that were said.
First, I will look at Terry McAuliffe
library(readr)
McAuliffe <- read_delim("McAuliffe.txt",
delim = ";", escape_double = FALSE, col_names = FALSE,
trim_ws = TRUE)
colnames(McAuliffe)[1] <- "text"
McAuliffe %>%
unnest_tokens(word, text) -> mcauliffe_words
mcauliffe_words %>%
count()
## # A tibble: 1 × 1
## n
## <int>
## 1 3725
The total word count for McAuliffe is 3725.
## Joining, by = "word"
## # A tibble: 568 × 2
## word n
## <chr> <int>
## 1 virginia 29
## 2 governor 23
## 3 jobs 15
## 4 trump 14
## 5 people 13
## 6 america 12
## 7 plan 12
## 8 day 11
## 9 education 10
## 10 commonwealth 9
## # … with 558 more rows
However, when the stop words (such as I or The) are removed such as I and the his word count is 568.
youngkinww <- read_delim("youngkinww.txt",
delim = ";", escape_double = FALSE, col_names = FALSE,
trim_ws = TRUE)
## Rows: 25 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr (1): X1
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(youngkinww)[1] <- "text"
youngkinww%>%
unnest_tokens(word, text) -> youngkin_words
youngkin_words %>%
count()
## # A tibble: 1 × 1
## n
## <int>
## 1 3193
The total word count for Youngkin is 3193
youngkin_words %>%
anti_join(stop_words) %>%
count(word) %>%
arrange(desc(n))
## Joining, by = "word"
## # A tibble: 578 × 2
## word n
## <chr> <int>
## 1 virginia 29
## 2 opponent 20
## 3 virginians 17
## 4 terry 12
## 5 vaccine 12
## 6 governor 11
## 7 education 10
## 8 law 10
## 9 life 10
## 10 enforcement 8
## # … with 568 more rows
Youngkin word count without stop words is 578
As we can see Terry McAuliffe spoke a considerable amount more words than Glenn Youngkin overall, but actually spoke less words when the stop words were removed. This signals that while McAuliffe may have spent more time talking, it might not have all be substantive
Next, a word cloud will be developed for each candidate, along with the 15 most frequent words used during the debate.
First For McAuliffe.
mcauliffe_words %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) %>%
filter(!word %in% c("crosstalk", "00")) -> filter_mcauliffewords
## Joining, by = "word"
filter_mcauliffewords %>%
wordcloud2()
filter_mcauliffewords %>%
head(15) %>%
ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() +
theme_dark() + ggtitle("McAuliffe's 15 Most Frequent Words :
2021 Virginia Gubernatorial Debate") + xlab("Word") + geom_bar(stat="identity", fill="#000099")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
Now for Youngkin
youngkin_words %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) %>%
filter(!word %in% c("crosstalk", "00")) -> filter_youngkinwords
## Joining, by = "word"
filter_youngkinwords %>%
wordcloud2()
filter_youngkinwords %>%
head(15) %>%
ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() +
theme_dark() + ggtitle("Youngkin's 15 Most Frequent Words :
2021 Virginia Gubernatorial Debate") + xlab("Word") + geom_bar(stat="identity", fill="#000099")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
There are definitely some generic words that make it into the list of most common words, such a virginia and governor. However, one interesting thing is Donald and Trump make it into the top 15 words for McAuliffe, but neither of those words make it into Youngkins 15 most popular. Virginia has turned into a reliably democratic state, so it is possible that McAuliffe is trying to paint Youngkin as more radical in nature, in an attempt to win over more moderate voters. Youngkin may be attempting to distance himself from Trump since the state he is running is was lost by Trump in both the 2016 and 2020 elections.
This part of the project will examine the sentiment of the candidates words by using the Afinn lexicon. This method assings certain values to words based on how positive or negative they are.
## Joining, by = "word"
## Joining, by = "word"
## [1] -0.1
McAuliffe had a mean value of -0.1. This is slightly more negative than neutral.
youngkin_words %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("afinn")) -> youngkin_afinn
## Joining, by = "word"
## Joining, by = "word"
mean(youngkin_afinn$value)
## [1] 0.1965812
Youngkin had a mean sentiment of 0.196. A more postive sentiment.
Based on the Afinn analysis, Youngkin suprisingly had a more postive sentiment compared to McAuliffe.
The next part of the project will do an analysis on the most positive and negative words for each candidate based on the affin lexicon.
First for the negative words.
mcauliffe_afinn %>%
filter(value < 0) %>%
arrange(desc(-value)) %>%
count(word, sort= TRUE) %>%
wordcloud2()
mcauliffe_afinn %>%
filter(value < 0) %>%
arrange(desc(-value)) %>%
count(word, sort= TRUE)
## # A tibble: 38 × 2
## word n
## <chr> <int>
## 1 ban 7
## 2 cut 6
## 3 hate 6
## 4 lowest 6
## 5 pay 5
## 6 crime 4
## 7 cuts 3
## 8 unemployment 3
## 9 cancer 2
## 10 chaos 2
## # … with 28 more rows
Now I will look at the most postive words for McAuliffe
## # A tibble: 40 × 2
## word n
## <chr> <int>
## 1 support 6
## 2 endorsed 5
## 3 united 5
## 4 clean 4
## 5 integrity 4
## 6 proud 4
## 7 care 3
## 8 huge 3
## 9 agree 2
## 10 benefits 2
## # … with 30 more rows
One of the most common negative words Terry McAuliffe used throughtout the debate was ‘hate’.
This Quote sums it up perfectly. He says “His education plan would cut 43,000 teachers. I really hate like, he’s talking about critical race theory that’s not taught in our schools. But what I hate is this is a big dog whistle. I really hate it.”
McAuliffe uses the word hate 3 times in just this short quote.
youngkin_afinn %>%
filter(value < 0) %>%
arrange(desc(-value)) %>%
count(word, sort= TRUE) %>%
wordcloud2()
youngkin_afinn %>%
filter(value < 0) %>%
arrange(desc(-value)) %>%
count(word, sort= TRUE)
## # A tibble: 33 × 2
## word n
## <chr> <int>
## 1 difficult 5
## 2 anti 3
## 3 murder 3
## 4 cut 2
## 5 death 2
## 6 rape 2
## 7 risk 2
## 8 shortages 2
## 9 stalled 2
## 10 terribly 2
## # … with 23 more rows
youngkin_afinn %>%
filter(value > 0) %>%
arrange(desc(+value)) %>%
count(word, sort= TRUE) %>%
wordcloud2()
youngkin_afinn %>%
filter(value > 0) %>%
arrange(desc(+value)) %>%
count(word, sort= TRUE)
## # A tibble: 38 × 2
## word n
## <chr> <int>
## 1 encourage 5
## 2 clean 4
## 3 comfortable 4
## 4 trust 4
## 5 embrace 3
## 6 protect 3
## 7 safe 3
## 8 support 3
## 9 ability 2
## 10 authority 2
## # … with 28 more rows
It seems as if Youngkin made it a strong point to talk about law enforcement, and the dangers that could be made with certain democratic attitudes towards the police. Many of the negative words he use relate to this concept,such as rape, death, and murder and “law” was in the top ten of most common overall words for Youngkin.
An important quote was made to illustrate this when policing was brought up during the debate. Youngkin says, “Thank you for this most important question, because Virginia today is at a 20-year high murder rate. Now, my opponent’s not surprised by that, because when he was governor, the murder rate went up 43%. We see across law enforcement agencies in the Commonwealth of Virginia, an absolute depletion of resources, manpower, shortages, equipment shortages, we have a funding problem, but we also have a morale problem.”
Clearly, this was an important topic during the debate.
This part of the project will utitilize the bing lexicon to anaylze the sentiments.The bing lexicon is bimodal, meaning it rates words as either positive or negative.
youngkin_words %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("bing")) %>%
group_by(sentiment, sort=TRUE) %>%
count() %>%
ggplot(aes(sentiment,n, fill = sentiment)) + geom_col() + coord_flip() +
ylab("Count") + ggtitle("Youngkin Bing Seniment Analysis") +
geom_text(aes(label=n), hjust =2.5,vjust=0, color="black", size=3.5) -> youngkinBing
## Joining, by = "word"
## Joining, by = "word"
mcauliffe_words %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("bing")) %>%
group_by(sentiment, sort=TRUE) %>%
count() %>%
ggplot(aes(sentiment,n, fill = sentiment)) + geom_col() + coord_flip() +
ylab("Count") + ggtitle("McAuliffe Bing Seniment Analysis") +
geom_text(aes(label=n), hjust =2.5,vjust=0, color="black", size=3.5) -> mcauliffeBing
## Joining, by = "word"
## Joining, by = "word"
grid.arrange(youngkinBing, mcauliffeBing, ncol = 2)
The bing analysis is actually quite different from the afinn analysis. It actaully shows that Youngkin had a much more negative sentiment than McAuliffe. It appears that overall Terry McAuliffe had a neutral sentiment, as he spoke almost the same amount of postive and negative words. However, when Glenn Youngkin was examined it showed that he used a considerable amount of more negative words.
Overall, the results from the analysis were fairly interesting and somewhat contradictory in nature. The Afinn lexicon showed that Youngkin had a more positive sentiment than McAuliffe, while the bing lexicon showed that McAuliffe had a neutral sentiment compared to McAuliffe having a very negative sentiment. The afinn lexicon assigns a value to each word, while the bing lexicon is bimodal.Terry McAuliffe seemed to have a more neutral sentiment overall, while there was a lot of variance in the sentiment of Glenn Youngkin.
During my introduction I mentioned that Youngkin would most likely have a more negative sentiment since the incumbent administration is democratic. An additional factor that may have influenced that is that Terry McAuliffe was actually the governor of Virginia from 2014 and 2018. Certainly, Youngkin wanted to paint a negative review of his first term as governor.
With that being said, it doesn’t seem like any clear conclusions can be made that would support or go against my hypothesis. Certainly, the bing lexicon provides a compelling case for Youngkin being more negative overall. However, the Afinn lexicon does show a more postive sentiment for Youngkin. Further anaylsis would need to be done to draw any hard conclussions.
**Sources
https://rpubs.com/emilyrogers/albumanalysis
https://rpubs.com/paigeminsky/presidential_debate_analysis title: Text-debate.R author: rstudio-user date: ‘2021-10-20’