Text Analysis of the 2022 Texas Governor Debate

Introduction

As the United States is nearing the 2022 Midterm Elections, citizens become more and more aware of the importance of nationwide, state and local races. Although midterm elections often have a lower voter turnout, the most recent midterms in 2018 revealed a rise in voter turnout, specifically with Gen-Z voters. Political polarization has increased significantly in the 21st century, specifically since 2016, which is why many people have discussed how important it is to vote in this election. This election course is made up of many different races on the national, state, county and local levels. Analysis Think-Tank FiveThirtyEight states that 44 percent of the country’s population currently lives in one of the 25 states controlled by Republicans; 38 percent reside in one of 16 states that are controlled by Democrats. However, with 88 state-legislative seats and 36 governorship on the line, control on the state level could change drastically.

Texas is one state with a polarizing governor race this midterm. Democrat Beto O’Rourke is challenging Incumbent Republican candidate Governor Greg Abbott for the governor position. According to FiveThirtyEight, Texas is currently controlled by Republicans in the state House of Representatives, Senate, and Governorship. However, if O’Rourke were to win this race, control would be split.

As a Texan who voted absentee in this 2022 midterm elections, I was interested in analyzing the debate between Abbott and O’Rourke for the governor position. This debate took place on September 30, 2022, and featured questions covering a variety of polarizing issues for Texans: gun safety, border control, education, environmental protection, and abortion rights. I decided to analyze how each candidate responded to polarizing issues and their sentiment throughout the debate.

Hypothesis

By comparing the word count, word choice and sentiment of each candidate for governor in the Texas Governor’s debate that took place prior to the 2022 Midterm Elections, it can be predicted that the overall sentiment will be negative.

Process

First, I searched for a Texas Gubernatorial (Governor) Debate transcript. However, there was not a published transcript available, so I used Otter.com to transcribe a video of the full debate. Once Otter transcribed the video, I uploaded all of the words used by each candidate into Excel and exported it as a CSV file called “Texas Debate Data - Sheet1.csv”. Then, I imported the CSV file into R and loaded the packages necessary to begin my analysis.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(tidytext)
library(ggthemes)
library(wordcloud2)
library(readr)
library(gridExtra)

## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine

Texas_Debate_Data_Sheet1 <- read_csv("~/Desktop/governor debate project/Texas Debate Data - Sheet1.csv")

## Rows: 2 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Speaker, Text
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Word Count

After loading the required packages and importing my data, I separated the text by speaker into individual words. Then, I calculated the total word count for each candidate.

Texas_Debate_Data_Sheet1 %>% unnest_tokens(word, Text) %>% 
  filter(Speaker %in% "O'Rourke")-> o_rourke_words

count(o_rourke_words)

## # A tibble: 1 × 1
##       n
##   <int>
## 1  3484

Texas_Debate_Data_Sheet1 %>% unnest_tokens(word, Text) %>% 
  filter(Speaker %in% "Abbott")-> abbott_words

count(abbott_words)

## # A tibble: 1 × 1
##       n
##   <int>
## 1  3293

It is clear from analyzing the word count for each candidate that the Democratic Party candidate Beto O’Rourke spoke 3,484 words whereas the incumbent Republican Party candidate Greg Abbott spoke 3,293 words. In a text analysis project of the 2020 Presidential debate analyst and author Paige Minsky made the argument that the difference in word count in a debate could be due to candidates speaking out of turn. Although O’Rourke only spoke about 200 more words than Governor Abbott, it could be because O’Rourke spoke out of turn, or that he continued speaking past his turn. However, both candidates spoke out of turn on multiple occasions and under 200 words is a small margin, so I’m not sure that argument can be made here. The one conclusion that can be made from this data is that speaking out of turn may have contributed, but that O’Rourke just spoke more words than Abbott.

Next, we removed the “stop words” from the text, which are just words used as conjunctions, pronouns, and other commonly used words to complete a thought or sentence. Some examples of stop words include “the”, “and”, “or”, and “is”. It’s important to remove these words in order to analyze the true content in this debate. Therefore, we removed all stop words and created a word cloud for each candidate with the remaining words.

First, we created a word cloud for all of O’Rourke’s words except for the stop words. Based on this word cloud, we also found the 10 most frequent words used by Beto O’Rourke.

o_rourke_words %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words) %>% 
  wordcloud2()

## Joining, by = "word"

o_rourke_words %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words) %>% 
  head(10)

## Joining, by = "word"

## # A tibble: 10 × 2
##    word         n
##    <chr>    <int>
##  1 governor    25
##  2 texas       22
##  3 greg        12
##  4 property    12
##  5 people      11
##  6 abbott      10
##  7 care        10
##  8 law         10
##  9 taxes       10
## 10 billion      9

Next, we repeated the process for Governor Abbott.

abbott_words %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words) %>% 
  wordcloud2()

## Joining, by = "word"

abbott_words %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words) %>% 
  head(10)

## Joining, by = "word"

## # A tibble: 10 × 2
##    word            n
##    <chr>       <int>
##  1 texas          29
##  2 beto           14
##  3 property       14
##  4 border         12
##  5 law            12
##  6 people         12
##  7 issue          10
##  8 police         10
##  9 power          10
## 10 enforcement     9

We can also visualize this data in bar graph form. We’ve used blue and red to match the colors associated with each political party.

o_rourke_words %>% 
  anti_join(stop_words) %>% 
  count(word, sort = TRUE) %>% 
  head(12) %>% 
  ggplot(aes(reorder(word, n), n)) + geom_col() + coord_flip() +
  theme_economist() + ggtitle("O'Rourke's Most Frequent Words") + 
  xlab("Word") + ylab("Count") + geom_bar(stat="identity", fill = "blue")

## Joining, by = "word"

Here we can see that O’Rourke’s most common word is “governor” which makes sense given that he is running to be the next Governor of Texas. In the debate, he often uses the word “governor” to criticize the actions of Governor Abbott and discuss his plans if he were to be elected. For example, in the first few minutes of the debate O’Rourke said “What we just heard from the governor is what we’ll likely hear throughout this debate; he’s going to blame people like President Biden.” This quote is just one example of how O’Rourke used the word governor throughout the debate. Some of the other most common words are “property”, “taxes”, and “children”, all of which are keywords to top issues for Texans during the midterm elections this year.

Next, we repeated the process for Abbott.

abbott_words %>% 
  anti_join(stop_words) %>% 
  count(word, sort = TRUE) %>% 
  head(12) %>% 
  ggplot(aes(reorder(word, n), n)) + geom_col() + coord_flip() +
  theme_economist() + ggtitle("Abbott's Most Frequent Words") + 
  xlab("Word") + ylab("Count") + geom_bar(stat="identity", fill = "red")

## Joining, by = "word"

Similar to how Beto O’Rourke refers to his opponent, Governor Abbott often refers to his opponent as “Beto”, which is O’Rourke’s first name. “Beto” ends up being one of Abbott’s most frequent words throughout the debate. For example, Abbott criticizes his opponent in the first few minute of the debate by saying “Beto would take us down.” It is interesting how much each candidate focuses on his opponent; however, in a polarizing election such as this one, it is not surprising. Abbott’s other most frequent words focus on the role of law enforcement, the border, and police — all of which are priority issues for the Republican Party.

Sentiment

Now we’ll examine the sentiment of each candidate throughout the debate. Sentiment refers to a word’s positive or negative value. This analysis will measure the sentiment of each candidate’s words through three different Lexicons, or dictionaries. The two lexicons we will analyze are “afinn” and “nrc”. Each of these dictionaries reveals different information and analysis about the words spoken during the debate. This step is crucial to understanding the historical context of this election, and it may also reflect the attitude of each candidate and their supporters.

AFINN Lexicon

First, let’s find the average sentiment value of each candidate’s words spoken in the debate. We can do this using the afinn lexicon, which measures the average sentiment of a word on a 15 point scale, ranging from -7 to +7. In this case, -7 is the most negative a word can be, and +7 is the most positive it can be. If the value is 0, it is considered to be a completely neutral word. Let’s start by analyzing O’Rourke’s words—minus the stop words—and find the average sentiment of O’Rourke’s words used during the debate.

o_rourke_words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("afinn")) -> orourke_afinn

## Joining, by = "word"
## Joining, by = "word"

mean(orourke_afinn$value)

## [1] -0.398773

In this case, the average afinn score for Beto O’Rourke is -0.398773. This means that the words used by Beto O’Rourke during the debate had a slightly negative average sentiment. Most likely the afinn is leaning slightly negative because this election is very polarizing and the issues are somewhat somber and dangerous issues including abortion rights, border security, and high taxes—most of which all have somewhat of a negative emotional association.

abbott_words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("afinn")) -> abbott_afinn

## Joining, by = "word"
## Joining, by = "word"

mean(abbott_afinn$value)

## [1] -0.2844037

The average afinn score for Governor Greg Abbott is -0.2844037. This means that the words used by Abbott during the debate had a slightly negative average sentiment, but it is not quite as negative as Beto O’Rourke’s average afinn sentiment. This is most likely because O’Rourke is the candidate looking to replace the incumbent and is very critical of Abbott’s policies and behavior during his time in office.

The afinn lexicon also allows us to look at each candidate’s most common negative and positive words spoken throughout the debate. We can visualize this data with word clouds and a list of a candidate’s most frequent positive and negative words. First, we’ll look at Beto O’Rourke’s most frequent negative words.

orourke_afinn %>% 
  filter(value < 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE) %>% 
  wordcloud2()

orourke_afinn %>%
  filter(value < 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words) %>% 
  head(10)

## Joining, by = "word"

## # A tibble: 10 × 2
##    word        n
##    <chr>   <int>
##  1 failure     8
##  2 failed      5
##  3 pay         5
##  4 rape        4
##  5 blame       3
##  6 death       3
##  7 gun         3
##  8 killed      3
##  9 lost        3
## 10 victims     3

As we can see, many of these words are very violent. We know that O’Rourke is quite critical of Abbott, and can therefore assume the use of the word “failed” refers to how O’Rourke believes Governor Abbott “failed” Texans. Some of the other words like “rape”, “death”, and “gun” are directly related to some of the most pressing issues Texans are facing like abortion rights and gun safety.

Next, let’s use the afinn lexicon to review Governor Abbott’s most frequent negative words.

abbott_afinn %>% 
  filter(value < 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE) %>% 
  wordcloud2()

abbott_afinn %>%
  filter(value < 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words) %>% 
  head(10)

## Joining, by = "word"

## # A tibble: 10 × 2
##    word          n
##    <chr>     <int>
##  1 pay           4
##  2 angry         2
##  3 crime         2
##  4 criminals     2
##  5 emergency     2
##  6 failure       2
##  7 gun           2
##  8 illegal       2
##  9 killed        2
## 10 lowest        2

It’s no surprise that many of the negative words displayed in these word clouds are used by both Greg Abbott and Beto O’Rourke. However, the most common words for each candidate reveal their different focuses for progress in Texas. It seems from this data that Abbott is more focused on border protection from “illegal” immigrants, while O’Rourke is more focused on protecting Texans from “rape” and “gun” violence. These conclusions can be drawn based on party-issue alignment in Texas.

Next, let’s review the most frequent positive words used by Beto O’Rourke.

orourke_afinn %>% 
  filter(value > 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE) %>% 
  wordcloud2()

orourke_afinn %>%
  filter(value > 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words) %>% 
  head(10)

## Joining, by = "word"

## # A tibble: 10 × 2
##    word          n
##    <chr>     <int>
##  1 care         10
##  2 solutions     6
##  3 share         5
##  4 safe          4
##  5 ensure        3
##  6 justice       3
##  7 ability       2
##  8 expand        2
##  9 fair          2
## 10 honor         2

Finally, let’s review the most frequent positive words used by Greg Abbott.

abbott_afinn %>% 
  filter(value > 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE) %>% 
  wordcloud2()

abbott_afinn %>%
  filter(value > 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words) %>% 
  head(10)

## Joining, by = "word"

## # A tibble: 10 × 2
##    word           n
##    <chr>      <int>
##  1 care           8
##  2 support        4
##  3 secure         3
##  4 supports       3
##  5 ability        2
##  6 determined     2
##  7 easy           2
##  8 matter         2
##  9 united         2
## 10 won            2

It’s interesting how many of these words overlap between the two candidates. Both candidates want to show viewers that they care about Texans, which we can assume is the reason behind “care” being the most frequent positive word from both O’Rourke and Abbott. It can also be observed that each candidate is focused on prioritizing different issues from their use of language: From the use of words like “Justice” and “Honor” O’Rourke is more likely focused the progression of reproductive rights following the reversal of Roe v. Wade which took place in June 2022. From the use of words like “secure”, it seems like Abbott is more focused on border control than on reproductive rights, which also aligns with the perspectives of his campaign.

Next, we merged the afinn data for O’Rourke and Abbott to view the sentiment value of their most frequent words. This allows us to easily compare the sentiment of each candidate.

orourke_afinn %>% 
  full_join(abbott_afinn) -> merged

## Joining, by = c("Speaker", "word", "value")

merged %>% 
  count(word, Speaker, value, sort = TRUE) %>% 
  filter(n > 2) %>% 
  ggplot(aes(reorder(word, n), n, fill = value)) + geom_col() + coord_flip() + facet_wrap(~Speaker, scales = "free_y")

This graph was created using merged afinn data from O’Rourke and Abbott. This is useful as it allows us to understand the overall afinn value of each word and how it was used by each candidate. This graph measires words that were used at least two or more times by either candidate. You can see that the sentiment is represented by the color scale on the righthand side of the graph. This is extremely useful as it resembles the focus of each candidate. Abbott’s language had a much more positive sentiment overall, which is him most likely wanting to create a position frame of his time in office. O’Rourke used language with a much more negative sentiment which is evident through the use of the words “rape”, “victims” and “killed”. As we have previously discussed, O’Rourke’s increased negative sentiment is most likely due to his criticism of Abbott’s time in office and focus on protecting reproductive rights.

AFINN Conclusion

Overall, each candidate’s sentiment revealed their party’s and campaign’s priorities. Although O’Rourke’s language contained a more negative sentiment than Abbott, we can assume this is due to his criticism of Abbott’s time as governor and the political direction in which Texas is heading.

NRC Lexicon

NRC Lexicon is another way of processing sentiment that categorizes words into eight emotions: joy, trust, surprise, anticipation, anger, fear, sadness, and disgust. NRC also sorts word into positive or negative sentiment, which will always have the largest values because other emotions fall into these categories in addition to the other eight emotions.

We will first use “inner_join()” to collect the sentiment of each candidate’s chosen words. Once this step is complete we can understand more about the emotions shared by each candidate’s spoken words.

o_rourke_words %>% 
  inner_join(get_sentiments("nrc")) -> orourke_nrc

## Joining, by = "word"

orourke_nrc %>% 
  group_by(sentiment) %>% 
  count(sentiment, sort = TRUE) %>% 
  head(10) -> orourke_top_ten

abbott_words %>% 
  inner_join(get_sentiments("nrc")) -> abbott_nrc

## Joining, by = "word"

abbott_nrc %>% 
  group_by(sentiment) %>% 
  count(sentiment, sort = TRUE) %>% 
  head(10) -> abbott_top_ten

ggplot(orourke_top_ten, aes(reorder(sentiment, n),n)) + geom_col() + 
  coord_flip() + xlab("Sentiment") + ylab("Count") +geom_bar(stat="identity", fill="blue")+
  ggtitle("O'Rourke NRC") + theme_economist() +geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)  -> orourke_nrc_plot

ggplot(abbott_top_ten, aes(reorder(sentiment, n),n)) + geom_col() + 
  coord_flip() + xlab("Sentiment") + ylab("Count") +geom_bar(stat="identity", fill="red")+
  ggtitle("Abbott NRC") + theme_economist() +geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)  -> abbott_nrc_plot

grid.arrange(orourke_nrc_plot, abbott_nrc_plot, ncol = 2)

This graph shows the NRC breakdown of sentiment for each candidate in the debate. Both O’Rourke and Abbott have “positive” sentiment as their largest category, which can be expected given that other categories also fall under the umbrella of “positive” sentiment. The six highest categories are identical for each candidate: positive, trust, negative, fear, anticipation, and sadness. It can be expected that “trust” is the second largest category for both candidates as they are each trying to prove themselves as trustworthy to Texans. We can assume that O’Rourke’s seventh category is “anger” instead of “joy” due to his ongoing frustration with Abbott’s administration.

NRC Conclusion

O’Rourke and Abbot has similar results when it came to using the NRC lexicon to determine a more complex understanding of sentiment. Overall, each candidate used language that was deemed mostly positive. This contradicts the average (mean) sentiment calculated by the afinn lexicon in the previous section.

Conclusion

The 2022 Midterm Elections are extremely crucial to determining the future course of the United States. One of the most important races is the election for Governor of Texas. The purpose of this project was to analyze text between Beto O’Rourke and Greg Abbott during the 2022 Texas Governor Debate.

We first found the total word count for each candidate. Then, we used that data to find the most common words spoken by Abbott and O’Rourke.

We then analyzed the sentiment of the words used by each candidate using two lexicons — Afinn and NRC. We saw from the Afinn lexicon that each candidate used language that contained an average sentiment that was slightly negative, but overall O’Rourke’s words were more negative than Abbott’s words. The NRC lexicon analyzed sentiment of Abbott’s and O’Rourke’s language across eight different emotions in addition to positive and negative sentiment. Through that analyzation, the NRC lexicon revealed that a majority of the emotions overlapped across each candidate, one of which was trust.

This analysis tells us a lot about each candidate and their attitude toward the debate, Texans and the election. It was clear from our analysis that each candidate chose to focus their language on issues that aligned with their political agenda: Abbott focused a majority of his language on border security and immigration to Texas while O’Rourke focused more on abortion-rights and gun safety. This project allows us to learn more about each candidate and how their language is a reflection of their political perspective. We can conclude that our hypothesis was correct, and that this overall slightly negative sentiment is in line with the polarization reflected in the 2022 Midterm Elections.