Text Analysis of The 2020 Presidential Debates

Word Count Comparison

First, I want to examine the word count for each candidate for the second debate. I first do this by separating the lines of text into individual words where I then can count the number of words spoken by each candidate. To do this, I used ‘unnest_tokens()’ as well as the ‘count()’ function.

Biden_SecondDebate%>% 
  unnest_tokens(word, X1) %>% 
  mutate(word = gsub("\u2019", "'", word)) ->biden_words2

count(biden_words2)

Joe Biden Word Count: Second Debate
n
7009

Trump_SecondDebate %>% 
  unnest_tokens(word, X1) %>% 
  mutate(word = gsub("\u2019", "'", word))->trump_words2

count(trump_words2)

Donald Trump Word Count: Second Debate
n
7808

Counting each word the candidates spoke at the debate allows us to say that Donald Trump spoke more words at the second Presidential Debate when compared to Joe Biden. Donald Trump spoke a total of 7,808 words, where Joe Biden spoke 7,009. Trump spoke nearly 800 more words than Joe Biden.

Let’s compare this to the first Presidential Debate and see if the same results occurred.

Expanding on the work from my previous anlaysis of the first debate and expanding the work onto the second debate, we can now compare the two.

Word Count Findings

Both Candidates Spoke More

Comparing the word count of each candidate for each debate, we can see that both candidates spoke more in the second debate than the first debate. A reason for this can possibly be the change in mic usage. According to the NY Times, “during the first two minutes each candidate speaks in each of the six 15-minute segments, his opponent’s microphone will be muted”.This change in how the debate took place could possibly account for the increase of words both candidates said due to decrease amount of cross over talking. Again, this can not directly be the reason but it is something worth considering.

Trump Spoke More than Biden in Both Debates

When comparing both presidential debates, it is also worth noting that Donald Trump had a higher word count when compared to Joe Biden in each debate. However, in the first debate, the gap between word counts was 594 words, where the difference between word counts in the second debate was larger, 799 words. This means that the difference between which candidate spoke more was greater in the second debate. This is an interesting finding due to the muting of the mics.

Distinct Word Count

Now that we have compared overall word counts for each candidate, it is important to filter out filler words or ‘stop words’ since they do not provide too much substantiate for sentiment analysis. Like in the previous project, we can use ‘anti_join()’ to filter out stop words. It is also important to filter out the words ‘crosstalk’ and ‘00’ from both candidates transcripts. Even though there are minor instances of these in the transcript, when compared to the first debate, it is still important to not include them for the overall analysis. These words represent the crosstalk that happened at the debate and are not useful, hence why we can filter them out.

biden_words2 %>% 
  anti_join(stop_words) %>% 
  count(word, sort = TRUE) %>% 
  filter(!word %in% c("crosstalk", "00")) -> filtered_biden2

trump_words2 %>% 
  anti_join(stop_words) %>% 
  count(word, sort = TRUE) %>% 
  filter(!word %in% c("crosstalk", "00")) -> filtered_trump2

Now, if we were to count these filtered words for each candidate, we should get a different number for word count. This is because removing stop words allows us to analysis the number of distinct words each candidate said. This can also be viewed as examining if either candidate repeats multiple words. Like we did before, we can use the ‘count()’ function to examine the number of distinct words each candidate said in the second debate.

n_distinct(filtered_biden2)

Joe Biden Filtered Word Count: Second Debate
n
993

n_distinct(filtered_trump2)

Donald Trump Filtered Word Count: Second Debate
n
864

Distinct Word Count Findings

Biden Spoke More Distinct Words

When comparing both debates, we can see that Joe Biden spoke more distinct words when compared to Donald Trump in both debates. Again, one must note that even though Donald Trump spoke more words total, Joe Biden still spoke more distinct words. One can possibly draw the conclusion that Donald Trump repeats words multiple times will speaking, hence the increased total word count but less distinct words overall. Again, this is a theory as to why Trump speaks less distinct words, but is interesting to notice and think about.

Wordclouds for Each Candidate for Both Debates

Wordclouds are a way to visualize the words both candidates spoke during each debate. The words that have a greater frequency are larger and words that were spoke less, or have a lesser frequency, are smaller in size. Using the ‘wordcloud2()’ package, we can easily create these for each candidate.

colorVec = rep(c('blue'), length.out=nrow(demoFreq))
wordcloud2(filtered_biden2, color = colorVec, fontWeight = "bold",size = 2, minRotation = -pi/6, maxRotation = -pi/6, rotateRatio = 1)

word	n
people	42
president	28
china	22
plan	16
time	15
money	14
american	13
talking	13
united	12
country	11

colorVec = rep(c('red'), length.out=nrow(demoFreq))
wordcloud2(filtered_trump2, color = colorVec, fontWeight = "bold",size = 2, minRotation = -pi/6, maxRotation = -pi/6, rotateRatio = 1)

word	n
people	47
joe	31
money	28
china	20
president	20
russia	20
country	19
million	17
lot	15
ago	14

These wordclouds allow us to examine all the words each candidate said, but we can take a closer look at the top 10 words each candidate said using a bar graph.

Top Ten Words for Each Candidate

Then, using the ‘ggplot()’ function we can visualize the top ten words each candidate said. Again, we can compare this to the first debate to see if the candidates top ten words changed from the first debate.

filter_bidenwords %>% 
  head(10) %>% 
  ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() + 
  theme_economist() + ggtitle(label =  "Biden's 10 Most Frequent Words", subtitle = "1st Debate" ) +
  xlab("Word") + geom_bar(stat="identity", fill="#000099")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5) 
filter_trumpwords %>% 
  head(10) %>% 
  ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() + 
  theme_economist() + ggtitle("Trump's 10 Most Frequent Words", subtitle = "1st Debate" ) + xlab("Word") + geom_bar(stat="identity", fill="#8b0000")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5) 
filtered_biden2 %>% 
  head(10) %>% 
  ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() + 
  theme_economist() + ggtitle("Biden's 10 Most Frequent Words", subtitle = "2nd Debate" ) +    xlab("Word") + geom_bar(stat="identity", fill="#000099")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5) 
filtered_trump2 %>% 
  head(10) %>% 
  ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() + 
  theme_economist() + ggtitle("Trump's 10 Most Frequent Words", subtitle = "2nd Debate" ) + xlab("Word") + geom_bar(stat="identity", fill="#8b0000")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)

Top Ten Most Frequent Word Findings

Popularity of the Word: ‘People’

For both candidates in both debates, the word people is the most common word. For a presidential debate, this finding makes sense since both candidates are addressing the American people.

Popularity of the Word: ‘China’

When examining the different words between the first and second debate, it is interesting to see the addition of the word China to both candidates top 10 most frequent word lists. When compared to the first debate, one can see that China was not either of the candidates most frequent words. This would mean that discussion topics regarding China were discussed in the second debate and not in the first debate.

Common Words Across Debates

For both candidates, many of their frequent words in the first debate were repeated in the second debate. When examining Biden’s top 10 most frequent words, 4 words appeared in both debates; ‘people’, ‘president’, ‘plan’, and ‘American’. For Donald Trump, 6 words appeared in both debates; ‘people’, ‘joe’, ‘country’, million’, ‘president’, and ‘lot’.

Bi-Gram Analysis

Looking at most popular words said is interesting in itself, but exploring one step forward, we can look at the words surrounding these most popular words to analyze the context in which they were used/said. To do this, we can conduct a bi-grams analysis. Now, we will examine the text in pairs of words. In order to examine the context of common words said by both candidates, I first wanted to combine each debate text into one file so that I could examine what each candidate said over both debates. Once I had a one total text file for each candidate I then could conduct a bi-gram analysis.

To complete this analysis, we will use the ‘unnest_tokens()’ function and the ‘separate()’ function to look at only 2 words from the text at a time. In the code below, it is important to notice that I filtered out the word “crosstalk” since this word was frequently repeated and would skew the results.

Joe Biden Bi-Grams

Biden_Total_DebateText %>% 
  unnest_tokens(bigram, X1, token = "ngrams", n=2) %>% 
  count(bigram, sort = TRUE) %>% 
  separate(bigram, c("word1", "word2"), sep=" ") %>% 
  filter(!word1 %in% stop_words$word) %>% 
  filter(!word2 %in% stop_words$word) %>% 
  filter(!word1 %in% "crosstalk") %>% 
  filter(!word2 %in% "crosstalk") -> biden_bigram

biden_bigram %>% 
  filter(word1 == "people" |word2 == "people") %>%
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() + 
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'People'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
biden_bigram %>% 
  filter(word1 == "president" | word2 == "president") %>% 
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'President'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
biden_bigram %>% 
  filter(word1 == "plan" | word2 == "plan") %>% 
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'Plan'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
biden_bigram %>% 
  filter(word1 == "american" | word2 == "american") %>% 
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'American'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)

For this bi-gram analysis, I specifically examined common words that were frequent in both debates for each candidate. For example, shown in ‘Common Words Across Debates’, we were able to conclude that 4 of Biden’s top 10 most frequent words that were repeated in each debate were; ‘people’, ‘president’, ‘plan’, and ‘American’. Because of this finding, I thought it would be interesting to analyze what words were said prior and after these common words. Conducting a bi-gram analysis of these words allow us to see in what context Joe Biden was using these words across both debates.

Joe Biden Bi-Gram Findings

Word: ‘Plan’

It is interesting to see what words Joe Biden said around the word ‘plan’. This bi-gram analysis allows one to see what types of plans Joe Biden was discussing in both debates. Here we can see that in both debates, Joe Biden discussed the ‘Biden plan’ the most, followed by a ‘socialist plan’, ‘infrastructure plan’, ‘healthcare plan’ and ‘economic plan’. These findings all make sense since these are the kinds of topics and questions that are being asked during Presidential Debates.

Word: ‘American’

Seeing the high frequency Joe Biden says ‘American people’ seems correct due to Joe Biden addressing the American people for each debate.

Word: ‘People’

Looking at the words Joe Biden said surrounding ‘people’, we can assume many of these instances are regarding the current pandemic of Covid-19. The word pair, ‘people died’ is assumed that he is talking about the high number of American people that died during pandemic. We can also assume that the many numbers of people Joe Biden is saying is regarding the many statistics regarding people because of the current pandemic. Some examples could be the number of people in hospitals, number of people who have died, number of people who have lost their job, etc.,.

Donald Trump Bi-Grams

Now, we will conduct the same bi-grams analysis for Donald Trump. Again, I wanted to include the findings from earlier in the report and use the common words said across both debates. In this case, I examined words around; “people”, “Joe”, “lot”, and “millions”. Again, these four words were findings in the ‘Common Words Across Debates’ section above.

Trump_Total_DebateText %>% 
  unnest_tokens(bigram, X1, token = "ngrams", n=2) %>% 
  count(bigram, sort = TRUE) %>% 
  separate(bigram, c("word1", "word2"), sep=" ") %>% 
  filter(!word1 %in% stop_words$word) %>% 
  filter(!word2 %in% stop_words$word) %>% 
  filter(!word1 %in% "crosstalk") %>% 
  filter(!word2 %in% "crosstalk") -> trump_bigram

trump_bigram %>% 
  filter(word1 == "people" |word2 == "people") %>%
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() + 
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'People'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
trump_bigram %>% 
  filter(word1 == "joe" | word2 == "joe") %>% 
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'Joe'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
trump_bigram %>% 
  filter(word1 == "president" | word2 == "president") %>% 
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'lot'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
trump_bigram %>% 
  filter(word1 == "million" | word2 == "million") %>% 
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'millions'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)

Donald Trump Bi-Gram Findings

Word: ‘People’

Similar to how Joe Biden used the word ‘people’ across debates, we can theorize that Donald Trump used ‘people’ similarly to how Joe Biden did. For example, we can assume that Donald Trump used the word ‘people’ to generalize how people were affected during this pandemic. For example, the use of “people recover” could possibly be regarding when Donald Trump discusses American people recovering from Covid-19. We can also assume that Donald Trump is talking about Covid-19 when he used the word pair, “people died” across debates.

Word: ‘President’

It is interesting to see the high frequency of the word pairing ‘vice president’ for Donald Trump. We know from his frequency of using the word ‘Joe’ that he tends to frequently say his candidates name. This finding shows us that not only does he use Joe Biden’s first name, he also frequently calls Joe Biden by ‘vice president’ since he was a former Vice President.

Word: ‘Joe’

In this analysis, we can see how Donald Trump references Joe Biden when he calls him by his first name. It is interesting to see the most frequent word pairing for ‘Joe’ is the word ‘cages’. This can be referred to in the second debate when Kristen, the debate mediator, discusses immigration and how children are separated from their families at the border.

Bi-Gram Comparison for the Words: ‘People’ and ‘Plan’

biden_bigram %>% 
  filter(word1 == "people" |word2 == "people") %>%
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() + 
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'People'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
trump_bigram %>% 
  filter(word1 == "people" |word2 == "people") %>%
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() + 
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'People'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
biden_bigram %>% 
  filter(word1 == "plan" | word2 == "plan") %>% 
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'Plan'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
trump_bigram %>% 
  filter(word1 == "plan" | word2 == "plan") %>% 
  head(10) %>% 
  mutate(wordPair = paste(word1, word2, sep=" ")) %>% 
  ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
  theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'Plan'") +
  xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
  ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)

Bi-Gram Comparison Findings

Word: ‘People’

Looking at each candidates use of the word ‘people’ and the context in which they said it in, we can see Joe Biden’s increased frequency of using numbers and assumed statistics in both debates. Seeing this contrast, we can theorize that Joe Biden used more statistics regarding the American people when compared to Donald Trump across both debates.

Word: ‘Plan’

When looking at the bi-gram analysis for each candidate regarding the word ‘plan’, it is interesting to see the differences between both candidates. We can see that Joe Biden uses the word ‘plan’ to discuss the plans he would have for his presidency. When compared to Donald Trump, we can suggest that he does not utilize the word ‘plan’ in the same way as Joe Biden. We can suggest that Donald Trump uses more adjectives to describe plans rather than the types of plan.

Sentiment Analysis

When examining text and political speeches in general, it is important to analysis the sentiment of the speakers. A sentiment analysis can be used to determine how positive or negative each candidate spoke. Since there are multiple opinions on the sentiment of a single word, there are three major lexicons or dictionaries we can use to complete a sentiment analysis. Using all three lexicons, ‘Afinn’, ‘NRC’, and ‘Bing’, will allow us to get a well rounded idea of the sentiment of each speaker.

Afinn Lexicon

The Afinn Lexicon is utilized to conduct a sentiment analysis since the Afinn lexicon rates words on a range from -7, being the most negative, to +7 being the most positive. On the scale, 0 is symbolized as a neutral sentiment. Using the ‘inner_join()’ and ‘get_sentiments()’ functions, we can calculate the sentiment value for each word that both candidates spoke using the Afinn lexicon.

biden_words2 %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("afinn")) -> biden_afinn2
trump_words2 %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("afinn")) -> trump_afinn2

biden_afinn2 %>% 
  filter(value < 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE) %>% 
  head(10) %>% 
  ggplot(aes(reorder(word,n),n)) + geom_col() + theme_economist() + coord_flip()+
  ggtitle(label = "Biden's Most Frequent Negative Words \n Afinn Lexicon") +
  xlab("Word") + geom_bar(stat="identity", fill="#000099")+
  ylab("Count") + geom_text(aes(label=n), 
  hjust =1.5,vjust=0, color="white", size=3.5)
trump_afinn2 %>% 
  filter(value < 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE) %>% 
  head(10) %>% 
  ggplot(aes(reorder(word,n),n)) + geom_col() + theme_economist() + coord_flip()+
  ggtitle(label = "Trump's Most Frequent Negative Words \n Afinn Lexicon") + 
  xlab("Word") +geom_bar(stat="identity", fill="#8b0000")+ ylab("Count") + 
  geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)

Afinn Negative Findings

Word: ‘Excuse’

It is interesting to see the extreme use of the word, ‘excuse’, for Donald Trump. We then can question in what context he is utilizing this word. Is he blaming someone and saying it is an ‘excuse’? Again, interesting to see ‘excuse’ as his most frequent negative word according to the Afinn lexicon.

Words: ‘Worry’ and ‘Anxious’

It is also interesting to see Joe Biden use words regarding negative feelings, such as ‘worry’ and ‘anxious’. Biden could possibly be using these words to describe his feelings for the future and is using them to express his questioning of the future of America.

biden_afinn2 %>% 
  filter(value > 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE) %>% 
  head(10) %>% 
  ggplot(aes(reorder(word,n),n)) + geom_col() + theme_economist() + coord_flip()+ 
  geom_bar(stat="identity", fill="#000099") +
  ggtitle(label = "Biden's Most Frequent Positive Words \ Afinn Lexicon") + 
  xlab("Word") + ylab("Count") + 
  geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5) 
trump_afinn2 %>% 
  filter(value > 0) %>% 
  arrange(desc(-value)) %>% 
  count(word, sort= TRUE) %>% 
  head(10) %>% 
  ggplot(aes(reorder(word,n),n)) + geom_col() + theme_economist() + coord_flip()+
  ggtitle(label = "Trump's Most Frequent Positive Words \n Afinn Lexicon") + 
  xlab("Word") + geom_bar(stat="identity", fill="#8b0000")+ ylab("Count") +
  geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)

Afinn Positive Findings

Word: ‘United’ and ’Love

Through this analysis, we can see Joe Biden’s most frequent positive word in this lexicon is the word ‘united’. We can assume he is referring to the fact that this country needs to become more united in the future. We can see that Donald Trump does not frequently use the word ‘united’ in the second debate, however, Trump’s most frequent word is ‘love’.

Word: ‘Clean’

It is interesting to see how the afinn lexicon categorizes the word ‘clean’ as positive. We can assume that Joe Biden uses this word to refer to an environmental plan for the future.

Comparing each candidates most common negative and positive words is important to determine the sentiment of the speakers. Next, we will use two other lexicons to examine sentiment.

NRC Lexicon

The next lexicon we can examine is the NRC Lexicon. This lexicon sorts words based upon 8 different emotions; anger, fear, anticipation, trust, surprise, sadness, joy and disgust. With these 8 different emotions, words are also sorted into negative and positive sentiments. Since each word is categorized as negative or positive as well as some of the 8 different emotions, the count values for the negative and positive sentiment words will always be greater. This is important to remember when examining the results of a NRC sentiment analysis. Similar to the Afinn analysis before, we will use the ‘inner_join()’ function to join the candidates words with the NRC lexicon.

filtered_biden2 %>% 
  inner_join(get_sentiments("nrc")) -> biden_nrc2

biden_nrc2 %>% 
  group_by(sentiment) %>% 
  count(sentiment, sort = TRUE) %>% 
  head(10) -> biden_nrc_topten

filtered_trump2 %>% 
  inner_join(get_sentiments("nrc")) -> trump_nrc2

trump_nrc2 %>% 
  group_by(sentiment) %>% 
  count(sentiment, sort = TRUE) %>% 
  head(10) -> trump_nrc_top_ten

ggplot(biden_nrc_topten, aes(reorder(sentiment, n),n)) + geom_col() + 
  coord_flip() + xlab("Sentiment") + ylab("Count") + 
  geom_bar(stat="identity", fill="#000099")+
  ggtitle("Biden NRC") + theme_economist() +
  geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5) -> biden2_nrc_plot

ggplot(trump_nrc_top_ten, aes(reorder(sentiment, n),n)) + geom_col() + 
  coord_flip() + xlab("Sentiment") + ylab("Count") +
  geom_bar(stat="identity", fill="#8b0000")+
  ggtitle("Trump NRC") + theme_economist() +
  geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5) -> trump2_nrc_plot

grid.arrange(biden2_nrc_plot, trump2_nrc_plot, ncol=1)

NRC Findings

Positive or Negative

Through an NRC sentiment analysis, we can see that Joe Biden spoke words that have a more positive sentiment. When compared to Donald Trump, he spoke more words with a negative sentiment. Again, these findings are based on the NRC lexicon. We can also see that the difference between positive frequency of both candidates is 38 words. This shows a significant increase of positive words for Joe Biden compared to Donald Trump.

Frequent Sentiment of Trust

Again, one must remember that for the NRC lexicon, every word is categorized into the negative and positive sentiment, hence, the outstanding numbers in these categories. However, based on the 8 different emotions: anger, fear, anticipation, trust, surprise, sadness, joy and disgust, ‘trust’ is the most frequent emotion for both candidates in the second debate. We can assume this is because candidates are asking the American people to trust them in what they say they will accomplish in their possible future Presidency.

Bing Lexicon

The last lexicon we can use to examine the sentiment of the candidates during the debate is the Bing Lexicon. This lexicon categorizes words in a binary fashion into positive and negative categories. Similar to the past two lexicons, we will use the ‘inner_join()’ function to join the words with the correct bing sentiments and then use the ‘count()’ function. For this lexicon, I also decided to graph the sentiment over the time of the debate to visualize how the sentiment of the candidates changed as the debates progressed. In order to represent the time of these debates, I added a row number to each word using the ‘mutate()’ function. Then, using ‘count()’ I grouped words so that each column on the graph represents 30 words from the debate transcript. I referenced Tidy Text Mining in order to show a Bing analysis over time.

Second Debate: Bing Sentiment

filtered_biden2 %>% 
  mutate(idNum = row_number(), candidate = "Joe Biden") %>% 
  inner_join(get_sentiments("bing")) %>% 
  count(candidate, index = idNum %/% 30, sentiment) %>% 
  spread(sentiment, n, fill= 0) %>% 
  mutate(method = "Bing") %>% 
  mutate(sentiment = positive - negative) -> biden2_sentiment

ggplot(biden2_sentiment, aes(index, sentiment, fill = method)) + 
  geom_col() + 
  ggtitle(label = "Bing Sentiment Over Second Presidential Debate \n", 
          subtitle = "JOE BIDEN") -> biden2_sentiment_graph

filtered_trump2 %>% 
  mutate(idNum = row_number(), candidate = "Donald Trump") %>% 
  inner_join(get_sentiments("bing")) %>% 
  count(candidate, index = idNum %/% 30, sentiment) %>% 
  spread(sentiment, n, fill= 0) %>% 
  mutate(method = "Bing") %>% 
  mutate(sentiment = positive - negative) -> trump2_sentiment

ggplot(trump2_sentiment, aes(index, sentiment, fill = method)) + 
  geom_col() +
  ggtitle(label = NULL, subtitle = "DONALD TRUMP") -> trump2_sentiment_graph

grid.arrange(biden2_sentiment_graph, trump2_sentiment_graph, ncol = 1)

Second Debate: Bing Sentiment Findings

Both Candidates Show More Negative Sentiment

This bing sentiment analysis, allows us to examine the sentiment of the candidates over the course of the debate time line. Looking at both candidates, we can see that both show a more negative sentiment compared to positive. We can see this in there being more columns, which represent sentiment value, being below the axis a majority of the time for both candidates. This allows us to see that both candidates spoke with more negative sentiment than positive sentiment throughout the second debate.

Extreme Positive Values

For the second debate, we can see that Donald Trump shows two extreme values for a positive sentiment when compared to Joe Biden. Donald Trump shows two extreme sentiment values of 5.0 where Biden shows one extreme sentiment value of 4.0.

Extreme Negative Values

If we were to look at the negative sentiment values, we can see that Joe Biden shows one extreme sentiment value of -6.0 and Trump shows 3 extreme sentiment values of -5.0.

First Debate: Bing Sentiment

filter_bidenwords %>% 
  mutate(idNum = row_number(), candidate = "Joe Biden") %>% 
  inner_join(get_sentiments("bing")) %>% 
  count(candidate, index = idNum %/% 30, sentiment) %>% 
  spread(sentiment, n, fill =0) %>% 
  mutate(method = "Bing") %>% 
  mutate(sentiment = positive - negative) -> biden_sentiment

ggplot(biden_sentiment, aes(index, sentiment, fill = method)) + 
  geom_col() + 
  ggtitle(label = "Bing Sentiment Over First Presidential Debate \n", 
          subtitle = "JOE BIDEN") -> biden_sentiment_graph

filter_trumpwords %>% 
  mutate(idNum = row_number(), candidate = "Donald Trump") %>% 
  inner_join(get_sentiments("bing")) %>% 
  count(candidate, index = idNum %/% 30, sentiment) %>% 
  spread(sentiment, n, fill =0) %>% 
  mutate(method = "Bing") %>% 
  mutate(sentiment = positive - negative) -> trump_sentiment

ggplot(trump_sentiment, aes(index, sentiment, fill = method)) + 
  geom_col() + 
  ggtitle(label = NULL, 
          subtitle = "DONALD TRUMP") -> trump_sentiment_graph

grid.arrange(biden_sentiment_graph, trump_sentiment_graph, ncol =1)

First Debate: Bing Sentiment Findings

Both Candidates Show More Negative Sentiment

Again, similar to the second debate, this bing analysis shows that both candidates were overall more negative in the first debate. Similar to before, we can see this in the graph above since most columns, which represent sentiment, are below the axis and are negative.

Extreme Positive Values

Looking at the sentiment of each candidate over the course of the first debate, we can see Joe Biden has one extreme positive value of 3.5 and Donald Trump shows one extreme positive value of 5.0.

Extreme Negative Values

If we were to look at the negative sentiment values over the course of the first debate, we can see that Joe Biden shows one extreme negative value of -6.0. Then, we can also see that Donald Trump shows 3 negative extreme values of -5.0. According to this analysis, we can draw the conclusion that Donald Trump was more negative in the first debate when compared to Joe Biden.

2nd Debate Sentiment Overview

  filtered_biden2 %>% 
    mutate(idNum = row_number(), candidate = "Joe Biden") %>% 
    inner_join(get_sentiments("afinn")) %>% 
    group_by(index = idNum %/% 30) %>% 
    summarise(sentiment = sum(value)) %>% 
    mutate(method = "AFINN") -> biden2_afinn_overview

  filtered_biden2 %>% 
    mutate(idNum = row_number(), candidate = "Joe Biden") %>% 
    inner_join(get_sentiments("nrc")) %>% 
    filter(sentiment %in% c("negative", "positive")) %>% 
    mutate(method = "NRC") %>% 
    count(method, index = idNum %/% 30, sentiment) %>% 
    spread(sentiment, n, fill = 0) %>% 
    mutate(sentiment = positive - negative) -> biden2_nrc_overview

  filtered_trump2 %>% 
    mutate(idNum = row_number(), candidate = "Donald Trump") %>% 
    inner_join(get_sentiments("afinn")) %>% 
    group_by(index = idNum %/% 30) %>% 
    summarise(sentiment = sum(value)) %>% 
    mutate(method = "AFINN") -> trump2_afinn_overview
  
  filtered_trump2 %>% 
    mutate(idNum = row_number(), candidate = "Donald Trump") %>% 
    inner_join(get_sentiments("nrc")) %>% 
    filter(sentiment %in% c("negative", "positive")) %>% 
    mutate(method = "NRC") %>% 
    count(method, index = idNum %/% 30, sentiment) %>% 
    spread(sentiment, n, fill = 0) %>% 
    mutate(sentiment = positive - negative) -> trump2_nrc_overview

bind_rows(biden2_afinn_overview, biden2_nrc_overview, 
            biden2_sentiment) %>% 
    ggplot(aes(index, sentiment, fill =method)) +
    geom_col() + facet_wrap(~method, ncol = 1 ) + 
    ggtitle(label="Joe Biden 2nd 2020 Presidential Debate")
bind_rows(trump2_afinn_overview, trump2_nrc_overview, 
            trump2_sentiment) %>% 
    ggplot(aes(index, sentiment, fill =method)) +
    geom_col() + facet_wrap(~method, ncol = 1 ) + 
    ggtitle(label="Donald Trump 2nd 2020 Presidential Debate")

2nd Debate Sentiment Overview Findings

Above we can visualize all the three lexicons, Afinn, Bing, and NRC over the course of the 2nd Presidential Debate. Once looking at the three lexicons side by side for each candidate, we can conclude that in the 2nd Presidential Debate, Donald Trump showed a more negative sentiment when compared to Joe Biden. We can draw this conclusion because for all three lexicons, Donald Trump shows a more negative sentiment, which is represented in columns below the axis.

1st Debate Sentiment Overview

filter_bidenwords %>% 
  mutate(idNum = row_number(), candidate = "Joe Biden") %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = idNum %/% 30) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(method = "AFINN") -> biden_afinn_overview

filter_bidenwords %>% 
  mutate(idNum = row_number(), candidate = "Joe Biden") %>% 
  inner_join(get_sentiments("nrc")) %>% 
  filter(sentiment %in% c("negative", "positive")) %>% 
  mutate(method = "NRC") %>% 
  count(method, index = idNum %/% 30, sentiment) %>% 
  spread(sentiment, n, fill =0) %>% 
  mutate(sentiment = positive - negative) -> biden_nrc_overview

filter_trumpwords %>% 
  mutate(idNum = row_number(), candidate = "Donald Trump") %>% 
  inner_join(get_sentiments("afinn")) %>% 
  group_by(index = idNum %/% 30) %>% 
  summarise(sentiment = sum(value)) %>% 
  mutate(method = "AFINN") -> trump_afinn_overview

filter_trumpwords %>% 
  mutate(idNum = row_number(), candidate = "Donald Trump") %>% 
  inner_join(get_sentiments("nrc")) %>% 
  filter(sentiment %in% c("negative", "positive")) %>% 
  mutate(method = "NRC") %>% 
  count(method, index = idNum %/% 30, sentiment) %>% 
  spread(sentiment, n, fill =0) %>% 
  mutate(sentiment = positive - negative) -> trump_nrc_overview

bind_rows(biden_afinn_overview, biden_nrc_overview, 
            biden_sentiment) %>% 
    ggplot(aes(index, sentiment, fill =method)) +
    geom_col() + facet_wrap(~method, ncol = 1 ) + 
    ggtitle(label="Joe Biden 1st 2020 Presidential Debate")
bind_rows(trump_afinn_overview, trump_nrc_overview, 
            trump_sentiment) %>% 
    ggplot(aes(index, sentiment, fill =method)) +
    geom_col() + facet_wrap(~method, ncol = 1 ) + 
    ggtitle(label="Donald Trump 1st 2020 Presidential Debate")
bind_rows(biden2_afinn_overview, biden2_nrc_overview, 
            biden2_sentiment) %>% 
    ggplot(aes(index, sentiment, fill =method)) +
    geom_col() + facet_wrap(~method, ncol = 1 ) + 
    ggtitle(label="Joe Biden 2nd 2020 Presidential Debate")
bind_rows(trump2_afinn_overview, trump2_nrc_overview, 
            trump2_sentiment) %>% 
    ggplot(aes(index, sentiment, fill =method)) +
    geom_col() + facet_wrap(~method, ncol = 1 ) + 
    ggtitle(label="Donald Trump 2nd 2020 Presidential Debate")

1st Debate Sentiment Overview Findings

Now, re-examining the first presidential debate over the three lexicons, we can see that Donald Trump had a stronger negative sentiment when compared to Joe Biden in the first debate. Similar to before, we can make this conclusion since Donald Trump shows more negative sentiment values or columns below the axis.

Conclusion

If we were to revisit the original hypothesis, we can see that it was; “In both Presidential Debates, Donald Trump spoke more when compared to Joe Biden and Joe Biden had a stronger negative sentiment.”. After examining total word counts and distinct word counts for each candidate as well as sentiment analyses, we can disprove and prove certain aspects of the original hypothesis. We can prove that in both debates, Donald Trump spoke more total words when compared to Joe Biden. This means that Donald Trump was able to say more words to the American public over both Presidential Debates. However, after filtering out stop words, we were able to conclude that Donald Trump spoke more filler words in both debates, compared to Joe Biden. Next, through our sentiment analyses, where we utilized three different lexicons, Afinn, Bing and NRC, we were able to conclude that over both debates, Donald Trump spoke with a more negative sentiment. Again, it is important to realize that each lexicon measures sentiment in different ways, hence the use of all three to compile a complete analysis. Overall, it is important to critique and analysis Presidential Debates due to the possibly of either candidate becoming the American President. Due to the intense and critical political climate and pandemic we are faced with in 2020, I found it necessary to complete this in-depth and complete analysis of both Presidential Debates in the 2020 election.

References

[1] https://rpubs.com/paigeminsky/presidential_debate_analysis
[2] https://www.rev.com/blog/transcripts/donald-trump-joe-biden-final-presidential-debate-transcript-2020
[3] https://www.nytimes.com/2020/10/22/us/politics/muted-mics-social-distancing-debate.html
[4] https://www.tidytextmining.com/sentiment.html

Text Analysis of The 2020 Presidential Debates

Paige Minsky

11.18.20

Introduction

Hypothesis

Process

Word Count Comparison

Word Count Findings

Both Candidates Spoke More

Trump Spoke More than Biden in Both Debates

Distinct Word Count

Distinct Word Count Findings

Biden Spoke More Distinct Words

Wordclouds for Each Candidate for Both Debates

Top Ten Words for Each Candidate

Top Ten Most Frequent Word Findings

Popularity of the Word: ‘People’

Popularity of the Word: ‘China’

Common Words Across Debates

Bi-Gram Analysis

Joe Biden Bi-Grams

Joe Biden Bi-Gram Findings

Word: ‘Plan’

Word: ‘American’

Word: ‘People’

Donald Trump Bi-Grams

Donald Trump Bi-Gram Findings

Word: ‘People’

Word: ‘President’

Word: ‘Joe’

Bi-Gram Comparison for the Words: ‘People’ and ‘Plan’

Bi-Gram Comparison Findings

Word: ‘People’

Word: ‘Plan’

Sentiment Analysis

Afinn Lexicon

Afinn Negative Findings

Word: ‘Excuse’

Words: ‘Worry’ and ‘Anxious’

Afinn Positive Findings

Word: ‘United’ and ’Love

Word: ‘Clean’

NRC Lexicon

NRC Findings

Positive or Negative

Frequent Sentiment of Trust

Bing Lexicon

Second Debate: Bing Sentiment

Second Debate: Bing Sentiment Findings

Both Candidates Show More Negative Sentiment

Extreme Positive Values

Extreme Negative Values

First Debate: Bing Sentiment

First Debate: Bing Sentiment Findings

Both Candidates Show More Negative Sentiment

Extreme Positive Values

Extreme Negative Values

2nd Debate Sentiment Overview

2nd Debate Sentiment Overview Findings

1st Debate Sentiment Overview

1st Debate Sentiment Overview Findings

Conclusion

References