Currently the United States is going through an election year with Joe Biden and Donald Trump as Democratic and Republican candidates respectively. I previously conducted a text analysis on the first Presidential Debate that took place on September 29th, 2020. I completed my analysis in early October and was unable to include the second Presidential Debate since it had not occurred yet. With the second Presidential Debate taking place on October 22nd, 2020, I wanted to utilize my first analysis as a stepping stone in order to create one cumulative project that include both debates. Expanding my previous report will allow for a more completed analysis of the sentiment of both candidates and will allow for a comparison between how candidates spoke in both debates. To see my previous analysis of the first 2020 Presidential Debate, click the button below.
In both Presidential Debates, Donald Trump spoke more when compared to Joe Biden and Joe Biden had a stronger negative sentiment.
Similar to the original analysis, I first had to locate a transcript of the second Presidential Debate and transform it into text, that I could use in R-Studio. The transcript of the second debate came from Rev. Once I had the entire debate transcript, I then converted the transcript into a text file where I then cleaned the data and created two separate transcripts, one for each Joe Biden and Donald Trump. It is important to note that I removed the words spoken from the debate moderator, Kristen Welker, since I felt that it was unnecessary for the text analysis of the candidates. Eliminating the words spoke by the moderator allows for the analysis of solely the 2020 Presidential candidates. Once I had both text files for each Joe Biden and Donald Trump, I was then able to import those text files into R Studio and begin my analysis. I also imported the debate transcripts from the first debate so I would be able to compare how each candidate spoke over the two debates.
In order to beginning my analysis, I first needed to install the correct packages.
library(tidyverse)
library(tidytext)
library(ggthemes)
library(wordcloud2)
library(textdata)
library(gridExtra)
library(readr)
library(reshape2)
library(ggplot2)
First, I want to examine the word count for each candidate for the second debate. I first do this by separating the lines of text into individual words where I then can count the number of words spoken by each candidate. To do this, I used ‘unnest_tokens()’ as well as the ‘count()’ function.
Biden_SecondDebate%>%
unnest_tokens(word, X1) %>%
mutate(word = gsub("\u2019", "'", word)) ->biden_words2
count(biden_words2)
| n |
|---|
| 7009 |
Trump_SecondDebate %>%
unnest_tokens(word, X1) %>%
mutate(word = gsub("\u2019", "'", word))->trump_words2
count(trump_words2)
| n |
|---|
| 7808 |
Counting each word the candidates spoke at the debate allows us to say that Donald Trump spoke more words at the second Presidential Debate when compared to Joe Biden. Donald Trump spoke a total of 7,808 words, where Joe Biden spoke 7,009. Trump spoke nearly 800 more words than Joe Biden.
Let’s compare this to the first Presidential Debate and see if the same results occurred.
Expanding on the work from my previous anlaysis of the first debate and expanding the work onto the second debate, we can now compare the two.
Comparing the word count of each candidate for each debate, we can see that both candidates spoke more in the second debate than the first debate. A reason for this can possibly be the change in mic usage. According to the NY Times, “during the first two minutes each candidate speaks in each of the six 15-minute segments, his opponent’s microphone will be muted”.This change in how the debate took place could possibly account for the increase of words both candidates said due to decrease amount of cross over talking. Again, this can not directly be the reason but it is something worth considering.
When comparing both presidential debates, it is also worth noting that Donald Trump had a higher word count when compared to Joe Biden in each debate. However, in the first debate, the gap between word counts was 594 words, where the difference between word counts in the second debate was larger, 799 words. This means that the difference between which candidate spoke more was greater in the second debate. This is an interesting finding due to the muting of the mics.
Now that we have compared overall word counts for each candidate, it is important to filter out filler words or ‘stop words’ since they do not provide too much substantiate for sentiment analysis. Like in the previous project, we can use ‘anti_join()’ to filter out stop words. It is also important to filter out the words ‘crosstalk’ and ‘00’ from both candidates transcripts. Even though there are minor instances of these in the transcript, when compared to the first debate, it is still important to not include them for the overall analysis. These words represent the crosstalk that happened at the debate and are not useful, hence why we can filter them out.
biden_words2 %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) %>%
filter(!word %in% c("crosstalk", "00")) -> filtered_biden2
trump_words2 %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) %>%
filter(!word %in% c("crosstalk", "00")) -> filtered_trump2
Now, if we were to count these filtered words for each candidate, we should get a different number for word count. This is because removing stop words allows us to analysis the number of distinct words each candidate said. This can also be viewed as examining if either candidate repeats multiple words. Like we did before, we can use the ‘count()’ function to examine the number of distinct words each candidate said in the second debate.
n_distinct(filtered_biden2)
| n |
|---|
| 993 |
n_distinct(filtered_trump2)
| n |
|---|
| 864 |
When comparing both debates, we can see that Joe Biden spoke more distinct words when compared to Donald Trump in both debates. Again, one must note that even though Donald Trump spoke more words total, Joe Biden still spoke more distinct words. One can possibly draw the conclusion that Donald Trump repeats words multiple times will speaking, hence the increased total word count but less distinct words overall. Again, this is a theory as to why Trump speaks less distinct words, but is interesting to notice and think about.
Wordclouds are a way to visualize the words both candidates spoke during each debate. The words that have a greater frequency are larger and words that were spoke less, or have a lesser frequency, are smaller in size. Using the ‘wordcloud2()’ package, we can easily create these for each candidate.
colorVec = rep(c('blue'), length.out=nrow(demoFreq))
wordcloud2(filtered_biden2, color = colorVec, fontWeight = "bold",size = 2, minRotation = -pi/6, maxRotation = -pi/6, rotateRatio = 1)
| word | n |
|---|---|
| people | 42 |
| president | 28 |
| china | 22 |
| plan | 16 |
| time | 15 |
| money | 14 |
| american | 13 |
| talking | 13 |
| united | 12 |
| country | 11 |
colorVec = rep(c('red'), length.out=nrow(demoFreq))
wordcloud2(filtered_trump2, color = colorVec, fontWeight = "bold",size = 2, minRotation = -pi/6, maxRotation = -pi/6, rotateRatio = 1)
| word | n |
|---|---|
| people | 47 |
| joe | 31 |
| money | 28 |
| china | 20 |
| president | 20 |
| russia | 20 |
| country | 19 |
| million | 17 |
| lot | 15 |
| ago | 14 |
These wordclouds allow us to examine all the words each candidate said, but we can take a closer look at the top 10 words each candidate said using a bar graph.
Then, using the ‘ggplot()’ function we can visualize the top ten words each candidate said. Again, we can compare this to the first debate to see if the candidates top ten words changed from the first debate.
filter_bidenwords %>%
head(10) %>%
ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() +
theme_economist() + ggtitle(label = "Biden's 10 Most Frequent Words", subtitle = "1st Debate" ) +
xlab("Word") + geom_bar(stat="identity", fill="#000099")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
filter_trumpwords %>%
head(10) %>%
ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() +
theme_economist() + ggtitle("Trump's 10 Most Frequent Words", subtitle = "1st Debate" ) + xlab("Word") + geom_bar(stat="identity", fill="#8b0000")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
filtered_biden2 %>%
head(10) %>%
ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() +
theme_economist() + ggtitle("Biden's 10 Most Frequent Words", subtitle = "2nd Debate" ) + xlab("Word") + geom_bar(stat="identity", fill="#000099")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
filtered_trump2 %>%
head(10) %>%
ggplot(aes(reorder(word,n), n))+ geom_col() + coord_flip() +
theme_economist() + ggtitle("Trump's 10 Most Frequent Words", subtitle = "2nd Debate" ) + xlab("Word") + geom_bar(stat="identity", fill="#8b0000")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
For both candidates in both debates, the word people is the most common word. For a presidential debate, this finding makes sense since both candidates are addressing the American people.
When examining the different words between the first and second debate, it is interesting to see the addition of the word China to both candidates top 10 most frequent word lists. When compared to the first debate, one can see that China was not either of the candidates most frequent words. This would mean that discussion topics regarding China were discussed in the second debate and not in the first debate.
For both candidates, many of their frequent words in the first debate were repeated in the second debate. When examining Biden’s top 10 most frequent words, 4 words appeared in both debates; ‘people’, ‘president’, ‘plan’, and ‘American’. For Donald Trump, 6 words appeared in both debates; ‘people’, ‘joe’, ‘country’, million’, ‘president’, and ‘lot’.
Looking at most popular words said is interesting in itself, but exploring one step forward, we can look at the words surrounding these most popular words to analyze the context in which they were used/said. To do this, we can conduct a bi-grams analysis. Now, we will examine the text in pairs of words. In order to examine the context of common words said by both candidates, I first wanted to combine each debate text into one file so that I could examine what each candidate said over both debates. Once I had a one total text file for each candidate I then could conduct a bi-gram analysis.
To complete this analysis, we will use the ‘unnest_tokens()’ function and the ‘separate()’ function to look at only 2 words from the text at a time. In the code below, it is important to notice that I filtered out the word “crosstalk” since this word was frequently repeated and would skew the results.
Biden_Total_DebateText %>%
unnest_tokens(bigram, X1, token = "ngrams", n=2) %>%
count(bigram, sort = TRUE) %>%
separate(bigram, c("word1", "word2"), sep=" ") %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
filter(!word1 %in% "crosstalk") %>%
filter(!word2 %in% "crosstalk") -> biden_bigram
biden_bigram %>%
filter(word1 == "people" |word2 == "people") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'People'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
biden_bigram %>%
filter(word1 == "president" | word2 == "president") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'President'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
biden_bigram %>%
filter(word1 == "plan" | word2 == "plan") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'Plan'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
biden_bigram %>%
filter(word1 == "american" | word2 == "american") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'American'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
For this bi-gram analysis, I specifically examined common words that were frequent in both debates for each candidate. For example, shown in ‘Common Words Across Debates’, we were able to conclude that 4 of Biden’s top 10 most frequent words that were repeated in each debate were; ‘people’, ‘president’, ‘plan’, and ‘American’. Because of this finding, I thought it would be interesting to analyze what words were said prior and after these common words. Conducting a bi-gram analysis of these words allow us to see in what context Joe Biden was using these words across both debates.
It is interesting to see what words Joe Biden said around the word ‘plan’. This bi-gram analysis allows one to see what types of plans Joe Biden was discussing in both debates. Here we can see that in both debates, Joe Biden discussed the ‘Biden plan’ the most, followed by a ‘socialist plan’, ‘infrastructure plan’, ‘healthcare plan’ and ‘economic plan’. These findings all make sense since these are the kinds of topics and questions that are being asked during Presidential Debates.
Seeing the high frequency Joe Biden says ‘American people’ seems correct due to Joe Biden addressing the American people for each debate.
Looking at the words Joe Biden said surrounding ‘people’, we can assume many of these instances are regarding the current pandemic of Covid-19. The word pair, ‘people died’ is assumed that he is talking about the high number of American people that died during pandemic. We can also assume that the many numbers of people Joe Biden is saying is regarding the many statistics regarding people because of the current pandemic. Some examples could be the number of people in hospitals, number of people who have died, number of people who have lost their job, etc.,.
Now, we will conduct the same bi-grams analysis for Donald Trump. Again, I wanted to include the findings from earlier in the report and use the common words said across both debates. In this case, I examined words around; “people”, “Joe”, “lot”, and “millions”. Again, these four words were findings in the ‘Common Words Across Debates’ section above.
Trump_Total_DebateText %>%
unnest_tokens(bigram, X1, token = "ngrams", n=2) %>%
count(bigram, sort = TRUE) %>%
separate(bigram, c("word1", "word2"), sep=" ") %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
filter(!word1 %in% "crosstalk") %>%
filter(!word2 %in% "crosstalk") -> trump_bigram
trump_bigram %>%
filter(word1 == "people" |word2 == "people") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'People'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
trump_bigram %>%
filter(word1 == "joe" | word2 == "joe") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'Joe'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
trump_bigram %>%
filter(word1 == "president" | word2 == "president") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'lot'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
trump_bigram %>%
filter(word1 == "million" | word2 == "million") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'millions'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
Similar to how Joe Biden used the word ‘people’ across debates, we can theorize that Donald Trump used ‘people’ similarly to how Joe Biden did. For example, we can assume that Donald Trump used the word ‘people’ to generalize how people were affected during this pandemic. For example, the use of “people recover” could possibly be regarding when Donald Trump discusses American people recovering from Covid-19. We can also assume that Donald Trump is talking about Covid-19 when he used the word pair, “people died” across debates.
It is interesting to see the high frequency of the word pairing ‘vice president’ for Donald Trump. We know from his frequency of using the word ‘Joe’ that he tends to frequently say his candidates name. This finding shows us that not only does he use Joe Biden’s first name, he also frequently calls Joe Biden by ‘vice president’ since he was a former Vice President.
In this analysis, we can see how Donald Trump references Joe Biden when he calls him by his first name. It is interesting to see the most frequent word pairing for ‘Joe’ is the word ‘cages’. This can be referred to in the second debate when Kristen, the debate mediator, discusses immigration and how children are separated from their families at the border.
biden_bigram %>%
filter(word1 == "people" |word2 == "people") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'People'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
trump_bigram %>%
filter(word1 == "people" |word2 == "people") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'People'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
biden_bigram %>%
filter(word1 == "plan" | word2 == "plan") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'Plan'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#000099")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
trump_bigram %>%
filter(word1 == "plan" | word2 == "plan") %>%
head(10) %>%
mutate(wordPair = paste(word1, word2, sep=" ")) %>%
ggplot(aes(reorder(wordPair,n), n)) + geom_col() + coord_flip() +
theme_economist() + ggtitle(label = NULL, subtitle ="Word: 'Plan'") +
xlab("Word Pair") + ylab("Frequency/Count") + geom_bar(stat="identity", fill="#8b0000")+
ylab("Count") + geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
Looking at each candidates use of the word ‘people’ and the context in which they said it in, we can see Joe Biden’s increased frequency of using numbers and assumed statistics in both debates. Seeing this contrast, we can theorize that Joe Biden used more statistics regarding the American people when compared to Donald Trump across both debates.
When looking at the bi-gram analysis for each candidate regarding the word ‘plan’, it is interesting to see the differences between both candidates. We can see that Joe Biden uses the word ‘plan’ to discuss the plans he would have for his presidency. When compared to Donald Trump, we can suggest that he does not utilize the word ‘plan’ in the same way as Joe Biden. We can suggest that Donald Trump uses more adjectives to describe plans rather than the types of plan.
When examining text and political speeches in general, it is important to analysis the sentiment of the speakers. A sentiment analysis can be used to determine how positive or negative each candidate spoke. Since there are multiple opinions on the sentiment of a single word, there are three major lexicons or dictionaries we can use to complete a sentiment analysis. Using all three lexicons, ‘Afinn’, ‘NRC’, and ‘Bing’, will allow us to get a well rounded idea of the sentiment of each speaker.
The Afinn Lexicon is utilized to conduct a sentiment analysis since the Afinn lexicon rates words on a range from -7, being the most negative, to +7 being the most positive. On the scale, 0 is symbolized as a neutral sentiment. Using the ‘inner_join()’ and ‘get_sentiments()’ functions, we can calculate the sentiment value for each word that both candidates spoke using the Afinn lexicon.
biden_words2 %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("afinn")) -> biden_afinn2
trump_words2 %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("afinn")) -> trump_afinn2
biden_afinn2 %>%
filter(value < 0) %>%
arrange(desc(-value)) %>%
count(word, sort= TRUE) %>%
head(10) %>%
ggplot(aes(reorder(word,n),n)) + geom_col() + theme_economist() + coord_flip()+
ggtitle(label = "Biden's Most Frequent Negative Words \n Afinn Lexicon") +
xlab("Word") + geom_bar(stat="identity", fill="#000099")+
ylab("Count") + geom_text(aes(label=n),
hjust =1.5,vjust=0, color="white", size=3.5)
trump_afinn2 %>%
filter(value < 0) %>%
arrange(desc(-value)) %>%
count(word, sort= TRUE) %>%
head(10) %>%
ggplot(aes(reorder(word,n),n)) + geom_col() + theme_economist() + coord_flip()+
ggtitle(label = "Trump's Most Frequent Negative Words \n Afinn Lexicon") +
xlab("Word") +geom_bar(stat="identity", fill="#8b0000")+ ylab("Count") +
geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
It is interesting to see the extreme use of the word, ‘excuse’, for Donald Trump. We then can question in what context he is utilizing this word. Is he blaming someone and saying it is an ‘excuse’? Again, interesting to see ‘excuse’ as his most frequent negative word according to the Afinn lexicon.
It is also interesting to see Joe Biden use words regarding negative feelings, such as ‘worry’ and ‘anxious’. Biden could possibly be using these words to describe his feelings for the future and is using them to express his questioning of the future of America.
biden_afinn2 %>%
filter(value > 0) %>%
arrange(desc(-value)) %>%
count(word, sort= TRUE) %>%
head(10) %>%
ggplot(aes(reorder(word,n),n)) + geom_col() + theme_economist() + coord_flip()+
geom_bar(stat="identity", fill="#000099") +
ggtitle(label = "Biden's Most Frequent Positive Words \ Afinn Lexicon") +
xlab("Word") + ylab("Count") +
geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
trump_afinn2 %>%
filter(value > 0) %>%
arrange(desc(-value)) %>%
count(word, sort= TRUE) %>%
head(10) %>%
ggplot(aes(reorder(word,n),n)) + geom_col() + theme_economist() + coord_flip()+
ggtitle(label = "Trump's Most Frequent Positive Words \n Afinn Lexicon") +
xlab("Word") + geom_bar(stat="identity", fill="#8b0000")+ ylab("Count") +
geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5)
Through this analysis, we can see Joe Biden’s most frequent positive word in this lexicon is the word ‘united’. We can assume he is referring to the fact that this country needs to become more united in the future. We can see that Donald Trump does not frequently use the word ‘united’ in the second debate, however, Trump’s most frequent word is ‘love’.
It is interesting to see how the afinn lexicon categorizes the word ‘clean’ as positive. We can assume that Joe Biden uses this word to refer to an environmental plan for the future.
Comparing each candidates most common negative and positive words is important to determine the sentiment of the speakers. Next, we will use two other lexicons to examine sentiment.
The next lexicon we can examine is the NRC Lexicon. This lexicon sorts words based upon 8 different emotions; anger, fear, anticipation, trust, surprise, sadness, joy and disgust. With these 8 different emotions, words are also sorted into negative and positive sentiments. Since each word is categorized as negative or positive as well as some of the 8 different emotions, the count values for the negative and positive sentiment words will always be greater. This is important to remember when examining the results of a NRC sentiment analysis. Similar to the Afinn analysis before, we will use the ‘inner_join()’ function to join the candidates words with the NRC lexicon.
filtered_biden2 %>%
inner_join(get_sentiments("nrc")) -> biden_nrc2
biden_nrc2 %>%
group_by(sentiment) %>%
count(sentiment, sort = TRUE) %>%
head(10) -> biden_nrc_topten
filtered_trump2 %>%
inner_join(get_sentiments("nrc")) -> trump_nrc2
trump_nrc2 %>%
group_by(sentiment) %>%
count(sentiment, sort = TRUE) %>%
head(10) -> trump_nrc_top_ten
ggplot(biden_nrc_topten, aes(reorder(sentiment, n),n)) + geom_col() +
coord_flip() + xlab("Sentiment") + ylab("Count") +
geom_bar(stat="identity", fill="#000099")+
ggtitle("Biden NRC") + theme_economist() +
geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5) -> biden2_nrc_plot
ggplot(trump_nrc_top_ten, aes(reorder(sentiment, n),n)) + geom_col() +
coord_flip() + xlab("Sentiment") + ylab("Count") +
geom_bar(stat="identity", fill="#8b0000")+
ggtitle("Trump NRC") + theme_economist() +
geom_text(aes(label=n), hjust =1.5,vjust=0, color="white", size=3.5) -> trump2_nrc_plot
grid.arrange(biden2_nrc_plot, trump2_nrc_plot, ncol=1)
Through an NRC sentiment analysis, we can see that Joe Biden spoke words that have a more positive sentiment. When compared to Donald Trump, he spoke more words with a negative sentiment. Again, these findings are based on the NRC lexicon. We can also see that the difference between positive frequency of both candidates is 38 words. This shows a significant increase of positive words for Joe Biden compared to Donald Trump.
Again, one must remember that for the NRC lexicon, every word is categorized into the negative and positive sentiment, hence, the outstanding numbers in these categories. However, based on the 8 different emotions: anger, fear, anticipation, trust, surprise, sadness, joy and disgust, ‘trust’ is the most frequent emotion for both candidates in the second debate. We can assume this is because candidates are asking the American people to trust them in what they say they will accomplish in their possible future Presidency.
The last lexicon we can use to examine the sentiment of the candidates during the debate is the Bing Lexicon. This lexicon categorizes words in a binary fashion into positive and negative categories. Similar to the past two lexicons, we will use the ‘inner_join()’ function to join the words with the correct bing sentiments and then use the ‘count()’ function. For this lexicon, I also decided to graph the sentiment over the time of the debate to visualize how the sentiment of the candidates changed as the debates progressed. In order to represent the time of these debates, I added a row number to each word using the ‘mutate()’ function. Then, using ‘count()’ I grouped words so that each column on the graph represents 30 words from the debate transcript. I referenced Tidy Text Mining in order to show a Bing analysis over time.
filtered_biden2 %>%
mutate(idNum = row_number(), candidate = "Joe Biden") %>%
inner_join(get_sentiments("bing")) %>%
count(candidate, index = idNum %/% 30, sentiment) %>%
spread(sentiment, n, fill= 0) %>%
mutate(method = "Bing") %>%
mutate(sentiment = positive - negative) -> biden2_sentiment
ggplot(biden2_sentiment, aes(index, sentiment, fill = method)) +
geom_col() +
ggtitle(label = "Bing Sentiment Over Second Presidential Debate \n",
subtitle = "JOE BIDEN") -> biden2_sentiment_graph
filtered_trump2 %>%
mutate(idNum = row_number(), candidate = "Donald Trump") %>%
inner_join(get_sentiments("bing")) %>%
count(candidate, index = idNum %/% 30, sentiment) %>%
spread(sentiment, n, fill= 0) %>%
mutate(method = "Bing") %>%
mutate(sentiment = positive - negative) -> trump2_sentiment
ggplot(trump2_sentiment, aes(index, sentiment, fill = method)) +
geom_col() +
ggtitle(label = NULL, subtitle = "DONALD TRUMP") -> trump2_sentiment_graph
grid.arrange(biden2_sentiment_graph, trump2_sentiment_graph, ncol = 1)
This bing sentiment analysis, allows us to examine the sentiment of the candidates over the course of the debate time line. Looking at both candidates, we can see that both show a more negative sentiment compared to positive. We can see this in there being more columns, which represent sentiment value, being below the axis a majority of the time for both candidates. This allows us to see that both candidates spoke with more negative sentiment than positive sentiment throughout the second debate.
For the second debate, we can see that Donald Trump shows two extreme values for a positive sentiment when compared to Joe Biden. Donald Trump shows two extreme sentiment values of 5.0 where Biden shows one extreme sentiment value of 4.0.
If we were to look at the negative sentiment values, we can see that Joe Biden shows one extreme sentiment value of -6.0 and Trump shows 3 extreme sentiment values of -5.0.
filter_bidenwords %>%
mutate(idNum = row_number(), candidate = "Joe Biden") %>%
inner_join(get_sentiments("bing")) %>%
count(candidate, index = idNum %/% 30, sentiment) %>%
spread(sentiment, n, fill =0) %>%
mutate(method = "Bing") %>%
mutate(sentiment = positive - negative) -> biden_sentiment
ggplot(biden_sentiment, aes(index, sentiment, fill = method)) +
geom_col() +
ggtitle(label = "Bing Sentiment Over First Presidential Debate \n",
subtitle = "JOE BIDEN") -> biden_sentiment_graph
filter_trumpwords %>%
mutate(idNum = row_number(), candidate = "Donald Trump") %>%
inner_join(get_sentiments("bing")) %>%
count(candidate, index = idNum %/% 30, sentiment) %>%
spread(sentiment, n, fill =0) %>%
mutate(method = "Bing") %>%
mutate(sentiment = positive - negative) -> trump_sentiment
ggplot(trump_sentiment, aes(index, sentiment, fill = method)) +
geom_col() +
ggtitle(label = NULL,
subtitle = "DONALD TRUMP") -> trump_sentiment_graph
grid.arrange(biden_sentiment_graph, trump_sentiment_graph, ncol =1)
Again, similar to the second debate, this bing analysis shows that both candidates were overall more negative in the first debate. Similar to before, we can see this in the graph above since most columns, which represent sentiment, are below the axis and are negative.
Looking at the sentiment of each candidate over the course of the first debate, we can see Joe Biden has one extreme positive value of 3.5 and Donald Trump shows one extreme positive value of 5.0.
If we were to look at the negative sentiment values over the course of the first debate, we can see that Joe Biden shows one extreme negative value of -6.0. Then, we can also see that Donald Trump shows 3 negative extreme values of -5.0. According to this analysis, we can draw the conclusion that Donald Trump was more negative in the first debate when compared to Joe Biden.
filtered_biden2 %>%
mutate(idNum = row_number(), candidate = "Joe Biden") %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index = idNum %/% 30) %>%
summarise(sentiment = sum(value)) %>%
mutate(method = "AFINN") -> biden2_afinn_overview
filtered_biden2 %>%
mutate(idNum = row_number(), candidate = "Joe Biden") %>%
inner_join(get_sentiments("nrc")) %>%
filter(sentiment %in% c("negative", "positive")) %>%
mutate(method = "NRC") %>%
count(method, index = idNum %/% 30, sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative) -> biden2_nrc_overview
filtered_trump2 %>%
mutate(idNum = row_number(), candidate = "Donald Trump") %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index = idNum %/% 30) %>%
summarise(sentiment = sum(value)) %>%
mutate(method = "AFINN") -> trump2_afinn_overview
filtered_trump2 %>%
mutate(idNum = row_number(), candidate = "Donald Trump") %>%
inner_join(get_sentiments("nrc")) %>%
filter(sentiment %in% c("negative", "positive")) %>%
mutate(method = "NRC") %>%
count(method, index = idNum %/% 30, sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative) -> trump2_nrc_overview
bind_rows(biden2_afinn_overview, biden2_nrc_overview,
biden2_sentiment) %>%
ggplot(aes(index, sentiment, fill =method)) +
geom_col() + facet_wrap(~method, ncol = 1 ) +
ggtitle(label="Joe Biden 2nd 2020 Presidential Debate")
bind_rows(trump2_afinn_overview, trump2_nrc_overview,
trump2_sentiment) %>%
ggplot(aes(index, sentiment, fill =method)) +
geom_col() + facet_wrap(~method, ncol = 1 ) +
ggtitle(label="Donald Trump 2nd 2020 Presidential Debate")
Above we can visualize all the three lexicons, Afinn, Bing, and NRC over the course of the 2nd Presidential Debate. Once looking at the three lexicons side by side for each candidate, we can conclude that in the 2nd Presidential Debate, Donald Trump showed a more negative sentiment when compared to Joe Biden. We can draw this conclusion because for all three lexicons, Donald Trump shows a more negative sentiment, which is represented in columns below the axis.
filter_bidenwords %>%
mutate(idNum = row_number(), candidate = "Joe Biden") %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index = idNum %/% 30) %>%
summarise(sentiment = sum(value)) %>%
mutate(method = "AFINN") -> biden_afinn_overview
filter_bidenwords %>%
mutate(idNum = row_number(), candidate = "Joe Biden") %>%
inner_join(get_sentiments("nrc")) %>%
filter(sentiment %in% c("negative", "positive")) %>%
mutate(method = "NRC") %>%
count(method, index = idNum %/% 30, sentiment) %>%
spread(sentiment, n, fill =0) %>%
mutate(sentiment = positive - negative) -> biden_nrc_overview
filter_trumpwords %>%
mutate(idNum = row_number(), candidate = "Donald Trump") %>%
inner_join(get_sentiments("afinn")) %>%
group_by(index = idNum %/% 30) %>%
summarise(sentiment = sum(value)) %>%
mutate(method = "AFINN") -> trump_afinn_overview
filter_trumpwords %>%
mutate(idNum = row_number(), candidate = "Donald Trump") %>%
inner_join(get_sentiments("nrc")) %>%
filter(sentiment %in% c("negative", "positive")) %>%
mutate(method = "NRC") %>%
count(method, index = idNum %/% 30, sentiment) %>%
spread(sentiment, n, fill =0) %>%
mutate(sentiment = positive - negative) -> trump_nrc_overview
bind_rows(biden_afinn_overview, biden_nrc_overview,
biden_sentiment) %>%
ggplot(aes(index, sentiment, fill =method)) +
geom_col() + facet_wrap(~method, ncol = 1 ) +
ggtitle(label="Joe Biden 1st 2020 Presidential Debate")
bind_rows(trump_afinn_overview, trump_nrc_overview,
trump_sentiment) %>%
ggplot(aes(index, sentiment, fill =method)) +
geom_col() + facet_wrap(~method, ncol = 1 ) +
ggtitle(label="Donald Trump 1st 2020 Presidential Debate")
bind_rows(biden2_afinn_overview, biden2_nrc_overview,
biden2_sentiment) %>%
ggplot(aes(index, sentiment, fill =method)) +
geom_col() + facet_wrap(~method, ncol = 1 ) +
ggtitle(label="Joe Biden 2nd 2020 Presidential Debate")
bind_rows(trump2_afinn_overview, trump2_nrc_overview,
trump2_sentiment) %>%
ggplot(aes(index, sentiment, fill =method)) +
geom_col() + facet_wrap(~method, ncol = 1 ) +
ggtitle(label="Donald Trump 2nd 2020 Presidential Debate")
Now, re-examining the first presidential debate over the three lexicons, we can see that Donald Trump had a stronger negative sentiment when compared to Joe Biden in the first debate. Similar to before, we can make this conclusion since Donald Trump shows more negative sentiment values or columns below the axis.
If we were to revisit the original hypothesis, we can see that it was; “In both Presidential Debates, Donald Trump spoke more when compared to Joe Biden and Joe Biden had a stronger negative sentiment.”. After examining total word counts and distinct word counts for each candidate as well as sentiment analyses, we can disprove and prove certain aspects of the original hypothesis. We can prove that in both debates, Donald Trump spoke more total words when compared to Joe Biden. This means that Donald Trump was able to say more words to the American public over both Presidential Debates. However, after filtering out stop words, we were able to conclude that Donald Trump spoke more filler words in both debates, compared to Joe Biden. Next, through our sentiment analyses, where we utilized three different lexicons, Afinn, Bing and NRC, we were able to conclude that over both debates, Donald Trump spoke with a more negative sentiment. Again, it is important to realize that each lexicon measures sentiment in different ways, hence the use of all three to compile a complete analysis. Overall, it is important to critique and analysis Presidential Debates due to the possibly of either candidate becoming the American President. Due to the intense and critical political climate and pandemic we are faced with in 2020, I found it necessary to complete this in-depth and complete analysis of both Presidential Debates in the 2020 election.
[1] https://rpubs.com/paigeminsky/presidential_debate_analysis
[2] https://www.rev.com/blog/transcripts/donald-trump-joe-biden-final-presidential-debate-transcript-2020
[3] https://www.nytimes.com/2020/10/22/us/politics/muted-mics-social-distancing-debate.html
[4] https://www.tidytextmining.com/sentiment.html