Introduction: In 2016, Hillary Clinton and Donald Trump had their first presidential debate. This was a very anticipated event that had America on its toes. There were many predicted topics to arise along with the assumption that this would be a more intense match up. It is predicted to see a negative theme throughout the speeches due to the build up before the debate. This debate is being looked at because it was a very important first debate between the candidates and many viewers were shocked after watching it live.

Hypothesis: I am expecting to see unifying and positive words from Clinton, and more negative words along with mentions of other countries from Trump.

Process: First, I had to find a transcript of this first debate and separate Trump and Clinton’s words along with deleting any words from the moderator, Lester Holt, an anchor of NBC Nightly News. Holt played a large role in keeping the conversation going and making sure each candidate had enough time to get their point across, even with interruptions. Next, I turned the transcript into a txt file and launched it in R. This transcript was found on https://www.politico.com/story/2016/09/full-transcript-first-2016-presidential-debate-228761.

Next, I imported the necessary packages to use for this project.

#install.packages('tidytext')
library(tidyverse)
library(tidytext)
library(textdata)
#install.packages('ggplot2')
library(ggplot2)

library(readr)
hillary <- read_delim("~/Downloads/hillary.txt",
                      delim = "\t", escape_double = FALSE,
                      col_names = FALSE, trim_ws = TRUE)

trump_2 <- read_delim("~/Downloads/trump-2.txt",
                      delim = "\t", escape_double = FALSE,
                      col_names = FALSE, trim_ws = TRUE)

An important part of this data was making sure every significant word Hillary and Trump said is analyzed. Using the ‘unnest_tokens’ function makes sure to break sentences into a single word list format.

trump_2_words <- trump_2 %>%
  unnest_tokens(word, X1) %>%
  mutate(word = gsub("\u2019", "'", word))

hillary_words <- hillary %>%
  unnest_tokens(word, X1) %>%
  mutate(word = gsub("\u2019", "'", word)) 

I wanted to see how many words each candidate spoke during the debate found through the data set variable list. Clinton said about 6,321 words while Trump said about 8,373 words. This is quite a drastic difference given that each candidate has the same amount of time to get their point across.

hillary_words %>%
  count() 
## # A tibble: 1 × 1
##       n
##   <int>
## 1  6321
trump_2_words %>%
  count()
## # A tibble: 1 × 1
##       n
##   <int>
## 1  8373

Word cloud’s allow you to see the most frequently used words. By filtering out stop words, we are able to see the most frequent important words that each candidate used during the debate. The larger the word on the word cloud, the more frequently it is used.

library(wordcloud2)
trump_2_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  filter(!word %in% c("that's", "it's", "we're")) %>%
  head(200) %>%
  wordcloud2()
trump_2_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  filter(!word %in% c("that's", "it's", "we're")) %>%
  head(20) %>%
  knitr::kable()
word n
country 51
people 37
secretary 26
clinton 21
companies 20
jobs 19
bad 17
money 17
bring 15
leaving 15
lot 15
tax 15
time 15
agree 14
deal 14
isis 14
war 14
world 14
stop 13
countries 12

After looking at Trump’s word cloud and filtering out filler words, we can see his most frequent words were, “country,” “secretary,” “Clinton,” “jobs,” etc. When looking at the wordcloud, we can see many other interesting words such as, “isis,” “money,” “Russia,”jobs.” There is also a lot of mention of other countries. This makes me wonder if one of Clinton’s most frequent words was “Trump,” or “Donald,” and if she also brings up other countries as frequently.

hillary_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  filter(!word %in% c("that's", "it's", "we're")) %>%
  head(200) %>%
  wordcloud2()
hillary_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  filter(!word %in% c("that's", "it's", "we're")) %>%
  head(20) %>%
  knitr::kable()
word n
people 33
donald 27
lot 18
jobs 17
country 16
tax 16
business 15
american 13
economy 13
nuclear 11
police 11
communities 10
debt 10
million 10
deal 9
middle 9
information 8
iran 8
isis 8
plan 8

After looking at Clinton’s word cloud and filtering out filler words, we can see her most frequent words were, “country,” “people,” “American,” “tax,” and “business.” Hillary had a lot less mentions of other countries compared to Trump.

Next I wanted to see the most frequent negative and positive words they both used.

We can see from this chart showing Trumps frequency of negative words, his most commonly used words were “bad” and “wrong.”

trump_2_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  inner_join(get_sentiments('afinn')) %>%
  arrange(desc(n)) %>%
  head(20) %>%
  ggplot(aes(reorder(word, value), value, fill = value)) + geom_col() +
  coord_flip() + ggtitle( "Trump's Negative and Positive Words - Most Frequent") +
  xlab("Negative and Positive Words") + ylab("Count")

I wanted to create a list of the most frequently used words and their sentiment in order to better see the data shown in the graph for Trump.

trump_2_words %>%
  select(word) %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  inner_join(get_sentiments("bing")) %>%
  inner_join(get_sentiments("afinn")) %>%
  filter(!word %in% c("tahm", "igh")) %>%
  head(20) %>%
  knitr::kable()
word n sentiment value
bad 17 negative -3
wrong 11 negative -2
losing 7 negative -3
worst 7 negative -3
endorsed 6 positive 2
mess 6 negative -2
debt 5 negative -2
killed 5 negative -3
terrible 5 negative -3
advantage 4 positive 2
disaster 4 negative -2
love 4 positive 3
proud 4 positive 2
strong 4 positive 2
wealthy 4 positive 2
badly 3 negative -3
beautiful 3 positive 3
excuse 3 negative -1
fine 3 positive 2
happy 3 positive 3

After looking at Clinton’s more frequently used negative words, her most commonly used negative words were “crime,” and “criminal.” This is interesting to see because after Trump’s tax scandal it makes you wonder who she is calling a criminal throughout this debate.

hillary_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  inner_join(get_sentiments('afinn')) %>%
  arrange(desc(n)) %>%
  head(20) %>%
  ggplot(aes(reorder(word, value), value, fill = value)) + geom_col() +
  coord_flip() + ggtitle( "Clinton's Negative and Positive Words - Most Frequent") +
  xlab("Negative and Positive Words") + ylab("Count")

I wanted to create a list of the most frequently used words and their sentiment in order to better see the data shown in the graph for Clinton.

hillary_words %>%
  # select(word) %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  #  inner_join(get_sentiments("bing")) %>%
  inner_join(get_sentiments("afinn")) %>%
  filter(!word %in% c("tahm", "igh")) %>%
  head(20) %>%
  knitr::kable()
word n value
debt 10 -2
support 8 2
wealthy 8 2
justice 7 2
pay 7 -1
gun 6 -1
top 6 2
criminal 5 -3
fair 5 2
hope 5 2
united 5 1
worst 5 -3
benefit 4 2
clean 4 2
growth 4 2
hard 4 -1
crime 3 -3
matter 3 1
opportunities 3 2
prepared 3 1

Keeping count of the negative and positive words is important to get a full understand of the difference in the candidates speeches.

## Joining, by = "word"
## # A tibble: 1 × 1
##       n
##   <int>
## 1   170
## Joining, by = "word"
## # A tibble: 1 × 1
##       n
##   <int>
## 1   236

From this count we can see that Trump said 236 negative words while Hillary said 170. This is a vast difference of sentiment between the candidates.

Lastly, I wanted to compare their sentiment means and see which candidate was mathematically more negative.

trump_2_words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments('afinn')) -> trump_sentiment

mean(trump_sentiment$value)
## [1] -0.3607955
hillary_words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments('afinn')) -> hillary_sentiment

mean(hillary_sentiment$value)
## [1] -0.02302632

After looking at their sentiment means, we can see that Hillary’s is -0.02 while Trump’s is -0.3. This shows that Trump’s speech had a much more negative connotation to it.

In conclusion, after reviewing Trump and Clinton’s most frequently used negative words in their first presidential debate, we can see by looking at both of their most frequent negative and positive word graphs, that Clinton has a more sentimental and positive outlook. My hypothesis was correct. Hillary used more positive and unifying words while Trump used more negative words. Also, after looking at the word cloud and the most frequent words, Trump did mention other countries rather frequently. this is important to look at because from watching the speech you don’t get the true understanding of the candidates sentiments throughout. Without doing this analysis, it could seem that Hillary was just as negative as Trump was throughout.