Final Project

Introduction: In 2016, Hillary Clinton and Donald Trump had their first presidential debate. This was a very anticipated event that had America on its toes. There were many predicted topics to arise along with the assumption that this would be a more intense match up. It is predicted to see a negative theme throughout the speeches due to the build up before the debate. This debate is being looked at because it was a very important first debate between the candidates and many viewers were shocked after watching it live.

Hypothesis: I am expecting to see unifying and positive words from Clinton, and more negative words along with mentions of other countries from Trump.

Process: First, I had to find a transcript of this first debate and separate Trump and Clinton’s words along with deleting any words from the moderator, Lester Holt, an anchor of NBC Nightly News. Holt played a large role in keeping the conversation going and making sure each candidate had enough time to get their point across, even with interruptions. Next, I turned the transcript into a txt file and launched it in R. This transcript was found on https://www.politico.com/story/2016/09/full-transcript-first-2016-presidential-debate-228761.

Next, I imported the necessary packages to use for this project.

#install.packages('tidytext')
library(tidyverse)
library(tidytext)
library(textdata)
#install.packages('ggplot2')
library(ggplot2)

library(readr)
hillary <- read_delim("~/Downloads/hillary.txt",
                      delim = "\t", escape_double = FALSE,
                      col_names = FALSE, trim_ws = TRUE)

trump_2 <- read_delim("~/Downloads/trump-2.txt",
                      delim = "\t", escape_double = FALSE,
                      col_names = FALSE, trim_ws = TRUE)

An important part of this data was making sure every significant word Hillary and Trump said is analyzed. Using the ‘unnest_tokens’ function makes sure to break sentences into a single word list format.

trump_2_words <- trump_2 %>%
  unnest_tokens(word, X1) %>%
  mutate(word = gsub("\u2019", "'", word))

hillary_words <- hillary %>%
  unnest_tokens(word, X1) %>%
  mutate(word = gsub("\u2019", "'", word))

I wanted to see how many words each candidate spoke during the debate found through the data set variable list. Clinton said about 6,321 words while Trump said about 8,373 words. This is quite a drastic difference given that each candidate has the same amount of time to get their point across.

hillary_words %>%
  count()

## # A tibble: 1 × 1
##       n
##   <int>
## 1  6321

trump_2_words %>%
  count()

## # A tibble: 1 × 1
##       n
##   <int>
## 1  8373

Word cloud’s allow you to see the most frequently used words. By filtering out stop words, we are able to see the most frequent important words that each candidate used during the debate. The larger the word on the word cloud, the more frequently it is used.

library(wordcloud2)
trump_2_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  filter(!word %in% c("that's", "it's", "we're")) %>%
  head(200) %>%
  wordcloud2()

trump_2_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  filter(!word %in% c("that's", "it's", "we're")) %>%
  head(20) %>%
  knitr::kable()

word	n
country	51
people	37
secretary	26
clinton	21
companies	20
jobs	19
bad	17
money	17
bring	15
leaving	15
lot	15
tax	15
time	15
agree	14
deal	14
isis	14
war	14
world	14
stop	13
countries	12

After looking at Trump’s word cloud and filtering out filler words, we can see his most frequent words were, “country,” “secretary,” “Clinton,” “jobs,” etc. When looking at the wordcloud, we can see many other interesting words such as, “isis,” “money,” “Russia,”jobs.” There is also a lot of mention of other countries. This makes me wonder if one of Clinton’s most frequent words was “Trump,” or “Donald,” and if she also brings up other countries as frequently.

hillary_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  filter(!word %in% c("that's", "it's", "we're")) %>%
  head(200) %>%
  wordcloud2()

hillary_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  filter(!word %in% c("that's", "it's", "we're")) %>%
  head(20) %>%
  knitr::kable()

word	n
people	33
donald	27
lot	18
jobs	17
country	16
tax	16
business	15
american	13
economy	13
nuclear	11
police	11
communities	10
debt	10
million	10
deal	9
middle	9
information	8
iran	8
isis	8
plan	8

After looking at Clinton’s word cloud and filtering out filler words, we can see her most frequent words were, “country,” “people,” “American,” “tax,” and “business.” Hillary had a lot less mentions of other countries compared to Trump.

Next I wanted to see the most frequent negative and positive words they both used.

We can see from this chart showing Trumps frequency of negative words, his most commonly used words were “bad” and “wrong.”

trump_2_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  inner_join(get_sentiments('afinn')) %>%
  arrange(desc(n)) %>%
  head(20) %>%
  ggplot(aes(reorder(word, value), value, fill = value)) + geom_col() +
  coord_flip() + ggtitle( "Trump's Negative and Positive Words - Most Frequent") +
  xlab("Negative and Positive Words") + ylab("Count")

I wanted to create a list of the most frequently used words and their sentiment in order to better see the data shown in the graph for Trump.

trump_2_words %>%
  select(word) %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  inner_join(get_sentiments("bing")) %>%
  inner_join(get_sentiments("afinn")) %>%
  filter(!word %in% c("tahm", "igh")) %>%
  head(20) %>%
  knitr::kable()

word	n	sentiment	value
bad	17	negative	-3
wrong	11	negative	-2
losing	7	negative	-3
worst	7	negative	-3
endorsed	6	positive	2
mess	6	negative	-2
debt	5	negative	-2
killed	5	negative	-3
terrible	5	negative	-3
advantage	4	positive	2
disaster	4	negative	-2
love	4	positive	3
proud	4	positive	2
strong	4	positive	2
wealthy	4	positive	2
badly	3	negative	-3
beautiful	3	positive	3
excuse	3	negative	-1
fine	3	positive	2
happy	3	positive	3

After looking at Clinton’s more frequently used negative words, her most commonly used negative words were “crime,” and “criminal.” This is interesting to see because after Trump’s tax scandal it makes you wonder who she is calling a criminal throughout this debate.

hillary_words %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  inner_join(get_sentiments('afinn')) %>%
  arrange(desc(n)) %>%
  head(20) %>%
  ggplot(aes(reorder(word, value), value, fill = value)) + geom_col() +
  coord_flip() + ggtitle( "Clinton's Negative and Positive Words - Most Frequent") +
  xlab("Negative and Positive Words") + ylab("Count")

I wanted to create a list of the most frequently used words and their sentiment in order to better see the data shown in the graph for Clinton.

hillary_words %>%
  # select(word) %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  #  inner_join(get_sentiments("bing")) %>%
  inner_join(get_sentiments("afinn")) %>%
  filter(!word %in% c("tahm", "igh")) %>%
  head(20) %>%
  knitr::kable()

word	n	value
debt	10	-2
support	8	2
wealthy	8	2
justice	7	2
pay	7	-1
gun	6	-1
top	6	2
criminal	5	-3
fair	5	2
hope	5	2
united	5	1
worst	5	-3
benefit	4	2
clean	4	2
growth	4	2
hard	4	-1
crime	3	-3
matter	3	1
opportunities	3	2
prepared	3	1

Keeping count of the negative and positive words is important to get a full understand of the difference in the candidates speeches.

## Joining, by = "word"

## # A tibble: 1 × 1
##       n
##   <int>
## 1   170

## Joining, by = "word"

## # A tibble: 1 × 1
##       n
##   <int>
## 1   236

From this count we can see that Trump said 236 negative words while Hillary said 170. This is a vast difference of sentiment between the candidates.

Lastly, I wanted to compare their sentiment means and see which candidate was mathematically more negative.

trump_2_words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments('afinn')) -> trump_sentiment

mean(trump_sentiment$value)

## [1] -0.3607955

hillary_words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments('afinn')) -> hillary_sentiment

mean(hillary_sentiment$value)

## [1] -0.02302632

After looking at their sentiment means, we can see that Hillary’s is -0.02 while Trump’s is -0.3. This shows that Trump’s speech had a much more negative connotation to it.

In conclusion, after reviewing Trump and Clinton’s most frequently used negative words in their first presidential debate, we can see by looking at both of their most frequent negative and positive word graphs, that Clinton has a more sentimental and positive outlook. My hypothesis was correct. Hillary used more positive and unifying words while Trump used more negative words. Also, after looking at the word cloud and the most frequent words, Trump did mention other countries rather frequently. this is important to look at because from watching the speech you don’t get the true understanding of the candidates sentiments throughout. Without doing this analysis, it could seem that Hillary was just as negative as Trump was throughout.

Final Project

Isabel Graham

2022-12-07