In 2016, Hillary Clinton and Donald Trump had their first presidential debate. This was a very anticipated event that had America on our toes. There were many predicted topics to arise along with the assumption that this would be a more intense match up. Trump in particular gets agitated right away aside from the assumptions that he would keep his cool during this first debate. I am interested to see how many times Trump tells Hillary she is “wrong,” I also would like to track the amount of times that Trump interrupted Clinton, although this might be impossible. I am expecting to see unifying words from Clinton,and more aggressive words along with mentions of other countries from Trump.
I want to see Trump and Clinton’s most frequent negative words and compare them.
Lastly, I imported the necessary packages to use for this project.
#install.packages('tidytext')
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(tidytext)
library(textdata)
#install.packages('ggplot2')
library(ggplot2)
library(readr)
hillary <- read_delim("~/Downloads/hillary.txt",
delim = "\t", escape_double = FALSE,
col_names = FALSE, trim_ws = TRUE)
## Rows: 142 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): X1
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
trump_2 <- read_delim("~/Downloads/trump-2.txt",
delim = "\t", escape_double = FALSE,
col_names = FALSE, trim_ws = TRUE)
## Rows: 191 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): X1
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
An important part of this data was making sure there were no filler words or other words that have no meaning to the data, so filtering them out was crucial.
trump_2_words <- trump_2 %>%
unnest_tokens(word, X1) %>%
mutate(word = gsub("\u2019", "'", word))
hillary_words <- hillary %>%
unnest_tokens(word, X1) %>%
mutate(word = gsub("\u2019", "'", word))
I wanted to see how many words each candidate spoke during the debate. Clinton said about 6,321 words while Trump said about 8,373 words.This is quite a drastic difference given that each candidate has the same amount of time to get their point across. This could mean Trump used more filler words or their was large amount of interruptions from him.
By filtering out stop words, we are able to see the most frequent important words that each candidate used during the debate. The larger the word on the word cloud, the more frequently it is used.
library(wordcloud2)
trump_2_words %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) %>%
filter(!word %in% c("that's", "it's", "we're")) %>%
head(200) %>%
wordcloud2()
## Joining, by = "word"
After looking at Trump’s word cloud and filtering out filler words, we can see his most frequent words were, “country,” “people,” “Clinton,” “companies,” etc. This makes me wonder if one of Clinton’s most frequent words was “Trump,” or “Donald.”
hillary_words %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) %>%
filter(!word %in% c("that's", "it's", "we're")) %>%
head(200) %>%
wordcloud2()
## Joining, by = "word"
After looking at Clinton’s word cloud and filtering out filler words, we can see her most frequent words were, “country,” “people,” “Donald,” etc. This shows that both of used the same most frequent words.
Next I wanted to see the most frequent negative and positive words they both used.
trump_2_words <- trump_2 %>%
unnest_tokens(word, X1) %>%
mutate(word = gsub("\u2019", "'", word))
trump_2_words %>%
anti_join(stop_words) %>%
inner_join(get_sentiments('afinn'))
## Joining, by = "word"
## Joining, by = "word"
## # A tibble: 352 × 2
## word value
## <chr> <dbl>
## 1 fight -1
## 2 fight -1
## 3 winning 4
## 4 fight -1
## 5 losing -3
## 6 sophisticated 2
## 7 united 1
## 8 care 2
## 9 agree 1
## 10 stop -1
## # … with 342 more rows
We can see from this chart showing Trumps frequency of negative words, his most commonly used word was “bad,” and a close second was “worst,” both used 15 times.
trump_2_words %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) %>%
inner_join(get_sentiments('afinn')) %>%
arrange(desc(n)) %>%
head(20) %>%
ggplot(aes(reorder(word, value), value, fill = value)) + geom_col() +
coord_flip() + ggtitle( "Trump's Negative and Positive Words - Most Frequent") +
xlab("Negative and Positive Words") + ylab("Count")
## Joining, by = "word"
## Joining, by = "word"
I wanted to create a list of the most frequently used words and their sentiment in order to better see the data shown in the graph for Trump.
trump_2_words %>%
select(word) %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) %>%
inner_join(get_sentiments("bing")) %>%
inner_join(get_sentiments("afinn")) %>%
filter(!word %in% c("tahm", "igh")) %>%
head(20) %>%
knitr::kable()
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
| word | n | sentiment | value |
|---|---|---|---|
| bad | 17 | negative | -3 |
| wrong | 11 | negative | -2 |
| losing | 7 | negative | -3 |
| worst | 7 | negative | -3 |
| endorsed | 6 | positive | 2 |
| mess | 6 | negative | -2 |
| debt | 5 | negative | -2 |
| killed | 5 | negative | -3 |
| terrible | 5 | negative | -3 |
| advantage | 4 | positive | 2 |
| disaster | 4 | negative | -2 |
| love | 4 | positive | 3 |
| proud | 4 | positive | 2 |
| strong | 4 | positive | 2 |
| wealthy | 4 | positive | 2 |
| badly | 3 | negative | -3 |
| beautiful | 3 | positive | 3 |
| excuse | 3 | negative | -1 |
| fine | 3 | positive | 2 |
| happy | 3 | positive | 3 |
After looking at Clinton’s more frequently used negative words, her most commonly used negative words were “criminal,” and “worst.” This is interesting to see because after Trump’s tax scandal it makes you wonder if she is calling Trump a criminal throughout this debate.
hillary_words %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) %>%
inner_join(get_sentiments('afinn')) %>%
arrange(desc(n)) %>%
head(20) %>%
ggplot(aes(reorder(word, value), value, fill = value)) + geom_col() +
coord_flip() + ggtitle( "Clinton's Negative and Positive Words - Most Frequent") +
xlab("Negative and Positive Words") + ylab("Count")
## Joining, by = "word"
## Joining, by = "word"
I wanted to create a list of the most frequently used words and their sentiment in order to better see the data shown in the graph for Clinton.
hillary_words %>%
select(word) %>%
anti_join(stop_words) %>%
count(word, sort = TRUE) %>%
inner_join(get_sentiments("bing")) %>%
inner_join(get_sentiments("afinn")) %>%
filter(!word %in% c("tahm", "igh")) %>%
head(20) %>%
knitr::kable()
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
| word | n | sentiment | value |
|---|---|---|---|
| debt | 10 | negative | -2 |
| support | 8 | positive | 2 |
| wealthy | 8 | positive | 2 |
| top | 6 | positive | 2 |
| criminal | 5 | negative | -3 |
| fair | 5 | positive | 2 |
| worst | 5 | negative | -3 |
| benefit | 4 | positive | 2 |
| clean | 4 | positive | 2 |
| hard | 4 | negative | -1 |
| crime | 3 | negative | -3 |
| racist | 3 | negative | -3 |
| recession | 3 | negative | -2 |
| attack | 2 | negative | -1 |
| attacks | 2 | negative | -1 |
| bad | 2 | negative | -3 |
| bias | 2 | negative | -1 |
| collapse | 2 | negative | -2 |
| crisis | 2 | negative | -3 |
| difficult | 2 | negative | -1 |
In conclusion, after reviewing Trump and Clinton’s most frequently used negative words in their first presidential debate, we can see they are very similar.By looking at both of their most frequent negative and positive word graphs,it is apparent that Clinton has a more sentimental and positive outlook and talking points, therefore my assumption about Clinton’s words were correct. My assumption about Trump’s portion was also correct as he brings up war and other countries rather frequently.