This continuation of BIAS AND CONTEXT IN PRESIDENTIAL DEBATE TEXTS, which focused on a “Bag of Words” approach to analyzing the text of Presidential Debates.
This analysis shows a “Heat Map” of frequent words. It is not really a new analysys, but just a better way of visualizing the data. I also
The text of the presidential debates are downloaded from the UCSB Presidency Project. Transcripts were pasted into Apple Pages and stored as unformatted .txt files.
##FILTER TEXT
word.filter <- "terror"
## Start and Stop Word Frequency Rank
n.s <- 1 ## Start
n.w <- 20 ## Number of words
We can check word frequency directly by tokenizing and counting single words. (Note: this is a partial duplication of the work done in the first analysis. But as the word vector analysis below leverages some of the output of this, it’s reproduced here in a slightly different format as a control of quality)
There are a total of 925 words in the combined vocabulary of the candidates.
word | trump | sanders | clinton | rubio | cruz | all |
---|---|---|---|---|---|---|
isis | 0 | 4 | 6 | 7 | 14 | 31 |
terrorism | 1 | 8 | 7 | 1 | 12 | 29 |
terrorists | 0 | 2 | 5 | 6 | 12 | 25 |
radical | 1 | 0 | 1 | 3 | 18 | 23 |
islamic | 1 | 0 | 0 | 0 | 18 | 19 |
need | 0 | 2 | 7 | 1 | 8 | 18 |
will | 0 | 0 | 4 | 1 | 13 | 18 |
think | 0 | 4 | 10 | 0 | 2 | 16 |
people | 2 | 1 | 6 | 5 | 0 | 14 |
now | 0 | 1 | 1 | 5 | 6 | 13 |
going | 0 | 5 | 3 | 2 | 0 | 10 |
international | 0 | 7 | 1 | 0 | 0 | 8 |
got | 0 | 5 | 1 | 0 | 1 | 7 |
issue | 0 | 6 | 0 | 1 | 0 | 7 |
records | 0 | 0 | 0 | 6 | 0 | 6 |
say | 2 | 1 | 2 | 0 | 1 | 6 |
things | 2 | 0 | 0 | 0 | 1 | 3 |
opened | 2 | 0 | 0 | 0 | 0 | 2 |
SUM | 46 | 310 | 402 | 821 | 330 | 1909 |
- Hillary Clinton spoke 402 total words, with a vocabulary of 289 words.
- Bernie Sanders spoke 310 total words, with a vocabulary of 211 words.
- Donald Trump spoke 46 words with a vocabulary of 42 words.
- Ted Cruz spoke 821 words with a vocabulary of 469 words.
- Marco Rubio spoke 330 words with a vocabulary of 218 words.
A “heat map” of frequent words shows several interesting patterns. For instance, all candidates but one use the word “people” with high frequency. Conversely, only one candidate mentions the word “tax” frequently.
Candidate word choices vary from candidate to candidate. Filtering for specific text choices and word counts reveals interesting and potentially explitable patterns.