Perception is influenced, in part, by how frequently we hear about an issue and also the context in which we hear it. This little study is a experiment to see if systematic differences in language reflecting political priorities and biases can be detected. Here, using standard NLP (Natural Lanuguage Processing) techniques, I explore this question looking for differences in the texts from recent Republican and Democratic presidential debates. Key findings are:
1. “wordcloud” visualization reveals stylistic differences between candidates but no clarity on specific postiions.
2. Word-frequencies of selected “key-words” suggest positions differences. A z-statistic and a coefficient of variance can be used to highlight signficant differences between candidates.
3. Initial results for bigram tokenization reveal differences some differences in key-word context.


The text of the presidential debates are downloaded from the UCSB Presidency Project. Transcripts were pasted into Apple Pages and stored as unformatted .txt files. From that point all processing is done with R using capabilities of {tm} and associated libraries.


Wordclouds are a quick and visually apprealing method to compare texts. The {wordcloud} package in R is used here. Not surprisingly, word choices vary between candidates. However, there are also some striking and surprising similarities.

Let’s first compare the word clouds of candidates using the {wordcloud} package.


Bernie’s word cloud is larger than Donald’s, due to having spoken more total words. (There were three major candidates at the Democratic debate and ten at Republican). What I find most surprising is the similarity of the clouds; words like “people”, “country”, and “going” are common to both. Despite strong differences in policy, word clouds reveal little about them.