Today, the New York Times released their compiled list of Donald Trump’s Twitter Feed, compiled by person the tweet was directed against. Since we have already looked at acceptance speeches here, we thought it might be worthwhile to take 2 minutes of analysis* against this particular dataset. This quicklook only considers his tweets against his opponent, Secretary Hillary Clinton.

There are really three parts to doing text analysis; they are:

  1. Getting the data
  2. Putting the data in a usable format
  3. Making the picture you want to show.

In this very quick analysis, putting the data in a usable format was the challenge. Step by step:

  1. Getting the data. I used the tried-and-true method of ctrl+A and ctrl+C against the subject Website

  2. Putting it in usable format. This was a dance that involved doing analysis in R, some formatting in notepad, and… MS Word (!?!?). That’s right - the raw data had a lot of " marks, which were easily converted to whitespace in Word by using the ‘search and replace’ function. This isn’t elegant, but it was fast and effective.

  3. Making the picture. This was straightforward, and used code that was previously developed for the Acceptance Speeches article, with some minor tweaks.

After uploading the text document using the read_delim function from the readr package, we proceed as follows. Note the copious use of the %<>% operator from the magrittr package. This really streamlines the code where you perform a number of transforms against a fixed dataset.

library(tm); library(SnowballC); library(wordcloud)
library(magrittr)
TC = Corpus(VectorSource(Trump))
TC %<>% tm_map( PlainTextDocument)
TC %<>% tm_map(removePunctuation)
TC %<>% tm_map(removeWords, stopwords('english'))
TC %<>% tm_map(stemDocument)
dtm = TermDocumentMatrix(TC)
m = as.matrix(dtm)
v = sort(rowSums(m), decreasing = TRUE)
d = data.frame(word = names(v), freq = v)
wordcloud(words = d$word, freq = d$freq, min.freq = 3, random.order = FALSE, rot.per = .35, colors = brewer.pal(8, "Dark2")) 

Figure 1: Donald Trump’s Tweets targeting Hillary Clinton. For some reason, they look better in pastel

Having the Document Term Matrix (DTM) allows for other interesting things to be quickly performed as well. For example, the word ‘crook’ is featured prominently in the center of the word cloud. We can quickly examine measures of interest, such as the proportion of times the word ‘Crook’ is used:

 (d$freq[which(d$word == "crook")]/sum(d$freq)) %>% multiply_by(100) %>%  format(digits = 0) %>% paste("%") %>% print() 
[1] "23 %"

Wow. Fully 23% of the non stopwords in this sequence of tweets are ‘Crook’.

In fact, his use of the word “Crook” dwarfs all other words in his tweets, as shown:

barplot(d$freq[1:5], main = "Five most common words", xaxt = "n")
axis(1, labels = d$word[1:5], at = 1:5)

Figure 2: Barplot of the 5 most common words in Donald Trump’s Tweets about Secretary Clinton. Note that ‘presid’ is the stem for “President”, “Presidency” and “Presidential”

*why 2 minutes? Because we used a lot of code from the last Five Minute Analyst.

LS0tDQp0aXRsZTogIldvcmRDbG91ZCBhbmQgUXVpY2sgQW5hbHlzaXM6IERvbmFsZCBUcnVtcHMgdHdlZXRzIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KVG9kYXksIHRoZSBbTmV3IFlvcmsgVGltZXNdKGh0dHA6Ly93d3cubnl0aW1lcy5jb20vaW50ZXJhY3RpdmUvMjAxNi8wMS8yOC91cHNob3QvZG9uYWxkLXRydW1wLXR3aXR0ZXItaW5zdWx0cy5odG1sP3NtaWQ9dHctbnl0aW1lcyZzbXR5cD1jdXImX3I9MCkgcmVsZWFzZWQgdGhlaXIgY29tcGlsZWQgbGlzdCBvZiBEb25hbGQgVHJ1bXAncyBUd2l0dGVyIEZlZWQsIGNvbXBpbGVkIGJ5IHBlcnNvbiB0aGUgdHdlZXQgd2FzIGRpcmVjdGVkIGFnYWluc3QuICBTaW5jZSB3ZSBoYXZlIGFscmVhZHkgbG9va2VkIGF0IGFjY2VwdGFuY2Ugc3BlZWNoZXMgW2hlcmVdKGh0dHA6Ly9hbmFseXRpY3MtbWFnYXppbmUub3JnL2ZpdmUtbWludXRlLWFuYWx5c3QtcHJlc2lkZW50aWFsLWFjY2VwdGFuY2Utc3BlZWNoZXMvKSwgd2UgdGhvdWdodCBpdCBtaWdodCBiZSB3b3J0aHdoaWxlIHRvIHRha2UgMiBtaW51dGVzIG9mIGFuYWx5c2lzKiBhZ2FpbnN0IHRoaXMgcGFydGljdWxhciBkYXRhc2V0LiAgVGhpcyBxdWlja2xvb2sgb25seSBjb25zaWRlcnMgaGlzIHR3ZWV0cyBhZ2FpbnN0IGhpcyBvcHBvbmVudCwgU2VjcmV0YXJ5IEhpbGxhcnkgQ2xpbnRvbi4gIA0KDQpUaGVyZSBhcmUgcmVhbGx5IHRocmVlIHBhcnRzIHRvIGRvaW5nIHRleHQgYW5hbHlzaXM7IHRoZXkgYXJlOg0KDQoxLiAgR2V0dGluZyB0aGUgZGF0YQ0KMi4gIFB1dHRpbmcgdGhlIGRhdGEgaW4gYSB1c2FibGUgZm9ybWF0DQozLiAgTWFraW5nIHRoZSBwaWN0dXJlIHlvdSB3YW50IHRvIHNob3cuICANCg0KSW4gdGhpcyB2ZXJ5IHF1aWNrIGFuYWx5c2lzLCBwdXR0aW5nIHRoZSBkYXRhIGluIGEgdXNhYmxlIGZvcm1hdCB3YXMgdGhlIGNoYWxsZW5nZS4gIFN0ZXAgYnkgc3RlcDoNCg0KMS4gR2V0dGluZyB0aGUgZGF0YS4gIEkgdXNlZCB0aGUgdHJpZWQtYW5kLXRydWUgbWV0aG9kIG9mIGN0cmwrQSBhbmQgY3RybCtDIGFnYWluc3QgdGhlIHN1YmplY3QgV2Vic2l0ZQ0KDQoyLiAgUHV0dGluZyBpdCBpbiB1c2FibGUgZm9ybWF0LiAgVGhpcyB3YXMgYSBkYW5jZSB0aGF0IGludm9sdmVkIGRvaW5nIGFuYWx5c2lzIGluIFIsIHNvbWUgZm9ybWF0dGluZyBpbiBub3RlcGFkLCBhbmQuLi4gTVMgV29yZCAoIT8hPykuICBUaGF0J3MgcmlnaHQgLSB0aGUgcmF3IGRhdGEgaGFkIGEgbG90IG9mICIgbWFya3MsIHdoaWNoIHdlcmUgZWFzaWx5IGNvbnZlcnRlZCB0byB3aGl0ZXNwYWNlIGluIFdvcmQgYnkgdXNpbmcgdGhlICdzZWFyY2ggYW5kIHJlcGxhY2UnIGZ1bmN0aW9uLiAgVGhpcyBpc24ndCBlbGVnYW50LCBidXQgaXQgd2FzIGZhc3QgYW5kIGVmZmVjdGl2ZS4gIA0KDQozLiAgTWFraW5nIHRoZSBwaWN0dXJlLiBUaGlzIHdhcyBzdHJhaWdodGZvcndhcmQsIGFuZCB1c2VkIGNvZGUgdGhhdCB3YXMgcHJldmlvdXNseSBkZXZlbG9wZWQgZm9yIHRoZSBBY2NlcHRhbmNlIFNwZWVjaGVzIGFydGljbGUsIHdpdGggc29tZSBtaW5vciB0d2Vha3MuDQoNCmBgYHtSIFVwbG9hZCwgZWNobz1GQUxTRSwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0NCmxpYnJhcnkocmVhZHIpDQpUcnVtcCA8LSByZWFkX2RlbGltKCJDOi9Vc2Vycy9IYXJyaXNvbi9EZXNrdG9wL1dyaXRpbmcgUHJvamVjdHMvQURTX1RyYW5zaXRpb24vVHJ1bXBXb3Jkcy5jc3YudHh0IiwNCiIgIiwgZXNjYXBlX2RvdWJsZSA9IEZBTFNFLCBjb2xfbmFtZXMgPSBGQUxTRSwNCmxvY2FsZSA9IGxvY2FsZShlbmNvZGluZyA9ICJBU0NJSSIpLA0KdHJpbV93cyA9IFRSVUUpDQoNCmBgYA0KDQpBZnRlciB1cGxvYWRpbmcgdGhlIHRleHQgZG9jdW1lbnQgdXNpbmcgdGhlICoqcmVhZF9kZWxpbSoqIGZ1bmN0aW9uIGZyb20gdGhlICpyZWFkciogcGFja2FnZSwgd2UgcHJvY2VlZCBhcyBmb2xsb3dzLiAgTm90ZSB0aGUgY29waW91cyB1c2Ugb2YgdGhlICU8PiUgb3BlcmF0b3IgZnJvbSB0aGUgbWFncml0dHIgcGFja2FnZS4gIFRoaXMgcmVhbGx5IHN0cmVhbWxpbmVzIHRoZSBjb2RlIHdoZXJlIHlvdSBwZXJmb3JtIGEgbnVtYmVyIG9mIHRyYW5zZm9ybXMgYWdhaW5zdCBhIGZpeGVkIGRhdGFzZXQuICANCg0KYGBge1IgV29yZENsb3VkLCBlY2hvPVRSVUUsIGZpZy5oZWlnaHQ9NCwgZmlnLndpZHRoPTYsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQpsaWJyYXJ5KHRtKTsgbGlicmFyeShTbm93YmFsbEMpOyBsaWJyYXJ5KHdvcmRjbG91ZCkNCmxpYnJhcnkobWFncml0dHIpDQpUQyA9IENvcnB1cyhWZWN0b3JTb3VyY2UoVHJ1bXApKQ0KVEMgJTw+JSB0bV9tYXAoUGxhaW5UZXh0RG9jdW1lbnQpDQpUQyAlPD4lIHRtX21hcChyZW1vdmVQdW5jdHVhdGlvbikNClRDICU8PiUgdG1fbWFwKHJlbW92ZVdvcmRzLCBzdG9wd29yZHMoJ2VuZ2xpc2gnKSkNClRDICU8PiUgdG1fbWFwKHN0ZW1Eb2N1bWVudCkNCmR0bSA9IFRlcm1Eb2N1bWVudE1hdHJpeChUQykNCm0gPSBhcy5tYXRyaXgoZHRtKQ0KdiA9IHNvcnQocm93U3VtcyhtKSwgZGVjcmVhc2luZyA9IFRSVUUpDQpkID0gZGF0YS5mcmFtZSh3b3JkID0gbmFtZXModiksIGZyZXEgPSB2KQ0Kd29yZGNsb3VkKHdvcmRzID0gZCR3b3JkLCBmcmVxID0gZCRmcmVxLCBtaW4uZnJlcSA9IDMsIHJhbmRvbS5vcmRlciA9IEZBTFNFLCByb3QucGVyID0gLjM1LCBjb2xvcnMgPSBicmV3ZXIucGFsKDgsICJEYXJrMiIpKSANCmBgYA0KDQoNCioqRmlndXJlIDE6ICBEb25hbGQgVHJ1bXAncyBUd2VldHMgdGFyZ2V0aW5nIEhpbGxhcnkgQ2xpbnRvbi4gIEZvciBzb21lIHJlYXNvbiwgdGhleSBsb29rIGJldHRlciBpbiBwYXN0ZWwqKiANCg0KSGF2aW5nIHRoZSBEb2N1bWVudCBUZXJtIE1hdHJpeCAoRFRNKSBhbGxvd3MgZm9yIG90aGVyIGludGVyZXN0aW5nIHRoaW5ncyB0byBiZSBxdWlja2x5IHBlcmZvcm1lZCBhcyB3ZWxsLiAgRm9yIGV4YW1wbGUsIHRoZSB3b3JkICdjcm9vaycgaXMgZmVhdHVyZWQgcHJvbWluZW50bHkgaW4gdGhlIGNlbnRlciBvZiB0aGUgd29yZCBjbG91ZC4gIFdlIGNhbiBxdWlja2x5IGV4YW1pbmUgbWVhc3VyZXMgb2YgaW50ZXJlc3QsIHN1Y2ggYXMgdGhlIHByb3BvcnRpb24gb2YgdGltZXMgdGhlIHdvcmQgJ0Nyb29rJyBpcyB1c2VkOg0KDQpgYGB7UiBjcm9vaywgZWNobyA9IFRSVUV9DQogKGQkZnJlcVt3aGljaChkJHdvcmQgPT0gImNyb29rIildL3N1bShkJGZyZXEpKSAlPiUgbXVsdGlwbHlfYnkoMTAwKSAlPiUgIGZvcm1hdChkaWdpdHMgPSAwKSAlPiUgcGFzdGUoIiUiKSAlPiUgcHJpbnQoKSANCg0KDQpgYGANCg0KV293LiAgRnVsbHkgMjMlIG9mIHRoZSBub24gc3RvcHdvcmRzIGluIHRoaXMgc2VxdWVuY2Ugb2YgdHdlZXRzIGFyZSAnQ3Jvb2snLg0KDQpJbiBmYWN0LCBoaXMgdXNlIG9mIHRoZSB3b3JkICJDcm9vayIgZHdhcmZzIGFsbCBvdGhlciB3b3JkcyBpbiBoaXMgdHdlZXRzLCBhcyBzaG93bjoNCg0KYGBge3Igd29yZEJhciwgZWNobyA9IFRSVUV9DQpiYXJwbG90KGQkZnJlcVsxOjVdLCBtYWluID0gIkZpdmUgbW9zdCBjb21tb24gd29yZHMiLCB4YXh0ID0gIm4iKQ0KYXhpcygxLCBsYWJlbHMgPSBkJHdvcmRbMTo1XSwgYXQgPSAxOjUpDQoNCmBgYA0KKipGaWd1cmUgMjogIEJhcnBsb3Qgb2YgdGhlIDUgbW9zdCBjb21tb24gd29yZHMgaW4gRG9uYWxkIFRydW1wJ3MgVHdlZXRzIGFib3V0IFNlY3JldGFyeSBDbGludG9uLiAgTm90ZSB0aGF0ICdwcmVzaWQnIGlzIHRoZSBzdGVtIGZvciAiUHJlc2lkZW50IiwgIlByZXNpZGVuY3kiIGFuZCAiUHJlc2lkZW50aWFsIioqDQoNCip3aHkgMiBtaW51dGVzPyAgQmVjYXVzZSB3ZSB1c2VkIGEgbG90IG9mIGNvZGUgZnJvbSB0aGUgbGFzdCBGaXZlIE1pbnV0ZSBBbmFseXN0LiAgDQo=