The Analytics Edge - Visual

Load the data

tweets <- read.csv("tweets.csv", stringsAsFactors=FALSE)

Clean up data. Excluding stemming because it will be easier to red and understand the word cloud if it includes full words.

library(tm)

## Warning: package 'tm' was built under R version 3.1.3

## Loading required package: NLP

## Warning: package 'NLP' was built under R version 3.1.3

library(SnowballC)

## Warning: package 'SnowballC' was built under R version 3.1.3

corpus <- Corpus(VectorSource(tweets$Tweet))
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, PlainTextDocument)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeWords, stopwords("english"))
frequencies <- DocumentTermMatrix(corpus)
allTweets <- as.data.frame(as.matrix(frequencies))

Word Cloud

Because we are plotting a large number of words, you might get warnings that some of the words could not be fit on the page and were therefore not plotted – this is especially likely if you are using a smaller screen. You can address these warnings by plotting the words smaller. From ?wordcloud, we can see that the “scale” parameter controls the sizes of the plotted words. By default, the sizes range from 4 for the most frequent words to 0.5 for the least frequent, as denoted by the parameter “scale=c(4, 0.5)”. We could obtain a much smaller plot with, for instance, parameter “scale=c(2, 0.25)”.

library(wordcloud)

## Warning: package 'wordcloud' was built under R version 3.1.3

## Loading required package: RColorBrewer

wordcloud(colnames(allTweets), colSums(allTweets), scale=c(2, 0.25))

Remove the most frequent word ‘apple’ and regenerate graph

corpus <- Corpus(VectorSource(tweets$Tweet))
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, PlainTextDocument)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeWords, c("apple", stopwords("english")))
frequencies <- DocumentTermMatrix(corpus)
allTweets <- as.data.frame(as.matrix(frequencies))
wordcloud(colnames(allTweets), colSums(allTweets), scale=c(2, 0.25))

Word cloud with negative sentiment only (Avg value -1 or less)

negTweets <- subset(allTweets, tweets$Avg <= -1)
wordcloud(colnames(negTweets), colSums(negTweets), scale=c(2, 0.25))

Show cloud with words that show up at least 10 times and most important words in center and then spread out:

wordcloud(colnames(allTweets), colSums(allTweets), scale=c(2, 0.25),
          min.freq=10, random.order=FALSE)

Make 70% of words to rotate and reduce maximum of words to be plotted to 100

wordcloud(colnames(allTweets), colSums(allTweets), scale=c(2, 0.25),
          rot.per=0.7, max.words=100)

Displaying palette

display.brewer.pal(7, "Greys")

Plotting word cloud using color palette

wordcloud(colnames(allTweets), colSums(allTweets), scale=c(2, 0.25),
          min.freq=10, colors=brewer.pal(9,"Blues"))

Removing the light colors

wordcloud(colnames(allTweets), colSums(allTweets), scale=c(2, 0.25),
          min.freq=10, colors=brewer.pal(9,"Blues")[c(5,6,7,8,9)])

The Analytics Edge - Visual - Tweets

Andy

Saturday, July 18, 2015

Load the data

Word Cloud