This document will do a search of tweets containing the word “weather”. The query is focus on Illinois, corresponding to the state with more reports through the mping app. Common words in those tweets will be identified and showed in a wordcloud.
To start, we will need to import some packages:
library("twitteR") # To get twitter data.
## Loading required package: ROAuth
## Loading required package: RCurl
## Loading required package: bitops
## Loading required package: digest
## Loading required package: rjson
library("tm") # For text Mining
## Loading required package: NLP
library("wordcloud") # To build the wordcloud
## Loading required package: RColorBrewer
library("RColorBrewer") # To get palettes for drawing nice plots.
Also we need to import the authentication files placed in the working directory.
load("twitterauthentication.Rdata")
registerTwitterOAuth(cred)
## [1] TRUE
With the following single line we will get some tweets from Illinois in English containing the word “weather”.
weather<-searchTwitter("weather", n=179, lang="en",since='2014-06-01', until='2014-07-15',geocode="39.739262,-89.504089,50km")
Taking a look to the tweets:
head(weather)
## [[1]]
## [1] "dus_tyBro: Hell of the headache these days stupid weather conditions causing this, its in order to close!! feels as though my minds going to appear"
##
## [[2]]
## [1] "awesomeShortnes: RT @RicKearbey: October-like weather tomorrow that could break 2 records (record cool high and record cool low). Jacket time? :) http://t.…"
##
## [[3]]
## [1] "ILConnected: WICS: Ric Kearbey has been talking about the BIG CHILL of October weather coming in Ju...: Ric Kearbey has bee... http://t.co/2RHb7UMjaB"
##
## [[4]]
## [1] "amuhs: My jackets are still in Maine. :) MT @RicKearbey: Oct-like weather tomorrow could break 2 records. Jacket time?http://t.co/r9NzOqR1yg"
##
## [[5]]
## [1] "wics_abc20: RT @RicKearbey: October-like weather tomorrow that could break 2 records (record cool high and record cool low). Jacket time? :) http://t.…"
##
## [[6]]
## [1] "RicKearbey: October-like weather tomorrow that could break 2 records (record cool high and record cool low). Jacket time? :) http://t.co/Js96hrX8Ei"
In the following section the tweets are stored in a csv file.
weather.df<-do.call(rbind,lapply(weather,as.data.frame))
write.csv(weather.df,"/home/msuarez/Documents/UCSB/2014/Summer/R/weather.csv")
From here on, we will start to play around with the data.
Extracting the text from the tweets in a vector:
weather_list <- sapply(weather, function(x) x$getText())
Constructing the lexical Corpus:
weather_corpus <- Corpus(VectorSource(weather_list))
Constructing the Term Document Matrix and applying some transformations:
tdm = TermDocumentMatrix(weather_corpus,
control = list(removePunctuation = TRUE, stopwords = c("weather", stopwords("english")),
removeNumbers = TRUE, tolower = TRUE, stripWhitespace = TRUE, stemDocument = TRUE))
Defining TermDocumentMatrix as matrix:
m = as.matrix(tdm)
Getting word counts in decreasing order:
word_freqs = sort(rowSums(m), decreasing=TRUE)
Creating a data frame with words and their frequencies:
dm = data.frame(word=names(word_freqs), freq=word_freqs)
Ploting wordcloud:
wordcloud(dm$word, dm$freq, scale=c(4,.3), random.order=FALSE, rot.per=.15, colors=brewer.pal(8, "Dark2"), font = 1, family = "serif")