Twitter Analysis

Setup

This is a R markdown document for Twitter Analysis. In this document, we setup twitter credentials, do a connection test, search twitter, gather tweets and perform analysis on them.

Twitter uses Oauth, we will be using “setup_twitter_oauth” function to setup the necessary oauth by passing the key, secret, token information. We can obtain these details by setting up a twitter dev account.

setup_twitter_oauth(key, secret, token, accesstoken)

## [1] "Using direct authentication"

Connection Check

Once the oauth is setup successfully, we can perform a connection test by searching the twitter by asking to return minimun number of tweets.

searchTwitter("#rstats", n =10)

## [[1]]
## [1] "thatdnaguy: RT @JennyBryan: googledrive is a new #rstats package that lets you find, create, delete, download &amp; share files. Joint w/ @LucyStats https:…"
## 
## [[2]]
## [1] "esoteroc: RT @PyData: #Python overtakes #Rstats, becomes the leader in #DataScience and #MachineLearning https://t.co/nAAjHXPNeM"
## 
## [[3]]
## [1] "Tom_Drake1: Great poster! Worth a visit. Stats have never been taught better! #AMEE2017 #rstats https://t.co/0xzQVgz1Xi"
## 
## [[4]]
## [1] "nurmal2017: RT @jenitive_case: Worked on this #rstats #crossstitch over the weekend. Soon to be listed at https://t.co/u5lVh3D8cl! #magrittr #tidyverse…"
## 
## [[5]]
## [1] "xgirouxb: Very little #rstats SO traffic from developing countries, it seems the #rstats world is not as flat as one would ho… https://t.co/pV0D5ORCXG"
## 
## [[6]]
## [1] "pteetor: RT @JennyBryan: googledrive is a new #rstats package that lets you find, create, delete, download &amp; share files. Joint w/ @LucyStats https:…"
## 
## [[7]]
## [1] "millerdl: oh yeah, btw #rstats I wrote @mgcv_changelog a while ago to ping us with mgcv news..."
## 
## [[8]]
## [1] "sfermigier: RT @PyData: #Python overtakes #Rstats, becomes the leader in #DataScience and #MachineLearning https://t.co/nAAjHXPNeM"
## 
## [[9]]
## [1] "thomas_sandmann: \"Learning in leaps and bounds: my 10 favorite data science books\" https://t.co/6cL6Lv98cT on @LinkedIn #rstats"
## 
## [[10]]
## [1] "ajantriks: RT @LucyStats: \xed\xa0\xbd\xed\xb8\x8d Love #rstats?\n❤️ Love Google Drive?\n\xed\xa0\xbd\xed\xb3\xa6 Check out googledrive, brand new &amp; sparkly \u2728 on CRAN! https://t.co/j6LRrPxygP"

Query twitter

We will be using the searchTwitter function earlier to query twitter for tweets about Tesla.

tweets <- searchTwitter("Tesla", since="2011-07-01", lang = "en", n = 1000)

Understanding Structure

Let’s understand the structure of tweets object

head(tweets)

## [[1]]
## [1] "sci_tek: Is Elon Musk Going To Leave Tesla..?\n\nhttps://t.co/bVEppdEZo1\n#Tesla #ElonMusk https://t.co/e465hfnM0C"
## 
## [[2]]
## [1] "JackieMendez: Tesla Just Revealed How Much Its Solar Roof Will Cost https://t.co/RloB2KJD9l"
## 
## [[3]]
## [1] "nachojuarez: RT @KrapelsMarco: Ignore North Korea nukes.  Just watch my Tesla and super cool wind turbine: https://t.co/EwN3MJTgcv"
## 
## [[4]]
## [1] "tarkikturna: RT @Steeler_Stud: Tesla's robo-factory where machines put the ‘wings' onto its latest Model X | Daily Mail Online https://t.co/dMBYrOaZ70"
## 
## [[5]]
## [1] "teslacars1: Solar News: #Tesla Unveils Standard Solar Panels, NREL Report Shows Price Inflation By Large ... https://t.co/ob4yMQ9VQx"
## 
## [[6]]
## [1] "uxstephen: @CoralineAda I think most devs would prefer to play Tesla if we can. But IME business demands put lots of pressure… https://t.co/yVYwHJN6Mc"

class(tweets)

## [1] "list"

length(tweets)

## [1] 1000

Converting tweets to Data Frame and creating text as a character vector and removing graphic content

#tweetsDF <- twListToDF(tweets)
tweets_vector <- sapply(tweets, function(x) x$getText())
tweets_vector <- str_replace_all(tweets_vector,"[^[:graph:]]", " ")

Pre-processing Tweets

Step1 : In this step we will be converting the tweets into a source object

tweets_source <- VectorSource(tweets_vector)

Step2 : Create a corpus object from the source object

tweets_corpus <- VCorpus(tweets_source)

Cleaning Tweets

Step1: Build a function that would do the basic pre-processing steps

preprocess <- function(x) {
  x <- tm_map(x, stripWhitespace)
  x <- tm_map(x, removePunctuation)
  x <- tm_map(x, content_transformer(tolower))
  x <- tm_map(x, removeWords, c(stopwords("en"), "tesla", "will"))
  x <- tm_map(x, content_transformer(removeNumbers))
  x
}

Now lets us use this pre processing function to convert all characters to lower case, remove punctuations, remove numbers and strip white spaces

tweets_corpus <- preprocess(tweets_corpus)

Create a Term Document Matrix where each word is a row of the matrix and every column is the document

tweets_tdm <- TermDocumentMatrix(tweets_corpus)

Convert the TDM into a matrix and compute the word frequencies across each document using row sums

tweets_tdm_m <- as.matrix(tweets_tdm)
freq <- rowSums(tweets_tdm_m)

Create a new data frame with words and their corresponding frequencies

freq_df <- tibble(words = names(freq), value = freq)
freq_df <- freq_df %>% arrange(desc(value))
head(freq_df)

## # A tibble: 6 x 2
##   words value
##   <chr> <dbl>
## 1 model   114
## 2   wsj   112
## 3   new    90
## 4   car    89
## 5   get    89
## 6   amp    74

Create a word cloud using wordcloud2 library

wordcloud2(data = freq_df[1:100, ])

## Warning in if (class(data) == "table") {: the condition has length > 1 and
## only the first element will be used