This is a R markdown document for Twitter Analysis. In this document, we setup twitter credentials, do a connection test, search twitter, gather tweets and perform analysis on them.
Twitter uses Oauth, we will be using “setup_twitter_oauth” function to setup the necessary oauth by passing the key, secret, token information. We can obtain these details by setting up a twitter dev account.
setup_twitter_oauth(key, secret, token, accesstoken)
## [1] "Using direct authentication"
Once the oauth is setup successfully, we can perform a connection test by searching the twitter by asking to return minimun number of tweets.
searchTwitter("#rstats", n =10)
## [[1]]
## [1] "thatdnaguy: RT @JennyBryan: googledrive is a new #rstats package that lets you find, create, delete, download & share files. Joint w/ @LucyStats https:…"
##
## [[2]]
## [1] "esoteroc: RT @PyData: #Python overtakes #Rstats, becomes the leader in #DataScience and #MachineLearning https://t.co/nAAjHXPNeM"
##
## [[3]]
## [1] "Tom_Drake1: Great poster! Worth a visit. Stats have never been taught better! #AMEE2017 #rstats https://t.co/0xzQVgz1Xi"
##
## [[4]]
## [1] "nurmal2017: RT @jenitive_case: Worked on this #rstats #crossstitch over the weekend. Soon to be listed at https://t.co/u5lVh3D8cl! #magrittr #tidyverse…"
##
## [[5]]
## [1] "xgirouxb: Very little #rstats SO traffic from developing countries, it seems the #rstats world is not as flat as one would ho… https://t.co/pV0D5ORCXG"
##
## [[6]]
## [1] "pteetor: RT @JennyBryan: googledrive is a new #rstats package that lets you find, create, delete, download & share files. Joint w/ @LucyStats https:…"
##
## [[7]]
## [1] "millerdl: oh yeah, btw #rstats I wrote @mgcv_changelog a while ago to ping us with mgcv news..."
##
## [[8]]
## [1] "sfermigier: RT @PyData: #Python overtakes #Rstats, becomes the leader in #DataScience and #MachineLearning https://t.co/nAAjHXPNeM"
##
## [[9]]
## [1] "thomas_sandmann: \"Learning in leaps and bounds: my 10 favorite data science books\" https://t.co/6cL6Lv98cT on @LinkedIn #rstats"
##
## [[10]]
## [1] "ajantriks: RT @LucyStats: \xed\xa0\xbd\xed\xb8\x8d Love #rstats?\n❤️ Love Google Drive?\n\xed\xa0\xbd\xed\xb3\xa6 Check out googledrive, brand new & sparkly \u2728 on CRAN! https://t.co/j6LRrPxygP"
We will be using the searchTwitter function earlier to query twitter for tweets about Tesla.
tweets <- searchTwitter("Tesla", since="2011-07-01", lang = "en", n = 1000)
Let’s understand the structure of tweets object
head(tweets)
## [[1]]
## [1] "sci_tek: Is Elon Musk Going To Leave Tesla..?\n\nhttps://t.co/bVEppdEZo1\n#Tesla #ElonMusk https://t.co/e465hfnM0C"
##
## [[2]]
## [1] "JackieMendez: Tesla Just Revealed How Much Its Solar Roof Will Cost https://t.co/RloB2KJD9l"
##
## [[3]]
## [1] "nachojuarez: RT @KrapelsMarco: Ignore North Korea nukes. Just watch my Tesla and super cool wind turbine: https://t.co/EwN3MJTgcv"
##
## [[4]]
## [1] "tarkikturna: RT @Steeler_Stud: Tesla's robo-factory where machines put the ‘wings' onto its latest Model X | Daily Mail Online https://t.co/dMBYrOaZ70"
##
## [[5]]
## [1] "teslacars1: Solar News: #Tesla Unveils Standard Solar Panels, NREL Report Shows Price Inflation By Large ... https://t.co/ob4yMQ9VQx"
##
## [[6]]
## [1] "uxstephen: @CoralineAda I think most devs would prefer to play Tesla if we can. But IME business demands put lots of pressure… https://t.co/yVYwHJN6Mc"
class(tweets)
## [1] "list"
length(tweets)
## [1] 1000
Converting tweets to Data Frame and creating text as a character vector and removing graphic content
#tweetsDF <- twListToDF(tweets)
tweets_vector <- sapply(tweets, function(x) x$getText())
tweets_vector <- str_replace_all(tweets_vector,"[^[:graph:]]", " ")
Step1 : In this step we will be converting the tweets into a source object
tweets_source <- VectorSource(tweets_vector)
Step2 : Create a corpus object from the source object
tweets_corpus <- VCorpus(tweets_source)
Step1: Build a function that would do the basic pre-processing steps
preprocess <- function(x) {
x <- tm_map(x, stripWhitespace)
x <- tm_map(x, removePunctuation)
x <- tm_map(x, content_transformer(tolower))
x <- tm_map(x, removeWords, c(stopwords("en"), "tesla", "will"))
x <- tm_map(x, content_transformer(removeNumbers))
x
}
Now lets us use this pre processing function to convert all characters to lower case, remove punctuations, remove numbers and strip white spaces
tweets_corpus <- preprocess(tweets_corpus)
Create a Term Document Matrix where each word is a row of the matrix and every column is the document
tweets_tdm <- TermDocumentMatrix(tweets_corpus)
Convert the TDM into a matrix and compute the word frequencies across each document using row sums
tweets_tdm_m <- as.matrix(tweets_tdm)
freq <- rowSums(tweets_tdm_m)
Create a new data frame with words and their corresponding frequencies
freq_df <- tibble(words = names(freq), value = freq)
freq_df <- freq_df %>% arrange(desc(value))
head(freq_df)
## # A tibble: 6 x 2
## words value
## <chr> <dbl>
## 1 model 114
## 2 wsj 112
## 3 new 90
## 4 car 89
## 5 get 89
## 6 amp 74
Create a word cloud using wordcloud2 library
wordcloud2(data = freq_df[1:100, ])
## Warning in if (class(data) == "table") {: the condition has length > 1 and
## only the first element will be used