Introduction

In this course, I would like to import the tweets from twitter using twitteR package. The purpose is to see how people see how people are reacting regarding California’s recent school shooting.

library(tm)
library(dplyr)
library(twitteR)
library(wordcloud)
library(tidyverse)
library(knitr)
library(tidytext)

Loading the tweets through app created in twitter

api_key <- “xxx”
api_secret <- “xxx”

access_token <- “xxx”
access_secret <- “xxx”

setup_twitter_oauth(api_key, api_secret, access_token, access_secret)

## [1] "Using direct authentication"

All the given api keys and access keys were entered to extract the tweets from twitter. For the sake of security, keys are hidden.

Getting user’s tweets

usertweet <- getUser('wesmckinn') # Accessing the user
usertweet
## [1] "wesmckinn"
wesmckinney <- userTimeline(usertweet, n=10) # Getting 10 tweets from user Wes McKinney

wesmckinney_df <- twListToDF(wesmckinney) # Changing the list into dataframe using twitteR's function
wesmckinney_df
##                                                                                                                                           text
## 1                           @minrk You should talk with @KrisztianSzucs  about our Crossbow system for @ApacheArrow wheels, it could be a help
## 2 At this point the Trumpian defense to what happened is like saying it's not murder unless you say "I am murdering y… https://t.co/YTbeMuHMsw
## 3                                                        Note: since I wrote that GitHub now acks code reviews as contributions, which is good
## 4 The next time you consider measuring someone's productivity based on their GitHub contribution calendar, I refer yo… https://t.co/tx5pjwzeup
## 5                                                                                            Reminder to get upgraded! https://t.co/4lo7TcSPm2
## 6                                                                                                        @jonathankennell Clutter is excellent
## 7                                                                                                                @BrianInLaw hm, don't recall!
##   favorited favoriteCount       replyToSN             created truncated
## 1     FALSE             2           minrk 2019-11-14 11:05:34     FALSE
## 2     FALSE            47            <NA> 2019-11-13 17:38:05      TRUE
## 3     FALSE             3       wesmckinn 2019-11-07 16:22:50     FALSE
## 4     FALSE            36            <NA> 2019-11-07 16:22:19      TRUE
## 5     FALSE            16            <NA> 2019-11-06 17:29:44     FALSE
## 6     FALSE             0 jonathankennell 2019-11-06 03:22:29     FALSE
## 7     FALSE             0      BrianInLaw 2019-10-31 17:29:25     FALSE
##            replyToSID                  id replyToUID
## 1 1194911841509621760 1194934391010734080   15423006
## 2                <NA> 1194670783336833025       <NA>
## 3 1192477388036288513 1192477517875220480  115494880
## 4                <NA> 1192477388036288513       <NA>
## 5                <NA> 1192131967208378372       <NA>
## 6 1191912471726034944 1191918749953273857   15580417
## 7 1189866000633810944 1189957562151645184  213019436
##                                                                          statusSource
## 1             <a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>
## 2             <a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>
## 3             <a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>
## 4             <a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>
## 5 <a href="https://about.twitter.com/products/tweetdeck" rel="nofollow">TweetDeck</a>
## 6             <a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>
## 7             <a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>
##   screenName retweetCount isRetweet retweeted longitude latitude
## 1  wesmckinn            0     FALSE     FALSE        NA       NA
## 2  wesmckinn            1     FALSE     FALSE        NA       NA
## 3  wesmckinn            0     FALSE     FALSE        NA       NA
## 4  wesmckinn            7     FALSE     FALSE        NA       NA
## 5  wesmckinn            9     FALSE     FALSE        NA       NA
## 6  wesmckinn            0     FALSE     FALSE        NA       NA
## 7  wesmckinn            0     FALSE     FALSE        NA       NA

Extracting tweets

Using library twitteR package, I extracted the tweets for which I used searchTwitter function which is used to search for the keywords. Furthermore, I limited number of tweets to 1000 with the tweets posted in English language.

# Now let's start extracting tweets regarding impeachment inquiry by using few trending tweets

tweets <- searchTwitter('#schoolshooting, #guncontrol', n=1000, lang = 'en')

# Converting tweets into dataframe

school_shooting <- twListToDF(tweets)
school_shooting_text <- school_shooting$text # This vector contain only tweets

Saving the tweets into local directory

With the free access, I believe I can only access tweets up to 30 days that’s why I had to download the text into local directory. There are two files. First is the main file while second dataset contains only tweets.

# Writing both files into csv 

write.csv(school_shooting, file='shooting_file.csv', row.names=FALSE)
write.csv(school_shooting_text, file='shooting_tweetsonly.csv', row.names=FALSE)
# Reading the csv file from local directory
shooting_tweets <- read.csv('shooting_tweetsonly.csv', row.names=NULL, stringsAsFactors = FALSE)

Data cleaning and text mining

Corpus has been used to clean the data using tm package to remove numbers, punctuations, extra spaces and some commonly used English words. Furthermore, “word” dataset was later converted into matrix to use wordcloud.

words <- Corpus(VectorSource(shooting_tweets$x)) # Saving the tweets in vector 'words' while x is column's name which was given randomly while importing

words <- tm_map(words, tolower)
## Warning in tm_map.SimpleCorpus(words, tolower): transformation drops
## documents
words <- tm_map(words, removeNumbers)
## Warning in tm_map.SimpleCorpus(words, removeNumbers): transformation drops
## documents
words <- tm_map(words, removePunctuation)
## Warning in tm_map.SimpleCorpus(words, removePunctuation): transformation
## drops documents
words <- tm_map(words, stripWhitespace)
## Warning in tm_map.SimpleCorpus(words, stripWhitespace): transformation
## drops documents
words <- tm_map(words, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(words, removeWords, stopwords("english")):
## transformation drops documents
words <- tm_map(words, removeWords, c("will")) # This sentence would be helpful for later to remove any unnecessary words
## Warning in tm_map.SimpleCorpus(words, removeWords, c("will")):
## transformation drops documents
# Now let's build a matrix and dataframe to show the number of words to make wordcloud

tdm <- TermDocumentMatrix((words))
m <- as.matrix(tdm)
v <- sort(rowSums(m), decreasing=TRUE)
d <- data.frame(word= names(v), freq=v)
head(d,20)
##                          word freq
## guncontrol         guncontrol  173
## schoolshooting schoolshooting  150
## another               another   62
## amp                       amp   57
## guncontrolnow   guncontrolnow   52
## gun                       gun   51
## school                 school   45
## santa                   santa   43
## clarita               clarita   41
## nra                       nra   37
## shooting             shooting   36
## today                   today   29
## california         california   27
## enough                 enough   27
## now                       now   25
## yet                       yet   25
## saugus                 saugus   24
## kids                     kids   24
## santaclarita     santaclarita   22
## government         government   22

The above table shows that guncontrol, schoolshooting, another, amp and guncontrolnow are the most frequently used words by users.

Data visualization

For the sake of visualization, we are going to use wordcloud and sentiment analysis graphically to see the data in a better way.

Wordcloud

set.seed(3321)
wordcloud(words=d$word, freq=d$freq, min.freq=10, max.words =200, random.order=FALSE, decreasing= TRUE, rot.per=0.05, colors=brewer.pal(10,"Dark2"))

Above wordcloud shows most frequently used which we have seen already in the above table. It shows the words more clearly.

# Using sentiment analysis to see people's reaction
ss_tdm <- tidy(tdm)
ss_senti <- ss_tdm %>% 
  inner_join(get_sentiments("bing"), by=c(term="word"))

ss_senti %>% 
  count(sentiment, term, wt=count) %>% 
  ungroup() %>% 
  filter(n>= 5) %>% 
  mutate(n= ifelse(sentiment=="negative", -n, n)) %>% 
  mutate(term=reorder(term,n)) %>% 
  ggplot(aes(term, n, fill=sentiment))+ geom_bar(stat="identity")+ylab("Sentiment analysis on school shooting")+coord_flip()

Sentiment analysis was used to see people’s sentiment regarding the school shooting which recently happened in California. It shows that people are positive regarding enough and supporting which is I believe in context of controlling the gun control laws and background check while excuse and sad were used in anger and sadness by the users. People are overall angry about the incident and they want better gun control laws.