Seize the trend

Want to see what’s popular? Twitter is a good place to start with. Twitter is a public platform that alows users to post what they are up to, for recent years it has been increasingly used by organizations and government officers to broadcast news and their opinions, a typical example is the common saying ‘rule by Twitter’ following Donald Trump rode into office. Over the years ‘tweets’ has become a valuable tool for data mining experiment such as sentiment analysis. The TwitteR package is intended to provide access to the Twitter API using R, allowing users to grab interesting subsets of Twitter data for their analyses.

What is included in this Vignette

In this Vignett, a brife introduction on three ways of how to scrape a chunck of tweets will be illurstrated:

  1. Search for at least 3200 tweets of any individual account.
  2. Search for at least 3200 tweets using a hashtag of your choosing.
  3. Search for at least 3200 tweets sent to a certain user.

Getting Twitter access

First, to actually get access to bulk data from Tweeter, we need to create an application at https://apps.twitter.com. To do that, we need to register as a Tweeter developer and fill in some information: Figure 1. Register Tweeter developer Then create an application using the developer account that we just signed up: Figure 2. Create an app Last step of the preparation phase is to get keys and tokens: Figure 3. Keys and tokens Now they tweeter account is all set up and access to twitter data should be granted!

Load packages

There are a few other useful packages than titteR to supplement tweeter data scraping.

library(twitteR)
library(purrr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:twitteR':
## 
##     id, location
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Authentication with OAuth

Utilise the keys and tokens acquired from Tweeter developer account:

consumerKey = "2UBwrhKMuLRvKxBw1dwCfJfs2"  
consumerSecret = "dWnVwl593uB9RyJsCScCRePY6yv6Ogs6lfSqguGOLGcEgiEl3X"
accessToken = "1231894513439756289-miuF5QtVJ3o1sQXHMdTElx7UuldNKk"
accessSecret = "0afZHGFHvGER56QNVxjZK4jSkEgefpmghS79RGf3Pm4Ka"
options(httr_oauth_cache=TRUE)
setup_twitter_oauth(consumer_key = consumerKey, consumer_secret = consumerSecret,
                    access_token = accessToken, access_secret = accessSecret)
## [1] "Using direct authentication"

This process will authenticate via hrrt. The personal codes should stay confidential, so the above codes have been regerated by the time this Vignette is published.

Searching and scraping

The basic workflow of this method is to seach certain tweets and add those tweets to a database, then write the database to a file. ### To create a list of Donald J. Trump’s recent 3200 Tweets:

trumptweets<- userTimeline("realDonaldTrump", n = 3200)

trumptweets_df <- tbl_df(map_df(trumptweets, as.data.frame))

write.csv(trumptweets_df, "trumptweets.csv")

‘trumptweets.csv’ file should be generated in the working directory and ready to be viewed and analysed.

To get feeling and taste of what the output dataframe looks like:

str(read.csv("trumptweets.csv"))
## 'data.frame':    33 obs. of  17 variables:
##  $ X            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ text         : Factor w/ 33 levels ".....Could be as high as 15 Million Barrels. Good (GREAT) news for everyone!",..: 14 9 25 3 32 31 8 18 21 5 ...
##  $ favorited    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ favoriteCount: int  42123 30321 67392 78175 138629 70737 124494 39623 111329 219394 ...
##  $ replyToSN    : Factor w/ 1 level "realDonaldTrump": NA NA NA 1 NA NA NA NA NA NA ...
##  $ created      : Factor w/ 32 levels "2020-04-02 12:57:41",..: 32 31 30 29 28 27 26 25 24 23 ...
##  $ truncated    : logi  FALSE FALSE TRUE TRUE TRUE TRUE ...
##  $ replyToSID   : num  NA NA NA 1.25e+18 NA ...
##  $ id           : num  1.25e+18 1.25e+18 1.25e+18 1.25e+18 1.25e+18 ...
##  $ replyToUID   : int  NA NA NA 25073877 NA NA NA NA NA NA ...
##  $ statusSource : Factor w/ 1 level "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>": 1 1 1 1 1 1 1 1 1 1 ...
##  $ screenName   : Factor w/ 1 level "realDonaldTrump": 1 1 1 1 1 1 1 1 1 1 ...
##  $ retweetCount : int  11957 8503 17854 14016 24598 15692 24361 10140 19821 44219 ...
##  $ isRetweet    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ retweeted    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ longitude    : logi  NA NA NA NA NA NA ...
##  $ latitude     : logi  NA NA NA NA NA NA ...

Althogh some columns are not seemingly useful, there are still some valuable information this data reveals like text, favoriteCount, created, and retweetCount.

To create a list 3200 Tweets containing a certain hashtag:

daylightsavings <- searchTwitter("#daylightsavings exclude:retweets", n=3200)
## Warning in doRppAPICall("search/tweets", n, params = params, retryOnRateLimit =
## retryOnRateLimit, : 3200 tweets were requested but the API can only return 609
daylightsavings_df <- tbl_df(map_df(daylightsavings, as.data.frame))

write.csv(daylightsavings_df, "daylightsavings.csv")

Oops, looks like only a portion or tweets were returned. To dig in a little bit more, get the head and tail of the output dataframe:

tail(read.csv("daylightsavings.csv"))
##       X
## 604 604
## 605 605
## 606 606
## 607 607
## 608 608
## 609 609
##                                                                                                                                             text
## 604              Spring is almost here! Set your clocks forward by one hour with the Rado True Secret. #DaylightSavings… https://t.co/rhHeYxLit0
## 605 As a reminder that the clocks go forward tonight, here’s 81-year-old Evan Davies, a retired watchmaker from the Pon… https://t.co/gGdrJvJQFT
## 606         Don't forget to put your clocks forward this Sunday \U0001f337\U0001f423\n\n#springforward\n#daylightsavings https://t.co/Qe2CiyaeyG
## 607 Say “NO To UK Daylight Saving Time!” Less Sleep Less Health! MORE Accidents, Heart Attacks and Less Immunity from l… https://t.co/PEwFa27kBx
## 608   Daylight savings day is tomorrow, don't forget the clocks go forward tonight! #daylightsavingsday #daylightsavings https://t.co/YDwrpzp0Lp
## 609                               @ST0NEHENGE Bet this is easier than doing my cooker clock!\n\nhttps://t.co/WlfNy2eHRg\n\n#BST #DaylightSavings
##     favorited favoriteCount  replyToSN             created truncated replyToSID
## 604     FALSE            29       <NA> 2020-03-28 16:00:11      TRUE         NA
## 605     FALSE             1       <NA> 2020-03-28 15:55:11      TRUE         NA
## 606     FALSE             5       <NA> 2020-03-28 15:50:02     FALSE         NA
## 607     FALSE             0       <NA> 2020-03-28 15:31:05      TRUE         NA
## 608     FALSE             0       <NA> 2020-03-28 15:00:26     FALSE         NA
## 609     FALSE            42 ST0NEHENGE 2020-03-28 14:45:12     FALSE 1.2438e+18
##               id replyToUID
## 604 1.243931e+18         NA
## 605 1.243930e+18         NA
## 606 1.243928e+18         NA
## 607 1.243924e+18         NA
## 608 1.243916e+18         NA
## 609 1.243912e+18   45553126
##                                                                             statusSource
## 604          <a href="https://prod1.sprinklr.com" rel="nofollow">Sprinklr Publishing</a>
## 605                <a href="https://www.hootsuite.com" rel="nofollow">Hootsuite Inc.</a>
## 606                               <a href="https://buffer.com" rel="nofollow">Buffer</a>
## 607   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 608                               <a href="https://buffer.com" rel="nofollow">Buffer</a>
## 609 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
##          screenName retweetCount isRetweet retweeted longitude latitude
## 604            rado            6     FALSE     FALSE        NA       NA
## 605      Census2021            1     FALSE     FALSE        NA       NA
## 606    BishopBurton            1     FALSE     FALSE        NA       NA
## 607   thesleepcoach            2     FALSE     FALSE        NA       NA
## 608 lemonpressprint            0     FALSE     FALSE        NA       NA
## 609       stuart180            6     FALSE     FALSE        NA       NA
head(read.csv("daylightsavings.csv"))
##   X
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
##                                                                                                                                        text
## 1 Any chance we can save this hour for like Dec 2020 when we can actually use it for anything other than playing scrabble? #daylightsavings
## 2                                                                                                         ห่างกัน3ชมแล้วนะปทท #daylightsavings
## 3                                        Follow up. Dogs now think dinner is late.\n#daylightsavings #dogsoftwitter https://t.co/8at5qsTOel
## 4     Remember to turn back time tonight \u23f0 #DayLightSavings #MoreTimeAtHome #lockdown @ Waitakere, New Zealand https://t.co/HI8oewcU1O
## 5                                                 I keep telling my dog it’s only 4pm, and dinner is at 5pm. As per usual. #daylightsavings
## 6                                                                         Same, same not different #daylightsavings https://t.co/EtbeXsTLFm
##   favorited favoriteCount       replyToSN             created truncated
## 1     FALSE             1            <NA> 2020-04-05 06:38:08     FALSE
## 2     FALSE             0            <NA> 2020-04-05 06:25:08     FALSE
## 3     FALSE             0 pupsnpopculture 2020-04-05 06:11:10     FALSE
## 4     FALSE             1            <NA> 2020-04-05 06:09:48     FALSE
## 5     FALSE             0            <NA> 2020-04-05 06:01:04     FALSE
## 6     FALSE             0            <NA> 2020-04-05 05:59:01     FALSE
##    replyToSID           id   replyToUID
## 1          NA 1.246689e+18           NA
## 2          NA 1.246685e+18           NA
## 3 1.24656e+18 1.246682e+18 8.936359e+17
## 4          NA 1.246681e+18           NA
## 5          NA 1.246679e+18           NA
## 6          NA 1.246679e+18           NA
##                                                                           statusSource
## 1   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 2   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 3 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
## 4                          <a href="http://instagram.com" rel="nofollow">Instagram</a>
## 5   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 6   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
##        screenName retweetCount isRetweet retweeted longitude latitude
## 1   laurathistle9            0     FALSE     FALSE        NA       NA
## 2           nmukp            0     FALSE     FALSE        NA       NA
## 3 pupsnpopculture            0     FALSE     FALSE        NA       NA
## 4 ToniTalijancich            0     FALSE     FALSE  174.5439 -36.8491
## 5       TobyBayer            0     FALSE     FALSE        NA       NA
## 6   adamwesterink            0     FALSE     FALSE        NA       NA

By looking at the post time of the first tweet and last tweet in this dataframe,it suggests that there is a limit of about 1-week tweets are returned due to API reason.

To create a list of tweets sent to a user:

tweetstotrump <- searchTwitter("@realDonaldTrump exclude:retweets", n=3200)

tweetstotrump_df <- tbl_df(map_df(tweetstotrump, as.data.frame))

write.csv(tweetstotrump_df, "tweetstotrump.csv")

What to do next

Using the data scraped from Tweeter, analysis like frequency of word appearance, instances where a word appears, and the time it appears can be done with the data extracted using the above method. By analysing the number of ‘favoriteCount’ and ‘retweetCount’, we can shed some light on social trend and preferences. There are still a lot of more functions that twitteR offer, look up to https://cran.r-project.org/web/packages/twitteR/twitteR.pdf for a more comprehensive help.