Retrieving Tweets with R

Eduardo de Andrade Rodrigues

2017-04-09

Introduction

This vignette consists of a brief explanation on how to use R to retrieve tweets using the OAuth API. For further information on this API, please follow https://en.wikipedia.org/wiki/OAuth. For the specific implementation of this API by Twitter, please follow https://dev.twitter.com/oauth.

As an example, this vignette will retrieve the tweets that used the term data science and do a basic data exploration on them. Some insights on how to use Twitter and other social medias will also be briefly presented.

Requirements

To access twits by using R, one has to:

  1. Create a Twitter Account
  2. Create a Twitter Application
  3. Install required R package

Create a Twitter Account

This is a basic step, nevertheless, it is still a requirement. You will use your account to build your Twitter applications. To create a Twitter account, follow https://twitter.com.

Create a Twitter Aplication

The Twitter user will receive a Consumer Key and a Consumer Secret when creating a new application. Furthermore, the user will have to create an Access Token and an Access Token Secret to enable external access to the application. Remember to not share these tokens to untrusted parties when creating applications on your Twitter account.

Install twitteR R package

The function below will install the required package if you have a working internet connection. This call will automatically install dependencies packages.

install.packages("twitteR", repos = 'http://cran.us.r-project.org')

Initializing R Environment

The following packages will be used in this vignette, thus they shall be loaded as below.

library(dplyr)
library(twitteR)
library(knitr)
library(ggplot2)

Setting Credentials

The first thing to do is to set the credential tokens. You will have to copy API key, API secret, token and token secret from the apps Twitter web page where you created your application. To make the code readable, it is always a good idea to use variables to set them instead of writting them directly in the setup_twitter_oauth function call you will see below.

api_key <- "API KEY of the Twitter Application"
api_secret <- "API SECRET of the Twitter Application"
token <- "TOKEN of the Twitter Application"
token_secret <- "TOKEN SECRET of the Twitter Application"

Connecting to Twitter Application by OAuth

Then, you must connect to the Twitter application created by you. The function setup_twitter_oauth() do it as you pass the adequate credentials to access your application.

setup_twitter_oauth(consumer_key = api_key,
                    consumer_secret = api_secret,
                    access_token = token,
                    access_secret = token_secret)

Retrieving Tweets

Now it is already possible to retrieve tweets. The function below performs this operation. We will search for popular tweets that have the term “data science”, from March 01 2017 to April 07 2017.

tweets <- searchTwitter(searchString = "data science", n=100, lang="en", since="2017-03-01", until="2017-04-07" , resultType = "popular")
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 100 tweets were requested but the
## API can only return 22

Note the following characteristics of the arguments:

More details about this function can be found at https://www.rdocumentation.org/packages/twitteR/versions/1.1.9/topics/searchTwitter

Viewing Tweets Variables

The tweets object is a list of twitteR objects. twitteR objects contain much information about a tweet. Most of them are not of interest of this vignette. The function twListToDF(), as used below, transforms the list of twitteR objects into a dataframe that is more understandable and workable. The dataframe generated contains the following variables:

tweetsDF <- twListToDF(tweets)
names(tweetsDF)
##  [1] "text"          "favorited"     "favoriteCount" "replyToSN"    
##  [5] "created"       "truncated"     "replyToSID"    "id"           
##  [9] "replyToUID"    "statusSource"  "screenName"    "retweetCount" 
## [13] "isRetweet"     "retweeted"     "longitude"     "latitude"

Top 20 Barchart

The ranking information can also be shown as a figure. The bar chart below depicts the top 20 retweet authors among the tweets retrieved by our search.

ggplot (head(retweetsByAuthor, 20), aes(reorder(screenName, -retweetCount), retweetCount)) +
  geom_bar(stat = "identity") +
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

Further Thoughts

Social medias are a rich source of information about people’s behavior as a huge amount of individuals publicly expresses their thoughts and feelings about an enormous variety of subjects online. Knowledge can be derived from this “bag” of information by applying data science techniques. The example above could be expanded in such a way that a company could trace popular tweeters to use their tweets to promote its brand or its products Furthermore, by applying text mining techniques, an enterprise could measure how positively or negatively social media users evaluate its brand, actions or products. Thus, social medias are an effervescent means full of information to be worked by data scientists.