Getting started with the Twitter API in R

Introduction
Keys and secrets
Retrieving tweets from your timeline
Search tweets about a topic

1. Introduction

REST APIs

The Twitter Platform is made up of a number of APIs and tools, some of which are the REST APIs. If you’re not sure what that means, Wikipedia tells us that:

“RESTful systems typically, but not always, communicate over Hypertext Transfer Protocol (HTTP) with the same HTTP verbs (GET, POST, PUT, DELETE, etc.) that web browsers use to retrieve web pages and to send data to remote servers. REST systems interface with external systems as web resources identified by Uniform Resource Identifiers (URIs), for example /people/tom, which can be operated upon using standard verbs such as DELETE /people/tom.”

Rate limiting

The Twitter APIs are rate-limited, meaning that in every 15 minute window, there is a maximum number of request that you can make with a single command.

For the GET commands, these are either 15 or 180, depending on the individual command. The Search API can be used to query tweets; however, the API documentation notes that relevancy is prioritised over completeness, and so if one wishes to access every tweet with a certain match, a Streaming API may be more appropriate.

2. Keys and secrets

When you sign up to use the Twitter APIs, there are four different values which you need to use to be able to use the API within R:

consumer key
consumer secret
token
token secret.

You can think of these as being akin to a username, password, a user ID for the particular app, and the password for that app to access the Twitter API.

If you wish to share your scripts with others, you should remove these values from your scripts to ensure account security. A simple way to ensure privacy is to use the gWidgets package to prompt you with a dialog box to enter your keys and secrets when you run your script.

library(gWidgets)
options(guiToolkit="tcltk") 
consKey <- ginput("Enter your consumer key:")
consSecret<-ginput("Enter your consumer secret:")
token<-ginput("Enter your token:")
tokenSecret<-ginput("Enter your token secret:")

The httr package is useful for working with URLs and HTTP. We start by taking our keys and secrets and creating objects which will allow us to interact with the API.

library(httr)

## Warning: package 'httr' was built under R version 3.2.4

# start the authorisation process
myapp = oauth_app("twitter", key=consKey, secret=consSecret)

# sign using token and token secret
sig = sign_oauth1.0(myapp, token=token, token_secret=tokenSecret)

3. Retrieving tweets from your timeline

One of the tasks you might wish to do it to retrieve tweets from your timeline.

To accomplish this, use the GET command, supplying to it as parameters the below URL and the variable created earlier to sign.

my_timeline=GET("https://api.twitter.com/1.1/statuses/home_timeline.json", sig)

The URL we input above related to part of the Twitter API. Documentation for this command can be found here.

If you browse the sidebar of this site, you can also see other examples of things you can retrieve using GET.

Examining the content

OK, so we’ve retrieved content from twitter, but what do we do with it now?

We can use the content function to get the JSON data as a structured R objects. As this can be hard to read, we use the jsonlite package to reformat it as a data.frame.

library(jsonlite)

## Warning: package 'jsonlite' was built under R version 3.2.4

json1 = content(my_timeline)
json2 = jsonlite::fromJSON(toJSON(json1))

Let’s take a look at the data. For reference purposes, the first 4 columns are as follows:

created_at - when the tweet was sent
id - ID as a numeric value
id_str - ID as a string
text - the content of the tweet

There are more columns than this, although for the purpose of simplicity, here we will just examine the first 4.

Here are the 3 most recent tweets on my timeline:

json2[1:3,1:4]

##                       created_at           id             id_str
## 1 Thu Apr 07 10:50:11 +0000 2016 7.180281e+17 718028088119001088
## 2 Thu Apr 07 10:49:01 +0000 2016 7.180278e+17 718027797948575744
## 3 Thu Apr 07 10:47:06 +0000 2016 7.180273e+17 718027314550992896
##                                                                                                                                      text
## 1                               Cash is king in business: Make sure you apply these analytics - Forbes #Analytics https://t.co/cPrqzdwGit
## 2 'Package Development in R' workshop at #EARL2016 - open to non-conference goers https://t.co/Yh0MU3ir1B #rstats https://t.co/crjTmcmDKH
## 3                                       This is a great resource for anyone with young children learning to read. https://t.co/MVOFCCz3Cs

Retweets

Let’s have a look at another example. Here I use the API to look at which of my recent tweets have been retweeted by others.

retweets=GET("https://api.twitter.com/1.1/statuses/retweets_of_me.json", sig)
json_rtw = content(retweets)
json_rtw2 = jsonlite::fromJSON(toJSON(json_rtw))
json_rtw2[1:5,4]

## [[1]]
## [1] "6 (OK, 7) #BigData and Analytics Learning Resources That Business People Can Understand - Forbes https://t.co/5Ek9Wpn4Lw"
## 
## [[2]]
## [1] "At a fantastic workshop by @MangoTheCat on using GitHub for R development - really comprehensive coverage of topics :) #rstats #DataScience"
## 
## [[3]]
## [1] "For #BigData, It's 'Show Me The Money' Time - Forbes https://t.co/ieG2b20mfw"
## 
## [[4]]
## [1] "UCAS offers Data Scientist Internships for graduates https://t.co/h1Fn8wJVw7 #DataScience"
## 
## [[5]]
## [1] "Finally braving it and learning proper matrix notation; turns out I already knew a lot &amp; it's not nearly as hard as I expected! #DataScience"

4. Search tweets about a topic

Finally, we may want to take a look at tweets about a certain topic. Keep in mind my previous comments about the difference in results returned by the Search API and the Streaming API.

Let’s have a look at the content of recent tweets about Shiny

shiny_tweets=GET("https://api.twitter.com/1.1/search/tweets.json?q=rshiny", sig)
json_shiny = content(shiny_tweets)
json_shiny2 = jsonlite::fromJSON(toJSON(json_shiny))
statuses<-json_shiny2$statuses
statuses[1:10,4]

## [[1]]
## [1] "@kylehamilton likewise, we are #shiny people!\n#useR2016 \n#rshiny"
## 
## [[2]]
## [1] "Still super excited about my upcoming lightning talk at #user2016 first #shinydevcon and #APS2016 now this! #rshiny #rstats #psychsci"
## 
## [[3]]
## [1] "@airamoigroig looking forward to seeing your talk at #user2016 #rshiny #rstats"
## 
## [[4]]
## [1] "RT @cole_brokamp: new #LinnStrument scale explorer I made using #RShiny https://t.co/FqYd9bbY1p #rstats"
## 
## [[5]]
## [1] "RT @cole_brokamp: new #LinnStrument scale explorer I made using #RShiny https://t.co/FqYd9bbY1p #rstats"
## 
## [[6]]
## [1] "new #LinnStrument scale explorer I made using #RShiny https://t.co/FqYd9bbY1p #rstats"
## 
## [[7]]
## [1] "RT @GlioVis: 12  pediatric datasets are now available on GlioVis, https://t.co/xHLRmUuJDX  \n#rshiny #braintumor https://t.co/PzeqIiK69h"
## 
## [[8]]
## [1] "New Shiny App: Shiny App to Calculate Home Loan EMI (Equated Monthly     https://t.co/V9b9MAJn6X #rshiny https://t.co/pUdFtSv1xy"
## 
## [[9]]
## [1] "New Shiny App: shiny regression toy https://t.co/6t91nRLWpY #rshiny https://t.co/3iG7ih5Xs3"
## 
## [[10]]
## [1] "RT @overfitting_es: overfitting.es makes #DataScience based on  @RDataMining @rstudio @rstudiotips and #Rshiny which gives interactivity to"

Thanks to the Getting and Cleaning Data course by John Hopkins University on Coursera for providing the basis for much of this content.