Scraping the Web

Assignment 6 - Using the Twitter API in R

Coen Dekker, Dirrik Emmen & Sam Verkoelen

17 Dec 2014

Learning Goals

  • Show how to install the 'twitteR' package, including the authorization of the Twitter API.
  • Providing a quick step-by-step tutorial how to get data from Twitter to use in R.
  • Explain how this can be visualized in a heatmap.

Case study: Tweets about the week

As an example for this lecture we will be using a researh question:

  • Which day is tweeted about the most each day of the week?

From here on we will look step-by-step how to answer this research question.

Step 1: Setting up Twitter API 1/3

To receive data from Twitter we first need authorization. We do this by logging on to Twitter and going to the following page to create a new application: https://apps.twitter.com. Fill in the required fields.

Step 1: Setting up Twitter API 2/3

Tick the box…

Step 1: Setting up Twitter API 3/3

Click on 'Keys and Access Tokens' to view your Consumer Key and Consumer Secret.

We will be needing the keys in Step 2.

Step 2: Authorization of 'twitteR' 1/2

Getting the 'twitteR' package

install.packages("twitteR"); library(twitteR);

Now we ask for authorization using the OAuthFactory$new() function by entering the keys from Step 1.

reqURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "https://api.twitter.com/oauth/authorize"
consumerKey <- "gmNtSEIVuZAYxd4v8LZMfmHiU"
consumerSecret <- "jO68bIJLuRcozfe686lz1VAgcooM5IqA72VXR9fLptsGTBDdLD"
twitCred <- OAuthFactory$new(consumerKey=consumerKey,
                             consumerSecret=consumerSecret,
                             requestURL=reqURL,
                             accessURL=accessURL,
                             authURL=authURL)

Step 2: Authorization of 'twitteR' 2/2

Next we will ask Twitter for authorization using our current data.

twitCred$handshake()

This will open a webpage, this is where we finalize the authorization. Click 'Authorize app'. This will give us a PIN code. Enter this PIN in the console…

Lastly we check if the authorization is successful by using the following code.

registerTwitterOAuth(twitCred)

If this returns TRUE, it means that the 'twitteR' package is ready to be used!

Step 3: Setting up the data structure

Now we can work with Twitter, let us prepare a data structure using the days of the week.

days.en <- c("Monday", "Tuesday", "Wednesday", "Thursday", 
             "Friday", "Saturday", "Sunday")
days.nl <- c("maandag", "dinsdag", "woensdag", "donderdag", 
             "vrijdag", "zaterdag", "zondag")

Secondly we create a matrix of 7 rows and 7 cols to fill the matrix later.

occurrence <- matrix(0, ncol=7, nrow = 7)
colnames(occurrence) <- days.nl
rownames(occurrence) <- days.nl

Step 4: Requesting data from Twitter

Now we ask Twitter to look for each day of the week which days are mentoined.

for(col in 1:7){  
  print(paste(c("Now searching", days.en[col])))
  n = 650;
  searchterm = paste0(days.nl[col], 
                      sep = " -filter:retweets min_retweets:2")
  
  s <- searchTwitter(searchterm, n=n)
  
  for(i in 1:n){
    row <- which(days.en %in% weekdays(as.Date(s[[i]]$created)))
    occurrence[row, col] <- occurrence[row, col] + 1
  }
}

Step 5: Visualizing the our data 1/3

The data looks like this:

occurrence
##           maandag dinsdag woensdag donderdag vrijdag zaterdag zondag
## maandag       122      80      167       125     169      170    122
## dinsdag        55      82       91       199     157      173    101
## woensdag       17      21      132       117      65       67     52
## donderdag      12      55       61        80       0        0      0
## vrijdag       188     130       54        58     132        0     13
## zaterdag      164     105       50        32      50      146    112
## zondag         92     177       95        39      77       94    250

Step 5: Visualizing the our data 2/3

Let's make it fancy using a heatmap:

heatmap(t(occurrence), 
        symm = T, 
        Rowv = NA, 
        ylab = "Talking about", 
        xlab = "Actual day", 
        scale="row",
        cexCol = .8, 
        cexRow = .8,
        col=brewer.pal(9,"Blues")
        )

Step 5: Visualizing the our data 3/3

It will look like this:

Recap

You are now able to:

  • Access the Twitter API
  • Request data from Twitter
  • Visualize Twitter data in a heatmap

There are still a lot more things possible with Twitter, just keep in mind that it can only handle 2400 requests each day.