Extracting data through the Facebook Graph API

Daniel Booth

2017-04-04

Overview

This vignette provides an introduction to using R to extract data through the Facebook Graph API. For our example we will extract data from the UTS facebook page using the httr R package to interact with the API.

A brief introduction to APIs

An API or Application Programming Interface makes a website’s data digestible for a computer. This is useful because it means we can interact with a website like a human would, but do this programmatically with code.

First step: Register a Facebook application

Before you can use the API, Facebook requires you to register an Application.

To register an application:

  1. Navigate to https://developers.facebook.com/apps/
  2. Sign in with your Facebook account
  3. Click on the Add a new app button
  4. Fill in the Create a New App ID form and click on Create App ID. You should be redirected to the Product Setup page
  5. In the left side menu click Dashboard. You should see something like this:
  1. Make note of the App ID and App Secret values as you’ll need these in the Third step
  2. In the left side menu click Settings
  3. Click the + Add Platform button and choose Website
  4. In the Site URL field enter the value http://localhost:1410/. This is a configuration the httr package needs
  5. Click the Save Changes button on the bottom right to save

Second step: Install the required R packages

The rest of this tutorial will be in R. Open up RStudio. You’ll need the R packages httr, josnlite, dplyr and lubridate. If you don’t already have them, install them with the R command:

install.packages(c('httr', 'jsonlite', 'dplyr', 'lubridate'))

Load these packages with:

library(httr)
library(jsonlite)
library(dplyr)
library(lubridate)

Third step: Authenticate with the Facebook Graph API in R

The API we are going to use is called the Facebook Graph API.

Before you can request any data, you need to authenticate with Facebook to receive an Access token. This provides temporary, secure access to the Facebook APIs. There are three main types of tokens:

Access token type Description Further reading
User Access Token This is the most commonly used token. It lets you read, modify or write a specific person’s Facebook data on their behalf. User access tokens are generally obtained via a login dialog and require a person to permit your app to obtain one User Access Tokens
App Access Token This token is needed to modify and read your app settings. It can also be used to read publicly available Facebook content like Pages App Access Tokens
Page Access Token This token is similar to user access tokens, except they let you read, write or modify the data belonging to a Facebook Page Page Access Tokens

User Access Token

To get a user access token, run the following, replacing <your_app_id> and <your_app_secret> with the values obtained in the First step:

# Define keys
app_id = '<your_app_id>'
app_secret = '<your_app_secret>'

# Define the app
fb_app <- oauth_app(appname = "facebook",
                    key = app_id,
                    secret = app_secret)

# Get OAuth user access token
fb_token <- oauth2.0_token(oauth_endpoints("facebook"),
                           fb_app,
                           scope = 'public_profile',
                           type = "application/x-www-form-urlencoded",
                           cache = TRUE)

Since you created the Facebook app with your own user, you will be prompted to confirm the authentication. Click Continue:

If authentication is successful you will see the following message in your browser:

Authentication complete. Please close this page and return to R.

In R you will see the following output:

Waiting for authentication in browser...
Press Esc/Ctrl + C to abort
Authentication complete.

Check your authentication worked:

fb_token
#> <Token>
#> <oauth_endpoint>
#>  authorize: https://www.facebook.com/dialog/oauth
#>  access:    https://graph.facebook.com/oauth/access_token
#> <oauth_app> facebook
#>   key:    916070815201948
#>   secret: <hidden>
#> <credentials> {"access_token":"EAANBKVuHepwBAEnNg79kLibagCMiMs9BgYXmFY4ZBPcsZCziYsLa4E9ZAnf9WCHckgCfZCgfM79ZCbojXHpZBKTc9BhypiRAvK28f58c7OlTYuendk7wOUJc5ZApS80uGaGbbnEtJZCZBZCzsq9NwrEwrJRyaw5z3Do6UpIp29UJtCNwZDZD","token_type":"bearer","expires_in":5142013}
#> ---

You can test the token works with a basic API call:

# GET request for your user information
response <- GET("https://graph.facebook.com",
                path = "/me",
                config = config(token = fb_token))

# Show content returned
content(response)
OAuth scope
Before we continue, note the scope argument in the oauth2.0_token function above. This is where we define what permissions to grant our application. With scope = 'public_profile' we’ve only asked for the lowest level of permissions, however there are many more (see the full list). If you need to request more, provide them as a character vector. For example for public profile and user friends permissions you’d have: scope = c('public_profile', 'user_friends')

App Access Token

To get an app access token, run the following R block, replacing <your_app_id> and <your_app_secret> with the values obtained in the first step above:

# Define keys
app_id = '<your_app_id>'
app_secret = '<your_app_secret>'

# Define the API node and query arguments
node <- '/oauth/access_token'
query_args <- list(client_id = app_id,
                   client_secret = app_secret,
                   grant_type = 'client_credentials',
                   redirect_uri = 'http://localhost:1410/')

# GET request to generate the token
response <- GET('https://graph.facebook.com',
                path = node,
                query = query_args)

# Save the token to an object for use
app_access_token <- content(response)$access_token

Check your authentication worked:

app_access_token
#> [1] "916070815201948|rXvai414g3tzEsXHZTW8TLJwPzA"

Then test:

# GET request for UTS facebook page info
response <- GET("https://graph.facebook.com",
                path = "/UTSEngage",
                query = list(access_token = app_access_token))

# Check response content
content(response)
#> $name
#> [1] "UTS: University of Technology Sydney"
#> 
#> $id
#> [1] "254319736002"

Fourth step: Query the API

GET request structure

All nodes and edges in the API can be read with an HTTP GET request to the relevant endpoint. The structure of this request is:

GET graph.facebook.com
  /{node-id}?
    fields=<first-level>{<second-level>}

And if you want to add edges:

GET graph.facebook.com
  /{node-id}/{edge-type}?
    fields=<first-level>{<second-level>}

The full list of node types (and their corresponding edge types) can be found in the Graph API Reference page.

The response you receive will take the general json form:

{
   "fieldname": {field-value},
   ....
}

Implementing in R

Let’s implement this in R. We’ll start by requesting a Page node using the UTS facebook page. We can use the UTSEngage username as the node-id value in the GET request. We’ll define the fields to return as username, id, name, category, fan count and link. Finally, since this is a public page, we don’t need the full functionality of a User Access Token and so will use the App Access Token to sign the request. To achieve this we run:

# Define the node and fields
path <- '/UTSEngage'
query_args <- list(fields = 'username,id,name,category,fan_count,link',
                   access_token = app_access_token)

# GET request
response <- GET('https://graph.facebook.com',
                path = path,
                query = query_args)

We see the response is json as expected:

http_type(response)
#> [1] "application/json"

To inspect the content of the response we can use the content() function, which automatically parses the json. We’ll wrap this with str() to inspect:

str(content(response))
#> List of 6
#>  $ username : chr "UTSEngage"
#>  $ id       : chr "254319736002"
#>  $ name     : chr "UTS: University of Technology Sydney"
#>  $ category : chr "College & University"
#>  $ fan_count: int 91919
#>  $ link     : chr "https://www.facebook.com/UTSEngage/"

Alternatively, to access the json use:

content(response, as = 'text')
#> [1] "{\"username\":\"UTSEngage\",\"id\":\"254319736002\",\"name\":\"UTS: University of Technology Sydney\",\"category\":\"College & University\",\"fan_count\":91919,\"link\":\"https:\\/\\/www.facebook.com\\/UTSEngage\\/\"}"

Fifth step: Parsing the JSON response to a data.frame

In most cases you will want to parse the json the API returns into a data.frame. Let’s do this for the posts on the page:

# == Contruct the GET request
# Define the node, edge and fields
path <- '/UTSEngage/feed'
query_args <- list(fields = 'id,created_time,from,message,type,place,permalink_url,shares,likes.summary(true),comments.summary(true)',
                   access_token = app_access_token)

# GET request
response <- GET('https://graph.facebook.com',
                path = path,
                query = query_args)

Use the jsonlite package to parse the json to a list:

# Convert json to a list
response_parsed <- fromJSON(content(response, "text"))

Now response_parsed$data contains the API data as a data.frame:

glimpse(response_parsed$data)
#> Observations: 25
#> Variables: 9
#> $ id            <chr> "254319736002_10154349053406003", "254319736002_...
#> $ created_time  <chr> "2017-04-04T10:00:00+0000", "2017-04-03T10:06:00...
#> $ from          <data.frame> c("UTS: University of Technology Sydney",...
#> $ message       <chr> "Monday 10 April is the last day you can withdra...
#> $ type          <chr> "video", "video", "link", "link", "video", "link...
#> $ permalink_url <chr> "https://www.facebook.com/UTSEngage/videos/10154...
#> $ likes         <data.frame> c("1354372497959722, 1046647338813500, 27...
#> $ comments      <data.frame> c("NULL", "2017-04-03T10:36:18+0000, Pete...
#> $ shares        <data.frame> c("NA", "1", "3", "65", "2", "2", "NA", "...

You will notice only 25 results returned. This is by design. To balance load, Facebook deliberately returns the results of our request in paginated chunks. Thus the response_parsed$paging list tells us how to transverse through the rest of the paginated results:

str(response_parsed$paging)
#> List of 2
#>  $ previous: chr "https://graph.facebook.com/v2.8/254319736002/feed?fields=id,created_time,from,message,type,place,permalink_url,shares,likes.sum"| __truncated__
#>  $ next    : chr "https://graph.facebook.com/v2.8/254319736002/feed?fields=id,created_time,from,message,type,place,permalink_url,shares,likes.sum"| __truncated__

To get the next 25 results we’d run:

response_next <- GET(response_ls$paging$`next`)

In practice we would use a while loop to continue this until paging$next is NULL.

Our final step is to clean the results:

posts <- tibble(id = response_parsed$data$id,
                created_time = with_tz(ymd_hms(response_parsed$data$created_time,
                                               tz = 'UTC'),
                                       tz = 'Australia/Sydney'),
                from_id = response_parsed$data$from$id,
                from_name = response_parsed$data$from$name,
                message = response_parsed$data$message,
                type = response_parsed$data$type,
                permalink_url = response_parsed$data$permalink_url,
                shares_count = response_parsed$data$shares$count,
                likes_count = response_parsed$data$likes$summary$total_count,
                comments_count = response_parsed$data$comments$summary$total_count)

# Inspect
glimpse(posts)
#> Observations: 25
#> Variables: 10
#> $ id             <chr> "254319736002_10154349053406003", "254319736002...
#> $ created_time   <dttm> 2017-04-04 20:00:00, 2017-04-03 20:06:00, 2017...
#> $ from_id        <chr> "254319736002", "254319736002", "254319736002",...
#> $ from_name      <chr> "UTS: University of Technology Sydney", "UTS: U...
#> $ message        <chr> "Monday 10 April is the last day you can withdr...
#> $ type           <chr> "video", "video", "link", "link", "video", "lin...
#> $ permalink_url  <chr> "https://www.facebook.com/UTSEngage/videos/1015...
#> $ shares_count   <int> NA, 1, 3, 65, 2, 2, NA, NA, NA, NA, 5, 8, 4, 7,...
#> $ likes_count    <int> 8, 29, 64, 453, 24, 42, 22, 0, 37, 0, 63, 143, ...
#> $ comments_count <int> 0, 1, 10, 553, 0, 0, 1, 0, 1, 0, 23, 4, 2, 8, 2...

You are now ready analyse this data in R. For example, what was the highest liked post?

posts %>%
  filter(likes_count == max(likes_count)) %>% 
  select(from_name, message, likes_count, comments_count, permalink_url) %>% 
  glimpse()
#> Observations: 1
#> Variables: 5
#> $ from_name      <chr> "UTS: University of Technology Sydney"
#> $ message        <chr> "Great news for all our single students - UTS i...
#> $ likes_count    <int> 453
#> $ comments_count <int> 553
#> $ permalink_url  <chr> "https://www.facebook.com/UTSEngage/posts/10154...

Further exploration - Rfacebook

There is an Rfacebook package that provides a wrapper to many of the other parts of the Facebook APIs. Because it has dedicated functions for things like getPage and getPost, I deliberately didn’t demonstrate it as it hides the customisation of the API requests I have shown you when using httr.