This vignette provides an introduction to using R to extract data through the Facebook Graph API. For our example we will extract data from the UTS facebook page using the httr
R package to interact with the API.
An API or Application Programming Interface makes a website’s data digestible for a computer. This is useful because it means we can interact with a website like a human would, but do this programmatically with code.
Before you can use the API, Facebook requires you to register an Application.
To register an application:
http://localhost:1410/
. This is a configuration the httr
package needsThe rest of this tutorial will be in R. Open up RStudio. You’ll need the R packages httr
, josnlite
, dplyr
and lubridate
. If you don’t already have them, install them with the R command:
install.packages(c('httr', 'jsonlite', 'dplyr', 'lubridate'))
Load these packages with:
library(httr)
library(jsonlite)
library(dplyr)
library(lubridate)
The API we are going to use is called the Facebook Graph API.
Before you can request any data, you need to authenticate with Facebook to receive an Access token. This provides temporary, secure access to the Facebook APIs. There are three main types of tokens:
Access token type | Description | Further reading |
---|---|---|
User Access Token | This is the most commonly used token. It lets you read, modify or write a specific person’s Facebook data on their behalf. User access tokens are generally obtained via a login dialog and require a person to permit your app to obtain one | User Access Tokens |
App Access Token | This token is needed to modify and read your app settings. It can also be used to read publicly available Facebook content like Pages | App Access Tokens |
Page Access Token | This token is similar to user access tokens, except they let you read, write or modify the data belonging to a Facebook Page | Page Access Tokens |
To get a user access token, run the following, replacing <your_app_id>
and <your_app_secret>
with the values obtained in the First step:
# Define keys
app_id = '<your_app_id>'
app_secret = '<your_app_secret>'
# Define the app
fb_app <- oauth_app(appname = "facebook",
key = app_id,
secret = app_secret)
# Get OAuth user access token
fb_token <- oauth2.0_token(oauth_endpoints("facebook"),
fb_app,
scope = 'public_profile',
type = "application/x-www-form-urlencoded",
cache = TRUE)
Since you created the Facebook app with your own user, you will be prompted to confirm the authentication. Click Continue:
If authentication is successful you will see the following message in your browser:
Authentication complete. Please close this page and return to R.
In R you will see the following output:
Waiting for authentication in browser...
Press Esc/Ctrl + C to abort
Authentication complete.
Check your authentication worked:
fb_token
#> <Token>
#> <oauth_endpoint>
#> authorize: https://www.facebook.com/dialog/oauth
#> access: https://graph.facebook.com/oauth/access_token
#> <oauth_app> facebook
#> key: 916070815201948
#> secret: <hidden>
#> <credentials> {"access_token":"EAANBKVuHepwBAEnNg79kLibagCMiMs9BgYXmFY4ZBPcsZCziYsLa4E9ZAnf9WCHckgCfZCgfM79ZCbojXHpZBKTc9BhypiRAvK28f58c7OlTYuendk7wOUJc5ZApS80uGaGbbnEtJZCZBZCzsq9NwrEwrJRyaw5z3Do6UpIp29UJtCNwZDZD","token_type":"bearer","expires_in":5142013}
#> ---
You can test the token works with a basic API call:
# GET request for your user information
response <- GET("https://graph.facebook.com",
path = "/me",
config = config(token = fb_token))
# Show content returned
content(response)
scope
argument in the oauth2.0_token
function above. This is where we define what permissions to grant our application. With scope = 'public_profile'
we’ve only asked for the lowest level of permissions, however there are many more (see the full list). If you need to request more, provide them as a character vector. For example for public profile and user friends permissions you’d have: scope = c('public_profile', 'user_friends')
To get an app access token, run the following R block, replacing <your_app_id>
and <your_app_secret>
with the values obtained in the first step above:
# Define keys
app_id = '<your_app_id>'
app_secret = '<your_app_secret>'
# Define the API node and query arguments
node <- '/oauth/access_token'
query_args <- list(client_id = app_id,
client_secret = app_secret,
grant_type = 'client_credentials',
redirect_uri = 'http://localhost:1410/')
# GET request to generate the token
response <- GET('https://graph.facebook.com',
path = node,
query = query_args)
# Save the token to an object for use
app_access_token <- content(response)$access_token
Check your authentication worked:
app_access_token
#> [1] "916070815201948|rXvai414g3tzEsXHZTW8TLJwPzA"
Then test:
# GET request for UTS facebook page info
response <- GET("https://graph.facebook.com",
path = "/UTSEngage",
query = list(access_token = app_access_token))
# Check response content
content(response)
#> $name
#> [1] "UTS: University of Technology Sydney"
#>
#> $id
#> [1] "254319736002"
All nodes and edges in the API can be read with an HTTP GET
request to the relevant endpoint. The structure of this request is:
GET graph.facebook.com
/{node-id}?
fields=<first-level>{<second-level>}
And if you want to add edges:
GET graph.facebook.com
/{node-id}/{edge-type}?
fields=<first-level>{<second-level>}
The full list of node types (and their corresponding edge types) can be found in the Graph API Reference page.
The response you receive will take the general json form:
{
"fieldname": {field-value},
....
}
Let’s implement this in R. We’ll start by requesting a Page node using the UTS facebook page. We can use the UTSEngage username as the node-id
value in the GET
request. We’ll define the fields to return as username, id, name, category, fan count and link. Finally, since this is a public page, we don’t need the full functionality of a User Access Token and so will use the App Access Token to sign the request. To achieve this we run:
# Define the node and fields
path <- '/UTSEngage'
query_args <- list(fields = 'username,id,name,category,fan_count,link',
access_token = app_access_token)
# GET request
response <- GET('https://graph.facebook.com',
path = path,
query = query_args)
We see the response is json as expected:
http_type(response)
#> [1] "application/json"
To inspect the content of the response we can use the content()
function, which automatically parses the json. We’ll wrap this with str()
to inspect:
str(content(response))
#> List of 6
#> $ username : chr "UTSEngage"
#> $ id : chr "254319736002"
#> $ name : chr "UTS: University of Technology Sydney"
#> $ category : chr "College & University"
#> $ fan_count: int 91919
#> $ link : chr "https://www.facebook.com/UTSEngage/"
Alternatively, to access the json use:
content(response, as = 'text')
#> [1] "{\"username\":\"UTSEngage\",\"id\":\"254319736002\",\"name\":\"UTS: University of Technology Sydney\",\"category\":\"College & University\",\"fan_count\":91919,\"link\":\"https:\\/\\/www.facebook.com\\/UTSEngage\\/\"}"
In most cases you will want to parse the json the API returns into a data.frame
. Let’s do this for the posts on the page:
# == Contruct the GET request
# Define the node, edge and fields
path <- '/UTSEngage/feed'
query_args <- list(fields = 'id,created_time,from,message,type,place,permalink_url,shares,likes.summary(true),comments.summary(true)',
access_token = app_access_token)
# GET request
response <- GET('https://graph.facebook.com',
path = path,
query = query_args)
Use the jsonlite
package to parse the json to a list:
# Convert json to a list
response_parsed <- fromJSON(content(response, "text"))
Now response_parsed$data
contains the API data as a data.frame
:
glimpse(response_parsed$data)
#> Observations: 25
#> Variables: 9
#> $ id <chr> "254319736002_10154349053406003", "254319736002_...
#> $ created_time <chr> "2017-04-04T10:00:00+0000", "2017-04-03T10:06:00...
#> $ from <data.frame> c("UTS: University of Technology Sydney",...
#> $ message <chr> "Monday 10 April is the last day you can withdra...
#> $ type <chr> "video", "video", "link", "link", "video", "link...
#> $ permalink_url <chr> "https://www.facebook.com/UTSEngage/videos/10154...
#> $ likes <data.frame> c("1354372497959722, 1046647338813500, 27...
#> $ comments <data.frame> c("NULL", "2017-04-03T10:36:18+0000, Pete...
#> $ shares <data.frame> c("NA", "1", "3", "65", "2", "2", "NA", "...
You will notice only 25 results returned. This is by design. To balance load, Facebook deliberately returns the results of our request in paginated chunks. Thus the response_parsed$paging
list tells us how to transverse through the rest of the paginated results:
str(response_parsed$paging)
#> List of 2
#> $ previous: chr "https://graph.facebook.com/v2.8/254319736002/feed?fields=id,created_time,from,message,type,place,permalink_url,shares,likes.sum"| __truncated__
#> $ next : chr "https://graph.facebook.com/v2.8/254319736002/feed?fields=id,created_time,from,message,type,place,permalink_url,shares,likes.sum"| __truncated__
To get the next 25 results we’d run:
response_next <- GET(response_ls$paging$`next`)
In practice we would use a while
loop to continue this until paging$next
is NULL
.
Our final step is to clean the results:
posts <- tibble(id = response_parsed$data$id,
created_time = with_tz(ymd_hms(response_parsed$data$created_time,
tz = 'UTC'),
tz = 'Australia/Sydney'),
from_id = response_parsed$data$from$id,
from_name = response_parsed$data$from$name,
message = response_parsed$data$message,
type = response_parsed$data$type,
permalink_url = response_parsed$data$permalink_url,
shares_count = response_parsed$data$shares$count,
likes_count = response_parsed$data$likes$summary$total_count,
comments_count = response_parsed$data$comments$summary$total_count)
# Inspect
glimpse(posts)
#> Observations: 25
#> Variables: 10
#> $ id <chr> "254319736002_10154349053406003", "254319736002...
#> $ created_time <dttm> 2017-04-04 20:00:00, 2017-04-03 20:06:00, 2017...
#> $ from_id <chr> "254319736002", "254319736002", "254319736002",...
#> $ from_name <chr> "UTS: University of Technology Sydney", "UTS: U...
#> $ message <chr> "Monday 10 April is the last day you can withdr...
#> $ type <chr> "video", "video", "link", "link", "video", "lin...
#> $ permalink_url <chr> "https://www.facebook.com/UTSEngage/videos/1015...
#> $ shares_count <int> NA, 1, 3, 65, 2, 2, NA, NA, NA, NA, 5, 8, 4, 7,...
#> $ likes_count <int> 8, 29, 64, 453, 24, 42, 22, 0, 37, 0, 63, 143, ...
#> $ comments_count <int> 0, 1, 10, 553, 0, 0, 1, 0, 1, 0, 23, 4, 2, 8, 2...
You are now ready analyse this data in R. For example, what was the highest liked post?
posts %>%
filter(likes_count == max(likes_count)) %>%
select(from_name, message, likes_count, comments_count, permalink_url) %>%
glimpse()
#> Observations: 1
#> Variables: 5
#> $ from_name <chr> "UTS: University of Technology Sydney"
#> $ message <chr> "Great news for all our single students - UTS i...
#> $ likes_count <int> 453
#> $ comments_count <int> 553
#> $ permalink_url <chr> "https://www.facebook.com/UTSEngage/posts/10154...
There is an Rfacebook package that provides a wrapper to many of the other parts of the Facebook APIs. Because it has dedicated functions for things like getPage
and getPost
, I deliberately didn’t demonstrate it as it hides the customisation of the API requests I have shown you when using httr
.