I prepared this document to show you how to parse json data with the awesome jsonlite package.

Let’s say that you are a huge fan of the great and powerful Hadley Wickam. (He’s the Chief Scientist at Rstudio. He’s a big reason why all this R magic is possible). You want to stay up to date with his latest work. You could skim through his github repo every day. Or you could use an API to pull his github activity as a JSON file. Let’s do the latter.

Let’s import a few libraries.

library(jsonlite)
library(tidyverse)
library(knitr)
library(kableExtra)

Let’s pull the JSON data with the github api. Then look at the data structure:

data1 <- fromJSON("https://api.github.com/users/hadley/repos")
str(data1[,c(0:7)])
## 'data.frame':    30 obs. of  7 variables:
##  $ id       : int  40423928 40544418 14984909 12241750 5154874 9324319 20228011 82348 888200 3116998 ...
##  $ node_id  : chr  "MDEwOlJlcG9zaXRvcnk0MDQyMzkyOA==" "MDEwOlJlcG9zaXRvcnk0MDU0NDQxOA==" "MDEwOlJlcG9zaXRvcnkxNDk4NDkwOQ==" "MDEwOlJlcG9zaXRvcnkxMjI0MTc1MA==" ...
##  $ name     : chr  "15-state-of-the-union" "15-student-papers" "500lines" "adv-r" ...
##  $ full_name: chr  "hadley/15-state-of-the-union" "hadley/15-student-papers" "hadley/500lines" "hadley/adv-r" ...
##  $ private  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ owner    :'data.frame':   30 obs. of  18 variables:
##   ..$ login              : chr  "hadley" "hadley" "hadley" "hadley" ...
##   ..$ id                 : int  4196 4196 4196 4196 4196 4196 4196 4196 4196 4196 ...
##   ..$ node_id            : chr  "MDQ6VXNlcjQxOTY=" "MDQ6VXNlcjQxOTY=" "MDQ6VXNlcjQxOTY=" "MDQ6VXNlcjQxOTY=" ...
##   ..$ avatar_url         : chr  "https://avatars3.githubusercontent.com/u/4196?v=4" "https://avatars3.githubusercontent.com/u/4196?v=4" "https://avatars3.githubusercontent.com/u/4196?v=4" "https://avatars3.githubusercontent.com/u/4196?v=4" ...
##   ..$ gravatar_id        : chr  "" "" "" "" ...
##   ..$ url                : chr  "https://api.github.com/users/hadley" "https://api.github.com/users/hadley" "https://api.github.com/users/hadley" "https://api.github.com/users/hadley" ...
##   ..$ html_url           : chr  "https://github.com/hadley" "https://github.com/hadley" "https://github.com/hadley" "https://github.com/hadley" ...
##   ..$ followers_url      : chr  "https://api.github.com/users/hadley/followers" "https://api.github.com/users/hadley/followers" "https://api.github.com/users/hadley/followers" "https://api.github.com/users/hadley/followers" ...
##   ..$ following_url      : chr  "https://api.github.com/users/hadley/following{/other_user}" "https://api.github.com/users/hadley/following{/other_user}" "https://api.github.com/users/hadley/following{/other_user}" "https://api.github.com/users/hadley/following{/other_user}" ...
##   ..$ gists_url          : chr  "https://api.github.com/users/hadley/gists{/gist_id}" "https://api.github.com/users/hadley/gists{/gist_id}" "https://api.github.com/users/hadley/gists{/gist_id}" "https://api.github.com/users/hadley/gists{/gist_id}" ...
##   ..$ starred_url        : chr  "https://api.github.com/users/hadley/starred{/owner}{/repo}" "https://api.github.com/users/hadley/starred{/owner}{/repo}" "https://api.github.com/users/hadley/starred{/owner}{/repo}" "https://api.github.com/users/hadley/starred{/owner}{/repo}" ...
##   ..$ subscriptions_url  : chr  "https://api.github.com/users/hadley/subscriptions" "https://api.github.com/users/hadley/subscriptions" "https://api.github.com/users/hadley/subscriptions" "https://api.github.com/users/hadley/subscriptions" ...
##   ..$ organizations_url  : chr  "https://api.github.com/users/hadley/orgs" "https://api.github.com/users/hadley/orgs" "https://api.github.com/users/hadley/orgs" "https://api.github.com/users/hadley/orgs" ...
##   ..$ repos_url          : chr  "https://api.github.com/users/hadley/repos" "https://api.github.com/users/hadley/repos" "https://api.github.com/users/hadley/repos" "https://api.github.com/users/hadley/repos" ...
##   ..$ events_url         : chr  "https://api.github.com/users/hadley/events{/privacy}" "https://api.github.com/users/hadley/events{/privacy}" "https://api.github.com/users/hadley/events{/privacy}" "https://api.github.com/users/hadley/events{/privacy}" ...
##   ..$ received_events_url: chr  "https://api.github.com/users/hadley/received_events" "https://api.github.com/users/hadley/received_events" "https://api.github.com/users/hadley/received_events" "https://api.github.com/users/hadley/received_events" ...
##   ..$ type               : chr  "User" "User" "User" "User" ...
##   ..$ site_admin         : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ html_url : chr  "https://github.com/hadley/15-state-of-the-union" "https://github.com/hadley/15-student-papers" "https://github.com/hadley/500lines" "https://github.com/hadley/adv-r" ...

We see that this is a nested JSON file by looking at the first few columns. The “owner” column consists of a dataframe.

We want a flat dataframe. Luckily for us, jsonlite comes with a function to flatten dataframes.

data1 <- jsonlite::flatten(data1)

We can now filter the columns to folder names, date of last update and folder url. We can see what he’s been working on lately.

data1 %>% 
  select(name, updated_at, git_url) %>%
  arrange(desc(updated_at)) %>%
  head(5) %>% 
  kable() %>%
  kable_styling
name updated_at git_url
assertthat 2019-12-12T09:55:44Z git://github.com/hadley/assertthat.git
adv-r 2019-12-11T10:22:52Z git://github.com/hadley/adv-r.git
data-baby-names 2019-12-02T23:06:51Z git://github.com/hadley/data-baby-names.git
babynames 2019-11-21T12:15:46Z git://github.com/hadley/babynames.git
beautiful-data 2019-11-06T20:02:39Z git://github.com/hadley/beautiful-data.git

There we go, apis and json files making our life easier!