I prepared this document to show you how to parse json data with the awesome jsonlite package.
Let’s say that you are a huge fan of the great and powerful Hadley Wickam. (He’s the Chief Scientist at Rstudio. He’s a big reason why all this R magic is possible). You want to stay up to date with his latest work. You could skim through his github repo every day. Or you could use an API to pull his github activity as a JSON file. Let’s do the latter.
Let’s import a few libraries.
library(jsonlite)
library(tidyverse)
library(knitr)
library(kableExtra)
Let’s pull the JSON data with the github api. Then look at the data structure:
data1 <- fromJSON("https://api.github.com/users/hadley/repos")
str(data1[,c(0:7)])
## 'data.frame': 30 obs. of 7 variables:
## $ id : int 40423928 40544418 14984909 12241750 5154874 9324319 20228011 82348 888200 3116998 ...
## $ node_id : chr "MDEwOlJlcG9zaXRvcnk0MDQyMzkyOA==" "MDEwOlJlcG9zaXRvcnk0MDU0NDQxOA==" "MDEwOlJlcG9zaXRvcnkxNDk4NDkwOQ==" "MDEwOlJlcG9zaXRvcnkxMjI0MTc1MA==" ...
## $ name : chr "15-state-of-the-union" "15-student-papers" "500lines" "adv-r" ...
## $ full_name: chr "hadley/15-state-of-the-union" "hadley/15-student-papers" "hadley/500lines" "hadley/adv-r" ...
## $ private : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ owner :'data.frame': 30 obs. of 18 variables:
## ..$ login : chr "hadley" "hadley" "hadley" "hadley" ...
## ..$ id : int 4196 4196 4196 4196 4196 4196 4196 4196 4196 4196 ...
## ..$ node_id : chr "MDQ6VXNlcjQxOTY=" "MDQ6VXNlcjQxOTY=" "MDQ6VXNlcjQxOTY=" "MDQ6VXNlcjQxOTY=" ...
## ..$ avatar_url : chr "https://avatars3.githubusercontent.com/u/4196?v=4" "https://avatars3.githubusercontent.com/u/4196?v=4" "https://avatars3.githubusercontent.com/u/4196?v=4" "https://avatars3.githubusercontent.com/u/4196?v=4" ...
## ..$ gravatar_id : chr "" "" "" "" ...
## ..$ url : chr "https://api.github.com/users/hadley" "https://api.github.com/users/hadley" "https://api.github.com/users/hadley" "https://api.github.com/users/hadley" ...
## ..$ html_url : chr "https://github.com/hadley" "https://github.com/hadley" "https://github.com/hadley" "https://github.com/hadley" ...
## ..$ followers_url : chr "https://api.github.com/users/hadley/followers" "https://api.github.com/users/hadley/followers" "https://api.github.com/users/hadley/followers" "https://api.github.com/users/hadley/followers" ...
## ..$ following_url : chr "https://api.github.com/users/hadley/following{/other_user}" "https://api.github.com/users/hadley/following{/other_user}" "https://api.github.com/users/hadley/following{/other_user}" "https://api.github.com/users/hadley/following{/other_user}" ...
## ..$ gists_url : chr "https://api.github.com/users/hadley/gists{/gist_id}" "https://api.github.com/users/hadley/gists{/gist_id}" "https://api.github.com/users/hadley/gists{/gist_id}" "https://api.github.com/users/hadley/gists{/gist_id}" ...
## ..$ starred_url : chr "https://api.github.com/users/hadley/starred{/owner}{/repo}" "https://api.github.com/users/hadley/starred{/owner}{/repo}" "https://api.github.com/users/hadley/starred{/owner}{/repo}" "https://api.github.com/users/hadley/starred{/owner}{/repo}" ...
## ..$ subscriptions_url : chr "https://api.github.com/users/hadley/subscriptions" "https://api.github.com/users/hadley/subscriptions" "https://api.github.com/users/hadley/subscriptions" "https://api.github.com/users/hadley/subscriptions" ...
## ..$ organizations_url : chr "https://api.github.com/users/hadley/orgs" "https://api.github.com/users/hadley/orgs" "https://api.github.com/users/hadley/orgs" "https://api.github.com/users/hadley/orgs" ...
## ..$ repos_url : chr "https://api.github.com/users/hadley/repos" "https://api.github.com/users/hadley/repos" "https://api.github.com/users/hadley/repos" "https://api.github.com/users/hadley/repos" ...
## ..$ events_url : chr "https://api.github.com/users/hadley/events{/privacy}" "https://api.github.com/users/hadley/events{/privacy}" "https://api.github.com/users/hadley/events{/privacy}" "https://api.github.com/users/hadley/events{/privacy}" ...
## ..$ received_events_url: chr "https://api.github.com/users/hadley/received_events" "https://api.github.com/users/hadley/received_events" "https://api.github.com/users/hadley/received_events" "https://api.github.com/users/hadley/received_events" ...
## ..$ type : chr "User" "User" "User" "User" ...
## ..$ site_admin : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ html_url : chr "https://github.com/hadley/15-state-of-the-union" "https://github.com/hadley/15-student-papers" "https://github.com/hadley/500lines" "https://github.com/hadley/adv-r" ...
We see that this is a nested JSON file by looking at the first few columns. The “owner” column consists of a dataframe.
We want a flat dataframe. Luckily for us, jsonlite comes with a function to flatten dataframes.
data1 <- jsonlite::flatten(data1)
We can now filter the columns to folder names, date of last update and folder url. We can see what he’s been working on lately.
data1 %>%
select(name, updated_at, git_url) %>%
arrange(desc(updated_at)) %>%
head(5) %>%
kable() %>%
kable_styling
| name | updated_at | git_url |
|---|---|---|
| assertthat | 2019-12-12T09:55:44Z | git://github.com/hadley/assertthat.git |
| adv-r | 2019-12-11T10:22:52Z | git://github.com/hadley/adv-r.git |
| data-baby-names | 2019-12-02T23:06:51Z | git://github.com/hadley/data-baby-names.git |
| babynames | 2019-11-21T12:15:46Z | git://github.com/hadley/babynames.git |
| beautiful-data | 2019-11-06T20:02:39Z | git://github.com/hadley/beautiful-data.git |
There we go, apis and json files making our life easier!