Introduction to Purrr
This vignette has been created to provide information on the tidyverse package: purrr. Purrr is a great package for analyzing and manipulating json data.
We will be using a League of Legends dataset, provided by Santiago Torres in project 2.
Loading the Data
library(rjson)
library(tidyverse)## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(stringr)
library(jsonlite)##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
##
## flatten
## The following objects are masked from 'package:rjson':
##
## fromJSON, toJSON
library(purrr)Load Data
Data is hosted on ddragon.leagueoflegends.com. We can use rjson::fromJSON to read and parse directly from the URL.
fromJSON will return a nested list structure containing all of the available json data
champion_json <- rjson::fromJSON(file="https://ddragon.leagueoflegends.com/cdn/11.19.1/data/en_US/champion.json")The champ_list is a list of 157 json objects. Each json object contains information about an available character in the online MOBA League of Legends. For instance, these json objects may hold the name of the character, descriptions of the character, or stats of the character.
champ_list <- champion_json$dataDemonstrate value of Purrr package
We will be using the tidyverse “purrr” package to demonstrate how working with JSONs doesn’t always need to be a chore.
Extracting nested data - single values
One of the cool functions available in the purrr package is map(). Map() takes in two arguments:
- a list of jsons
- a key-value to extract
Below, we will extract all of the champion names into a new character vector called “champ_names”
champ_names <- purrr::map(champ_list, "name")
champ_names[1:5]## $Aatrox
## [1] "Aatrox"
##
## $Ahri
## [1] "Ahri"
##
## $Akali
## [1] "Akali"
##
## $Akshan
## [1] "Akshan"
##
## $Alistar
## [1] "Alistar"
Extracting nested data - multiple values
What if we wanted to extract more than just the name of a champion? With purrr, we can do that as well.
Say for instance we wanted to extract all champions’ name, title, and blurb from the champion json. If we wanted to do this for a single json object, we could accomplish it like so:
champ_list[[1]][c("name","title","blurb")]## $name
## [1] "Aatrox"
##
## $title
## [1] "the Darkin Blade"
##
## $blurb
## [1] "Once honored defenders of Shurima against the Void, Aatrox and his brethren would eventually become an even greater threat to Runeterra, and were defeated only by cunning mortal sorcery. But after centuries of imprisonment, Aatrox was the first to find..."
In the above example, the square brackets surrounding the character vector is essentially a functional call, indexing the champlist object with the provided character vector. We can use this directly in purrr::map(.x, .f, …). The function in this case is actually [!
champ_list %>%
purrr::map(`[`,c("name","title","blurb")) %>%
.[2:3]## $Ahri
## $Ahri$name
## [1] "Ahri"
##
## $Ahri$title
## [1] "the Nine-Tailed Fox"
##
## $Ahri$blurb
## [1] "Innately connected to the latent power of Runeterra, Ahri is a vastaya who can reshape magic into orbs of raw energy. She revels in toying with her prey by manipulating their emotions before devouring their life essence. Despite her predatory nature..."
##
##
## $Akali
## $Akali$name
## [1] "Akali"
##
## $Akali$title
## [1] "the Rogue Assassin"
##
## $Akali$blurb
## [1] "Abandoning the Kinkou Order and her title of the Fist of Shadow, Akali now strikes alone, ready to be the deadly weapon her people need. Though she holds onto all she learned from her master Shen, she has pledged to defend Ionia from its enemies, one..."
Make sure that when you enter the [ you are NOT using quotes.
Extracting data into a dataframe
map() is great for extracting data, but ultimately we likely will need to include that data into a more readable format than another list! That is where map_dfr (map dataframe) comes into play. Map_dfr() automatically converts the extracted nested list and converts to a dataframe
champ_list %>%
purrr::map_dfr(`[`,c("name","title","blurb")) %>%
head()Extracting nested data, more than one level deep
Json objects may contain nested json objects. This nesting can theoretically continue down to many levels below the surface. In our dataset example, champions indeed have secondary levels to their data, including sub-lists such as “info”, “image”, and “stats”.
In the below example, we will use the function map_chr() to extract the “HP” stat for each champion, which reside under the “stat” list for each champion json.
champ_list %>%
map_chr(c("stats", "hp")) %>%
.[1:5]## Aatrox Ahri Akali Akshan Alistar
## "580.000000" "526.000000" "500.000000" "560.000000" "600.000000"
using keep(), the select_if for lists
Say you only wanted to analyze list objects based on some condition. purrr::keep() allows you to do this. You simply need to provide a list, and a conditional to follow.
In the below example, we will focus on analyzing only the champs that have a “HP” value above 500
champ_list %>%
map(c("stats","hp")) %>%
keep(~ .x > 500) %>%
.[1:5]## $Aatrox
## [1] 580
##
## $Ahri
## [1] 526
##
## $Akshan
## [1] 560
##
## $Alistar
## [1] 600
##
## $Amumu
## [1] 615
Reversing functions with negate()
This is a bit different than what we’ve explored so far, but purrr package also provides some useful functionality around customization of functions.
Take for example the is.null() function. To create the opposite of this, we can use negate()
lst <- list("a", 3, 22, NULL, "q", NULL)
is_not_null <- negate(is.null)
map_lgl(lst, is_not_null)## [1] TRUE TRUE TRUE FALSE TRUE FALSE