# Load the purr package along with other tidyverse packages
if(!require("tidyverse")) {install.packages("tidyverse"); library("tidyverse")}Vignette for Purrr
Introduction
The purrr package provides functions that eliminate the need for many common for loops. They are more consistent and thus easier to learn than many of the alternative functions in the base R package. It allows you to generalize a solution to every element in a list. It also allows you get lots of small pieces and compose them together with the pipe.
Map Functions
The purr package provides functions that allows you to loop over a vector, do something to each element, and save the results
map()makes a list.map_lgl()makes a logical vector.map_int()makes an integer vector.map_dbl()makes a double vector.map_chr()makes a character vector.
The map functions make it easier to write and to read than for loops.
We will import a dataset from Kaggle that contains information about NFL passing statistics from players since 2001.
# Lets import the passing_cleaned csv
passing_stats <- read.csv("passing_cleaned.csv")
passing_stats <- as_tibble(passing_stats)
# Lets extract the variables Age, G, Yds, TD, Cmp, Int, Rate
passing_stats <- passing_stats |>
select(Age, G, Yds, TD, Cmp, Int, Rate) |>
arrange(desc(Yds))
passing_stats# A tibble: 2,350 × 7
Age G Yds TD Cmp Int Rate
<int> <int> <int> <int> <int> <int> <dbl>
1 37 16 5477 55 450 10 115.
2 32 16 5476 46 468 14 111.
3 44 17 5316 43 485 12 102.
4 27 17 5250 41 435 12 105.
5 34 16 5235 39 401 12 106.
6 37 16 5208 37 471 15 102.
7 33 16 5177 43 422 19 96.3
8 34 16 5162 39 446 12 105.
9 36 16 5129 34 452 16 96.5
10 25 16 5109 33 380 30 84.3
# ℹ 2,340 more rows
Imagine if we wanted to compute the mean, median and sd for every column. We can use the map function to do this. Since they will generate doubles we need to us map_dbl().
# Compute mean for every column
passing_stats_mean <- passing_stats |>
map_dbl(mean, na.rm = TRUE)
# Compute median for every column
passing_stats_median <- passing_stats |>
map_dbl(median, na.rm = TRUE)
# Compute sd for every column
passing_stats_sd <- passing_stats |>
map_dbl(sd, na.rm = TRUE)
passing_stats_mean Age G Yds TD Cmp Int
28.131489 10.342979 1204.897021 7.322979 106.064681 4.600426
Rate
77.791830
passing_stats_median Age G Yds TD Cmp Int Rate
27.00 12.00 295.00 1.00 27.00 2.00 80.05
passing_stats_sd Age G Yds TD Cmp Int
4.356133 5.632477 1532.527920 10.252285 133.591048 5.672478
Rate
31.949639
As you can see the map function allowed us to apply some solution to an entire data-frame in one command!