Vignette for Purrr

Author

Rashad Long

Introduction

The purrr package provides functions that eliminate the need for many common for loops. They are more consistent and thus easier to learn than many of the alternative functions in the base R package. It allows you to generalize a solution to every element in a list. It also allows you get lots of small pieces and compose them together with the pipe.

# Load the purr package along with other tidyverse packages
if(!require("tidyverse")) {install.packages("tidyverse"); library("tidyverse")}

Map Functions

The purr package provides functions that allows you to loop over a vector, do something to each element, and save the results

  • map() makes a list.

  • map_lgl() makes a logical vector.

  • map_int() makes an integer vector.

  • map_dbl() makes a double vector.

  • map_chr() makes a character vector.

The map functions make it easier to write and to read than for loops.

We will import a dataset from Kaggle that contains information about NFL passing statistics from players since 2001.

# Lets import the passing_cleaned csv
passing_stats <- read.csv("passing_cleaned.csv")
passing_stats <- as_tibble(passing_stats)

# Lets extract the variables Age, G, Yds, TD, Cmp, Int, Rate
passing_stats <- passing_stats |> 
  select(Age, G, Yds, TD, Cmp, Int, Rate) |> 
  arrange(desc(Yds))

passing_stats
# A tibble: 2,350 × 7
     Age     G   Yds    TD   Cmp   Int  Rate
   <int> <int> <int> <int> <int> <int> <dbl>
 1    37    16  5477    55   450    10 115. 
 2    32    16  5476    46   468    14 111. 
 3    44    17  5316    43   485    12 102. 
 4    27    17  5250    41   435    12 105. 
 5    34    16  5235    39   401    12 106. 
 6    37    16  5208    37   471    15 102. 
 7    33    16  5177    43   422    19  96.3
 8    34    16  5162    39   446    12 105. 
 9    36    16  5129    34   452    16  96.5
10    25    16  5109    33   380    30  84.3
# ℹ 2,340 more rows

Imagine if we wanted to compute the mean, median and sd for every column. We can use the map function to do this. Since they will generate doubles we need to us map_dbl().

# Compute mean for every column
passing_stats_mean <- passing_stats |> 
  map_dbl(mean, na.rm = TRUE)

# Compute median for every column
passing_stats_median <- passing_stats |> 
  map_dbl(median, na.rm = TRUE)

# Compute sd for every column
passing_stats_sd <- passing_stats |>
  map_dbl(sd, na.rm = TRUE)

passing_stats_mean
        Age           G         Yds          TD         Cmp         Int 
  28.131489   10.342979 1204.897021    7.322979  106.064681    4.600426 
       Rate 
  77.791830 
passing_stats_median
   Age      G    Yds     TD    Cmp    Int   Rate 
 27.00  12.00 295.00   1.00  27.00   2.00  80.05 
passing_stats_sd
        Age           G         Yds          TD         Cmp         Int 
   4.356133    5.632477 1532.527920   10.252285  133.591048    5.672478 
       Rate 
  31.949639 

As you can see the map function allowed us to apply some solution to an entire data-frame in one command!