9_3_purrr_style

9.3 Purrr style

Let’s try fitting models to each subset of a dataset and extract corresponding coefficient, the slope!

Let’s begin by split mtcars by cylinder values and store them in a list!

by_cyl <- split(mtcars, mtcars$cyl)

we can easily do so:

by_cyl %>% 
  map(~lm(formula = mpg ~ wt, data = .x)) %>% # returns a list of lm fit object
  map(coef) %>% # returns a list of coefficient df of lm fits.
  map_dbl(2) # for each of the coefficient we index the second one which is slope!
        4         6         8 
-5.647025 -2.780106 -2.192438 

imagine if we did not have map ):

We would do

by_cyl %>% 
  lapply(FUN = function(data){lm(formula = mpg ~ wt, data = data)} ) %>% 
  vapply(FUN = function(fit) {coefficients = coef(fit)
  return(coefficients[[1]])}, FUN.VALUE = numeric(1))# telling vapply that my output length would be 1 (the slope)
       4        6        8 
39.57120 28.40884 23.86803 

Imagine we also do not have pipe D:

We would have to do

fit_list <- lapply(X = by_cyl, FUN = function(data) lm(formula = mpg ~ wt, data = data))
vapply(fit_list, FUN = function(fit){coef(fit)[[2]]}, FUN.VALUE = numeric(1))
        4         6         8 
-5.647025 -2.780106 -2.192438 

Finally, imagine we do not know apply family and would have to do this using a for loop!

slopes <- NULL

for(i in 1:length(by_cyl)){
  model = lm(data = by_cyl[[i]], formula = mpg ~ wt)
  slopes[i] = coef(model)[[2]]
}

takeaway:

as we move from map family >>>>> apply family >>>>> loop

we essentially moved from 3 iterations >>>>> 2 iterations >>>>> 1 iteration.

We prefer to do more iterations with simpler step in each instead of a complex compacted iteration. Since the former is easier to read and later modify.