Exploring the purrr package

Introduction

I will explore the purrr package which is a part of the tidverse packages. For an example data set I will be using the World Happiness Report 2021, this can be downloaded from Kaggle.

The purrr package “enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors. If you’ve never heard of FP before, the best place to start is the family of map() functions which allow you to replace many for loops with code that is both more succinct and easier to read”¹.

worldHappinessData <- read.csv(file ="https://raw.githubusercontent.com/bharbans/SPRING2021TIDYVERSE/main/brad_harbans_purr/world-happiness-report.csv")

The purrr package is the tidyverse equivalent to apply functions. They allow for iterations over lists. This is generally preferred over loops in R as it tends to be more concise and easier to read.

Map Functions vs Loops

For this example I will filter the data to only show the information for the year 2019.

worldHappinessData2019 <- worldHappinessData %>% 
  filter( year == '2019')

I will now write a loop to compute the average healthy life expectancy at birth. Please note , there are much better ways to complete this, I am only using this for an example. In fact the mean function is vectorized and the entire code below can be replaced by mean(worldHappinessData2019$Healthy.life.expectancy.at.birth , na.rm = T ). Also. dpylr’s summarise function can obtain summary statistics rather easily.

avgLifeExpectancy <- 0
for( i in worldHappinessData2019$Healthy.life.expectancy.at.birth )
{
  if(! is.na(i) )
  {
    avgLifeExpectancy <- avgLifeExpectancy + i
  }
}

avgLifeExpectancy <- avgLifeExpectancy/ length(worldHappinessData2019$Healthy.life.expectancy.at.birth)

I will now perform the same task using the map function. Note the pipeline, which is common in tidyverse operations. In this instance the ~ indicates a function and . takes the place of the argument of the function. Also note, this can be extended to multiple columns.

worldHappinessData2019 %>% 
  select( "Healthy.life.expectancy.at.birth", "Freedom.to.make.life.choices" ) %>% 
  map(~mean(.,na.rm = TRUE))

## $Healthy.life.expectancy.at.birth
## [1] 65.00391
## 
## $Freedom.to.make.life.choices
## [1] 0.7945524

Below I will use the map function a bit more. I will split the original data frame by year, and run a linear model on each year. I then apply the summary function the results from each model and then again use the map function to obtain the r.squared value for each year.

worldHappinessData %>%  
  split(.$year) %>% 
  map(~lm( `Healthy.life.expectancy.at.birth` ~`Log.GDP.per.capita`  , data = .) ) %>% 
  map(summary) %>% 
  map_df("r.squared") %>% 
  
  reactable()

“Functional Programming Tools” (n.d.)↩︎

Exploring the purrr package

Brad Harbans

4/11/2021

Introduction

Map Functions vs Loops

Conclusion