Overview

In this project we will demonstrate the top functions of the TidyVerse ‘purrr’ package. We will use the Sleep Data by Mathrun Ache found on Kaggle to demonstrate the packages’ main functionalities.

Loading Data

data <- read.csv("C:/Users/Lixander/Data/dataset_2191_sleep.csv")

What is ‘purrr’?

The ‘purrr’ package was created to enhance R’s functional programming toolkit by providing a complete set of tools for working with functions and vectors. https://purrr.tidyverse.org/

Map()

map() is the most commonly used function from the ‘purrr’ package. This function allows R users to apply a function to each element of a list or atomic vector while returning the object of the same length as the input. In the example below we extract the column containing the brain_weight of mammals currently measured in grams ito a list. Given that the body_weight is reported in kilograms, we will transform the brain_weight to kilograms.

Simply put, we create a custom function that takes in a list or vector to be divided by 1000. This function returns the result of the operation which is the brain_weight of mammals measured in kilograms. It is also important to note that the map() function returns the same type of data type fed into it (i.e.: int->int, db->db, etc). Furthermore, the map() has additional variations that have data class output transformations: 1. map_dbl() -double vector 2. map_int() -integer vector 3. map_chr() -character vector 4. map_lgl() -logical vector 5. map_dfc() -return a dataframe created by column binding 6. map_dfr() -return a dataframe created by row binding

#creating custom function
g_kg_func <- function(x) {
  x/1000
}

#extracting the brain_weight column into a list.
brain_weight_lst <- list(data$brain_weight)

#using custom function with map()
brain_weight_kg <- map(brain_weight_lst, g_kg_func)
brain_weight_kg

## [[1]]
##  [1] 5.71200 0.00660 0.04450 0.00570 4.60300 0.17950 0.00030 0.16900 0.02560
## [10] 0.44000 0.00640 0.42300 0.00240 0.41900 0.00120 0.02500 0.00350 0.00500
## [19] 0.01750 0.08100 0.68000 0.11500 0.00100 0.40600 0.32500 0.11950 0.00400
## [28] 0.00550 0.65500 0.15700 0.05600 0.00014 0.00025 1.32000 0.00300 0.00810
## [37] 0.00040 0.00033 0.00630 0.01080 0.49000 0.01550 0.11500 0.01140 0.18000
## [46] 0.01210 0.03920 0.00190 0.05040 0.17900 0.01230 0.02100 0.09820 0.17500
## [55] 0.01250 0.00100 0.00260 0.01230 0.00250 0.05800 0.00390 0.01700

On a dimensionality level, map() allows the aforemention variations to be used with: 1. Two lists, ex: map2_dbl() 2. Many lists, ex: pmap_dbl() 3. Lists and indexes, ex: imap_dbl()

Filtering datasets

The filter functions allow programmers to subset data based on logical tests. For example, we map a function to a list that returns a logical value - TRUE or FALSE. If the value evaluates to TRUE, the value is kept or discarded. See the 2 examples below for more details

#creating custom function
greater_than <- function(x) {
  x >5
}

#Keep values where brain weight is greater than 5 grams
purrr::keep(data$brain_weight,greater_than)

##  [1] 5712.0    6.6   44.5    5.7 4603.0  179.5  169.0   25.6  440.0    6.4
## [11]  423.0  419.0   25.0   17.5   81.0  680.0  115.0  406.0  325.0  119.5
## [21]    5.5  655.0  157.0   56.0 1320.0    8.1    6.3   10.8  490.0   15.5
## [31]  115.0   11.4  180.0   12.1   39.2   50.4  179.0   12.3   21.0   98.2
## [41]  175.0   12.5   12.3   58.0   17.0

#creating custom function
less_than <- function(x) {
  x < 4
}

#Keep values where total sleep is less than 4
purrr::discard(data$total_sleep,greater_than)

##  [1] "3.3"  "12.5" "16.5" "3.9"  "19.7" "14.5" "12.5" "3.9"  "10.3" "3.1" 
## [11] "10.7" "10.7" "18.1" "?"    "3.8"  "14.4" "12"   "13"   "13.8" "2.9" 
## [21] "10.8" "?"    "19.9" "10.6" "11.2" "13.2" "12.8" "19.4" "17.4" "?"   
## [31] "17"   "10.9" "13.7" "12.5" "13.2" "2.6"  "3.8"  "11"   "10.3" "13.3"
## [41] "15.8" "10.3" "19.4" "?"

Flatten()

Flattening is used frequently to reduce dimension of a dataframe. This creates a single list out of all the values found in a dataframe.

#flatten df
flatten <- purrr::flatten(data)
head(flatten, 5)

## [[1]]
## [1] 6654
## 
## [[2]]
## [1] 1
## 
## [[3]]
## [1] 3.385
## 
## [[4]]
## [1] 0.92
## 
## [[5]]
## [1] 2547

length(flatten)

## [1] 496

#Append() Many times the need to combine multiple dataframes will arise without a join. Perhaps we are looking to add more rows of data. This can be accomplished using append(). THis function add your element to the end of the list or dataframe in question.

#appending two lists
appending <- append(brain_weight_kg, brain_weight_lst)
appending

## [[1]]
##  [1] 5.71200 0.00660 0.04450 0.00570 4.60300 0.17950 0.00030 0.16900 0.02560
## [10] 0.44000 0.00640 0.42300 0.00240 0.41900 0.00120 0.02500 0.00350 0.00500
## [19] 0.01750 0.08100 0.68000 0.11500 0.00100 0.40600 0.32500 0.11950 0.00400
## [28] 0.00550 0.65500 0.15700 0.05600 0.00014 0.00025 1.32000 0.00300 0.00810
## [37] 0.00040 0.00033 0.00630 0.01080 0.49000 0.01550 0.11500 0.01140 0.18000
## [46] 0.01210 0.03920 0.00190 0.05040 0.17900 0.01230 0.02100 0.09820 0.17500
## [55] 0.01250 0.00100 0.00260 0.01230 0.00250 0.05800 0.00390 0.01700
## 
## [[2]]
##  [1] 5712.00    6.60   44.50    5.70 4603.00  179.50    0.30  169.00   25.60
## [10]  440.00    6.40  423.00    2.40  419.00    1.20   25.00    3.50    5.00
## [19]   17.50   81.00  680.00  115.00    1.00  406.00  325.00  119.50    4.00
## [28]    5.50  655.00  157.00   56.00    0.14    0.25 1320.00    3.00    8.10
## [37]    0.40    0.33    6.30   10.80  490.00   15.50  115.00   11.40  180.00
## [46]   12.10   39.20    1.90   50.40  179.00   12.30   21.00   98.20  175.00
## [55]   12.50    1.00    2.60   12.30    2.50   58.00    3.90   17.00

Resources

Sleep Data - https://www.kaggle.com/datasets/mathurinache/sleep-dataset
Official ‘purrr’ package cheat sheet - https://github.com/rstudio/cheatsheets/blob/main/purrr.pdf

TidyVerse CREATE assignment - Package Vignette

2022-11-24