Assignment Web APIs

Working with Purrr

In this sample vignette, I will walk through some of the functionalities of the the purrr Tidyverse package using an agriculture dataset from Kaggle.

According the Tidyverse site, purrr “enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.” It replaces a conventional “for loop” with a collection of functions centered around the map function to introduce an easy-to-read, compact syntax, that is powerful and performant.

In this tutorial, I will go over the following functions - map - map_* family - map2 - pmap

Before we get started, lets read in our csv file and prep the farm_data dataframe.

#setwd("")
farm_data <- read.csv('https://raw.githubusercontent.com/mcastro64/d607-assignments/refs/heads/master/assignment-07/agriculture_dataset.csv') %>%
  rename_all(~str_to_lower(.)) %>%    # Convert all column names to lowercase
  rename_all(~str_replace_all(., "\\.", "_")) %>% # Replace spaces with underscore
  rename_all(~str_replace_all(., "_$", "")) # remove trailing _

glimpse(farm_data)

## Rows: 50
## Columns: 10
## $ farm_id                  <chr> "F001", "F002", "F003", "F004", "F005", "F006…
## $ crop_type                <chr> "Cotton", "Carrot", "Sugarcane", "Tomato", "T…
## $ farm_area_acres          <dbl> 329.40, 18.67, 306.03, 380.21, 135.56, 12.50,…
## $ irrigation_type          <chr> "Sprinkler", "Manual", "Flood", "Rain-fed", "…
## $ fertilizer_used_tons     <dbl> 8.14, 4.77, 2.91, 3.32, 8.33, 6.42, 1.83, 5.1…
## $ pesticide_used_kg        <dbl> 2.21, 4.36, 0.56, 4.35, 4.48, 2.25, 2.37, 0.9…
## $ yield_tons               <dbl> 14.44, 42.91, 33.44, 34.08, 43.28, 38.18, 44.…
## $ soil_type                <chr> "Loamy", "Peaty", "Silty", "Silty", "Clay", "…
## $ season                   <chr> "Kharif", "Kharif", "Kharif", "Zaid", "Zaid",…
## $ water_usage_cubic_meters <dbl> 76648.20, 68725.54, 75538.56, 45401.23, 93718…

Using map

The map function makes a list by passing a list or an atomic verctor and a function as parameters in the format map(.x, .f), where .x is our list/vector and .f is our function.

In example 1, we want to use Prrr to calculate the mean for the numerical columns of our sample dataframe. We first pass the dataframe to the dplyr function “select”, then use “where” to select the correct columns using is.numeric to return only the columns with numeric values. We can then pass this list into the map function using the .x parameter and pass function “mean” using the .f parameter. Example 2 shows an abbreviated version of the same calculation.

Example 1

ex1_list <- farm_data |>
  select(where(is.numeric)) 

map(.x = ex1_list, .f = mean)

## $farm_area_acres
## [1] 254.9638
## 
## $fertilizer_used_tons
## [1] 4.9054
## 
## $pesticide_used_kg
## [1] 2.398
## 
## $yield_tons
## [1] 27.0592
## 
## $water_usage_cubic_meters
## [1] 56724.3

Example 2

ex2_list <- farm_data |>
  select(where(is.numeric)) |>
  map(mean)

ex2_list

## $farm_area_acres
## [1] 254.9638
## 
## $fertilizer_used_tons
## [1] 4.9054
## 
## $pesticide_used_kg
## [1] 2.398
## 
## $yield_tons
## [1] 27.0592
## 
## $water_usage_cubic_meters
## [1] 56724.3

Using other functions in the *map**_ family

In addition to the map function that returns a list, Purrr has dedicated functions that return specific vector types. Each functions named with the “map_” prefix followed by an abbreviation of the vector type, as seen in Example 3.

map_lgl() returns a logical vector
map_int() returns an integer vector
map_dbl() returns a double vector
map_chr() returns a character vector

Example 3

farm_area_miles <- map_dbl(farm_data$farm_area_acres, function(.x) {
  return(.x/640)
})

farm_area_miles

##  [1] 0.51468750 0.02917188 0.47817187 0.59407813 0.21181250 0.01953125
##  [7] 0.56259375 0.72593750 0.60839062 0.28807813 0.43742187 0.22706250
## [13] 0.51421875 0.38440625 0.47679687 0.09409375 0.44376562 0.20035937
## [19] 0.72020313 0.09195312 0.58914062 0.14479687 0.02448437 0.75606250
## [25] 0.11818750 0.25356250 0.58609375 0.40029687 0.45081250 0.44768750
## [31] 0.21275000 0.54753125 0.69806250 0.41268750 0.41567187 0.69712500
## [37] 0.24390625 0.67378125 0.34450000 0.26065625 0.57935938 0.65467188
## [43] 0.12310938 0.13143750 0.51045312 0.17625000 0.54321875 0.12092187
## [49] 0.72245313 0.45664063

Returning a Dataframe

Purr can also return a dataframe using the map_df function. Example 4 modifies the map_dbl example to return a dataframe of farm area in acres and miles. This example also uses an inline funciton notation whose variable .x represents the rows of the data variable (“farm_data$farm_area_acres”) that was passed into the map_df functions. The function returns a data frame.

Example 4

map_area_df <- map_df(farm_data$farm_area_acres, function(.x) {
  return(data.frame(acres = .x, miles = .x/640))
})

map_area_df

##     acres      miles
## 1  329.40 0.51468750
## 2   18.67 0.02917188
## 3  306.03 0.47817187
## 4  380.21 0.59407813
## 5  135.56 0.21181250
## 6   12.50 0.01953125
## 7  360.06 0.56259375
## 8  464.60 0.72593750
## 9  389.37 0.60839062
## 10 184.37 0.28807813
## 11 279.95 0.43742187
## 12 145.32 0.22706250
## 13 329.10 0.51421875
## 14 246.02 0.38440625
## 15 305.15 0.47679687
## 16  60.22 0.09409375
## 17 284.01 0.44376562
## 18 128.23 0.20035937
## 19 460.93 0.72020313
## 20  58.85 0.09195312
## 21 377.05 0.58914062
## 22  92.67 0.14479687
## 23  15.67 0.02448437
## 24 483.88 0.75606250
## 25  75.64 0.11818750
## 26 162.28 0.25356250
## 27 375.10 0.58609375
## 28 256.19 0.40029687
## 29 288.52 0.45081250
## 30 286.52 0.44768750
## 31 136.16 0.21275000
## 32 350.42 0.54753125
## 33 446.76 0.69806250
## 34 264.12 0.41268750
## 35 266.03 0.41567187
## 36 446.16 0.69712500
## 37 156.10 0.24390625
## 38 431.22 0.67378125
## 39 220.48 0.34450000
## 40 166.82 0.26065625
## 41 370.79 0.57935938
## 42 418.99 0.65467188
## 43  78.79 0.12310938
## 44  84.12 0.13143750
## 45 326.69 0.51045312
## 46 112.80 0.17625000
## 47 347.66 0.54321875
## 48  77.39 0.12092187
## 49 462.37 0.72245313
## 50 292.25 0.45664063

Handling multiple inputs

Purrr also has two functions for handling more than 1 list at a time: - map2- applies a function to two elements - pmap- applies a function to a list of elements

** Working with map2**

Example 4 shows how to work with map2. We pass two lists into variables .x and .y. These variables are then passed into the function when initializing it and manipulated within its body. The function returns a list.

Example 4

# map2 example
map2_ex <- map2(.x = farm_data$yield_tons, .y = farm_data$farm_area_acres, function(.x, .y) {
  return(.x /.y)
})

# print type and first 3 rows
class(map2_ex)

## [1] "list"

print(map2_ex[1:3])

## [[1]]
## [1] 0.04383728
## 
## [[2]]
## [1] 2.29834
## 
## [[3]]
## [1] 0.1092703

Working with pmap

pmap allows you to loop through more than two by passing in a list of those elements. Rather then use .x or .y, you need to specify the exact names of list elements in your function call. Example 5 generates randomized sample data with pnorm using pmap. The first half of the code chunk is used to create a list for means (means), standard deviations (sds), and sample size (samplesize) from the farm_data dataframe. We then pass these variables into our function when calling pmap.

Example 5

# construct list for this example
calcs_list <- farm_data |>
  group_by(crop_type) |>
  mutate(
    means = mean(yield_tons),
    sds = sd(yield_tons),
    samplesize = 50
  ) |>
  subset(select=c(means, sds, samplesize))

# pmap_example
set.seed(1234)
pmap_ex <- pmap(calcs_list, function(means, sds, samplesize) {
  data.frame(sample = rnorm(n=samplesize, mean=means, sd=sds))
})

ggplot(pmap_ex[[1]], aes(x=sample))+
  geom_histogram(bins = 50)

ggplot(pmap_ex[[2]], aes(x=sample))+
  geom_histogram(bins = 50)

### Nesting and combining map functions

The Purrr functions can also be used in combination with other Tidyverse packages and functions. Example 6 uses the map function within mutate to add a linear model using the lm function to every “crop_type” in our farm data. Before we can pass the data to map, the example also uses nest to convert the grouped dataframe into a list where the group is in the first column and the remaining columns from our original dataframe are “nested” into the second column of our list. The returned object adds the result of lms as a third column, which can be further cleaned up and unnest’ed for other applications.

Example 6

farm_data_lms <- farm_data |> 
  group_by(crop_type) |> 
  nest() |>
  mutate(lms = map(data, ~lm(yield_tons ~ fertilizer_used_tons + pesticide_used_kg, data = .)))

farm_data_lms[[3]][1]

## [[1]]
## 
## Call:
## lm(formula = yield_tons ~ fertilizer_used_tons + pesticide_used_kg, 
##     data = .)
## 
## Coefficients:
##          (Intercept)  fertilizer_used_tons     pesticide_used_kg  
##               51.305                -2.207                -7.480

Assignment Web APIs

2024-10-29

Working with Purrr

Using map

Using other functions in the *map**_ family

Returning a Dataframe

Handling multiple inputs

Conclusion

Assignment Web APIs

2024-10-29

Working with Purrr

Using map

Using other functions in the map*_ family

Returning a Dataframe

Handling multiple inputs

Conclusion

Using other functions in the *map**_ family