TidyVerse 02

The purrr package is a convenient tool for manipulating lists with functions. This assignment is a demonstration of how to use the functions in the purrr package.

Load the purrr library and the dplyr library as an assistant. We will use the college_all_ages data set from the fivethirtyeight package.

library(knitr)

## Warning: package 'knitr' was built under R version 3.6.3

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 3.6.3

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0

## Warning: package 'ggplot2' was built under R version 3.6.3

## Warning: package 'tibble' was built under R version 3.6.3

## Warning: package 'tidyr' was built under R version 3.6.3

## Warning: package 'readr' was built under R version 3.6.3

## Warning: package 'purrr' was built under R version 3.6.3

## Warning: package 'dplyr' was built under R version 3.6.3

## Warning: package 'stringr' was built under R version 3.6.3

## Warning: package 'forcats' was built under R version 3.6.3

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(fivethirtyeight)

## Warning: package 'fivethirtyeight' was built under R version 3.6.3

## Some larger datasets need to be installed separately, like senators and
## house_district_forecast. To install these, we recommend you install the
## fivethirtyeightdata package by running:
## install.packages('fivethirtyeightdata', repos =
## 'https://fivethirtyeightdata.github.io/drat/', type = 'source')

First, let look at the first few rows of the college_all_ages data set

kable(head(college_all_ages))

major_code	major	major_category	total	employed	employed_fulltime_yearround	unemployed	unemployment_rate	p25th	median	p75th
1100	General Agriculture	Agriculture & Natural Resources	128148	90245	74078	2423	0.0261471	34000	50000	80000
1101	Agriculture Production And Management	Agriculture & Natural Resources	95326	76865	64240	2266	0.0286361	36000	54000	80000
1102	Agricultural Economics	Agriculture & Natural Resources	33955	26321	22810	821	0.0302483	40000	63000	98000
1103	Animal Sciences	Agriculture & Natural Resources	103549	81177	64937	3619	0.0426789	30000	46000	72000
1104	Food Science	Agriculture & Natural Resources	24280	17281	12722	894	0.0491884	38500	62000	90000
1105	Plant Science And Agronomy	Agriculture & Natural Resources	79409	63043	51077	2070	0.0317909	35000	50000	75000

Extract the unemployment_rate column as a list

major_unemp <- college_all_ages$unemployment_rate

The set_names function is used to assign a name to each of the element in a list.
In this example, I assign the names by their corresponding majors

major_unemp <- set_names(major_unemp,college_all_ages$major)
head(major_unemp)

##                   General Agriculture Agriculture Production And Management 
##                            0.02614711                            0.02863606 
##                Agricultural Economics                       Animal Sciences 
##                            0.03024832                            0.04267890 
##                          Food Science            Plant Science And Agronomy 
##                            0.04918845                            0.03179089

Section 1 - select particular elements from a list:
1.pluck() —- select an element by name or index.
For example: We can select the first element by

pluck(major_unemp, "General Agriculture")

## [1] 0.02614711

# or
pluck(major_unemp, 1)

## [1] 0.02614711

2.keep() —- select all elements that pass a logicl test.
For example: We can select all elements whose value is greater than 0.05

res <- keep(major_unemp, function(x) x > 0.05)
head(res)

##                         Soil Science                Environmental Science 
##                           0.05086705                           0.05128983 
##         Natural Resources Management                         Architecture 
##                           0.05434128                           0.08599113 
## Area Ethnic And Civilization Studies                       Communications 
##                           0.06793896                           0.06436031

3.discard() —- select all elements that DO NOT pass a logicl test.
For example: We can select all elements whose value is NOT greater than 0.05

res <- discard(major_unemp, function(x) x > 0.05)
head(res)

##                   General Agriculture Agriculture Production And Management 
##                            0.02614711                            0.02863606 
##                Agricultural Economics                       Animal Sciences 
##                            0.03024832                            0.04267890 
##                          Food Science            Plant Science And Agronomy 
##                            0.04918845                            0.03179089

4.head_while() —- select all elements from the beginning of a list up to and exclude the element that does not pass a logical test.
For example: we can select all elements whose value is greater than 0.02 from the beginning of a list

res <- head_while(major_unemp, function(x) x > 0.02)
head(res)

##                   General Agriculture Agriculture Production And Management 
##                            0.02614711                            0.02863606 
##                Agricultural Economics                       Animal Sciences 
##                            0.03024832                            0.04267890 
##                          Food Science            Plant Science And Agronomy 
##                            0.04918845                            0.03179089

5.tail_while() —- select all elements from the end of a list up to and exclude the element that does not pass a logical test.
For example: we can select all elements whose value is greater than 0.05 from the end of a list

res <- tail_while(major_unemp, function(x) x > 0.05)
head(res)

## Miscellaneous Business & Medical Administration 
##                                      0.05267856 
##                                         History 
##                                      0.06585101 
##                           United States History 
##                                      0.07349961

Section 2 - logical operations:
1.every() —- Check if all elements in a list pass a logical test
For example: we can check if all elements are greater than 0

every(major_unemp,function(x) x > 0)

## [1] FALSE

2.some() —- Check if some elements in a list pass a logical test
For example: we can check if some elements are 0

some(major_unemp,function(x) x == 0)

## [1] TRUE

3.some() —- Check if a value is an element of a list
For example: we can check if 0 is in the list or not

has_element(major_unemp,0)

## [1] TRUE

4.detect() —- Find the first element that passes a logical test
For example: we can find the first element that is greater than 0.08

detect(major_unemp,function(x) x > 0.08)

## [1] 0.08599113

5.detect_index() —- Find the index of the first element that passes a logical test
For example: we can find index of the first element that is greater than 0.08

detect_index(major_unemp,function(x) x > 0.08)

## [1] 12

Section 3 - applying function conditionally:
1.modify() —- Apply a function to each of the elements in a list
For example: multiply all values in the list by 100

res <- modify(major_unemp, function(x) x * 100)
head(res)

##                   General Agriculture Agriculture Production And Management 
##                              2.614711                              2.863606 
##                Agricultural Economics                       Animal Sciences 
##                              3.024832                              4.267890 
##                          Food Science            Plant Science And Agronomy 
##                              4.918845                              3.179089

2.modify_at() —- Apply a function to an element by name or index
For exmaple: multiply the first element by 100

res <- modify_at(major_unemp,1, function(x) x * 100)
head(res)

##                   General Agriculture Agriculture Production And Management 
##                            2.61471060                            0.02863606 
##                Agricultural Economics                       Animal Sciences 
##                            0.03024832                            0.04267890 
##                          Food Science            Plant Science And Agronomy 
##                            0.04918845                            0.03179089

3.modify_if() —- Apply a function to elements that pass a logical test
For exmaple: multiply the elements that are greater than 0.05 by 100

res <- modify_if(major_unemp, function(x) x > 0.05, function(x) x *100)
tail(res)

##                          International Business 
##                                      7.13537080 
##                          Hospitality Management 
##                                      5.14469830 
##   Management Information Systems And Statistics 
##                                      0.04397714 
## Miscellaneous Business & Medical Administration 
##                                      5.26785610 
##                                         History 
##                                      6.58510060 
##                           United States History 
##                                      7.34996100

Section 4 - lists combine:
1.append() —- Append another list to the end of a list
For example: Append the 3rd and 4th elements to the 1st and 2nd elements as a list

append(major_unemp[1:2], major_unemp[3:4])

##                   General Agriculture Agriculture Production And Management 
##                            0.02614711                            0.02863606 
##                Agricultural Economics                       Animal Sciences 
##                            0.03024832                            0.04267890

2.prepend() —- Add another list to the beginning of a list
For example: Add the 3rd and 4th elements to the beginning of the 1st and 2nd elements as a list

prepend(major_unemp[1:2], major_unemp[3:4])

##                Agricultural Economics                       Animal Sciences 
##                            0.03024832                            0.04267890 
##                   General Agriculture Agriculture Production And Management 
##                            0.02614711                            0.02863606

3.splice(): combine all lists and other elements into a list
For example: Combine the 1st and 2nd elements as a list, the 3rd and 4th elements as a list and a new element “0.123” into one list

splice(major_unemp[1:2], major_unemp[3:4], "new value" = "0.123")

## [[1]]
##                   General Agriculture Agriculture Production And Management 
##                            0.02614711                            0.02863606 
## 
## [[2]]
## Agricultural Economics        Animal Sciences 
##             0.03024832             0.04267890 
## 
## $`new value`
## [1] "0.123"

Other functions:

cross2() —- Using elements from 2 lists, create a list of all possible combinations with 1 element for the first list and another element from the second list
For example: create all combinations by using the first 2 values from the major column and the first 2 values from the unemployment rate column

cross2(college_all_ages$major[1:2],college_all_ages$unemployment_rate[1:2])

## [[1]]
## [[1]][[1]]
## [1] "General Agriculture"
## 
## [[1]][[2]]
## [1] 0.02614711
## 
## 
## [[2]]
## [[2]][[1]]
## [1] "Agriculture Production And Management"
## 
## [[2]][[2]]
## [1] 0.02614711
## 
## 
## [[3]]
## [[3]][[1]]
## [1] "General Agriculture"
## 
## [[3]][[2]]
## [1] 0.02863606
## 
## 
## [[4]]
## [[4]][[1]]
## [1] "Agriculture Production And Management"
## 
## [[4]][[2]]
## [1] 0.02863606

array_tree() —- Turn an array into a list

class(major_unemp)

## [1] "numeric"

res <- array_tree(major_unemp)
class(res)

## [1] "list"

map2() —- apply a function to each pair of elements from 2 lists and return the results as a list
For example: we combine 1 element from the major_category list and a corresponding element from the unemployment_rate list into a list. The result will be a list of 2-element lists.

major_unemp_cate <- map2(college_all_ages$major_category, college_all_ages$unemployment_rate, function(x,y) list(x,y))
major_unemp_cate <- set_names(major_unemp_cate,college_all_ages$major)
head(major_unemp_cate)

## $`General Agriculture`
## $`General Agriculture`[[1]]
## [1] "Agriculture & Natural Resources"
## 
## $`General Agriculture`[[2]]
## [1] 0.02614711
## 
## 
## $`Agriculture Production And Management`
## $`Agriculture Production And Management`[[1]]
## [1] "Agriculture & Natural Resources"
## 
## $`Agriculture Production And Management`[[2]]
## [1] 0.02863606
## 
## 
## $`Agricultural Economics`
## $`Agricultural Economics`[[1]]
## [1] "Agriculture & Natural Resources"
## 
## $`Agricultural Economics`[[2]]
## [1] 0.03024832
## 
## 
## $`Animal Sciences`
## $`Animal Sciences`[[1]]
## [1] "Agriculture & Natural Resources"
## 
## $`Animal Sciences`[[2]]
## [1] 0.0426789
## 
## 
## $`Food Science`
## $`Food Science`[[1]]
## [1] "Agriculture & Natural Resources"
## 
## $`Food Science`[[2]]
## [1] 0.04918845
## 
## 
## $`Plant Science And Agronomy`
## $`Plant Science And Agronomy`[[1]]
## [1] "Agriculture & Natural Resources"
## 
## $`Plant Science And Agronomy`[[2]]
## [1] 0.03179089

invoke_map() —- given a list of functions and a list of elements of the same length, apply each of the function to a corresponding element. For example: we can multiply the first element by 100 and multiply the second element by -1

res <- invoke_map(list(function(x) x * 100, function(y) y * -1), major_unemp[1:2])
head(res)

## [[1]]
## [1] 2.614711
## 
## [[2]]
## [1] -0.02863606

flatten() —- remove a level of indexes from a list
For example, we can flatten the just created list of 2-element lists into a single list

length(major_unemp_cate)

## [1] 173

res <-flatten(major_unemp_cate)
length(res)

## [1] 346

transpose() —- Transposes the index order in a multi-level list
For example, we can Transposes just created the list of 2-element lists

length(major_unemp_cate)

## [1] 173

res <- transpose(major_unemp_cate)
length(res)

## [1] 2

reduce() —- apply a functin recursively to each element of a list
For example: we can add up all elements in a list recursively

reduce(major_unemp, sum)

## [1] 9.922493

accumulate() —- the same as reduce(), but also return all intermediate results as a list

res <- accumulate(major_unemp, sum)
tail(res)

##                          International Business 
##                                        9.635040 
##                          Hospitality Management 
##                                        9.686487 
##   Management Information Systems And Statistics 
##                                        9.730464 
## Miscellaneous Business & Medical Administration 
##                                        9.783143 
##                                         History 
##                                        9.848994 
##                           United States History 
##                                        9.922493

===========================================================================================

Extended purrr - Euclid Zhang.RMD By Shovan Biswas

Added dplyr function select().

select capability tutorial

Description: Using select we can keep the selected variables
Usage: select(data, …)
Example: To keep only major_category, total we can pass college_all_ages to the function.

major_category1 <- college_all_ages %>% select(major_category, total)
major_category1

## # A tibble: 173 x 2
##    major_category                   total
##    <chr>                            <int>
##  1 Agriculture & Natural Resources 128148
##  2 Agriculture & Natural Resources  95326
##  3 Agriculture & Natural Resources  33955
##  4 Agriculture & Natural Resources 103549
##  5 Agriculture & Natural Resources  24280
##  6 Agriculture & Natural Resources  79409
##  7 Agriculture & Natural Resources   6586
##  8 Agriculture & Natural Resources   8549
##  9 Biology & Life Science          106106
## 10 Agriculture & Natural Resources  69447
## # ... with 163 more rows

Added dplyr function summarise().

summarise capability tutorial

Description: Summarizes numerics, over another parameter.
Usage: summarise(data, …)
Example: Summarizing total over major_category.

total1 <- major_category1 %>% group_by(major_category) %>% summarise(Total = sum(total))

## `summarise()` ungrouping output (override with `.groups` argument)

kable(total1)

major_category	Total
Agriculture & Natural Resources	632437
Arts	1805865
Biology & Life Science	1338186
Business	9858741
Communications & Journalism	1803822
Computers & Mathematics	1781378
Education	4700118
Engineering	3576013
Health	2950859
Humanities & Liberal Arts	3738335
Industrial Arts & Consumer Services	1033798
Interdisciplinary	45199
Law & Public Policy	902926
Physical Sciences	1025318
Psychology & Social Work	1987278
Social Science	2654125

Added tidyverse function top_n().

top_n capability tutorial

Description: top_n selcts top_n(n) number of records.
Usage: top_n(n)
Example: top_n(10.

topten <- college_all_ages %>% group_by(major) %>% summarise(Employed = sum(employed)) %>% top_n(10)

## `summarise()` ungrouping output (override with `.groups` argument)

## Selecting by Employed

topten

## # A tibble: 10 x 2
##    major                                  Employed
##    <chr>                                     <int>
##  1 Accounting                              1335825
##  2 Business Management And Administration  2354398
##  3 Communications                           790696
##  4 Elementary Education                     819393
##  5 English Language And Literature          708882
##  6 General Business                        1580978
##  7 General Education                        843693
##  8 Marketing And Marketing Research         890125
##  9 Nursing                                 1325711
## 10 Psychology                              1055854

Added tidyverse function ggplot().

ggplot capability tutorial

ggplot(topten, aes(x = reorder(major,Employed), y = Employed)) +
    geom_bar(stat = "identity", position = position_dodge(),  fill="steelblue") +
    geom_text(aes(label = Employed), vjust = .5, hjust = 1, position = position_dodge(width = 0.9), color = "black") +
        ggtitle("Number of students, employed by top_(10) major") +
    xlab("major") + ylab("employed") +
        coord_flip()

Marker: 607-13B