TidyVerse 02
The purrr package is a convenient tool for manipulating lists with functions. This assignment is a demonstration of how to use the functions in the purrr package.
Load the purrr library and the dplyr library as an assistant. We will use the college_all_ages data set from the fivethirtyeight package.
## Warning: package 'knitr' was built under R version 3.6.3
## Warning: package 'tidyverse' was built under R version 3.6.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tibble' was built under R version 3.6.3
## Warning: package 'tidyr' was built under R version 3.6.3
## Warning: package 'readr' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'stringr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Warning: package 'fivethirtyeight' was built under R version 3.6.3
## Some larger datasets need to be installed separately, like senators and
## house_district_forecast. To install these, we recommend you install the
## fivethirtyeightdata package by running:
## install.packages('fivethirtyeightdata', repos =
## 'https://fivethirtyeightdata.github.io/drat/', type = 'source')
First, let look at the first few rows of the college_all_ages data set
| major_code | major | major_category | total | employed | employed_fulltime_yearround | unemployed | unemployment_rate | p25th | median | p75th |
|---|---|---|---|---|---|---|---|---|---|---|
| 1100 | General Agriculture | Agriculture & Natural Resources | 128148 | 90245 | 74078 | 2423 | 0.0261471 | 34000 | 50000 | 80000 |
| 1101 | Agriculture Production And Management | Agriculture & Natural Resources | 95326 | 76865 | 64240 | 2266 | 0.0286361 | 36000 | 54000 | 80000 |
| 1102 | Agricultural Economics | Agriculture & Natural Resources | 33955 | 26321 | 22810 | 821 | 0.0302483 | 40000 | 63000 | 98000 |
| 1103 | Animal Sciences | Agriculture & Natural Resources | 103549 | 81177 | 64937 | 3619 | 0.0426789 | 30000 | 46000 | 72000 |
| 1104 | Food Science | Agriculture & Natural Resources | 24280 | 17281 | 12722 | 894 | 0.0491884 | 38500 | 62000 | 90000 |
| 1105 | Plant Science And Agronomy | Agriculture & Natural Resources | 79409 | 63043 | 51077 | 2070 | 0.0317909 | 35000 | 50000 | 75000 |
Extract the unemployment_rate column as a list
The set_names function is used to assign a name to each of the element in a list.
In this example, I assign the names by their corresponding majors
## General Agriculture Agriculture Production And Management
## 0.02614711 0.02863606
## Agricultural Economics Animal Sciences
## 0.03024832 0.04267890
## Food Science Plant Science And Agronomy
## 0.04918845 0.03179089
Section 1 - select particular elements from a list:
1.pluck() —- select an element by name or index.
For example: We can select the first element by
## [1] 0.02614711
## [1] 0.02614711
2.keep() —- select all elements that pass a logicl test.
For example: We can select all elements whose value is greater than 0.05
## Soil Science Environmental Science
## 0.05086705 0.05128983
## Natural Resources Management Architecture
## 0.05434128 0.08599113
## Area Ethnic And Civilization Studies Communications
## 0.06793896 0.06436031
3.discard() —- select all elements that DO NOT pass a logicl test.
For example: We can select all elements whose value is NOT greater than 0.05
## General Agriculture Agriculture Production And Management
## 0.02614711 0.02863606
## Agricultural Economics Animal Sciences
## 0.03024832 0.04267890
## Food Science Plant Science And Agronomy
## 0.04918845 0.03179089
4.head_while() —- select all elements from the beginning of a list up to and exclude the element that does not pass a logical test.
For example: we can select all elements whose value is greater than 0.02 from the beginning of a list
## General Agriculture Agriculture Production And Management
## 0.02614711 0.02863606
## Agricultural Economics Animal Sciences
## 0.03024832 0.04267890
## Food Science Plant Science And Agronomy
## 0.04918845 0.03179089
5.tail_while() —- select all elements from the end of a list up to and exclude the element that does not pass a logical test.
For example: we can select all elements whose value is greater than 0.05 from the end of a list
## Miscellaneous Business & Medical Administration
## 0.05267856
## History
## 0.06585101
## United States History
## 0.07349961
Section 2 - logical operations:
1.every() —- Check if all elements in a list pass a logical test
For example: we can check if all elements are greater than 0
## [1] FALSE
2.some() —- Check if some elements in a list pass a logical test
For example: we can check if some elements are 0
## [1] TRUE
3.some() —- Check if a value is an element of a list
For example: we can check if 0 is in the list or not
## [1] TRUE
4.detect() —- Find the first element that passes a logical test
For example: we can find the first element that is greater than 0.08
## [1] 0.08599113
5.detect_index() —- Find the index of the first element that passes a logical test
For example: we can find index of the first element that is greater than 0.08
## [1] 12
Section 3 - applying function conditionally:
1.modify() —- Apply a function to each of the elements in a list
For example: multiply all values in the list by 100
## General Agriculture Agriculture Production And Management
## 2.614711 2.863606
## Agricultural Economics Animal Sciences
## 3.024832 4.267890
## Food Science Plant Science And Agronomy
## 4.918845 3.179089
2.modify_at() —- Apply a function to an element by name or index
For exmaple: multiply the first element by 100
## General Agriculture Agriculture Production And Management
## 2.61471060 0.02863606
## Agricultural Economics Animal Sciences
## 0.03024832 0.04267890
## Food Science Plant Science And Agronomy
## 0.04918845 0.03179089
3.modify_if() —- Apply a function to elements that pass a logical test
For exmaple: multiply the elements that are greater than 0.05 by 100
## International Business
## 7.13537080
## Hospitality Management
## 5.14469830
## Management Information Systems And Statistics
## 0.04397714
## Miscellaneous Business & Medical Administration
## 5.26785610
## History
## 6.58510060
## United States History
## 7.34996100
Section 4 - lists combine:
1.append() —- Append another list to the end of a list
For example: Append the 3rd and 4th elements to the 1st and 2nd elements as a list
## General Agriculture Agriculture Production And Management
## 0.02614711 0.02863606
## Agricultural Economics Animal Sciences
## 0.03024832 0.04267890
2.prepend() —- Add another list to the beginning of a list
For example: Add the 3rd and 4th elements to the beginning of the 1st and 2nd elements as a list
## Agricultural Economics Animal Sciences
## 0.03024832 0.04267890
## General Agriculture Agriculture Production And Management
## 0.02614711 0.02863606
3.splice(): combine all lists and other elements into a list
For example: Combine the 1st and 2nd elements as a list, the 3rd and 4th elements as a list and a new element “0.123” into one list
## [[1]]
## General Agriculture Agriculture Production And Management
## 0.02614711 0.02863606
##
## [[2]]
## Agricultural Economics Animal Sciences
## 0.03024832 0.04267890
##
## $`new value`
## [1] "0.123"
Other functions:
cross2() —- Using elements from 2 lists, create a list of all possible combinations with 1 element for the first list and another element from the second list
For example: create all combinations by using the first 2 values from the major column and the first 2 values from the unemployment rate column
## [[1]]
## [[1]][[1]]
## [1] "General Agriculture"
##
## [[1]][[2]]
## [1] 0.02614711
##
##
## [[2]]
## [[2]][[1]]
## [1] "Agriculture Production And Management"
##
## [[2]][[2]]
## [1] 0.02614711
##
##
## [[3]]
## [[3]][[1]]
## [1] "General Agriculture"
##
## [[3]][[2]]
## [1] 0.02863606
##
##
## [[4]]
## [[4]][[1]]
## [1] "Agriculture Production And Management"
##
## [[4]][[2]]
## [1] 0.02863606
array_tree() —- Turn an array into a list
## [1] "numeric"
## [1] "list"
map2() —- apply a function to each pair of elements from 2 lists and return the results as a list
For example: we combine 1 element from the major_category list and a corresponding element from the unemployment_rate list into a list. The result will be a list of 2-element lists.
major_unemp_cate <- map2(college_all_ages$major_category, college_all_ages$unemployment_rate, function(x,y) list(x,y))
major_unemp_cate <- set_names(major_unemp_cate,college_all_ages$major)
head(major_unemp_cate)## $`General Agriculture`
## $`General Agriculture`[[1]]
## [1] "Agriculture & Natural Resources"
##
## $`General Agriculture`[[2]]
## [1] 0.02614711
##
##
## $`Agriculture Production And Management`
## $`Agriculture Production And Management`[[1]]
## [1] "Agriculture & Natural Resources"
##
## $`Agriculture Production And Management`[[2]]
## [1] 0.02863606
##
##
## $`Agricultural Economics`
## $`Agricultural Economics`[[1]]
## [1] "Agriculture & Natural Resources"
##
## $`Agricultural Economics`[[2]]
## [1] 0.03024832
##
##
## $`Animal Sciences`
## $`Animal Sciences`[[1]]
## [1] "Agriculture & Natural Resources"
##
## $`Animal Sciences`[[2]]
## [1] 0.0426789
##
##
## $`Food Science`
## $`Food Science`[[1]]
## [1] "Agriculture & Natural Resources"
##
## $`Food Science`[[2]]
## [1] 0.04918845
##
##
## $`Plant Science And Agronomy`
## $`Plant Science And Agronomy`[[1]]
## [1] "Agriculture & Natural Resources"
##
## $`Plant Science And Agronomy`[[2]]
## [1] 0.03179089
invoke_map() —- given a list of functions and a list of elements of the same length, apply each of the function to a corresponding element. For example: we can multiply the first element by 100 and multiply the second element by -1
## [[1]]
## [1] 2.614711
##
## [[2]]
## [1] -0.02863606
flatten() —- remove a level of indexes from a list
For example, we can flatten the just created list of 2-element lists into a single list
## [1] 173
## [1] 346
transpose() —- Transposes the index order in a multi-level list
For example, we can Transposes just created the list of 2-element lists
## [1] 173
## [1] 2
reduce() —- apply a functin recursively to each element of a list
For example: we can add up all elements in a list recursively
## [1] 9.922493
accumulate() —- the same as reduce(), but also return all intermediate results as a list
## International Business
## 9.635040
## Hospitality Management
## 9.686487
## Management Information Systems And Statistics
## 9.730464
## Miscellaneous Business & Medical Administration
## 9.783143
## History
## 9.848994
## United States History
## 9.922493
===========================================================================================
Extended purrr - Euclid Zhang.RMD By Shovan Biswas
Added dplyr function select().
select capability tutorial
Description: Using select we can keep the selected variables
Usage: select(data, …)
Example: To keep only major_category, total we can pass college_all_ages to the function.
## # A tibble: 173 x 2
## major_category total
## <chr> <int>
## 1 Agriculture & Natural Resources 128148
## 2 Agriculture & Natural Resources 95326
## 3 Agriculture & Natural Resources 33955
## 4 Agriculture & Natural Resources 103549
## 5 Agriculture & Natural Resources 24280
## 6 Agriculture & Natural Resources 79409
## 7 Agriculture & Natural Resources 6586
## 8 Agriculture & Natural Resources 8549
## 9 Biology & Life Science 106106
## 10 Agriculture & Natural Resources 69447
## # ... with 163 more rows
Added dplyr function summarise().
summarise capability tutorial
Description: Summarizes numerics, over another parameter.
Usage: summarise(data, …)
Example: Summarizing total over major_category.
## `summarise()` ungrouping output (override with `.groups` argument)
| major_category | Total |
|---|---|
| Agriculture & Natural Resources | 632437 |
| Arts | 1805865 |
| Biology & Life Science | 1338186 |
| Business | 9858741 |
| Communications & Journalism | 1803822 |
| Computers & Mathematics | 1781378 |
| Education | 4700118 |
| Engineering | 3576013 |
| Health | 2950859 |
| Humanities & Liberal Arts | 3738335 |
| Industrial Arts & Consumer Services | 1033798 |
| Interdisciplinary | 45199 |
| Law & Public Policy | 902926 |
| Physical Sciences | 1025318 |
| Psychology & Social Work | 1987278 |
| Social Science | 2654125 |
Added tidyverse function top_n().
top_n capability tutorial
Description: top_n selcts top_n(n) number of records.
Usage: top_n(n)
Example: top_n(10.
topten <- college_all_ages %>% group_by(major) %>% summarise(Employed = sum(employed)) %>% top_n(10)## `summarise()` ungrouping output (override with `.groups` argument)
## Selecting by Employed
## # A tibble: 10 x 2
## major Employed
## <chr> <int>
## 1 Accounting 1335825
## 2 Business Management And Administration 2354398
## 3 Communications 790696
## 4 Elementary Education 819393
## 5 English Language And Literature 708882
## 6 General Business 1580978
## 7 General Education 843693
## 8 Marketing And Marketing Research 890125
## 9 Nursing 1325711
## 10 Psychology 1055854
Added tidyverse function ggplot().
ggplot capability tutorial
ggplot(topten, aes(x = reorder(major,Employed), y = Employed)) +
geom_bar(stat = "identity", position = position_dodge(), fill="steelblue") +
geom_text(aes(label = Employed), vjust = .5, hjust = 1, position = position_dodge(width = 0.9), color = "black") +
ggtitle("Number of students, employed by top_(10) major") +
xlab("major") + ylab("employed") +
coord_flip()Marker: 607-13B