Let’s reprise some of the purrr
toolkit. It’s a very
general way to solve the problems we meet in applied statistics.
First, our friend map
library(tidyverse)
library(magrittr)
library(palmerpenguins)
map(
.x = c(2, 4, 8, 16, 32),
.f = function(i){
i ^ 2
}
)
## [[1]]
## [1] 4
##
## [[2]]
## [1] 16
##
## [[3]]
## [1] 64
##
## [[4]]
## [1] 256
##
## [[5]]
## [1] 1024
map(
.x = 1:5,
.f = function(i){
letters %>%
extract(1:i)
}
)
## [[1]]
## [1] "a"
##
## [[2]]
## [1] "a" "b"
##
## [[3]]
## [1] "a" "b" "c"
##
## [[4]]
## [1] "a" "b" "c" "d"
##
## [[5]]
## [1] "a" "b" "c" "d" "e"
What does each argument do?
.x
is the list or vector you’re operating on. This
is a very general thing to work with–factor
levels,
formula, lists of tibble
s, etc. The obstacle here is really
only your imagination.
.f
is the function, or operation, you’re going to do
to every element of .x
. (You’ve seen me using
\()
as a shortcut for function()
)
i
is just a variable or index. Every element of
.x
you plan to do .f
to, is referred to as
i
.
Just to make sure you understand what’s going on, and how general
map
is a toolkit:
1:4 %>%
map(
\(i)
str_c(
i, " * ", i, " = ", i*i
)
)
list(
c("Alabama", "Alaska", "Arizona"),
c("California", "Colorado", "Connecticut"),
c("Maine", "Maryland", "Massachusetts","Michigan",
"Minnesota", "Mississippi", "Missouri", "Montana")
) %>%
map(
\(g)
us_rent_income %>%
filter(
NAME %>%
is_in(g)
)
)
c("bill", "flipper", "body") %>%
map(
\(k)
penguins %>%
select(
species, island, starts_with(k)
)
)
As I’ve said elsewhere, you’re really only limited by your ingenuity.
map
also has some cousin functions which require your
output to have a specific form (a particular kind of programmer autism
mandates type stability. These cousins include map_chr
,
map_int
, map_lgl
, map_dbl
,
etc)
1:4 %>%
map(
\(i)
str_c(
i, " * ", i, " = ", i*i
)
)
list(
c("Alabama", "Alaska", "Arizona"),
c("California", "Colorado", "Connecticut"),
c("Maine", "Maryland", "Massachusetts","Michigan",
"Minnesota", "Mississippi", "Missouri", "Montana")
) %>%
map(
\(g)
us_rent_income %>%
filter(
NAME %>%
is_in(g)
)
)
c("bill", "flipper", "body") %>%
map(
\(k)
penguins %>%
select(
species, island, starts_with(k)
)
)
map2
is a very useful special case, where you have two
lists of equal length, and you want to operate over both lists/vectors
simultaneously, matching each element by their respective position.
map2_chr(
.x = state.name[1:10],
.y = state.abb[1:10],
.f = \(i, j)
str_c(
i," - ", j
)
)
map2
Let’s see how we could work map2
in an indicative
example, using the most
famous framing effect in the history of American public opinion.
t1 <- "https://github.com/thomasjwood/code_lab/raw/main/data/gss_welfare.rds" %>%
url %>%
readRDS
t1
## # A tibble: 49,652 × 7
## year id polviews race version ans ans_num
## <fct> <dbl> <fct> <fct> <chr> <fct> <dbl>
## 1 1984 1 conservative white welfare too … 3
## 2 1984 2 slightly liberal white welfare too … 1
## 3 1984 3 liberal black welfare too … 1
## 4 1984 4 moderate, middle of the road white assistance to t… too … 1
## 5 1984 7 slightly liberal white welfare abou… 2
## 6 1984 8 don't know white assistance to t… too … 1
## 7 1984 9 don't know other assistance to t… too … 1
## 8 1984 11 slightly conservative black welfare too … 1
## 9 1984 12 moderate, middle of the road white assistance to t… too … 3
## 10 1984 14 don't know black assistance to t… too … 1
## # ℹ 49,642 more rows
Imagine we wanted to model the effect overall, and then interacted with ideology and race, all separately by year (so that we can figure out if these framing differences are growing or shrinking).
t2 <- t1 %>%
group_by(year) %>%
nest %>%
cross_join(
tibble(
frm = c(
"ans_num ~ version",
"ans_num ~ version * polviews",
"ans_num ~ version * race"
)
)
)
Can you guess which two equal-length lists over which we’re going to
map2
? What’s t2$frm
?
t2$frm %>% length
## [1] 72
t2$data %>% length
## [1] 72
So we can just
t2$mods <- map2(
.x = t2$frm,
.y = t2$data,
\(i, j)
lm(
formula = i, data = j
)
)
Now you might have heard this said–we estimate linear models, but
what do we really want, to plot & table estimates or contrasts?
that’s right, the emmean
!
library(emmeans)
library(broom)
t2$emm_form <- t2$frm %>%
str_replace(
"ans_num ~ ",
"trt.vs.ctrl ~ "
)
t2$emm <- map2(
.x = t2$mods,
.y = t2$emm_form,
\(i, j)
emmeans(
object = i,
specs = as.formula(j),
options = list(
adjust = "none"
)
)
)
Now we can measure effects by year
t3 <- t2 %>%
filter(
frm == "ans_num ~ version"
) %>%
select(
year, emm
) %>%
mutate(
cont = emm %>%
map(
\(i)
i[[2]] %>%
tidy
)
)
map2(
t3$cont,
t3$year,
\(i, j)
i %>%
mutate(year = j)
) %>%
list_rbind %>%
select(year, everything())
## # A tibble: 24 × 9
## year term contrast null.value estimate std.error df statistic p.value
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1984 versi… welfare… 0 0.689 0.0484 938 14.2 8.77e- 42
## 2 1985 versi… welfare… 0 0.817 0.0373 1474 21.9 4.32e- 92
## 3 1986 versi… welfare… 0 0.724 0.0385 1413 18.8 1.85e- 70
## 4 1987 versi… welfare… 0 0.764 0.0360 1708 21.2 1.05e- 88
## 5 1988 versi… welfare… 0 0.824 0.0376 1409 21.9 8.16e- 92
## 6 1989 versi… welfare… 0 0.791 0.0382 1454 20.7 7.24e- 84
## 7 1990 versi… welfare… 0 0.771 0.0391 1295 19.7 5.20e- 76
## 8 1991 versi… welfare… 0 0.735 0.0382 1422 19.2 2.08e- 73
## 9 1993 versi… welfare… 0 0.912 0.0378 1516 24.2 2.67e-109
## 10 1994 versi… welfare… 0 0.934 0.0273 2840 34.2 5.02e-215
## # ℹ 14 more rows
A student asked me to help format some untidy data, and it turned out
to be an elegant advertisement for the purrr
approach.
Read this file here
library(openxlsx)
library(magrittr)
t4 <- "https://github.com/thomasjwood/code_lab/raw/main/data/Parliament-results%202014.xlsx" %>%
read.xlsx(
colNames = F,
sheet = 1) %>%
tibble %>%
set_colnames(
str_c(
"x", 1:6
)
)
For each constituency,
Just to satisfy your curiosity, would it work if you
wanted to simply write as.numeric(t20$V202468x)
?↩︎