During the lab, I was unhelpfully succinct in describing the approach to using list-columns. The withering looks and rueful head shaking will haunt me for years.
Anyway, the important thing is–this is hugely beneficial and general means of organizing data and models. So I hope this example motivates the approach.
To start with, let’s load a table which obviates the need to do recoding.
library(tidyverse)
library(magrittr)
library(broom)
t1 <- "https://github.com/thomasjwood/ps7160/raw/master/anes_cdf_mod.RDS" %>%
url %>%
gzcon %>%
readRDS
Let’s just remind ourselves what’s in t1
t1
## # A tibble: 59,944 x 8
## demo_party demo_ideol demo_race demo_income demo_education gt_people_on_we~
## <dbl> <dbl> <fct> <dbl> <dbl> <dbl>
## 1 NA NA 1. White n~ 3 NA NA
## 2 NA NA 1. White n~ 5 NA NA
## 3 NA NA 1. White n~ 4 NA NA
## 4 NA NA 1. White n~ 5 NA NA
## 5 NA NA 1. White n~ 4 NA NA
## 6 NA NA 1. White n~ 5 NA NA
## 7 NA NA 1. White n~ 4 NA NA
## 8 NA NA 1. White n~ 1 NA NA
## 9 NA NA 1. White n~ 4 NA NA
## 10 NA NA 1. White n~ 4 NA NA
## # ... with 59,934 more rows, and 2 more variables: gt_poor_people <dbl>,
## # gt_women <dbl>
Ok, lovely. A nice table, where were have some demographic variables (helpfully prefixed with demo_
) and then from group feeling thermometers (prefixed with gt_
).
To start with, we want to communicate to R that the gt
variables are indexing separate group evaluations. We’ll do this with our old friend pivot_longer
t1 %>%
pivot_longer(
cols = starts_with("gt_"),
names_to = "grps",
values_to = "ft",
values_drop_na = T
)
## # A tibble: 69,568 x 7
## demo_party demo_ideol demo_race demo_income demo_education grps ft
## <dbl> <dbl> <fct> <dbl> <dbl> <chr> <dbl>
## 1 1 NA 1. White non~ 1 2 gt_poor~ 70
## 2 4 NA 1. White non~ 3 2 gt_poor~ 80
## 3 6 5 1. White non~ 3 3 gt_poor~ 40
## 4 2 4 1. White non~ NA 2 gt_poor~ 50
## 5 6 5 1. White non~ 4 4 gt_poor~ 70
## 6 7 6 1. White non~ 4 2 gt_poor~ 60
## 7 2 NA 1. White non~ 4 5 gt_poor~ 100
## 8 2 4 1. White non~ 4 2 gt_poor~ 50
## 9 6 5 1. White non~ 1 1 gt_poor~ 60
## 10 2 5 1. White non~ 3 6 gt_poor~ 100
## # ... with 69,558 more rows
What have we done? the grps
variable indexes which group we’re evaluating–poor people, women, or welfare recipients. Gosh, this grps
variable will come in hand, methinks!
We’ll formally tell R that we’re going to want to do something to each separate collection of rows by this variable
t2 <- t1 %>%
pivot_longer(
cols = starts_with("gt_"),
names_to = "grps",
values_to = "ft",
values_drop_na = T
) %>%
nest_by(grps)
t2
## # A tibble: 3 x 2
## # Rowwise: grps
## grps data
## <chr> <list<tibble>>
## 1 gt_people_on_welfare [27,259 x 6]
## 2 gt_poor_people [35,805 x 6]
## 3 gt_women [6,504 x 6]
OK! this <list<tibble>>
is of consequence to us. We’ve taken a 69,000 row table and replaced it with a 3 row tibble, but no data have been harmed in the making of this movie!
While it looks like each element of t2$data
appears to be a single element, each element contains multitudes! We can inspect each element, to remind ourselves of their contents
t2$data %>%
map(head)
## [[1]]
## # A tibble: 6 x 6
## demo_party demo_ideol demo_race demo_income demo_education ft
## <dbl> <dbl> <fct> <dbl> <dbl> <dbl>
## 1 3 4 1. White non-Hispanic 5 6 40
## 2 7 6 1. White non-Hispanic 2 3 50
## 3 4 3 1. White non-Hispanic 3 6 60
## 4 6 5 1. White non-Hispanic 3 3 60
## 5 6 NA 1. White non-Hispanic 4 3 40
## 6 4 NA 1. White non-Hispanic 3 5 85
##
## [[2]]
## # A tibble: 6 x 6
## demo_party demo_ideol demo_race demo_income demo_education ft
## <dbl> <dbl> <fct> <dbl> <dbl> <dbl>
## 1 1 NA 1. White non-Hispanic 1 2 70
## 2 4 NA 1. White non-Hispanic 3 2 80
## 3 6 5 1. White non-Hispanic 3 3 40
## 4 2 4 1. White non-Hispanic NA 2 50
## 5 6 5 1. White non-Hispanic 4 4 70
## 6 7 6 1. White non-Hispanic 4 2 60
##
## [[3]]
## # A tibble: 6 x 6
## demo_party demo_ideol demo_race demo_income demo_education ft
## <dbl> <dbl> <fct> <dbl> <dbl> <dbl>
## 1 3 4 1. White non-Hispanic 5 6 85
## 2 7 6 1. White non-Hispanic 2 3 100
## 3 4 3 1. White non-Hispanic 3 6 60
## 4 6 5 1. White non-Hispanic 3 3 70
## 5 6 5 1. White non-Hispanic 2 2 85
## 6 6 NA 1. White non-Hispanic 4 3 70
The key thing to not to ensure you’re grokking the breakup of these data into groups–for each element of t2$data
, we’re seeing different values in the ft
variable (since these are measuring feelings toward welfare recipients, poor people, and women, respectively.
Now, we want to use expand.grid
to have R set up the housekeeping for us–to group
the three dvs (poor people, welfare recipients, and women)
the two ivs (ideology and partisanship)
the two estimation strategies (whether we control for the other demographic variables, or if we just have them as a single predictor.
3 x 2 x 2 – ahhh, we’ve gotta write 12 different models! We’ve gotta name each lm
object, and make sure that we keep is straight, which DV and which IV correspond with each model. So instead, let’s just do this:
t3 <- expand.grid(
grps = t2$grps,
iv = c("demo_ideol",
"demo_party"),
conts = c("bivar", "mvar")
) %>%
left_join(
t2
) %>%
tibble
t3
## # A tibble: 12 x 4
## grps iv conts data
## <chr> <fct> <fct> <list<tibble>>
## 1 gt_people_on_welfare demo_ideol bivar [27,259 x 6]
## 2 gt_poor_people demo_ideol bivar [35,805 x 6]
## 3 gt_women demo_ideol bivar [6,504 x 6]
## 4 gt_people_on_welfare demo_party bivar [27,259 x 6]
## 5 gt_poor_people demo_party bivar [35,805 x 6]
## 6 gt_women demo_party bivar [6,504 x 6]
## 7 gt_people_on_welfare demo_ideol mvar [27,259 x 6]
## 8 gt_poor_people demo_ideol mvar [35,805 x 6]
## 9 gt_women demo_ideol mvar [6,504 x 6]
## 10 gt_people_on_welfare demo_party mvar [27,259 x 6]
## 11 gt_poor_people demo_party mvar [35,805 x 6]
## 12 gt_women demo_party mvar [6,504 x 6]
As you can see, we have
12 rows (reflecting the 12 combinations of models, dvs, and ivs, as described above)
the t3$data
object reflects the specific evaluations corresponding with the group named in t3$grps
(to make sure this is right, compare the number of rows in each t3$data
element and see that they’re constant within each level of t3$grps
.
Now, we generate a vector of models, which we’ll append right alongside the rows in t3
(this ensures that we’re not going to get mixed up, in which thing corresponds to which models
t3$mods <- t3$conts %>%
equals("bivar") %>%
ifelse(
str_c(
"ft ~ ",
t3$iv
),
str_c(
"ft ~ ",
t3$iv,
" + demo_race + demo_income + demo_education"
)
) %>%
map2(
t3$data,
~lm(formula = .x, data = .y)
)
t3
## # A tibble: 12 x 5
## grps iv conts data mods
## <chr> <fct> <fct> <list<tibble>> <list>
## 1 gt_people_on_welfare demo_ideol bivar [27,259 x 6] <lm>
## 2 gt_poor_people demo_ideol bivar [35,805 x 6] <lm>
## 3 gt_women demo_ideol bivar [6,504 x 6] <lm>
## 4 gt_people_on_welfare demo_party bivar [27,259 x 6] <lm>
## 5 gt_poor_people demo_party bivar [35,805 x 6] <lm>
## 6 gt_women demo_party bivar [6,504 x 6] <lm>
## 7 gt_people_on_welfare demo_ideol mvar [27,259 x 6] <lm>
## 8 gt_poor_people demo_ideol mvar [35,805 x 6] <lm>
## 9 gt_women demo_ideol mvar [6,504 x 6] <lm>
## 10 gt_people_on_welfare demo_party mvar [27,259 x 6] <lm>
## 11 gt_poor_people demo_party mvar [35,805 x 6] <lm>
## 12 gt_women demo_party mvar [6,504 x 6] <lm>
Ok–now this t3
tibble
has a vector of lm
objects. Let’s print the first three objects and figure out what’s going down:
t3 %>%
slice(1:3) %>%
use_series(mods)
## [[1]]
##
## Call:
## lm(formula = .x, data = .y)
##
## Coefficients:
## (Intercept) demo_ideol
## 61.563 -2.604
##
##
## [[2]]
##
## Call:
## lm(formula = .x, data = .y)
##
## Coefficients:
## (Intercept) demo_ideol
## 75.0236 -0.9513
##
##
## [[3]]
##
## Call:
## lm(formula = .x, data = .y)
##
## Coefficients:
## (Intercept) demo_ideol
## 82.7264 -0.8298
And armed with this nice frame, we’re actually ready to go, make a plot, and answer our question!
t4 <- t3 %>%
select(
grps, conts, mods
) %>%
pmap_dfr(
function(grps, conts, mods)
mods %>%
tidy %>%
mutate(
conts = conts,
grps = grps
) %>%
slice(2)
)
ggplot(
data = t4,
aes(
ymin = estimate %>% subtract(std.error %>% multiply_by(1.96)),
ymax = estimate %>% add(std.error %>% multiply_by(1.96)),
y = estimate,
x = term,
color = conts
)
) +
geom_pointrange() +
facet_grid(
. ~ grps
) +
scale_y_continuous(
breaks = seq(0, -2.5, -.5)
)