Last lab

During the lab, I was unhelpfully succinct in describing the approach to using list-columns. The withering looks and rueful head shaking will haunt me for years.

Anyway, the important thing is–this is hugely beneficial and general means of organizing data and models. So I hope this example motivates the approach.

To start with, let’s load a table which obviates the need to do recoding.

library(tidyverse)
library(magrittr)
library(broom)

t1 <- "https://github.com/thomasjwood/ps7160/raw/master/anes_cdf_mod.RDS" %>% 
   url %>% 
   gzcon %>% 
   readRDS

Let’s just remind ourselves what’s in t1

t1

## # A tibble: 59,944 x 8
##    demo_party demo_ideol demo_race   demo_income demo_education gt_people_on_we~
##         <dbl>      <dbl> <fct>             <dbl>          <dbl>            <dbl>
##  1         NA         NA 1. White n~           3             NA               NA
##  2         NA         NA 1. White n~           5             NA               NA
##  3         NA         NA 1. White n~           4             NA               NA
##  4         NA         NA 1. White n~           5             NA               NA
##  5         NA         NA 1. White n~           4             NA               NA
##  6         NA         NA 1. White n~           5             NA               NA
##  7         NA         NA 1. White n~           4             NA               NA
##  8         NA         NA 1. White n~           1             NA               NA
##  9         NA         NA 1. White n~           4             NA               NA
## 10         NA         NA 1. White n~           4             NA               NA
## # ... with 59,934 more rows, and 2 more variables: gt_poor_people <dbl>,
## #   gt_women <dbl>

Ok, lovely. A nice table, where were have some demographic variables (helpfully prefixed with demo_) and then from group feeling thermometers (prefixed with gt_ ).

To start with, we want to communicate to R that the gt variables are indexing separate group evaluations. We’ll do this with our old friend pivot_longer

t1 %>% 
  pivot_longer(
    cols = starts_with("gt_"),
    names_to = "grps",
    values_to  = "ft", 
    values_drop_na = T
  )

## # A tibble: 69,568 x 7
##    demo_party demo_ideol demo_race     demo_income demo_education grps        ft
##         <dbl>      <dbl> <fct>               <dbl>          <dbl> <chr>    <dbl>
##  1          1         NA 1. White non~           1              2 gt_poor~    70
##  2          4         NA 1. White non~           3              2 gt_poor~    80
##  3          6          5 1. White non~           3              3 gt_poor~    40
##  4          2          4 1. White non~          NA              2 gt_poor~    50
##  5          6          5 1. White non~           4              4 gt_poor~    70
##  6          7          6 1. White non~           4              2 gt_poor~    60
##  7          2         NA 1. White non~           4              5 gt_poor~   100
##  8          2          4 1. White non~           4              2 gt_poor~    50
##  9          6          5 1. White non~           1              1 gt_poor~    60
## 10          2          5 1. White non~           3              6 gt_poor~   100
## # ... with 69,558 more rows

What have we done? the grps variable indexes which group we’re evaluating–poor people, women, or welfare recipients. Gosh, this grps variable will come in hand, methinks!

We’ll formally tell R that we’re going to want to do something to each separate collection of rows by this variable

t2 <- t1 %>% 
  pivot_longer(
    cols = starts_with("gt_"),
    names_to = "grps",
    values_to  = "ft", 
    values_drop_na = T
  ) %>% 
  nest_by(grps)
  
t2

## # A tibble: 3 x 2
## # Rowwise:  grps
##   grps                           data
##   <chr>                <list<tibble>>
## 1 gt_people_on_welfare   [27,259 x 6]
## 2 gt_poor_people         [35,805 x 6]
## 3 gt_women                [6,504 x 6]

OK! this <list<tibble>> is of consequence to us. We’ve taken a 69,000 row table and replaced it with a 3 row tibble, but no data have been harmed in the making of this movie!

While it looks like each element of t2$data appears to be a single element, each element contains multitudes! We can inspect each element, to remind ourselves of their contents

t2$data %>% 
  map(head)

## [[1]]
## # A tibble: 6 x 6
##   demo_party demo_ideol demo_race             demo_income demo_education    ft
##        <dbl>      <dbl> <fct>                       <dbl>          <dbl> <dbl>
## 1          3          4 1. White non-Hispanic           5              6    40
## 2          7          6 1. White non-Hispanic           2              3    50
## 3          4          3 1. White non-Hispanic           3              6    60
## 4          6          5 1. White non-Hispanic           3              3    60
## 5          6         NA 1. White non-Hispanic           4              3    40
## 6          4         NA 1. White non-Hispanic           3              5    85
## 
## [[2]]
## # A tibble: 6 x 6
##   demo_party demo_ideol demo_race             demo_income demo_education    ft
##        <dbl>      <dbl> <fct>                       <dbl>          <dbl> <dbl>
## 1          1         NA 1. White non-Hispanic           1              2    70
## 2          4         NA 1. White non-Hispanic           3              2    80
## 3          6          5 1. White non-Hispanic           3              3    40
## 4          2          4 1. White non-Hispanic          NA              2    50
## 5          6          5 1. White non-Hispanic           4              4    70
## 6          7          6 1. White non-Hispanic           4              2    60
## 
## [[3]]
## # A tibble: 6 x 6
##   demo_party demo_ideol demo_race             demo_income demo_education    ft
##        <dbl>      <dbl> <fct>                       <dbl>          <dbl> <dbl>
## 1          3          4 1. White non-Hispanic           5              6    85
## 2          7          6 1. White non-Hispanic           2              3   100
## 3          4          3 1. White non-Hispanic           3              6    60
## 4          6          5 1. White non-Hispanic           3              3    70
## 5          6          5 1. White non-Hispanic           2              2    85
## 6          6         NA 1. White non-Hispanic           4              3    70

The key thing to not to ensure you’re grokking the breakup of these data into groups–for each element of t2$data, we’re seeing different values in the ft variable (since these are measuring feelings toward welfare recipients, poor people, and women, respectively.

Now, we want to use expand.grid to have R set up the housekeeping for us–to group

the three dvs (poor people, welfare recipients, and women)
the two ivs (ideology and partisanship)
the two estimation strategies (whether we control for the other demographic variables, or if we just have them as a single predictor.

3 x 2 x 2 – ahhh, we’ve gotta write 12 different models! We’ve gotta name each lm object, and make sure that we keep is straight, which DV and which IV correspond with each model. So instead, let’s just do this:

t3 <- expand.grid(
  grps = t2$grps,
  iv = c("demo_ideol",
         "demo_party"),
  conts = c("bivar", "mvar")
  ) %>%
  left_join(
    t2
  ) %>% 
  tibble

t3

## # A tibble: 12 x 4
##    grps                 iv         conts           data
##    <chr>                <fct>      <fct> <list<tibble>>
##  1 gt_people_on_welfare demo_ideol bivar   [27,259 x 6]
##  2 gt_poor_people       demo_ideol bivar   [35,805 x 6]
##  3 gt_women             demo_ideol bivar    [6,504 x 6]
##  4 gt_people_on_welfare demo_party bivar   [27,259 x 6]
##  5 gt_poor_people       demo_party bivar   [35,805 x 6]
##  6 gt_women             demo_party bivar    [6,504 x 6]
##  7 gt_people_on_welfare demo_ideol mvar    [27,259 x 6]
##  8 gt_poor_people       demo_ideol mvar    [35,805 x 6]
##  9 gt_women             demo_ideol mvar     [6,504 x 6]
## 10 gt_people_on_welfare demo_party mvar    [27,259 x 6]
## 11 gt_poor_people       demo_party mvar    [35,805 x 6]
## 12 gt_women             demo_party mvar     [6,504 x 6]

As you can see, we have

12 rows (reflecting the 12 combinations of models, dvs, and ivs, as described above)
the t3$data object reflects the specific evaluations corresponding with the group named in t3$grps (to make sure this is right, compare the number of rows in each t3$data element and see that they’re constant within each level of t3$grps.

Now, we generate a vector of models, which we’ll append right alongside the rows in t3 (this ensures that we’re not going to get mixed up, in which thing corresponds to which models

t3$mods <- t3$conts %>% 
  equals("bivar") %>% 
  ifelse(
    str_c(
      "ft ~ ",
      t3$iv
    ),
    str_c(
      "ft ~ ",
      t3$iv,
      " + demo_race + demo_income + demo_education"
    )
  ) %>% 
  map2(
  t3$data,
  ~lm(formula = .x, data = .y)
  )

t3

## # A tibble: 12 x 5
##    grps                 iv         conts           data mods  
##    <chr>                <fct>      <fct> <list<tibble>> <list>
##  1 gt_people_on_welfare demo_ideol bivar   [27,259 x 6] <lm>  
##  2 gt_poor_people       demo_ideol bivar   [35,805 x 6] <lm>  
##  3 gt_women             demo_ideol bivar    [6,504 x 6] <lm>  
##  4 gt_people_on_welfare demo_party bivar   [27,259 x 6] <lm>  
##  5 gt_poor_people       demo_party bivar   [35,805 x 6] <lm>  
##  6 gt_women             demo_party bivar    [6,504 x 6] <lm>  
##  7 gt_people_on_welfare demo_ideol mvar    [27,259 x 6] <lm>  
##  8 gt_poor_people       demo_ideol mvar    [35,805 x 6] <lm>  
##  9 gt_women             demo_ideol mvar     [6,504 x 6] <lm>  
## 10 gt_people_on_welfare demo_party mvar    [27,259 x 6] <lm>  
## 11 gt_poor_people       demo_party mvar    [35,805 x 6] <lm>  
## 12 gt_women             demo_party mvar     [6,504 x 6] <lm>

Ok–now this t3 tibble has a vector of lm objects. Let’s print the first three objects and figure out what’s going down:

t3 %>% 
  slice(1:3) %>% 
  use_series(mods)

## [[1]]
## 
## Call:
## lm(formula = .x, data = .y)
## 
## Coefficients:
## (Intercept)   demo_ideol  
##      61.563       -2.604  
## 
## 
## [[2]]
## 
## Call:
## lm(formula = .x, data = .y)
## 
## Coefficients:
## (Intercept)   demo_ideol  
##     75.0236      -0.9513  
## 
## 
## [[3]]
## 
## Call:
## lm(formula = .x, data = .y)
## 
## Coefficients:
## (Intercept)   demo_ideol  
##     82.7264      -0.8298

And armed with this nice frame, we’re actually ready to go, make a plot, and answer our question!

t4 <- t3 %>% 
  select(
    grps, conts, mods
  ) %>% 
  pmap_dfr(
    function(grps, conts, mods)
      mods %>% 
      tidy %>% 
      mutate(
        conts = conts,
        grps = grps
      ) %>% 
      slice(2)
  )


ggplot(
    data = t4, 
    aes(
      ymin = estimate %>% subtract(std.error %>% multiply_by(1.96)),
      ymax = estimate %>% add(std.error %>% multiply_by(1.96)),
      y = estimate,
      x = term,
      color = conts
    )
  ) +
  geom_pointrange() +
  facet_grid(
   . ~ grps
   ) +
  scale_y_continuous(
    breaks = seq(0, -2.5, -.5)
  )

PS7160 Modelling Tibble Example

Thomas Wood (wood.1080@osu.edu)

March 19, 2021

Last lab