Short n’ sweet group exercise

Scholars of American politics interested in tracking partisan survey participation over time can leverage Pew’s American Trends Panel (ATP), which has conducted over 130 waves from 2014 to 2023. Our very own Thomas J. Wood has collected and stored these waves in his github: https://github.com/thomasjwood/ps7160/tree/master/atp. Helpfully, the waves follow a consistent naming convention, so we can use regex to identify waves systematically:

library(gh) #queries GitHub API
library(tidyverse)
library(magrittr)
library(haven)
library(purrr)
library(showtext)

font_add_google("Roboto", "roboto", db_cache = F)
showtext_auto()

t0 <- tibble(
  url = "GET /repos/thomasjwood/ps7160/contents/atp" %>%
    gh %>%
    map_chr("download_url") %>% 
    str_subset(
      ".sav"
    ) %>% 
    str_subset(
      "W\\d{1,3}"
    ),
  data = url %>% 
    map(
      \(i)
      
      i %>% 
        read_sav %>% 
        as_factor, 
      .progress = T
    )
)

Now let’s clean up the wave indicator column and add a new column with the codebook for each dataframe.

t0$wave_num <- t0$url %>% 
  str_extract(
    "W\\d{1,3}"
  ) %>% 
  str_remove(
    "W"
  ) %>% 
  as.numeric


t0 %<>% 
  arrange(
    wave_num
  ) %>% 
  mutate(
    codebook = map(data, 
                   
                   \(i) 
                   
                   tibble(
                     item = i %>% map(~attr(., "label")) %>% 
                       unlist %>% 
                       names,
                     labs = i %>% map(~attr(., "label")) %>% 
                       unlist)
    )
  )

How fun for us…the partyid variable does NOT follow a consistent naming convention across all waves (thanks a lot, Pew).

This is where the codebooks will be useful – we can map across them to identify items which contain “PARTYSUM” in their name and grab all the variations of partyid (plus throw NAs when there is no partisan summary variable for that wave).

t1 <- t0 %>%
  mutate(
    pid_var = map_chr(
      codebook, 
      \(i) {
        var_name <- i %>%
          filter(str_detect(tolower(item), "f_partysum|partysum_f")) %>% 
          pull(item)
        
        if (length(var_name) > 0) var_name[1] else NA_character_
      }
    )
  )

Huzzah! We’ve arrived at the pmap() application. Let’s start with something basic: calculating the proportion of Republican respondents across waves.

t2 <- pmap_dfr(
  list(t1$data, t1$pid_var, t1$wave_num),
  possibly(
    \(data, pid_var, wave_num) {
      if (is.na(pid_var)) return(NULL) #two waves (47 and 59)
      
      data %>%
        filter(str_detect(!!sym(pid_var), "Rep")) %>%
        summarise(percent = n() / nrow(data) * 100) %>%
        mutate(wave = wave_num)
    }
  )
)

ggplot(t2, aes(wave, percent)) +
  geom_line() +
  geom_point() +
  geom_text(aes(label = wave, vjust = ifelse(as.integer(wave) %% 2 == 0, -1.5, 1.5)),
            size = 2.5) +
  labs(
    x = "Wave #", 
    y = "% Republican",
    caption = "Source: Pew American Trends Panel"
  ) +
  scale_y_continuous(limits = c(0, 80)) +
  theme_minimal(base_family = "roboto") +
  theme(axis.text.x = element_blank(),
        axis.title.y = element_text(size = 9),
        axis.title.x = element_text(size = 9),
        axis.text.y = element_text(size = 8),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

It’s certainly not the prettiest plot we’ve ever seen, but we can observe waves with distinct dips in the unweighted percentage of Republican respondents and investigate!

Thanks, pmap()!

Code Lab - purrr group exercise

Charlene Stainfield

02/20/2025

Short n’ sweet group exercise