A single figure

We’ll try something different this time–going methodically through a single figure, from scratch. I’ll try to move at a pace which allows us to discuss the details.

In this lab, we’ll use the 2022 version of the Chicago Council Survey, which has measured foreign policy attitudes since 1974.

First, we’ll download the data, helpfully saved as a static zip file

library(tidyverse)
library(magrittr)
library(haven)

tf <- tempfile()
download.file(
  url = "https://globalaffairs.org/sites/default/files/2023-02/2022%20CCS%20Data.zip", 
  destfile = tf
)

Now the file is saved in our machine’s memory as the object tf.

What’s this object’s contents?

tf %>% 
  unzip(list = T)
##                                                                    Name  Length
## 1                2022 CCS Data/2022 Chicago Council Survey Data Set.sav 3373785
## 2           2022 CCS Data/2022 Chicago Council Survey Field Report.docx  336773
## 3 2022 CCS Data/2022 Chicago Council Survey Questionnaire - Spanish.pdf  290862
## 4           2022 CCS Data/2022 Chicago Council Survey Questionnaire.pdf  302436
## 5                2022 CCS Data/2022 Chicago Council Survey Topline.docx  458277
##                  Date
## 1 2023-01-09 16:29:00
## 2 2023-01-09 16:28:00
## 3 2023-02-08 16:05:00
## 4 2023-02-08 16:05:00
## 5 2023-01-09 16:29:00

That .sav file is what we need to manipulate the survey microdata.1 We’ll copy the filename and use it in a new R script:

t1 <- tf %>%
  unz("2022 CCS Data/2022 Chicago Council Survey Data Set.sav") %>%
  read_sav %>%
  as_factor

Let’s do some housekeeping–building a codebook from the meta data is super useful for understanding the survey design

t1l <- t1 %>% 
  map(
    ~attr(., "label")
  ) %>% 
  unlist

t1l[1:3]
##                                                                                  CaseID 
##                                                            "Case Identification Number" 
##                                                                                  weight 
##                         "Post-Stratification Weights - 18+ Total Qualified Respondents" 
##                                                                                 race_wt 
## "Post-Stratification Weights - 18+ Total Qualified Respondents By Racial/Ethnic Groups"

t1l is a list of length 217. Not coincidentally, t1 has 217 columns! Each element of t1l corresponds to the label attribute in each t1 variable.

We can make a nice codebook tibble

cb1 <- tibble(
  itm = t1l %>% 
    names,
  labs = t1l %>% 
    unlist
)

cb1
## # A tibble: 217 × 2
##    itm       labs                                                               
##    <chr>     <chr>                                                              
##  1 CaseID    Case Identification Number                                         
##  2 weight    Post-Stratification Weights - 18+ Total Qualified Respondents      
##  3 race_wt   Post-Stratification Weights - 18+ Total Qualified Respondents By R…
##  4 xchicago  xchicago                                                           
##  5 tm_start  Interview start time (MST)                                         
##  6 tm_finish Interview finish time (MST)                                        
##  7 duration  Interview duration in minutes                                      
##  8 QFLAG     Data Only Variable: Final Qualification Flag For Respondent Based …
##  9 xacslang  Data Only Variable: Primary Language                               
## 10 xspanish  Data Only Variable: Survey Language                                
## # ℹ 207 more rows

Perchance they ask some nice items about preferred policy options for the US aid to Ukraine?

cb1 %>% 
  filter(
    labs %>% 
      str_detect("Ukraine")
  )
## # A tibble: 13 × 2
##    itm     labs                                                                 
##    <chr>   <chr>                                                                
##  1 Q808_6  [Coordinating an international response to Russia’s invasion of Ukra…
##  2 Q30_12a [To help Ukraine defend itself against the Russian invasion] Would y…
##  3 Q620A_7 [Ukraine] Would you favor or oppose the expansion of NATO to include…
##  4 Q249_1  [Other countries following Russia’s example of launching wars for te…
##  5 Q249_2  [Establishing a precedent that national borders are not fixed and ca…
##  6 Q249_3  [China seeing the invasion of Ukraine as a precedent, encouraging it…
##  7 Q250_6  [Accepting Ukrainian refugees into the United States] In response to…
##  8 Q250_5  [Providing economic assistance to Ukraine] In response to the situat…
##  9 Q250_2  [Increasing economic and diplomatic sanctions on Russia] In response…
## 10 Q250_3  [Sending additional arms and military supplies to the Ukrainian gove…
## 11 Q250_8  [Sending US troops to Ukraine to help the Ukrainian government defen…
## 12 QUKR1_2 [Forcing Russia to withdraw troops from Ukraine] How effective do yo…
## 13 QUKR1_3 [Deterring Russia from taking military action beyond Ukraine to neig…

Mmmm the Q250_ prefix items seem relevant here measuring the breadth of policy responses?

cb1 %>% 
  filter(
    itm %>% 
      str_detect("Q250_")
  ) %>% 
  extract2("labs") %>% 
  unlist
##                                                                                                                                                                                                   Q250_6 
##                                        "[Accepting Ukrainian refugees into the United States] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:" 
##                                                                                                                                                                                                   Q250_5 
##                                                   "[Providing economic assistance to Ukraine] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:" 
##                                                                                                                                                                                                   Q250_2 
##                                     "[Increasing economic and diplomatic sanctions on Russia] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:" 
##                                                                                                                                                                                                   Q250_3 
##                  "[Sending additional arms and military supplies to the Ukrainian government] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:" 
##                                                                                                                                                                                                   Q250_8 
## "[Sending US troops to Ukraine to help the Ukrainian government defend itself against Russia] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:"

Whoa awesome! Each label includes the response label and the text of the survey item! They’re gonna come in handy later…🤗

Knowing a thing or two about American politics, we’re probably interested in partisan differences on these items. Do we have a nice ANES style branching PID indicator?

cb1 %>% 
  filter(
    labs %>% 
      str_detect(
        "Democrat|Independent|Republican"
      )
  )
## # A tibble: 4 × 2
##   itm   labs                                                                    
##   <chr> <chr>                                                                   
## 1 Q1010 Generally speaking, do you usually think of yourself as a Republican, a…
## 2 Q1015 Would you call yourself a strong Republican or a not very strong Republ…
## 3 Q1020 Would you call yourself a strong Democrat or a not very strong Democrat?
## 4 Q1025 Do you think of yourself as closer to the Republican Party or to the De…

Fuuuhck ok we’ll need to build a single party id indicator from these ANES branching variables.

(Apologies to the non Americansists – reflecting the very zenith of socthe three PID items go as follows:

  1. First we ask if they identify with either of the two parties, or as an Independent, or something else (Q1010)
  2. Among those who identify as Independents in the first item, we ask if they lean to either party (Q1025)
  3. Among those who identify as a partisan in the first item, we ask if they identify as strong or not so strong partisans (Q1015 and Q1020.)

So we’ll use a case_when to group leaners with enthusiastic partisans

t1$party3 <- case_when(
  # for respondents who had no partisan affiliation
  t1$Q1010 %>% 
    is_in(
      c(
        "Other",
        "Refused"
      )
    ) ~ NA,
  
  # Republican leaners
  t1$Q1010 %>% 
    equals("Independent") &
    t1$Q1025 %>% 
    equals("Republican") ~ "Republican",
  
  # Democratic leaners
  t1$Q1010 %>% 
    equals("Independent") &
    t1$Q1025 %>% 
    equals("Democratic") ~ "Democrat",
  
  # Implacable independents
  t1$Q1010 %>% 
    equals("Independent") &
    t1$Q1025 %>% 
    equals("Neither") ~ "Independent",
  
  # open partisans
  t1$Q1010 %>% 
    equals("Independent") %>% 
    not ~ t1$Q1010
) %>% 
  factor(
    c("Democrat",
      "Independent",
      "Republican")
  )

Let’s see if we can put everything we need for the eventual figure in a new tibble.

t2 <- t1 %>% 
  select(
    CaseID, weight, party3, starts_with("Q250_")
  )
## # A tibble: 3,106 × 8
##    CaseID weight party3      Q250_6  Q250_5  Q250_2  Q250_3  Q250_8 
##     <dbl>  <dbl> <fct>       <fct>   <fct>   <fct>   <fct>   <fct>  
##  1      1 1.05   Republican  Support Support Support Support Support
##  2      2 0.773  Democrat    Oppose  Oppose  Oppose  Oppose  Oppose 
##  3      3 0.742  Democrat    Support Support Support Support Support
##  4      4 1.03   Republican  Oppose  Oppose  Support Oppose  Support
##  5      5 0.551  Democrat    Support Support Support Support Oppose 
##  6      6 0.804  Republican  Oppose  Oppose  Oppose  Oppose  Oppose 
##  7      7 0.545  Democrat    Support Support Support Support Oppose 
##  8      8 0.383  Democrat    Support Support Oppose  Support Support
##  9      9 0.278  Republican  Oppose  Support Support Support Oppose 
## 10     10 0.0172 Independent Support Support Support Support Support
## # ℹ 3,096 more rows

From microdata to quantities

Let’s do some more housekeeping–are there adminstrative values in the outcome variables?

t2 %>% 
  select(
    starts_with("Q250")
  ) %>% 
  map(table)
## $Q250_6
## 
## Refused Support  Oppose 
##      49    2376     681 
## 
## $Q250_5
## 
## Refused Support  Oppose 
##      59    2286     761 
## 
## $Q250_2
## 
## Refused Support  Oppose 
##      64    2521     521 
## 
## $Q250_3
## 
## Refused Support  Oppose 
##      67    2281     758 
## 
## $Q250_8
## 
## Refused Support  Oppose 
##      75    1136    1895

Ehhhh survey refusals, at 2% of respondents? I think it’s just a nuisance, and not something you can tell a persuasive story about. So let’s just replace them with NAs. This is obviously dangerously bad social science practice, so I humbly request your indulgence.

t2 %<>% 
  mutate(
    across(
      starts_with(
        "Q250_"
      ),
      ~.x %>% 
        equals("Refused") %>% 
        ifelse(
          NA,
          .x %>% 
            as.character
        )
    )
  )

Whoa do we remember our friend across? It’s a stunner!

across() has two primary arguments:

  • The first argument, .cols, selects the columns you want to operate on. All the super useful selector functions in dplyr work here.

  • The second argument, .fns, is a function or list of functions to apply to each column. This can also be a purrr style formula (or list of formulas) like ~ .x / 2.

We have multiple DVs, indicating all the ways a respondent might support or oppose different facets of US Ukraine response. Sounds like a job for pivot_longer()

t2 %<>% 
  pivot_longer(
    cols = starts_with("Q250"),
    names_to = "itm", 
    values_to = "vals", 
    values_drop_na = T
  )
## # A tibble: 15,216 × 5
##    CaseID weight party3     itm    vals   
##     <dbl>  <dbl> <fct>      <chr>  <chr>  
##  1      1  1.05  Republican Q250_6 Support
##  2      1  1.05  Republican Q250_5 Support
##  3      1  1.05  Republican Q250_2 Support
##  4      1  1.05  Republican Q250_3 Support
##  5      1  1.05  Republican Q250_8 Support
##  6      2  0.773 Democrat   Q250_6 Oppose 
##  7      2  0.773 Democrat   Q250_5 Oppose 
##  8      2  0.773 Democrat   Q250_2 Oppose 
##  9      2  0.773 Democrat   Q250_3 Oppose 
## 10      2  0.773 Democrat   Q250_8 Oppose 
## # ℹ 15,206 more rows

This is looking good, but more useful than a variable name is a functional description, especially for a plot.

But hey, that’s why we have lovely things like programmatic codebooks!

t2 %<>% 
  left_join(
    cb1
  )
## # A tibble: 15,216 × 6
##    CaseID weight party3     itm    vals    labs                                 
##     <dbl>  <dbl> <fct>      <chr>  <chr>   <chr>                                
##  1      1  1.05  Republican Q250_6 Support [Accepting Ukrainian refugees into t…
##  2      1  1.05  Republican Q250_5 Support [Providing economic assistance to Uk…
##  3      1  1.05  Republican Q250_2 Support [Increasing economic and diplomatic …
##  4      1  1.05  Republican Q250_3 Support [Sending additional arms and militar…
##  5      1  1.05  Republican Q250_8 Support [Sending US troops to Ukraine to hel…
##  6      2  0.773 Democrat   Q250_6 Oppose  [Accepting Ukrainian refugees into t…
##  7      2  0.773 Democrat   Q250_5 Oppose  [Providing economic assistance to Uk…
##  8      2  0.773 Democrat   Q250_2 Oppose  [Increasing economic and diplomatic …
##  9      2  0.773 Democrat   Q250_3 Oppose  [Sending additional arms and militar…
## 10      2  0.773 Democrat   Q250_8 Oppose  [Sending US troops to Ukraine to hel…
## # ℹ 15,206 more rows

Whoa, this is handy! What’s the format for t2$labs?

t2$labs %>% table
## .
##                                        [Accepting Ukrainian refugees into the United States] In response to the situation involving Russia and Ukraine, would you support or oppose the United States: 
##                                                                                                                                                                                                   3057 
##                                     [Increasing economic and diplomatic sanctions on Russia] In response to the situation involving Russia and Ukraine, would you support or oppose the United States: 
##                                                                                                                                                                                                   3042 
##                                                   [Providing economic assistance to Ukraine] In response to the situation involving Russia and Ukraine, would you support or oppose the United States: 
##                                                                                                                                                                                                   3047 
##                  [Sending additional arms and military supplies to the Ukrainian government] In response to the situation involving Russia and Ukraine, would you support or oppose the United States: 
##                                                                                                                                                                                                   3039 
## [Sending US troops to Ukraine to help the Ukrainian government defend itself against Russia] In response to the situation involving Russia and Ukraine, would you support or oppose the United States: 
##                                                                                                                                                                                                   3031

Let’s trim that only to the response

t2$labs %<>% 
  str_sub(
    start = 2,
    end = t2$labs %>% 
      str_locate("\\]") %>% 
      extract(, 1) %>% 
      subtract(1)
  ) 
## .
##                                        Accepting Ukrainian refugees into the United States 
##                                                                                       3057 
##                                     Increasing economic and diplomatic sanctions on Russia 
##                                                                                       3042 
##                                                   Providing economic assistance to Ukraine 
##                                                                                       3047 
##                  Sending additional arms and military supplies to the Ukrainian government 
##                                                                                       3039 
## Sending US troops to Ukraine to help the Ukrainian government defend itself against Russia 
##                                                                                       3031

Lovely! Let’s push on!

t3 <- t2  %>%  
  group_by(
    party3, labs, vals, 
  ) %>% 
  tally(weight) %>% 
  mutate(
    perc = n %>% 
      divide_by(
        n %>% sum
      ) %>% 
      multiply_by(100)
  ) %>% 
  na.omit

Making graphs!

Let’s start protyping!

A nice stacked bar….

t3 %>% 
  ggplot() +
  geom_col(
    aes(
      x = party3, y = perc,
      fill =  vals
    ),
    color = "grey5",
    linewidth = .1,
    position = position_stack()
  ) +
  facet_wrap(~labs, nrow = 1)

I don’t hate this! The labels are all fucked up, but that’s fixable, and I sorta like that the stacked bars communicate the percentage nature of the quantities.

But it’s pretty wasteful to depict both sides of each answer when we really are only interested in a single estimate of support.

So maybe….

t3 %>% 
  filter(
    vals == "Support"
  ) %>% 
  ggplot() +
  geom_col(
    aes(
      y = party3, x = perc,
      fill =  vals
    ),
    color = "grey5",
    linewidth = .1,
    position = position_stack()
  ) +
  facet_wrap(~labs, ncol = 1) +
  theme(
    legend.position = "none"
  )

Maybe we’d do better with dots rather than bars?

t3 %>% 
  filter(
    vals == "Support"
  ) %>% 
  ggplot() +
  geom_point(
    aes(
      y = party3,
      x = perc
    )
  ) +
  facet_wrap(~labs, ncol = 1) +
  theme(
    legend.position = "none"
  )

Or we could put labels inside the points?

t3 %>% 
  filter(
    vals == "Support"
  ) %>% 
  ggplot() +
  geom_point(
    aes(
      y = party3,
      x = perc
    ),
    shape = 21,
    fill = "white",
    size = 7
  ) +
  geom_text(
    aes(
      x = perc,
      y = party3,
      label = perc %>% round
    ),
    size = 3
  ) +
  facet_wrap(~labs, ncol = 1) +
  scale_x_continuous(
    breaks = NULL 
  ) +
  theme(
    legend.position = "none"
  )

Hmmm or maybe add a line to accentuate policy differences, as well as partisan differences?

t3 %>% 
  filter(
    vals == "Support"
  ) %>% 
  ggplot() +
  geom_linerange(
    aes(
      xmin = 0,
      xmax = perc,
      y = party3
    )
  ) +
  geom_point(
    aes(
      y = party3,
      x = perc
    ),
    shape = 21,
    fill = "white",
    size = 7
  ) +
  geom_text(
    aes(
      x = perc,
      y = party3,
      label = perc %>% round
    ),
    size = 3
  ) +
  facet_wrap(~labs, ncol = 1) +
  scale_x_continuous(
    breaks = NULL 
  ) +
  theme(
    legend.position = "none"
  )

I like this, except that the partisanship has dissapeared a little, and I don’t like that it’s a little too ornate for how little data there is.

Maybe facets could remove some of the boilerplate graphical elements?

t4 <- t3 %>%
  ungroup %>% 
  filter(
    vals == "Support"
  ) %>% 
  mutate(
    labs = labs %>% 
      fct_reorder(
        perc,
        .desc = F
        )
  )

t4 %>% 
  ggplot() +
  geom_linerange(
    aes(
      xmin = 0,
      xmax = perc,
      y = labs,
      color = party3
    ),
    linewidth = .4,
    position = position_dodge(width = .45)
  ) +
  geom_point(
    aes(
      y = labs,
      color = party3,
      x = perc
    ),
    shape = 21,
    fill = "white",
    size = 7,
    position = position_dodge(width = .45)
  ) +
  geom_text(
    aes(
      x = perc,
      y = labs,
      color = party3,
      label = perc %>% round
    ),
    size = 3,
    position = position_dodge(width = .45),
  ) +
  scale_x_continuous(breaks = NULL) +
  scale_y_discrete(
    breaks = t4$labs %>% 
      levels,
    labels = t4$labs %>% 
      levels %>% 
      str_wrap(width = 20)
    ) +
  scale_color_manual(
    values = c(
      "#2fabe1",
      "#8C6382",
      "#e91b23"
    )
  ) 

I think an improvement? Let’s start theming up, and removing the legend, and vestigial labels

t4 %>% 
  ggplot() +
  geom_linerange(
    aes(
      xmax = perc,
      y = labs,
      color = party3
    ),
    linewidth = .4,
    xmin = 0,
    position = position_dodge(width = .45)
  ) +
  geom_point(
    aes(
      y = labs,
      color = party3,
      x = perc
    ),
    shape = 21,
    fill = "white",
    size = 7,
    position = position_dodge(width = .45)
  ) +
  geom_text(
    aes(
      x = perc,
      y = labs,
      color = party3,
      label = perc %>% round
    ),
    size = 3,
    position = position_dodge(width = .45),
  ) +
  scale_x_continuous(
    breaks = NULL,
    limits = c(0, 90),
    expand = expansion(add = c(1, 1))
  ) +
  scale_y_discrete(
    breaks = t4$labs %>% 
      levels,
    labels = t4$labs %>% 
      levels %>% 
      str_wrap(width = 20)
    ) +
  scale_color_manual(
    values = c(
      "#2fabe1",
      "#8C6382",
      "#e91b23"
    )
  ) +
  labs(
    x = "Support (%)",
    y = "",
    title = "Support for US Ukraine policy responses, by partisanship",
    subtitle = "In response to the situation involving Russia and Ukraine, would you support or oppose the United States...",
    caption = "Data source: Chicago Council Survey, 2022"
    ) +
  theme_minimal()

Still more cleaning! And a nice typeface!

p1 <- t4 %>% 
  ggplot() +
  geom_linerange(
    aes(
      xmax = perc,
      y = labs,
      color = party3
    ),
    linewidth = .4,
    xmin = 0,
    position = position_dodge(width = .45)
  ) +
  geom_point(
    aes(
      y = labs,
      color = party3,
      x = perc
    ),
    shape = 21,
    fill = "white",
    size = 7,
    position = position_dodge(width = .45)
  ) +
  geom_text(
    aes(
      x = perc,
      y = labs,
      color = party3,
      label = perc %>% round
    ),
    size = 3,
    position = position_dodge(width = .45),
  ) +
  scale_x_continuous(
    breaks = NULL,
    limits = c(0, 90),
    expand = expansion(add = c(1, 1))
  ) +
  scale_y_discrete(
    breaks = t4$labs %>% 
      levels,
    labels = t4$labs %>% 
      levels %>% 
      str_wrap(width = 20)
    ) +
  scale_color_manual(
    values = c(
      "#2fabe1",
      "#8C6382",
      "#e91b23"
    )
  ) +
  labs(
    x = "Support (%)",
    y = "",
    title = "Support for US Ukraine policy responses, by partisanship",
    subtitle = "In response to the situation involving Russia and Ukraine, would you support or oppose the United States...",
    caption = "Data source: Chicago Council Survey, 2022"
    ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.text.y = element_text(
      hjust = .5,
      face = "italic"
      ),
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(face = "italic"),
    panel.background = element_rect(
      fill = "white",
      color = "white"
    ),
    panel.grid = element_blank(),
    plot.title.position = "plot",
    plot.caption = element_text(size = 8)
  )

Should return (after futzing with some typefaces…)

Which I think is pretty nice!

  • In Tufteian terms, it has high data-to-ink ratio.
  • A reader can figure out the format of the item and the response just from reading the figure itself, without any other context (this is pretty rare when you see figures that make it into print in political science journals).
  • It demonstrates that the party effects are still pretty modest, and the policy differences are huge (for instance, all partisan groups seem to understand and be averse to the possibility that the US fight Russia directly.)

An exercise

Sets of items in this Chicago Council survey measure attitudes towards a related policy area. For instance:

  • Q253_1:Q253_6 measures attitudes toward Russia

  • QUKR1_2:QUKR1_5 measures perceived sanction efficacy

  • QTW2_1:QTW2_7 measures preference for the US response to an invasion of Taiwan

  • Q819_1:Q819_6 measures preference for US policy toward North Korea

For any item set, make a figure that reports either attitudes overall or subgroup attitudes, across all the items in a set.


  1. While SPSS is powerfully naff software, its data formats (.por and .sav) are super–they contain tons of metadata by default, they are backwards compatible across subsequent generations of SPSS–it’s not like Stata’s horrible .dta files, where following Stata 13, the format was completely changed–couldn’t be opened by earlier versions of the software, and no apparent difference in the file until you try and open it!↩︎