We’ll try something different this time–going methodically through a single figure, from scratch. I’ll try to move at a pace which allows us to discuss the details.
In this lab, we’ll use the 2022 version of the Chicago Council Survey, which has measured foreign policy attitudes since 1974.
First, we’ll download the data, helpfully saved as a static zip file
library(tidyverse)
library(magrittr)
library(haven)
tf <- tempfile()
download.file(
url = "https://globalaffairs.org/sites/default/files/2023-02/2022%20CCS%20Data.zip",
destfile = tf
)
Now the file is saved in our machine’s memory as the object
tf
.
What’s this object’s contents?
tf %>%
unzip(list = T)
## Name Length
## 1 2022 CCS Data/2022 Chicago Council Survey Data Set.sav 3373785
## 2 2022 CCS Data/2022 Chicago Council Survey Field Report.docx 336773
## 3 2022 CCS Data/2022 Chicago Council Survey Questionnaire - Spanish.pdf 290862
## 4 2022 CCS Data/2022 Chicago Council Survey Questionnaire.pdf 302436
## 5 2022 CCS Data/2022 Chicago Council Survey Topline.docx 458277
## Date
## 1 2023-01-09 16:29:00
## 2 2023-01-09 16:28:00
## 3 2023-02-08 16:05:00
## 4 2023-02-08 16:05:00
## 5 2023-01-09 16:29:00
That .sav
file is what we need to manipulate the survey
microdata.1 We’ll copy the filename and use it in a new
R script:
t1 <- tf %>%
unz("2022 CCS Data/2022 Chicago Council Survey Data Set.sav") %>%
read_sav %>%
as_factor
Let’s do some housekeeping–building a codebook from the meta data is super useful for understanding the survey design
t1l <- t1 %>%
map(
~attr(., "label")
) %>%
unlist
t1l[1:3]
## CaseID
## "Case Identification Number"
## weight
## "Post-Stratification Weights - 18+ Total Qualified Respondents"
## race_wt
## "Post-Stratification Weights - 18+ Total Qualified Respondents By Racial/Ethnic Groups"
t1l
is a list of length 217. Not coincidentally,
t1
has 217 columns! Each element of t1l
corresponds to the label
attribute in each t1
variable.
We can make a nice codebook tibble
cb1 <- tibble(
itm = t1l %>%
names,
labs = t1l %>%
unlist
)
cb1
## # A tibble: 217 × 2
## itm labs
## <chr> <chr>
## 1 CaseID Case Identification Number
## 2 weight Post-Stratification Weights - 18+ Total Qualified Respondents
## 3 race_wt Post-Stratification Weights - 18+ Total Qualified Respondents By R…
## 4 xchicago xchicago
## 5 tm_start Interview start time (MST)
## 6 tm_finish Interview finish time (MST)
## 7 duration Interview duration in minutes
## 8 QFLAG Data Only Variable: Final Qualification Flag For Respondent Based …
## 9 xacslang Data Only Variable: Primary Language
## 10 xspanish Data Only Variable: Survey Language
## # ℹ 207 more rows
Perchance they ask some nice items about preferred policy options for the US aid to Ukraine?
cb1 %>%
filter(
labs %>%
str_detect("Ukraine")
)
## # A tibble: 13 × 2
## itm labs
## <chr> <chr>
## 1 Q808_6 [Coordinating an international response to Russia’s invasion of Ukra…
## 2 Q30_12a [To help Ukraine defend itself against the Russian invasion] Would y…
## 3 Q620A_7 [Ukraine] Would you favor or oppose the expansion of NATO to include…
## 4 Q249_1 [Other countries following Russia’s example of launching wars for te…
## 5 Q249_2 [Establishing a precedent that national borders are not fixed and ca…
## 6 Q249_3 [China seeing the invasion of Ukraine as a precedent, encouraging it…
## 7 Q250_6 [Accepting Ukrainian refugees into the United States] In response to…
## 8 Q250_5 [Providing economic assistance to Ukraine] In response to the situat…
## 9 Q250_2 [Increasing economic and diplomatic sanctions on Russia] In response…
## 10 Q250_3 [Sending additional arms and military supplies to the Ukrainian gove…
## 11 Q250_8 [Sending US troops to Ukraine to help the Ukrainian government defen…
## 12 QUKR1_2 [Forcing Russia to withdraw troops from Ukraine] How effective do yo…
## 13 QUKR1_3 [Deterring Russia from taking military action beyond Ukraine to neig…
Mmmm the Q250_
prefix items seem relevant here measuring
the breadth of policy responses?
cb1 %>%
filter(
itm %>%
str_detect("Q250_")
) %>%
extract2("labs") %>%
unlist
## Q250_6
## "[Accepting Ukrainian refugees into the United States] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:"
## Q250_5
## "[Providing economic assistance to Ukraine] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:"
## Q250_2
## "[Increasing economic and diplomatic sanctions on Russia] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:"
## Q250_3
## "[Sending additional arms and military supplies to the Ukrainian government] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:"
## Q250_8
## "[Sending US troops to Ukraine to help the Ukrainian government defend itself against Russia] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:"
Whoa awesome! Each label includes the response label and the text of the survey item! They’re gonna come in handy later…🤗
Knowing a thing or two about American politics, we’re probably interested in partisan differences on these items. Do we have a nice ANES style branching PID indicator?
cb1 %>%
filter(
labs %>%
str_detect(
"Democrat|Independent|Republican"
)
)
## # A tibble: 4 × 2
## itm labs
## <chr> <chr>
## 1 Q1010 Generally speaking, do you usually think of yourself as a Republican, a…
## 2 Q1015 Would you call yourself a strong Republican or a not very strong Republ…
## 3 Q1020 Would you call yourself a strong Democrat or a not very strong Democrat?
## 4 Q1025 Do you think of yourself as closer to the Republican Party or to the De…
Fuuuhck ok we’ll need to build a single party id indicator from these ANES branching variables.
(Apologies to the non Americansists – reflecting the very zenith of socthe three PID items go as follows:
Q1010
)Q1025
)Q1015
and Q1020
.)So we’ll use a case_when
to group leaners with
enthusiastic partisans
t1$party3 <- case_when(
# for respondents who had no partisan affiliation
t1$Q1010 %>%
is_in(
c(
"Other",
"Refused"
)
) ~ NA,
# Republican leaners
t1$Q1010 %>%
equals("Independent") &
t1$Q1025 %>%
equals("Republican") ~ "Republican",
# Democratic leaners
t1$Q1010 %>%
equals("Independent") &
t1$Q1025 %>%
equals("Democratic") ~ "Democrat",
# Implacable independents
t1$Q1010 %>%
equals("Independent") &
t1$Q1025 %>%
equals("Neither") ~ "Independent",
# open partisans
t1$Q1010 %>%
equals("Independent") %>%
not ~ t1$Q1010
) %>%
factor(
c("Democrat",
"Independent",
"Republican")
)
Let’s see if we can put everything we need for the eventual figure in a new tibble.
t2 <- t1 %>%
select(
CaseID, weight, party3, starts_with("Q250_")
)
## # A tibble: 3,106 × 8
## CaseID weight party3 Q250_6 Q250_5 Q250_2 Q250_3 Q250_8
## <dbl> <dbl> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 1 1.05 Republican Support Support Support Support Support
## 2 2 0.773 Democrat Oppose Oppose Oppose Oppose Oppose
## 3 3 0.742 Democrat Support Support Support Support Support
## 4 4 1.03 Republican Oppose Oppose Support Oppose Support
## 5 5 0.551 Democrat Support Support Support Support Oppose
## 6 6 0.804 Republican Oppose Oppose Oppose Oppose Oppose
## 7 7 0.545 Democrat Support Support Support Support Oppose
## 8 8 0.383 Democrat Support Support Oppose Support Support
## 9 9 0.278 Republican Oppose Support Support Support Oppose
## 10 10 0.0172 Independent Support Support Support Support Support
## # ℹ 3,096 more rows
Let’s do some more housekeeping–are there adminstrative values in the outcome variables?
t2 %>%
select(
starts_with("Q250")
) %>%
map(table)
## $Q250_6
##
## Refused Support Oppose
## 49 2376 681
##
## $Q250_5
##
## Refused Support Oppose
## 59 2286 761
##
## $Q250_2
##
## Refused Support Oppose
## 64 2521 521
##
## $Q250_3
##
## Refused Support Oppose
## 67 2281 758
##
## $Q250_8
##
## Refused Support Oppose
## 75 1136 1895
Ehhhh survey refusals, at 2% of respondents? I think it’s just a
nuisance, and not something you can tell a persuasive story about. So
let’s just replace them with NA
s. This is obviously
dangerously bad social science practice, so I humbly request your
indulgence.
t2 %<>%
mutate(
across(
starts_with(
"Q250_"
),
~.x %>%
equals("Refused") %>%
ifelse(
NA,
.x %>%
as.character
)
)
)
Whoa do we remember our friend across
? It’s a
stunner!
across()
has two primary arguments:
The first argument, .cols
, selects the columns you
want to operate on. All the super useful selector functions in
dplyr
work here.
The second argument, .fns
, is a function or list of
functions to apply to each column. This can also be a purrr style
formula (or list of formulas) like ~ .x / 2.
We have multiple DVs, indicating all the ways a respondent might
support or oppose different facets of US Ukraine response. Sounds like a
job for pivot_longer()
t2 %<>%
pivot_longer(
cols = starts_with("Q250"),
names_to = "itm",
values_to = "vals",
values_drop_na = T
)
## # A tibble: 15,216 × 5
## CaseID weight party3 itm vals
## <dbl> <dbl> <fct> <chr> <chr>
## 1 1 1.05 Republican Q250_6 Support
## 2 1 1.05 Republican Q250_5 Support
## 3 1 1.05 Republican Q250_2 Support
## 4 1 1.05 Republican Q250_3 Support
## 5 1 1.05 Republican Q250_8 Support
## 6 2 0.773 Democrat Q250_6 Oppose
## 7 2 0.773 Democrat Q250_5 Oppose
## 8 2 0.773 Democrat Q250_2 Oppose
## 9 2 0.773 Democrat Q250_3 Oppose
## 10 2 0.773 Democrat Q250_8 Oppose
## # ℹ 15,206 more rows
This is looking good, but more useful than a variable name is a functional description, especially for a plot.
But hey, that’s why we have lovely things like programmatic codebooks!
t2 %<>%
left_join(
cb1
)
## # A tibble: 15,216 × 6
## CaseID weight party3 itm vals labs
## <dbl> <dbl> <fct> <chr> <chr> <chr>
## 1 1 1.05 Republican Q250_6 Support [Accepting Ukrainian refugees into t…
## 2 1 1.05 Republican Q250_5 Support [Providing economic assistance to Uk…
## 3 1 1.05 Republican Q250_2 Support [Increasing economic and diplomatic …
## 4 1 1.05 Republican Q250_3 Support [Sending additional arms and militar…
## 5 1 1.05 Republican Q250_8 Support [Sending US troops to Ukraine to hel…
## 6 2 0.773 Democrat Q250_6 Oppose [Accepting Ukrainian refugees into t…
## 7 2 0.773 Democrat Q250_5 Oppose [Providing economic assistance to Uk…
## 8 2 0.773 Democrat Q250_2 Oppose [Increasing economic and diplomatic …
## 9 2 0.773 Democrat Q250_3 Oppose [Sending additional arms and militar…
## 10 2 0.773 Democrat Q250_8 Oppose [Sending US troops to Ukraine to hel…
## # ℹ 15,206 more rows
Whoa, this is handy! What’s the format for t2$labs
?
t2$labs %>% table
## .
## [Accepting Ukrainian refugees into the United States] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:
## 3057
## [Increasing economic and diplomatic sanctions on Russia] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:
## 3042
## [Providing economic assistance to Ukraine] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:
## 3047
## [Sending additional arms and military supplies to the Ukrainian government] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:
## 3039
## [Sending US troops to Ukraine to help the Ukrainian government defend itself against Russia] In response to the situation involving Russia and Ukraine, would you support or oppose the United States:
## 3031
Let’s trim that only to the response
t2$labs %<>%
str_sub(
start = 2,
end = t2$labs %>%
str_locate("\\]") %>%
extract(, 1) %>%
subtract(1)
)
## .
## Accepting Ukrainian refugees into the United States
## 3057
## Increasing economic and diplomatic sanctions on Russia
## 3042
## Providing economic assistance to Ukraine
## 3047
## Sending additional arms and military supplies to the Ukrainian government
## 3039
## Sending US troops to Ukraine to help the Ukrainian government defend itself against Russia
## 3031
Lovely! Let’s push on!
t3 <- t2 %>%
group_by(
party3, labs, vals,
) %>%
tally(weight) %>%
mutate(
perc = n %>%
divide_by(
n %>% sum
) %>%
multiply_by(100)
) %>%
na.omit
Let’s start protyping!
A nice stacked bar….
t3 %>%
ggplot() +
geom_col(
aes(
x = party3, y = perc,
fill = vals
),
color = "grey5",
linewidth = .1,
position = position_stack()
) +
facet_wrap(~labs, nrow = 1)
I don’t hate this! The labels are all fucked up, but that’s fixable, and I sorta like that the stacked bars communicate the percentage nature of the quantities.
But it’s pretty wasteful to depict both sides of each answer when we really are only interested in a single estimate of support.
So maybe….
t3 %>%
filter(
vals == "Support"
) %>%
ggplot() +
geom_col(
aes(
y = party3, x = perc,
fill = vals
),
color = "grey5",
linewidth = .1,
position = position_stack()
) +
facet_wrap(~labs, ncol = 1) +
theme(
legend.position = "none"
)
Maybe we’d do better with dots rather than bars?
t3 %>%
filter(
vals == "Support"
) %>%
ggplot() +
geom_point(
aes(
y = party3,
x = perc
)
) +
facet_wrap(~labs, ncol = 1) +
theme(
legend.position = "none"
)
Or we could put labels inside the points?
t3 %>%
filter(
vals == "Support"
) %>%
ggplot() +
geom_point(
aes(
y = party3,
x = perc
),
shape = 21,
fill = "white",
size = 7
) +
geom_text(
aes(
x = perc,
y = party3,
label = perc %>% round
),
size = 3
) +
facet_wrap(~labs, ncol = 1) +
scale_x_continuous(
breaks = NULL
) +
theme(
legend.position = "none"
)
Hmmm or maybe add a line to accentuate policy differences, as well as partisan differences?
t3 %>%
filter(
vals == "Support"
) %>%
ggplot() +
geom_linerange(
aes(
xmin = 0,
xmax = perc,
y = party3
)
) +
geom_point(
aes(
y = party3,
x = perc
),
shape = 21,
fill = "white",
size = 7
) +
geom_text(
aes(
x = perc,
y = party3,
label = perc %>% round
),
size = 3
) +
facet_wrap(~labs, ncol = 1) +
scale_x_continuous(
breaks = NULL
) +
theme(
legend.position = "none"
)
I like this, except that the partisanship has dissapeared a little, and I don’t like that it’s a little too ornate for how little data there is.
Maybe facets could remove some of the boilerplate graphical elements?
t4 <- t3 %>%
ungroup %>%
filter(
vals == "Support"
) %>%
mutate(
labs = labs %>%
fct_reorder(
perc,
.desc = F
)
)
t4 %>%
ggplot() +
geom_linerange(
aes(
xmin = 0,
xmax = perc,
y = labs,
color = party3
),
linewidth = .4,
position = position_dodge(width = .45)
) +
geom_point(
aes(
y = labs,
color = party3,
x = perc
),
shape = 21,
fill = "white",
size = 7,
position = position_dodge(width = .45)
) +
geom_text(
aes(
x = perc,
y = labs,
color = party3,
label = perc %>% round
),
size = 3,
position = position_dodge(width = .45),
) +
scale_x_continuous(breaks = NULL) +
scale_y_discrete(
breaks = t4$labs %>%
levels,
labels = t4$labs %>%
levels %>%
str_wrap(width = 20)
) +
scale_color_manual(
values = c(
"#2fabe1",
"#8C6382",
"#e91b23"
)
)
I think an improvement? Let’s start theming up, and removing the legend, and vestigial labels
t4 %>%
ggplot() +
geom_linerange(
aes(
xmax = perc,
y = labs,
color = party3
),
linewidth = .4,
xmin = 0,
position = position_dodge(width = .45)
) +
geom_point(
aes(
y = labs,
color = party3,
x = perc
),
shape = 21,
fill = "white",
size = 7,
position = position_dodge(width = .45)
) +
geom_text(
aes(
x = perc,
y = labs,
color = party3,
label = perc %>% round
),
size = 3,
position = position_dodge(width = .45),
) +
scale_x_continuous(
breaks = NULL,
limits = c(0, 90),
expand = expansion(add = c(1, 1))
) +
scale_y_discrete(
breaks = t4$labs %>%
levels,
labels = t4$labs %>%
levels %>%
str_wrap(width = 20)
) +
scale_color_manual(
values = c(
"#2fabe1",
"#8C6382",
"#e91b23"
)
) +
labs(
x = "Support (%)",
y = "",
title = "Support for US Ukraine policy responses, by partisanship",
subtitle = "In response to the situation involving Russia and Ukraine, would you support or oppose the United States...",
caption = "Data source: Chicago Council Survey, 2022"
) +
theme_minimal()
Still more cleaning! And a nice typeface!
p1 <- t4 %>%
ggplot() +
geom_linerange(
aes(
xmax = perc,
y = labs,
color = party3
),
linewidth = .4,
xmin = 0,
position = position_dodge(width = .45)
) +
geom_point(
aes(
y = labs,
color = party3,
x = perc
),
shape = 21,
fill = "white",
size = 7,
position = position_dodge(width = .45)
) +
geom_text(
aes(
x = perc,
y = labs,
color = party3,
label = perc %>% round
),
size = 3,
position = position_dodge(width = .45),
) +
scale_x_continuous(
breaks = NULL,
limits = c(0, 90),
expand = expansion(add = c(1, 1))
) +
scale_y_discrete(
breaks = t4$labs %>%
levels,
labels = t4$labs %>%
levels %>%
str_wrap(width = 20)
) +
scale_color_manual(
values = c(
"#2fabe1",
"#8C6382",
"#e91b23"
)
) +
labs(
x = "Support (%)",
y = "",
title = "Support for US Ukraine policy responses, by partisanship",
subtitle = "In response to the situation involving Russia and Ukraine, would you support or oppose the United States...",
caption = "Data source: Chicago Council Survey, 2022"
) +
theme_minimal() +
theme(
legend.position = "none",
axis.text.y = element_text(
hjust = .5,
face = "italic"
),
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(face = "italic"),
panel.background = element_rect(
fill = "white",
color = "white"
),
panel.grid = element_blank(),
plot.title.position = "plot",
plot.caption = element_text(size = 8)
)
Should return (after futzing with some typefaces…)
Which I think is pretty nice!
Sets of items in this Chicago Council survey measure attitudes towards a related policy area. For instance:
Q253_1:Q253_6
measures attitudes toward
Russia
QUKR1_2:QUKR1_5
measures perceived sanction
efficacy
QTW2_1:QTW2_7
measures preference for the US
response to an invasion of Taiwan
Q819_1:Q819_6
measures preference for US policy
toward North Korea
For any item set, make a figure that reports either attitudes overall or subgroup attitudes, across all the items in a set.
While SPSS is powerfully naff software, its data formats
(.por
and .sav
) are super–they contain tons of
metadata by default, they are backwards compatible across subsequent
generations of SPSS–it’s not like Stata’s horrible .dta
files, where following Stata 13, the format was completely
changed–couldn’t be opened by earlier versions of the software, and no
apparent difference in the file until you try and open it!↩︎