I wanted to recreate the figures from “The Origins and Consequences of Affective Polarization in the United States” (Iyengar et al., 2019) and “Political Sectarianism in America” (Finkel et al., 2020) shown below. They use American National Election Studies data to show how Affective Polarization has changed over time. This tutorial will not reproduce the figures exactly, but it will at least give you the base figure to work from to modify for your own purposes. This code uses tidy principles.

If you would like to connect, please find my contact information at my personal website. Happy to answer any questions!

Iyengar et al., (2019)

Finkel et al., (2020)

Overview

The steps are:

Grab data from ANES
Import
Clean
Mutate data for plotting
Plot

Grab ANES data

Grab the Time Series Cumulative Data File from: https://electionstudies.org/data-center/anes-time-series-cumulative-data-file. Personally, I like grabbing SPSS files because they include metadata that might be useful. The file is about 82MB.

Import data

library(tidyverse)
library(rio) #for importing

raw=import("anes_timeseries_cdf_spss_20211118.sav")

Clean data

Let’s clean the data, first by selecting just the variables of interest. I found the variables using the codebook on the ANES page.

VCF0004: Year of Study
VCF0218: Thermometer - Democratic Party
VCF0224: Thermometer - Republican Party
VCF0301: Party Identification of Respondent- 7-point Scale

And we might as well drop any participants that haven’t filled out any of these questions,

df=raw %>% 
  select(VCF0004, VCF0218, VCF0224, VCF0301) %>% 
  filter(!is.na(VCF0004) & !is.na(VCF0218) & !is.na(VCF0224) & !is.na(VCF0301))

df %>% head

##   VCF0004 VCF0218 VCF0224 VCF0301
## 1    1978      80      50       1
## 2    1978      50      50       4
## 3    1978      40      60       7
## 4    1978      60      60       3
## 5    1978      85      60       3
## 6    1978      50      50       2

Mutate data

We then want to change party identification from a 7-point scale into just Democrats and Republicans. Let’s look at the party ID variable to see the value coding.

df$VCF0301 %>% glimpse

##  num [1:42940] 1 4 7 3 3 2 3 2 2 1 ...
##  - attr(*, "label")= chr "Party Identification of Respondent- 7-point Scale"
##  - attr(*, "format.spss")= chr "F1.0"
##  - attr(*, "display_width")= int 9
##  - attr(*, "labels")= Named num [1:8] 0 1 2 3 4 5 6 7
##   ..- attr(*, "names")= chr [1:8] "0. DK; NA; other; refused to answer; no Pre IW" "1. Strong Democrat" "2. Weak Democrat" "3. Independent - Democrat" ...

df$VCF0301 %>% attr("labels") #because glimpse cuts off the labels

## 0. DK; NA; other; refused to answer; no Pre IW 
##                                              0 
##                             1. Strong Democrat 
##                                              1 
##                               2. Weak Democrat 
##                                              2 
##                      3. Independent - Democrat 
##                                              3 
##                   4. Independent - Independent 
##                                              4 
##                    5. Independent - Republican 
##                                              5 
##                             6. Weak Republican 
##                                              6 
##                           7. Strong Republican 
##                                              7

We should drop the following: 0, 3, 4, 5. And we also want to collapse Strong and Weak partisans into the same category. We can do both of these at the same time using case_when(). Let’s also drop the NAs.

df=df %>% 
  mutate(PID=case_when(
    VCF0301 == 1 ~ "Democrat",
    VCF0301 == 2 ~ "Democrat",
    VCF0301 == 6 ~ "Republican",
    VCF0301 == 7 ~ "Republican",
    TRUE         ~ NA_character_ #functions like ELSE
  )) %>%  
  filter(!is.na(PID)) #drop the NAs

Now we need to make an inparty feeling thermomoter and and outparty feeling thermometer.

df=df %>% 
  mutate(
    inparty_feeling=case_when(
      PID == "Democrat"   ~ VCF0218,  #if they are a democrat, then use democrat feeling variable
      PID == "Republican" ~ VCF0224   #if they are a repoublican, then use republican feeling variable
    ),
    
    outparty_feeling=case_when(
      PID == "Democrat"   ~ VCF0224,  #if they are a democrat, then use republican feeling variable
      PID == "Republican" ~ VCF0218   #if they are a repoublican, then use democrat feeling variable
    )
  
  )

df %>% head()

##   VCF0004 VCF0218 VCF0224 VCF0301        PID inparty_feeling outparty_feeling
## 1    1978      80      50       1   Democrat              80               50
## 2    1978      40      60       7 Republican              60               40
## 3    1978      50      50       2   Democrat              50               50
## 4    1978      60      60       2   Democrat              60               60
## 5    1978      70      40       2   Democrat              70               40
## 6    1978      85      85       1   Democrat              85               85

Now we need to make a dataframe for summary stats through time.

df_p=
df %>% 
  select(VCF0004, PID, inparty_feeling, outparty_feeling) %>% 
  group_by(VCF0004) %>% 
  summarise(inparty_feeling= mean(inparty_feeling,  na.rm=T),
            outparty_feeling=mean(outparty_feeling, na.rm=T)
            ) 


df_p

## # A tibble: 17 x 3
##    VCF0004 inparty_feeling outparty_feeling
##      <dbl>           <dbl>            <dbl>
##  1    1978            73.9             47.0
##  2    1980            75.0             45.6
##  3    1982            76.2             43.4
##  4    1984            76.8             44.9
##  5    1986            76.4             45.1
##  6    1988            77.6             44.0
##  7    1990            73.6             45.6
##  8    1992            72.5             40.8
##  9    1994            72.6             40.5
## 10    1996            74.3             39.5
## 11    1998            72.9             38.8
## 12    2000            76.5             39.8
## 13    2004            77.0             36.4
## 14    2008            77.0             32.3
## 15    2012            74.4             25.5
## 16    2016            70.2             24.4
## 17    2020            74.6             17.6

Looks good. To plot it, we need to reshape the data to a long format.

df_p=df_p %>% 
  pivot_longer(-VCF0004, names_to="feeling", values_to="Warmth")

df_p %>% head

## # A tibble: 6 x 3
##   VCF0004 feeling          Warmth
##     <dbl> <chr>             <dbl>
## 1    1978 inparty_feeling    73.9
## 2    1978 outparty_feeling   47.0
## 3    1980 inparty_feeling    75.0
## 4    1980 outparty_feeling   45.6
## 5    1982 inparty_feeling    76.2
## 6    1982 outparty_feeling   43.4

Plotting

Now, we can plot it.

df_p %>% 
  ggplot(aes(x=VCF0004, y=Warmth, color=feeling)) +
  geom_point() +
  geom_line()

Not bad. Here’s the final version I ended up exporting as SVG so I can further play around with aesthetics in PowerPoint.

df_p %>% 
  mutate(feeling=recode(feeling,
                        inparty_feeling="Inparty Feeling",
                        outparty_feeling="Outparty Feeling",
                        )) %>% 
  ggplot(aes(x=VCF0004, y=Warmth, color=feeling)) +
  
  geom_ribbon(
    data=. %>% group_by(VCF0004) %>% rstatix::get_summary_stats(Warmth),
    aes(ymin=min,ymax=max, y=mean, color=NULL),
    fill="gray96", show.legend = F)+  
  
  geom_hline(yintercept=50, color="gray", linetype="dashed")+
  
  geom_point(size=2.5) +
  geom_line(size=1, show.legend = F) +

  scale_x_continuous(breaks=seq(1980, 2020, by=4)) +
  scale_y_continuous(limits=c(0,100),
                     labels = paste0(seq(0, 100, by=25), "°")
                     ) +
  theme_minimal() +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major = element_blank(),
        axis.title.x = element_blank(),
        axis.text.x = element_text(angle=45, hjust=1),
        legend.title = element_blank()) +
  labs(x="Election year", y="Feeling thermometer rating",
       title="Affective Polarization Over Time Using ANES Data")

Affective Polarization Over Time using data from American National Election Studies (ANES)