American National Election Studies Data Exploration

Author

Kimberly Ouimette

Published

May 8, 2023

About

This report explores data from the American National Election Studies (ANES), a national research study examining Americans’ voting patterns, public opinion, and political participation with efforts supported by Stanford University, University of Michigan, and the National Science Foundation, from inception in 1948 to the most recently published wave in 2020. Part I of this investigation follows a series of questions outlined in the final assignment of Dr. Collin Paschall’s Programming and Data Management course at Johns Hopkins University. Part II of the report explores a series of research questions exploring religious importance and stances on the death penalty and abortion across different demographics.

For the purposes of this analysis, respondents who reported not knowing the answer or who skipped specific items entirely were excluded. Given that such deletion may disproportionately represent some sectors of the population over others, it cannot be said that the results of these analyses are applicable to the entire sample. However, they do offer a unique insight into the state of American politics through the eyes of citizens in the last half century.

Finally, it is worth noting that the programming language used throughout the report oscillates between R and Python - with each question being clearly labelled as such. The following packages or libraries were used within each language: R (i.e., Tidyverse) and Python (NumPy & Pandas).

Questions regarding syntax or other contents of this report may be directed to the author, Kimberly Ouimette, at kouimet1@jh.edu.

Data Source:

American National Election Studies. 2022. ANES Time Series Cumulative Data File [dataset and documentation]. September 16, 2022 version. www.electionstudies.org

Code
# Loading Tidyverse
library(tidyverse)

# Reading .csv 
anes <- read_csv('anes_timeseries_cdf_csv_20220916.csv')

Part I: Guided Questions

Question 1

R: Create a tibble that shows how many respondents are in each wave of the survey.

Data Preparation Syntax
Code
# Renaming 'year' variable to be more identifiable
anes <- anes %>% rename('year' = 'VCF0004')
Answer
Code
anes %>% 
  count(year) %>% 
  group_by(year)
year n
1948 662
1952 1899
1954 1139
1956 1762
1958 1450
1960 1181
1962 1297
1964 1571
1966 1291
1968 1557
1970 1507
1972 2705
1974 1575
1976 2248
1978 2304
1980 1614
1982 1418
1984 2257
1986 2176
1988 2040
1990 1980
1992 2485
1994 1795
1996 1714
1998 1281
2000 1807
2002 1511
2004 1212
2008 2322
2012 5914
2016 4270
2020 8280

Question 2

Python: How are survey respondents distributed across the major geographic regions of the U.S. in the 1996 wave of the survey? (i.e., how many respondents per region)

Process

First, I filtered the ANES dataset to only include those respondents in 1996. From there, I assigned the respective names to the numerical codes for major geographic regions (e.g., Northeast, South) in a new variable (i.e., ‘Region’). This then allowed me to run a cross-tabulation of the respondents within each region in 1996 - as demonstrated in the output below.

Data Preparation Syntax
Code
import pandas as pd
import numpy as np
anes = pd.read_csv('anes_timeseries_cdf_csv_20220916.csv')
Code
anes2 = anes.apply(pd.to_numeric,errors="coerce")
anes2 = anes2.rename(columns = {'VCF0004' : 'year', 'VCF0112' : 'region'})
Answer
Code
# Subsetting dataset to include only 1996 wave
anes_96 = anes2[anes2.year == 1996]

# Creating mapping 
region_mapping = {1 : 'Northeast', 2 : 'North Central', 3: 'South', 4: 'West', 0 : 'NA'}
anes_96 = anes_96.assign(Region = anes_96.region.map(region_mapping))

# Creating table 
pd.crosstab(index=anes_96['Region'], columns = 'Number of Respondents')
col_0          Number of Respondents
Region                              
North Central                    458
Northeast                        260
South                            642
West                             354
Interpretation

As evident in the table above, the majority of the 1,714 respondents in the 1996 wave were in the South and North Central regions while the least were located in the Northeast and West regions of the United States.

Question 3

R: Considering the 2008 wave and subsequent waves, what percent of these interviews in each wave were partially or entirely translated to Spanish? Don’t forget to account for both pre- and post-election interviews (ANES surveys include pre- or post-election interviews).

Process

To start, I selected only those observations between 2008 and 2020. From there, I removed rows with missing entries and recoded the pre- and post-election interview language variables into a binary indicator (i.e., 1 = in Spanish, 0 = Not in Spanish). Then, I selected only those rows with non-missing values. To determine the percent of interviews conducted in Spanish I used the syntax below. Specifically, I grouped the data by wave, calculated the proportion in Spanish (i.e., sum of pre- and post-election interview language variables divided by the number of rows multiplied by two), and multiplied that value by 100 to return the percent value.

Data Preparation Syntax
Code
# Subsetting by 2008 onward
anes_2008 <- subset(anes, year >= 2008)

# Selecting only relevant variables
anes_spanish <- anes_2008 %>% select(year, VCF0018a, VCF0018b)
anes_spanish <- rename(anes_spanish, pre_lang = VCF0018a, post_lang = VCF0018b)
anes_span2 <- drop_na(anes_spanish)

# Recoding values for pre-interviews, 1 = Spanish, 0 = Not Spanish
anes_span2$pre_lang <- anes_span2$pre_lang %>% case_match(
  c(0, 3, 5, 7) ~ 0, 
  c(1) ~ 1, 
  c(9) ~ -999
)

# Recoding values for post-interviews, 1 = Spanish, 0 = Not Spanish
anes_span2$post_lang <- anes_span2$post_lang %>% case_match(
  c(0, 3, 5, 7) ~ 0,
  c(1, 4) ~ 1, 
  c(9) ~ -999
)

# Subsetting for only valid responses 
spanish <- subset(anes_span2, pre_lang != -999 & post_lang != -999)
Answer
Code
spanish %>%
  group_by('Year' = year) %>%
  summarize('% of Interviews in Spanish' = sum(pre_lang, post_lang)/(2 * nrow(spanish)) * 100) %>%
  round(digits = 2)
Year % of Interviews in Spanish
2008 0.47
2012 0.95
2016 0.27
2020 0.32
Interpretation

Between 2008 and 2020, the percent of interviews conducted in Spanish never exceeded 1 percent. This is concerning considering approximately 13 percent of the U.S. population speaks Spanish at home as of 2021 (Forbes). While it is probable many of those individuals can also speak English, it raises concerns about whether there may be language barriers to accurately responding to the ANES surveys and interviews, especially when the instrument is not presented in one’s native language. Alternatively, the staggeringly low percent of interviews conducted in Spanish may be more reflective of the sample of individuals met by the ANES (i.e., predominantly native-English speakers, likely U.S.-born, non-Hispanic/Latino citizens). For this reason, further exploration of demographic variables such as race or ethnic identity, immigrant status, and household income is warranted.

Question 4

Python: One of the questions on this survey has the interviewer read a list of words and phrases that people use to describe political figures. Then, the interviewer asks the interviewee to think about a given political figure, and the interviewer asks whether a given phrase describes that political figure extremely well, quite well, not too well or not well at all.

So, for example, the interviewer might say, “Think about Ronald Reagan. In your opinion, does the phrase or word ‘intelligent’ describe Ronald Reagan extremely well, quite well, not too well, or not well at all?”

Based on the survey data between 1980 and 2008, which president did women under the age of 40 think was the most knowledgeable? Which president was the least knowledgeable in the eyes of this group? You can average across all surveys during a president’s term. Some presidents will be included in more waves than others - that’s fine, use the average regardless of the number of terms.

Process

Similar to previous questions, my first step was to subset only those responses between 1980 and 2008. From there, I selected only valid responses to the question asking about if a president is considered knowledgeable (i.e., not missing or unsure). From there, I filtered the dataset further to include only women under the age of 40. Then, I mapped the associated Presidents’ names with each year (e.g., 1980 = Jimmy Carter, 2004 = George W. Bush) and assigned it to a new variable. Finally, I grouped the data by president and computed their associated mean knowledge score.

Data Preparation Syntax
Code

# Subsetting to pull waves 1980 - 2008
anes1980 = anes2[(anes2.year >= 1980) & (anes2.year <= 2008)]

# Renaming variables to be more meaningful
anes1980 = anes1980.rename(columns = {'VCF0342' : 'knowledge', 'VCF0101' : 'age', 'VCF0104' : 'gender'})

# Subsetting data for only valid knowledge responses
anes1980_nm = anes1980[anes1980.knowledge.isin([1, 2, 3, 4])]

# Creating new, smaller data frame with variables of interest 
df = anes1980_nm[['year', 'age', 'gender', 'knowledge']]

# Filtering data frame by women younger than 40
df_filtered = df[(df.age < 40) & (df.gender == 2)]

# Mapping years onto presidents 
presidents = {1980: 'Jimmy Carter', 1982: 'George Bush Sr.', 1984: 'George Bush Sr.', 1986: 'George Bush Sr.', 1988: 'George Bush Sr.', 
1994: 'Bill Clinton', 1996: 'Bill Clinton', 1998: 'Bill Clinton', 2000: 'Bill Clinton', 2002: 'George W. Bush', 2004: 'George W. Bush', 2008: 'George W. Bush'}

# Assigning President to respective years based on codebook
pres_rating = df_filtered.assign(President = df_filtered.year.map(presidents))
Answer
Code
pres_rating.groupby('President')['knowledge'].mean().sort_values().round(2)
President
Bill Clinton       1.99
George Bush Sr.    2.08
Jimmy Carter       2.08
George W. Bush     2.52
Name: knowledge, dtype: float64
Interpretation

The above table shows the average attributed knowledgeable score (range 1-4) to each President by women under the age of 40 between 1988 and 2008. Of note, George Bush Sr. and Jimmy Carter share the same mean score, while Bill Clinton appears to be seen as the least knowledgeable among women under the age of 40. Contrarily, women under the age of 40 found George W. Bush to be the most knowledgeable among all the presidents in this time period. Given Bill Clinton’s infamous affair with one of his interns in 1996, it is understandable that women would view him as less knowledgeable than the other presidents. Alternatively, George W. Bush was president during the 9/11 terrorist attacks - a time met with one of the highest demonstrations of American patriotism and support. Because Bush was met with such an unprecedented situation, it is likely that women under age 40 viewed his response to the crisis positively - thereby attributing him as more knowledgeable (at least in the aftermath of 9/11) and possibly skewing his aggregate knowledge rating compared to the rest of his presidency. In fact, the day after 9/11, Bush’s presidential approval rating soared to almost 90 percent - a number unseen by any other U.S. President at the time (Gallup).

Question 5

R: These days, the evidence suggests that higher levels of education are associated with more liberal political attitudes, as measured on a traditional seven-point ideology scale. Track this pattern over time. Use respondents from 1980, 1992, 2000, and 2020. What is the average political ideology of survey respondents with a college degree or greater vs. the political ideology of respondents without a college degree? (Note: some college doesn’t count) In addition, repeat this, but compare how this breaks down on along racial lines. Is the pattern the same for whites and non-whites?

Process

To start, I selected data only from the listed years (i.e., 1980, 1992, 2000, & 2020) with valid, non-missing responses to the education level, race, and political ideology scale (range 1-7) items. From there, I categorized education level into two categories: ‘has college degree’ and ‘no college degree’ (i.e., some college, high school or less). Similarly, I categorized respondents into two categories based on their reported race: ‘white’ and ‘not white’ (i.e., those who selected anything but White). From there, I tabulated the data by year and education level with the associated average political ideology score. This process was then repeated with the added layer of race.

Data Preparation Syntax
Code
# Subsetting by select years
anes1980 <- subset(anes, year == c(1980, 1992, 2000, 2020))

# Creating smaller data frame, renaming variables to more meaningful names 
df5 <- anes1980 %>% select(year, 'Education' = VCF0110, 'Ideology' = VCF0803, 'Race' = VCF0106)

# Subsetting to only include observations with valid entries 
df5 <- subset(df5, df5$Education > 0) 
df5 <- subset(df5, df5$Ideology < 9 & df5$Ideology != 0)  

# Creating categorical variable indicated whether a respondent has a college degree 
df5$educ_level <- df5$Education %>% case_match(
  c(1, 2, 3) ~ 'No College Degree', 
  c(4) ~ 'Has College Degree'
)


# Creating categorical variable indicating respondents' race 
df5$race_cat <- df5$Race %>% case_match(
  c(1) ~ 'White', 
  c(2, 3) ~ 'Not White', 
  c(0, 9) ~ 'Missing'
)

df5 <- subset(df5, race_cat != 'Missing')

df5 <- drop_na(df5)

Average Ideology by Education Level

Code
edlevel <- df5 %>% group_by('Education Level' = educ_level, year) %>% 
  summarize_at(vars(Ideology), list('Average Ideology' = mean)) %>% arrange(year)

spread(edlevel, key = year, value = 'Average Ideology')
Education Level 1980 1992 2000 2020
Has College Degree 4.229167 4.075000 4.291139 3.829861
No College Degree 4.373134 4.191083 4.310000 4.255517
Interpretation

The above table shows minimal difference in political ideology between respondents with and without a college degree - with the largest difference occurring in 2020 with a difference of 0.42 points. This contradicts prior evidence that being college educated is associated with more liberal ideologies. However, it may also suggest the presence of another variable, such as race, that may influence this relationship.

Average Ideology by Education Level and Race

Code
df5_table <- df5 %>% group_by('Race' = race_cat, 'Education Level' = educ_level, year) %>% 
  summarize_at(vars(Ideology), list('Average Ideology' = mean)) %>% arrange(year)

(df5_wide <- spread(df5_table, key = year, value = 'Average Ideology') %>% arrange('Education Level')) 
Race Education Level 1980 1992 2000 2020
Not White Has College Degree 4.500000 4.833333 4.357143 3.535353
Not White No College Degree 4.125000 3.907895 4.095238 3.803419
White Has College Degree 4.204546 3.990741 4.276923 3.917417
White No College Degree 4.420118 4.281513 4.367089 4.424242

I then visualized this time series using the R ggplot package:

Data Preparation Syntax
Code
# Subsetting data into groups
# White - No Degree
w_nodegree <- gather(df5_wide[4, 3:6])
w_nodegree <- rename(w_nodegree, w_nodegree.ideology = value)

# White - Has Degree
w_degree <- gather(df5_wide[3, 3:6])
w_degree <- rename(w_degree, w_degree.ideology = value)

# Not White - No Degree
nw_nodegree <- gather(df5_wide[2, 3:6])
nw_nodegree <- rename(nw_nodegree, nw_nodegree.ideology = value)

# Not White - Has Degree
nw_degree <- gather(df5_wide[1, 3:6])
nw_degree <- rename(nw_degree, nw_degree.ideology = value)

# Merging into one long dataset 
white <- merge(w_nodegree, w_degree, by = 'key')
nwhite <- merge(nw_degree, nw_nodegree, by = 'key')
fig <- merge(white, nwhite, by = 'key')
fig <- rename(fig, 'year' = 'key')

# Merging white and not white datasets into one long dataset, 
fig <- fig %>% select(year, w_nodegree.ideology, w_degree.ideology, nw_degree.ideology, nw_nodegree.ideology) %>%
  gather(key = 'variable', value = 'value', -year) %>% arrange(year)

fig$variable <- factor(fig$variable)

# Recoding into grouping variables
fig <- fig %>% mutate(degree = case_match(
  variable, 
  c('w_degree.ideology', 'nw_degree.ideology') ~ 'Has Degree', 
  c('w_nodegree.ideology', 'nw_nodegree.ideology') ~ 'No Degree'
))

fig <- fig %>% mutate(race = case_match(
  variable, 
  c('w_degree.ideology', 'w_nodegree.ideology') ~ 'White',
  c('nw_degree.ideology', 'nw_nodegree.ideology') ~ 'Not White'
))
Visualization Syntax

Interpretation

Based on the time-series plot above, it is evident that there are differences in the effect of college degree attainment on a person’s placement along the political ideology spectrum based on their race. Specifically, among White respondents there is minimal difference in ideology between those with and without a degree - with those without a degree possessing slightly more conservative views than those with a college degree. However, it is worth noting that in 2020, this gap was far more profound. Yet, among not White respondents, those with a degree historically possessed more conservative views than those of their peers without college degrees. In fact, non-White respondents with degrees had the most conservative views of all groups until 2000 when they were about the same as White respondents without college degrees.

Yet, in 2020, non-White respondents with degrees became the group with the most liberal views with non-White respondents without degrees closely trailing behind. These trends suggests that some critical events (most likely the events leading up to the election of and presidency of the highly controversial President Donald J. Trump) happened between the 2000 and 2020 waves to cause such a drastic shift toward more liberal views among everyone except White respondents without degrees - who became slightly more conservative.

Question 6

Python: Several questions on this survey are related to social trust. I’m talking here about questions VCF0619-VCF0621. Let’s just look at the 2004 survey responses.

Construct a scale that adds together the responses to these three questions so that higher values indicate greater social trust. Set the scale so it runs from zero (the minimum amount of trust).

Now, consider how this scale relates to respondents’ partisan identity (strong Democrat to strong Republican). Do you see any evidence that greater social trust is associated with partisan identity?

Process

First, I subset the data to only include responses from 2004. Then, I recoded the responses to the three questions into three separate binary variables (1 = Most trusting, 0 = Least trusting) and removed any responses that were missing or indicated ‘it depends’. From there, I created the trust variable by summing these columns for each row. Next, I filtered the data to include only those responses with a valid political ideology score (i.e., between 1-7) and selecting only the two relevant columns to generate the first 10 rows of the table below.

Data Preparation Syntax
Code
anes2004 = anes2[(anes2.year == 2004)]

anes2004 = anes2004.rename(columns = {'VCF0619' : 'careful', 'VCF0620' : 'helpful', 'VCF0621' : 'fair', 'VCF0803' : 'ideology'})

# Recoding values 
scale_map = {1:0, 2:1}
anes2004 = anes2004.assign(careful_rc = anes2004.careful.map(scale_map))
anes2004 = anes2004.assign(helpful_rc = anes2004.helpful.map(scale_map))
anes2004 = anes2004.assign(fair_rc = anes2004.fair.map(scale_map))

# Subsetting for needed variables 
anes2004 = anes2004[['careful_rc', 'helpful_rc', 'fair_rc', 'ideology'
]]

# Removing missing values 
anes2004 = anes2004.dropna()

# Creating social trust scale
anes2004['trust'] = anes2004['careful_rc'] + anes2004['helpful_rc'] + anes2004['fair_rc']

# Filtering for valid ideology 
anes2004 = anes2004.loc[(anes2004['ideology']<9) & (anes2004['ideology'] != 0)]

# Selecting columns
df6 = anes2004[['trust', 'ideology']]

Table Preview

Code
df6.head(10)
       trust  ideology
46227    2.0       4.0
46228    3.0       6.0
46230    1.0       6.0
46231    0.0       6.0
46232    3.0       6.0
46233    3.0       4.0
46235    0.0       6.0
46236    1.0       4.0
46238    0.0       4.0
46239    2.0       4.0
Process

From there, I decided to compute the correlation between a person’s social trust and their ideology.

Correlation

Code
corr = np.corrcoef(df6['trust'], df6['ideology'])

corr[1, 0].round(3)
-0.001
Interpretation

Based on the results of the correlation test, there appears to be no correlation (r = -0.001) between a respondent’s degree of social trust and their political ideology. This suggests that greater social trust is not associated with partisan identity.

Question 7

R: A common type of question on political surveys is the “feeling thermometer” where respondents are asked how warm/cold they feel about certain topics or political groups or figures. It is widely believed that political polarization today is worse than in the past - Republicans have more negative feelings about Democrats today than they did in years past, and vice versa.

Use these survey data to assess this claim.

Process

To answer this question, I first included only data of respondents with valid entries (i.e., not missing, refused to answer, or unsure) for both feelings toward a political party (i.e., reported value < 98) and respondents’ political party affiliation (i.e., value < 8). From there, I recoded the numerical values associated with each political party to their given name (e.g., Democrat, Republican) and then subsetted the dataset into two separate data frames - one with only Republican respondents and the other with Democrat respondents. Using these subsets of the data, I was then able to calculate (using the code below) the average feelings toward the opposite party for Democrats and Republicans in each wave of the study between 1978 and 2020.

Data Preparation Syntax
Code
polar <- anes %>% select(year, 'dem_feels'= VCF0218, 'rep_feels'= VCF0224, 'party' = VCF0302)

# Dropping missing values
polar <- drop_na(polar)

# Excluding missing data & don't knows/refusals
polar_trim <- subset(polar, dem_feels < 98 & rep_feels < 98 & party < 8)

# Recoding values to be more meaningful
polar_trim$party_cat <- polar_trim$party %>% case_match(
                          1 ~ 'Republican', 
                          2 ~ 'Independent',
                          3 ~ 'No preference', 
                          4 ~ 'Other', 
                          5 ~ 'Democrat')
# Subsetting 
polar_trim <- subset(polar_trim, party_cat == c('Democrat', 'Republican'))
polar_dem <- subset(polar_trim, party_cat == 'Democrat')
polar_rep <- subset(polar_trim, party_cat == 'Republican')

# Computing mean feelings toward opposite party by year
avg_demfeelsofreps <- polar_dem %>% group_by(year) %>%
  summarize(mean_repfeels = mean(rep_feels))

avg_repfeelsofdems <- polar_rep %>% group_by(year) %>% 
  summarize(mean_demfeels = mean(dem_feels))
Process

These aggregations were then compiled into the line graph below:

Data Visualization Syntax
Code
ggplot() +
  geom_line(data = avg_demfeelsofreps, aes(x = year, y = mean_repfeels, color = "Democrats' feelings of Republicans"), lwd = 1) +
  geom_line(data = avg_repfeelsofdems, aes(x = year, y = mean_demfeels, color = "Republicans' feelings of Democrats"), lwd = 1) +
  scale_color_manual(values = c("Democrats' feelings of Republicans" = 'blue', "Republicans' feelings of Democrats" = 'red')) +
  theme(plot.caption = element_text(hjust=0)) +
  labs(x = "Year", 
       y = "Average Feeling Thermometer Score", 
       title = "Democrats and Republicans' Feelings Toward Opposite Party", 
       subtitle = "1978 - 2020", 
       caption = "Source: \nAmerican National Election Studies. 2022.
ANES Time Series Cumulative Data File
[dataset and documentation]. September 16, 2022 version. \nwww.electionstudies.org)") 

Interpretation

Based on the line graph above, it is evident that both Democrats and Republicans have steadily lost reverence (with minor fluctuations between 1990 and 2000) for one another since 1978 - thereby cultivating an increasingly polarized political environment.

Part II: Exploration

For the remainder of this project, you will further explore the ANES data and then create and answer some questions that are interesting to you about American politics. Answer four different questions or test four different claims, or conduct some other analysis of your choice, in the style of what you did in Part 1. In each question, introduce at least one new variable into your analysis.

Religious Importance Over Time

To start my exploration, I wanted to see how the importance of religion in Americans’ lives changed over time - specifically, broken down by gender. To do this, I selected the following variables: year of study (VCF0004), gender of respondent (VCF0104), and whether religion was important to the respondent (VCF0846). For the purposes of this analysis, I only included respondents who identified as male or female as the third category (i.e., ‘other’) was only recently introduced in 2016 and therefore there was not enough data available for longitudinal comparison. Similarly, I only included valid responses to the religious importance question (i.e., non-missing values).

The ANES assessed the degree of importance of religion in respondents’ lives by asking “Do you consider religion to be an important part of your life, or not?” from the 1980 wave onward. Respondents were then lumped into two categories indicating whether they find religion to be an important part of their life or of little to no importance.

To compare the importance of religion in the lives of American men and women, I split the data into two data frames - one for males and females respectively. From there, I computed the percentage of men and women in each wave between 1980 and 2020 who found religion to be an important or unimportant part of their lives.

Data Preparation Syntax
Code
anes2 = anes2.rename(columns = {'VCF0004': 'year', 'VCF0104' : 'gender', 'VCF0846' : 'relig_import', 'VCF0230' : 'abortion', 'VCF9237' : 'death_pen', 'VCF0101' :  'age', 'VCF0503' : 'ideology', 'VCF0103' : 'cohort'})

df8 = anes2[['year', 'gender', 'relig_import']]
df8_filtered = df8[(df8.relig_import > 0) & (df8.relig_import < 3) & (df8.gender <3) & (df8.gender > 0)]

# Recoding to more meaningful labels 
religion_mapping = {1 : 'Important', 2 : 'Not important'}
df8_filtered = df8_filtered.assign(religion_rc = df8_filtered.relig_import.map(religion_mapping))

gender_mapping = {1 : 'Male', 2 : 'Female'}
df8_filtered = df8_filtered.assign(gender_rc = df8_filtered.gender.map(gender_mapping))

# Subsetting by gender 
df8_fem = df8_filtered[(df8_filtered.gender == 2)]
df8_male = df8_filtered[(df8_filtered.gender == 1)]

Importance of Religion in Women’s Lives

Code
pd.crosstab(df8_fem['year'], df8_fem['religion_rc'], normalize = 'index').applymap(lambda x: "{0:.0f}%".format(100*x))
religion_rc Important Not important
year                               
1980              82%           18%
1984              86%           14%
1986              84%           16%
1988              83%           17%
1990              85%           15%
1992              84%           16%
1994              83%           17%
1996              83%           17%
1998              81%           19%
2000              81%           19%
2002              83%           17%
2004              82%           18%
2008              79%           21%
2012              75%           25%
2016              70%           30%
2020              70%           30%

Importance of Religion in Men’s Lives

Code
pd.crosstab(df8_male['year'], df8_male['religion_rc'], normalize = 'index').applymap(lambda x: "{0:.0f}%".format(100*x))
religion_rc Important Not important
year                               
1980              67%           33%
1984              71%           29%
1986              71%           29%
1988              71%           29%
1990              72%           28%
1992              72%           28%
1994              71%           29%
1996              72%           28%
1998              70%           30%
2000              70%           30%
2002              69%           31%
2004              72%           28%
2008              71%           29%
2012              64%           36%
2016              61%           39%
2020              62%           38%
Interpretation

The importance of religion in the lives of American men and women remained rather consistent (+/- 5 percentage points) between from 1980 to 2012. It wasn’t until 2012 that the proportion of American men and women who found religion to be important began to show signs of decline. Although American men and women followed similar change patterns on this metric, it is worth noting that more American women in every wave examined found religion to be an important part of their lives than men - with some years having gaps as high as 15 percent (see 1984).

Religious Importance Among Supporters of Death Penalty

Seeing the changes in religious importance over time, I wanted to explore the relationship between Americans’ views of religion and their stance on the death penalty for persons convicted of murder. To start this process, I used the same variable assessing whether religion was important to the respondent (VCF0846) and also included respondents’ answers to the question “Do you favor or oppose the death penalty for persons convicted of murder?” (VCF9237). To start, I extracted data from waves in which all data points were available (1988-2020, excluding 2002) and removed any missing or indecisive responses. It is worth noting that any data rows with missing data on the following variables was removed: VCF9237, VCF0846, VCF0110, and VCF0101.

From there, I first wanted to get a sense of the breakdown of religious and non-religious Americans’ stance on the death penalty across all time points (see table below).

Data Preparation Syntax
Code
df9 <- anes %>% select(year, 'death_pen' = VCF9237, 'relig_import' = VCF0846, 'Education' = VCF0110, 'Age' = VCF0101)

# Selecting years where all three data points are available
df9 <- subset(df9, year >=1988)

# Discovering what year(s) have missing death penalty data 
df9 <- subset(df9, is.na(death_pen) == FALSE)

# Removing missing values
df9_trim <- subset(df9, relig_import > 0 & relig_import < 3 & death_pen >= 1 & Education > 0 & Age != 0)

# Recoding values 
df9_trim <- df9_trim %>% mutate(relig_cat = case_match(relig_import, 
                                           1 ~ 'Important', 
                                           2 ~ 'Not Important'))

df9_trim <- df9_trim %>% mutate(death_cat = case_match(death_pen, 
                                                       c(1, 2) ~ 'Favor', 
                                                       c(4, 5) ~ 'Oppose'))

df9_trim$educ_level <- df9_trim$Education %>% case_match(
  c(1, 2, 3) ~ 'No College Degree', 
  c(4) ~ 'Has College Degree'
)

# Factorizing death penalty variable
df9_trim$death_cat <- as.factor(df9_trim$death_cat)

# Generation frequency table (numbers were then inputted in table below)
freq <- table(df9_trim$relig_cat, df9_trim$death_cat)
prop.table(freq,1)

Breakdown of Death Penalty Stance by Religious and Non-Religious Americans (1988 - 2020, excluding 2002)

View of Religion In Favor of Death Penalty (%) Oppose Death Penalty (%)
Important 70.68 29.32
Not Important 68.20 31.80

Although I was expecting religious individuals to be more in favor of the death penalty than their non-religious peers, I did not expect such a small difference between the two groups. Recognizing that the aggregation across years does not account for longitudinal variation, I decided to visualize this relationship by year.

Data Visualization Preparation Syntax
Code
# Generating count by year 
df9_long <- group_by(df9_trim, year) %>% count(death_cat, relig_cat)

# Preparing for visualization
df9_long <- df9_long %>% spread(key = c(death_cat), value = n)

# Calculating total n
df9_long['total'] <- df9_long$Favor + df9_long$Oppose

# Calculating proportion
df9_long['favor_perc'] <- df9_long$Favor / df9_long$total * 100
Data Visualization Syntax
Code
ggplot(data = df9_long, aes(x = year, y = favor_perc, color = relig_cat, group = relig_cat)) +
  geom_line(lwd = 1) +
  geom_point() +
  scale_color_manual(values = c('steelblue', 'maroon')) +
  theme(plot.caption = element_text(hjust=0)) +
  labs(x = "Year", 
       y = "Percentage in Favor of Death Penalty", 
       title = "Favor of Death Penalty Between Religious and Non-Religious \nAmericans over time", 
       subtitle = "1988 - 2020 (excluding 2002)", 
       color = 'Importance of Religion in Life',
       caption = "Source: \nAmerican National Election Studies. 2022.
ANES Time Series Cumulative Data File
[dataset and documentation]. September 16, 2022 version. \nwww.electionstudies.org)") + scale_fill_discrete(name = 'Importance of Religion in Life')

Interpretation

Across all time points, there was a gap in the percentage of individuals in favor of the death penalty between religious and non-religious Americans. Prior to 2016, non-religious Americans were consistently slightly more in favor of the death penalty than their religious peers. Yet, in 2016, the pattern switched with religious Americans being more in favor of the death penalty than their non-religious counterparts. In fact, between 2012 and 2020, the percentage of non-religious Americans in favor of the death penalty plummeted by 21.09 percent. Additionally, the percentage of religious Americans in favor of the death penalty only declined by 14.52 percent between 1988 and 2020. Contrarily, those non-religious Americans in favor of the death penalty starkly declined from 84.71 percent in 1988 to only 53.06 percent in 2020 - a difference of nearly 32 percent! While investigating the “why” behind this shift is beyond the scope of this analysis, I would recommend future analyses investigate other events (e.g., highly publicized death penalty cases in 2016, regional laws passed in 2016) that may have influenced such a drastic shift in stance on the death penalty among this group.

Death Penalty Stance, Religious Importance, and Education Level Among 22-30 year-olds

Recognizing that the importance of religion in one’s life is not the only possible determinant of a person’s stance on the death penalty, I decided to take a closer look at a specific age group’s (i.e., under age 30) death penalty stance over time in relation to their education level. The logic for this subset is that young adults have traditionally challenged the status quo in the U.S. - as demonstrated through various protests and movements (e.g., anti-war, right-to-abortion) over the years. From the last exploration, I assumed that the percentage of young in adults in favor of the death penalty would decline from 1988 to 2020 across all groups. I anticipated that religious young adults with college educations would be less in favor of the death penalty than the general population and their peers without college degrees across most time points until 2016. However, after 2016, I predict that non-religious young adults with college degrees will have the lowest percentage of young adults in favor of the death penalty.

To assess this trend, I first subsetted the data to include only respondents between the ages of 22 and 30 with valid responses to the variables of interest (i.e., death penalty stance, education level, age, religious importance). From there, I counted the number of respondents in favor of the death penalty by year for each of the 4 categories (e.g., religious & no degree, nonreligious & degree, etc.). This allowed me to then calculate the proportion and then percentage of respondents in favor of the death penalty for each group. I then plotted this in the time-series plot below:

Data Preparation Syntax
Code
# Subsetting for specified age range
df9_trim <- subset(df9_trim, Age <= 30 & Age >= 22)

# Generating count by year 
df9_ed <- group_by(df9_trim, year) %>% count(death_cat, relig_cat, educ_level)

# Preparing for visualization
df9_ed <- df9_ed %>% spread(key = c(death_cat), value = n)

# Replacing NAs with 0s
df9_ed$Favor <- replace_na(df9_ed$Favor, 0)
df9_ed$Oppose <- replace_na(df9_ed$Oppose, 0)

# Calculating total n
df9_ed['total'] <- df9_ed$Favor + df9_ed$Oppose

# Calculating proportion
df9_ed['favor_perc'] <- df9_ed$Favor / df9_ed$total * 100
Data Visualization Syntax
Code
ggplot(data = df9_ed, aes(x = year, y = favor_perc, color = relig_cat, linetype = educ_level)) +
  geom_line(lwd = 1) +
  geom_point() +
  scale_color_manual(values = c('maroon', 'steelblue')) +
  theme(plot.caption = element_text(hjust=0)) +
  labs(x = "Year", 
       y = "Percentage in Favor of Death Penalty", 
       title = "Favor of Death Penalty Between Religious and Non-Religious \nAmericans Under 30 by Educational Attainment", 
       subtitle = "1988 - 2020 (excluding 2002)", 
       color = 'Importance of Religion in Life',
       caption = "Source: \nAmerican National Election Studies. 2022.
ANES Time Series Cumulative Data File
[dataset and documentation]. September 16, 2022 version. \nwww.electionstudies.org)") + scale_fill_discrete(name = 'Importance of Religion in Life') +
  scale_linetype_discrete(name = 'Education Level')

Interpretation

While there was minimal difference over time in the percentage of young adults (between 22 and 30) in favor of the death penalty between religious Americans with and without college degrees, this was not the case for non-religious young adults. Specifically, while non-religious young adults without degrees followed the same pattern as both groups of their religious counterparts, their peers with degrees experienced more erratic changes throughout the years. In 1988, non-religious, college-educated Americans were most in favor of the death penalty (93.3 percent) compared to the other groups. Then, between 1988 and 1998, this group aligned more with the other groups. However, between 1998 and 2008, the percentage of non-religious, college-educated young adults in favor of the death penalty plummeted from 86.67 percent to only 48 percent. While other groups favor for the death penalty also declined during this period, the change was not nearly as significant. Additionally, in 2012, all groups except for religious, college-educated young adults (who stayed about the same) experienced a spike in favor for the death penalty. This spike then tapered off between 2012 and 2020, most rapidly within the non-religious, college-educated group. In fact, despite starting as the group most in favor of the death penalty in 1988, non-religious, college-educated young adults became the group least in favor of the death penalty in 2020 at 34.55 percent. Future explorations should investigate the social conditions (e.g., publicized death penalty cases, other social movements) that may have sparked the shift against the death penalty for non-religious, college-educated young adults in 1998 - as well as what caused favor to be regained in 2012.

2012-2020: Baby Boomers (1943-1958) vs. Millennials & Gen Z (1991-present) Death Penalty Stance by Gender & Religious Importance

In my final exploration, I wanted to see if there were generational differences in favor of the death penalty, particularly between the baby boomer generation (born 1943-1958) and millennials and generation Z (born 1991-present) by gender and the importance of religion in respondents’ lives. Based on prior analyses, I expect millennial or gen Z female, non-religious respondents to be the most opposed to the death penalty and religious, baby boomer male respondents to be the most in favor of the death penalty.

To start, I extracted data from the years for which data was available for both cohorts, namely the 2012-2020 cohorts. From there, I ensured that only valid, non-missing data was present for variables of interest and subsetted further by cohort. Using these subsets, I was then able to create the pivot tables. Of note, the death penalty stance scale ranges from 1 to 4, with lower ratings indicating stronger favor for the death penalty and higher ratings indicating opposition to the death penalty. All values presented are averaged by the specified group within a given year.

Data Preparation Syntax
Code
# Filtering for specific year and valid responses 
anes90 = anes2[['year', 'gender', 'age', 'death_pen', 'relig_import', 'cohort']]
anes_boom = anes90[(anes90.year >= 2012) & (anes90.gender < 3) & (anes90.death_pen >= 1) & (anes90.relig_import <8) & (anes90.relig_import != 0) & (anes90.age >= 17) & (anes90.cohort == 4) & (anes90.gender >=1) & (anes90.gender <= 2)]

# Creating mapping for death penalty and gender categories 
deathpen_map = {1 : 1, 2 : 2, 4 : 3, 5: 4}
gender_map = {1: 'Male', 2: 'Female'}
religion_mapping = {1 : 'Important', 2 : 'Not important'}

# Assigning mapping
anes_boom = anes_boom.assign(gender_rc = anes_boom.gender.map(gender_map))
anes_boom = anes_boom.assign(deathpen_rc = anes_boom.death_pen.map(deathpen_map))
anes_boom = anes_boom.assign(religion_rc = anes_boom.relig_import.map(religion_mapping))

Born 1943-1958

Code
np.round(anes_boom.pivot_table(index = ['year','gender_rc'], columns = ['religion_rc'], values = 'deathpen_rc', 
aggfunc={'deathpen_rc':'mean'}, margins = True),2)
religion_rc     Important  Not important   All
year gender_rc                                
2012 Female          2.01           1.86  1.98
     Male            1.83           1.76  1.81
2016 Female          1.96           2.17  2.01
     Male            1.81           1.82  1.82
2020 Female          2.09           2.61  2.21
     Male            1.92           2.22  2.00
All                  1.96           2.07  1.99
Data Preparation Syntax
Code
anes_mil = anes90[(anes90.death_pen >= 1) & (anes90.age >= 17) & (anes90.relig_import <8) & (anes90.relig_import != 0) & (anes90.cohort == 1) & (anes90.gender >=1) & (anes90.gender <= 2)]

# Creating mapping for death penalty and gender categories 
deathpen_map = {1 : 1, 2 : 2, 4 : 3, 5: 4}
gender_map = {1: 'Male', 2: 'Female'}
religion_mapping = {1 : 'Important', 2 : 'Not important'}

# Assigning mapping
anes_mil = anes_mil.assign(gender_rc = anes_mil.gender.map(gender_map))
anes_mil = anes_mil.assign(deathpen_rc = anes_mil.death_pen.map(deathpen_map))
anes_mil = anes_mil.assign(religion_rc = anes_mil.relig_import.map(religion_mapping))

Born 1991-2003

Code
np.round(anes_mil.pivot_table(index = ['year', 'gender_rc'], columns = 'religion_rc', values = 'deathpen_rc', 
aggfunc={'deathpen_rc':'mean'}, margins = True),2)
religion_rc     Important  Not important   All
year gender_rc                                
2012 Female          2.11           1.98  2.06
     Male            2.28           1.95  2.16
2016 Female          1.94           2.46  2.17
     Male            1.92           2.13  2.02
2020 Female          2.44           2.63  2.53
     Male            2.11           2.45  2.29
All                  2.19           2.41  2.30
Interpretation

As expected, female, non-religious millennial or gen Z respondents were the most opposed to the death penalty in most years, except for 2012 where their religious peers were the most opposed. Similarly, male, religious baby boomer respondents were the most in favor of the death penalty across all groups, except for in 2012. Across both cohorts and degrees of religious importance, male respondents were more in favor of the death penalty than their female peers. Additionally, consistent with prior analyses, after 2016 non-religious respondents across both generations and genders were consistently more opposed to the death penalty than they’re religious peers. Interestingly, while the younger generation showed gradual, increasing opposition to the death penalty across both degrees of religious importance, the older generation did not exhibit the same pattern. Specifically, religious older adults’ stance of the death penalty hardly changed between 2012 and 2020 across both males (+0.09 points) and females (+0.08). However, among non-religious older adults, opposition to the death penalty increased between 2012 and 2020 among both males (+0.46) and females (+0.75) - differences more on-par with those of the younger generation.