Code
# Loading Tidyverse
library(tidyverse)
# Reading .csv
anes <- read_csv('anes_timeseries_cdf_csv_20220916.csv')This report explores data from the American National Election Studies (ANES), a national research study examining Americans’ voting patterns, public opinion, and political participation with efforts supported by Stanford University, University of Michigan, and the National Science Foundation, from inception in 1948 to the most recently published wave in 2020. Part I of this investigation follows a series of questions outlined in the final assignment of Dr. Collin Paschall’s Programming and Data Management course at Johns Hopkins University. Part II of the report explores a series of research questions exploring religious importance and stances on the death penalty and abortion across different demographics.
For the purposes of this analysis, respondents who reported not knowing the answer or who skipped specific items entirely were excluded. Given that such deletion may disproportionately represent some sectors of the population over others, it cannot be said that the results of these analyses are applicable to the entire sample. However, they do offer a unique insight into the state of American politics through the eyes of citizens in the last half century.
Finally, it is worth noting that the programming language used throughout the report oscillates between R and Python - with each question being clearly labelled as such. The following packages or libraries were used within each language: R (i.e., Tidyverse) and Python (NumPy & Pandas).
Questions regarding syntax or other contents of this report may be directed to the author, Kimberly Ouimette, at kouimet1@jh.edu.
Data Source:
American National Election Studies. 2022. ANES Time Series Cumulative Data File [dataset and documentation]. September 16, 2022 version. www.electionstudies.org
# Loading Tidyverse
library(tidyverse)
# Reading .csv
anes <- read_csv('anes_timeseries_cdf_csv_20220916.csv')R: Create a tibble that shows how many respondents are in each wave of the survey.
# Renaming 'year' variable to be more identifiable
anes <- anes %>% rename('year' = 'VCF0004')anes %>%
count(year) %>%
group_by(year)| year | n |
|---|---|
| 1948 | 662 |
| 1952 | 1899 |
| 1954 | 1139 |
| 1956 | 1762 |
| 1958 | 1450 |
| 1960 | 1181 |
| 1962 | 1297 |
| 1964 | 1571 |
| 1966 | 1291 |
| 1968 | 1557 |
| 1970 | 1507 |
| 1972 | 2705 |
| 1974 | 1575 |
| 1976 | 2248 |
| 1978 | 2304 |
| 1980 | 1614 |
| 1982 | 1418 |
| 1984 | 2257 |
| 1986 | 2176 |
| 1988 | 2040 |
| 1990 | 1980 |
| 1992 | 2485 |
| 1994 | 1795 |
| 1996 | 1714 |
| 1998 | 1281 |
| 2000 | 1807 |
| 2002 | 1511 |
| 2004 | 1212 |
| 2008 | 2322 |
| 2012 | 5914 |
| 2016 | 4270 |
| 2020 | 8280 |
Python: How are survey respondents distributed across the major geographic regions of the U.S. in the 1996 wave of the survey? (i.e., how many respondents per region)
import pandas as pd
import numpy as np
anes = pd.read_csv('anes_timeseries_cdf_csv_20220916.csv')anes2 = anes.apply(pd.to_numeric,errors="coerce")
anes2 = anes2.rename(columns = {'VCF0004' : 'year', 'VCF0112' : 'region'})# Subsetting dataset to include only 1996 wave
anes_96 = anes2[anes2.year == 1996]
# Creating mapping
region_mapping = {1 : 'Northeast', 2 : 'North Central', 3: 'South', 4: 'West', 0 : 'NA'}
anes_96 = anes_96.assign(Region = anes_96.region.map(region_mapping))
# Creating table
pd.crosstab(index=anes_96['Region'], columns = 'Number of Respondents')col_0 Number of Respondents
Region
North Central 458
Northeast 260
South 642
West 354
R: Considering the 2008 wave and subsequent waves, what percent of these interviews in each wave were partially or entirely translated to Spanish? Don’t forget to account for both pre- and post-election interviews (ANES surveys include pre- or post-election interviews).
# Subsetting by 2008 onward
anes_2008 <- subset(anes, year >= 2008)
# Selecting only relevant variables
anes_spanish <- anes_2008 %>% select(year, VCF0018a, VCF0018b)
anes_spanish <- rename(anes_spanish, pre_lang = VCF0018a, post_lang = VCF0018b)
anes_span2 <- drop_na(anes_spanish)
# Recoding values for pre-interviews, 1 = Spanish, 0 = Not Spanish
anes_span2$pre_lang <- anes_span2$pre_lang %>% case_match(
c(0, 3, 5, 7) ~ 0,
c(1) ~ 1,
c(9) ~ -999
)
# Recoding values for post-interviews, 1 = Spanish, 0 = Not Spanish
anes_span2$post_lang <- anes_span2$post_lang %>% case_match(
c(0, 3, 5, 7) ~ 0,
c(1, 4) ~ 1,
c(9) ~ -999
)
# Subsetting for only valid responses
spanish <- subset(anes_span2, pre_lang != -999 & post_lang != -999)spanish %>%
group_by('Year' = year) %>%
summarize('% of Interviews in Spanish' = sum(pre_lang, post_lang)/(2 * nrow(spanish)) * 100) %>%
round(digits = 2)| Year | % of Interviews in Spanish |
|---|---|
| 2008 | 0.47 |
| 2012 | 0.95 |
| 2016 | 0.27 |
| 2020 | 0.32 |
Python: One of the questions on this survey has the interviewer read a list of words and phrases that people use to describe political figures. Then, the interviewer asks the interviewee to think about a given political figure, and the interviewer asks whether a given phrase describes that political figure extremely well, quite well, not too well or not well at all.
So, for example, the interviewer might say, “Think about Ronald Reagan. In your opinion, does the phrase or word ‘intelligent’ describe Ronald Reagan extremely well, quite well, not too well, or not well at all?”
Based on the survey data between 1980 and 2008, which president did women under the age of 40 think was the most knowledgeable? Which president was the least knowledgeable in the eyes of this group? You can average across all surveys during a president’s term. Some presidents will be included in more waves than others - that’s fine, use the average regardless of the number of terms.
# Subsetting to pull waves 1980 - 2008
anes1980 = anes2[(anes2.year >= 1980) & (anes2.year <= 2008)]
# Renaming variables to be more meaningful
anes1980 = anes1980.rename(columns = {'VCF0342' : 'knowledge', 'VCF0101' : 'age', 'VCF0104' : 'gender'})
# Subsetting data for only valid knowledge responses
anes1980_nm = anes1980[anes1980.knowledge.isin([1, 2, 3, 4])]
# Creating new, smaller data frame with variables of interest
df = anes1980_nm[['year', 'age', 'gender', 'knowledge']]
# Filtering data frame by women younger than 40
df_filtered = df[(df.age < 40) & (df.gender == 2)]
# Mapping years onto presidents
presidents = {1980: 'Jimmy Carter', 1982: 'George Bush Sr.', 1984: 'George Bush Sr.', 1986: 'George Bush Sr.', 1988: 'George Bush Sr.',
1994: 'Bill Clinton', 1996: 'Bill Clinton', 1998: 'Bill Clinton', 2000: 'Bill Clinton', 2002: 'George W. Bush', 2004: 'George W. Bush', 2008: 'George W. Bush'}
# Assigning President to respective years based on codebook
pres_rating = df_filtered.assign(President = df_filtered.year.map(presidents))pres_rating.groupby('President')['knowledge'].mean().sort_values().round(2)President
Bill Clinton 1.99
George Bush Sr. 2.08
Jimmy Carter 2.08
George W. Bush 2.52
Name: knowledge, dtype: float64
R: These days, the evidence suggests that higher levels of education are associated with more liberal political attitudes, as measured on a traditional seven-point ideology scale. Track this pattern over time. Use respondents from 1980, 1992, 2000, and 2020. What is the average political ideology of survey respondents with a college degree or greater vs. the political ideology of respondents without a college degree? (Note: some college doesn’t count) In addition, repeat this, but compare how this breaks down on along racial lines. Is the pattern the same for whites and non-whites?
# Subsetting by select years
anes1980 <- subset(anes, year == c(1980, 1992, 2000, 2020))
# Creating smaller data frame, renaming variables to more meaningful names
df5 <- anes1980 %>% select(year, 'Education' = VCF0110, 'Ideology' = VCF0803, 'Race' = VCF0106)
# Subsetting to only include observations with valid entries
df5 <- subset(df5, df5$Education > 0)
df5 <- subset(df5, df5$Ideology < 9 & df5$Ideology != 0)
# Creating categorical variable indicated whether a respondent has a college degree
df5$educ_level <- df5$Education %>% case_match(
c(1, 2, 3) ~ 'No College Degree',
c(4) ~ 'Has College Degree'
)
# Creating categorical variable indicating respondents' race
df5$race_cat <- df5$Race %>% case_match(
c(1) ~ 'White',
c(2, 3) ~ 'Not White',
c(0, 9) ~ 'Missing'
)
df5 <- subset(df5, race_cat != 'Missing')
df5 <- drop_na(df5)edlevel <- df5 %>% group_by('Education Level' = educ_level, year) %>%
summarize_at(vars(Ideology), list('Average Ideology' = mean)) %>% arrange(year)
spread(edlevel, key = year, value = 'Average Ideology')| Education Level | 1980 | 1992 | 2000 | 2020 |
|---|---|---|---|---|
| Has College Degree | 4.229167 | 4.075000 | 4.291139 | 3.829861 |
| No College Degree | 4.373134 | 4.191083 | 4.310000 | 4.255517 |
df5_table <- df5 %>% group_by('Race' = race_cat, 'Education Level' = educ_level, year) %>%
summarize_at(vars(Ideology), list('Average Ideology' = mean)) %>% arrange(year)
(df5_wide <- spread(df5_table, key = year, value = 'Average Ideology') %>% arrange('Education Level')) | Race | Education Level | 1980 | 1992 | 2000 | 2020 |
|---|---|---|---|---|---|
| Not White | Has College Degree | 4.500000 | 4.833333 | 4.357143 | 3.535353 |
| Not White | No College Degree | 4.125000 | 3.907895 | 4.095238 | 3.803419 |
| White | Has College Degree | 4.204546 | 3.990741 | 4.276923 | 3.917417 |
| White | No College Degree | 4.420118 | 4.281513 | 4.367089 | 4.424242 |
I then visualized this time series using the R ggplot package:
# Subsetting data into groups
# White - No Degree
w_nodegree <- gather(df5_wide[4, 3:6])
w_nodegree <- rename(w_nodegree, w_nodegree.ideology = value)
# White - Has Degree
w_degree <- gather(df5_wide[3, 3:6])
w_degree <- rename(w_degree, w_degree.ideology = value)
# Not White - No Degree
nw_nodegree <- gather(df5_wide[2, 3:6])
nw_nodegree <- rename(nw_nodegree, nw_nodegree.ideology = value)
# Not White - Has Degree
nw_degree <- gather(df5_wide[1, 3:6])
nw_degree <- rename(nw_degree, nw_degree.ideology = value)
# Merging into one long dataset
white <- merge(w_nodegree, w_degree, by = 'key')
nwhite <- merge(nw_degree, nw_nodegree, by = 'key')
fig <- merge(white, nwhite, by = 'key')
fig <- rename(fig, 'year' = 'key')
# Merging white and not white datasets into one long dataset,
fig <- fig %>% select(year, w_nodegree.ideology, w_degree.ideology, nw_degree.ideology, nw_nodegree.ideology) %>%
gather(key = 'variable', value = 'value', -year) %>% arrange(year)
fig$variable <- factor(fig$variable)
# Recoding into grouping variables
fig <- fig %>% mutate(degree = case_match(
variable,
c('w_degree.ideology', 'nw_degree.ideology') ~ 'Has Degree',
c('w_nodegree.ideology', 'nw_nodegree.ideology') ~ 'No Degree'
))
fig <- fig %>% mutate(race = case_match(
variable,
c('w_degree.ideology', 'w_nodegree.ideology') ~ 'White',
c('nw_degree.ideology', 'nw_nodegree.ideology') ~ 'Not White'
))Python: Several questions on this survey are related to social trust. I’m talking here about questions VCF0619-VCF0621. Let’s just look at the 2004 survey responses.
Construct a scale that adds together the responses to these three questions so that higher values indicate greater social trust. Set the scale so it runs from zero (the minimum amount of trust).
Now, consider how this scale relates to respondents’ partisan identity (strong Democrat to strong Republican). Do you see any evidence that greater social trust is associated with partisan identity?
anes2004 = anes2[(anes2.year == 2004)]
anes2004 = anes2004.rename(columns = {'VCF0619' : 'careful', 'VCF0620' : 'helpful', 'VCF0621' : 'fair', 'VCF0803' : 'ideology'})
# Recoding values
scale_map = {1:0, 2:1}
anes2004 = anes2004.assign(careful_rc = anes2004.careful.map(scale_map))
anes2004 = anes2004.assign(helpful_rc = anes2004.helpful.map(scale_map))
anes2004 = anes2004.assign(fair_rc = anes2004.fair.map(scale_map))
# Subsetting for needed variables
anes2004 = anes2004[['careful_rc', 'helpful_rc', 'fair_rc', 'ideology'
]]
# Removing missing values
anes2004 = anes2004.dropna()
# Creating social trust scale
anes2004['trust'] = anes2004['careful_rc'] + anes2004['helpful_rc'] + anes2004['fair_rc']
# Filtering for valid ideology
anes2004 = anes2004.loc[(anes2004['ideology']<9) & (anes2004['ideology'] != 0)]
# Selecting columns
df6 = anes2004[['trust', 'ideology']]df6.head(10) trust ideology
46227 2.0 4.0
46228 3.0 6.0
46230 1.0 6.0
46231 0.0 6.0
46232 3.0 6.0
46233 3.0 4.0
46235 0.0 6.0
46236 1.0 4.0
46238 0.0 4.0
46239 2.0 4.0
corr = np.corrcoef(df6['trust'], df6['ideology'])
corr[1, 0].round(3)-0.001
R: A common type of question on political surveys is the “feeling thermometer” where respondents are asked how warm/cold they feel about certain topics or political groups or figures. It is widely believed that political polarization today is worse than in the past - Republicans have more negative feelings about Democrats today than they did in years past, and vice versa.
Use these survey data to assess this claim.
polar <- anes %>% select(year, 'dem_feels'= VCF0218, 'rep_feels'= VCF0224, 'party' = VCF0302)
# Dropping missing values
polar <- drop_na(polar)
# Excluding missing data & don't knows/refusals
polar_trim <- subset(polar, dem_feels < 98 & rep_feels < 98 & party < 8)
# Recoding values to be more meaningful
polar_trim$party_cat <- polar_trim$party %>% case_match(
1 ~ 'Republican',
2 ~ 'Independent',
3 ~ 'No preference',
4 ~ 'Other',
5 ~ 'Democrat')
# Subsetting
polar_trim <- subset(polar_trim, party_cat == c('Democrat', 'Republican'))
polar_dem <- subset(polar_trim, party_cat == 'Democrat')
polar_rep <- subset(polar_trim, party_cat == 'Republican')
# Computing mean feelings toward opposite party by year
avg_demfeelsofreps <- polar_dem %>% group_by(year) %>%
summarize(mean_repfeels = mean(rep_feels))
avg_repfeelsofdems <- polar_rep %>% group_by(year) %>%
summarize(mean_demfeels = mean(dem_feels))ggplot() +
geom_line(data = avg_demfeelsofreps, aes(x = year, y = mean_repfeels, color = "Democrats' feelings of Republicans"), lwd = 1) +
geom_line(data = avg_repfeelsofdems, aes(x = year, y = mean_demfeels, color = "Republicans' feelings of Democrats"), lwd = 1) +
scale_color_manual(values = c("Democrats' feelings of Republicans" = 'blue', "Republicans' feelings of Democrats" = 'red')) +
theme(plot.caption = element_text(hjust=0)) +
labs(x = "Year",
y = "Average Feeling Thermometer Score",
title = "Democrats and Republicans' Feelings Toward Opposite Party",
subtitle = "1978 - 2020",
caption = "Source: \nAmerican National Election Studies. 2022.
ANES Time Series Cumulative Data File
[dataset and documentation]. September 16, 2022 version. \nwww.electionstudies.org)") For the remainder of this project, you will further explore the ANES data and then create and answer some questions that are interesting to you about American politics. Answer four different questions or test four different claims, or conduct some other analysis of your choice, in the style of what you did in Part 1. In each question, introduce at least one new variable into your analysis.
To start my exploration, I wanted to see how the importance of religion in Americans’ lives changed over time - specifically, broken down by gender. To do this, I selected the following variables: year of study (VCF0004), gender of respondent (VCF0104), and whether religion was important to the respondent (VCF0846). For the purposes of this analysis, I only included respondents who identified as male or female as the third category (i.e., ‘other’) was only recently introduced in 2016 and therefore there was not enough data available for longitudinal comparison. Similarly, I only included valid responses to the religious importance question (i.e., non-missing values).
The ANES assessed the degree of importance of religion in respondents’ lives by asking “Do you consider religion to be an important part of your life, or not?” from the 1980 wave onward. Respondents were then lumped into two categories indicating whether they find religion to be an important part of their life or of little to no importance.
To compare the importance of religion in the lives of American men and women, I split the data into two data frames - one for males and females respectively. From there, I computed the percentage of men and women in each wave between 1980 and 2020 who found religion to be an important or unimportant part of their lives.
anes2 = anes2.rename(columns = {'VCF0004': 'year', 'VCF0104' : 'gender', 'VCF0846' : 'relig_import', 'VCF0230' : 'abortion', 'VCF9237' : 'death_pen', 'VCF0101' : 'age', 'VCF0503' : 'ideology', 'VCF0103' : 'cohort'})
df8 = anes2[['year', 'gender', 'relig_import']]
df8_filtered = df8[(df8.relig_import > 0) & (df8.relig_import < 3) & (df8.gender <3) & (df8.gender > 0)]
# Recoding to more meaningful labels
religion_mapping = {1 : 'Important', 2 : 'Not important'}
df8_filtered = df8_filtered.assign(religion_rc = df8_filtered.relig_import.map(religion_mapping))
gender_mapping = {1 : 'Male', 2 : 'Female'}
df8_filtered = df8_filtered.assign(gender_rc = df8_filtered.gender.map(gender_mapping))
# Subsetting by gender
df8_fem = df8_filtered[(df8_filtered.gender == 2)]
df8_male = df8_filtered[(df8_filtered.gender == 1)]pd.crosstab(df8_fem['year'], df8_fem['religion_rc'], normalize = 'index').applymap(lambda x: "{0:.0f}%".format(100*x))religion_rc Important Not important
year
1980 82% 18%
1984 86% 14%
1986 84% 16%
1988 83% 17%
1990 85% 15%
1992 84% 16%
1994 83% 17%
1996 83% 17%
1998 81% 19%
2000 81% 19%
2002 83% 17%
2004 82% 18%
2008 79% 21%
2012 75% 25%
2016 70% 30%
2020 70% 30%
pd.crosstab(df8_male['year'], df8_male['religion_rc'], normalize = 'index').applymap(lambda x: "{0:.0f}%".format(100*x))religion_rc Important Not important
year
1980 67% 33%
1984 71% 29%
1986 71% 29%
1988 71% 29%
1990 72% 28%
1992 72% 28%
1994 71% 29%
1996 72% 28%
1998 70% 30%
2000 70% 30%
2002 69% 31%
2004 72% 28%
2008 71% 29%
2012 64% 36%
2016 61% 39%
2020 62% 38%
Seeing the changes in religious importance over time, I wanted to explore the relationship between Americans’ views of religion and their stance on the death penalty for persons convicted of murder. To start this process, I used the same variable assessing whether religion was important to the respondent (VCF0846) and also included respondents’ answers to the question “Do you favor or oppose the death penalty for persons convicted of murder?” (VCF9237). To start, I extracted data from waves in which all data points were available (1988-2020, excluding 2002) and removed any missing or indecisive responses. It is worth noting that any data rows with missing data on the following variables was removed: VCF9237, VCF0846, VCF0110, and VCF0101.
From there, I first wanted to get a sense of the breakdown of religious and non-religious Americans’ stance on the death penalty across all time points (see table below).
df9 <- anes %>% select(year, 'death_pen' = VCF9237, 'relig_import' = VCF0846, 'Education' = VCF0110, 'Age' = VCF0101)
# Selecting years where all three data points are available
df9 <- subset(df9, year >=1988)
# Discovering what year(s) have missing death penalty data
df9 <- subset(df9, is.na(death_pen) == FALSE)
# Removing missing values
df9_trim <- subset(df9, relig_import > 0 & relig_import < 3 & death_pen >= 1 & Education > 0 & Age != 0)
# Recoding values
df9_trim <- df9_trim %>% mutate(relig_cat = case_match(relig_import,
1 ~ 'Important',
2 ~ 'Not Important'))
df9_trim <- df9_trim %>% mutate(death_cat = case_match(death_pen,
c(1, 2) ~ 'Favor',
c(4, 5) ~ 'Oppose'))
df9_trim$educ_level <- df9_trim$Education %>% case_match(
c(1, 2, 3) ~ 'No College Degree',
c(4) ~ 'Has College Degree'
)
# Factorizing death penalty variable
df9_trim$death_cat <- as.factor(df9_trim$death_cat)
# Generation frequency table (numbers were then inputted in table below)
freq <- table(df9_trim$relig_cat, df9_trim$death_cat)
prop.table(freq,1)| View of Religion | In Favor of Death Penalty (%) | Oppose Death Penalty (%) |
|---|---|---|
| Important | 70.68 | 29.32 |
| Not Important | 68.20 | 31.80 |
Although I was expecting religious individuals to be more in favor of the death penalty than their non-religious peers, I did not expect such a small difference between the two groups. Recognizing that the aggregation across years does not account for longitudinal variation, I decided to visualize this relationship by year.
# Generating count by year
df9_long <- group_by(df9_trim, year) %>% count(death_cat, relig_cat)
# Preparing for visualization
df9_long <- df9_long %>% spread(key = c(death_cat), value = n)
# Calculating total n
df9_long['total'] <- df9_long$Favor + df9_long$Oppose
# Calculating proportion
df9_long['favor_perc'] <- df9_long$Favor / df9_long$total * 100ggplot(data = df9_long, aes(x = year, y = favor_perc, color = relig_cat, group = relig_cat)) +
geom_line(lwd = 1) +
geom_point() +
scale_color_manual(values = c('steelblue', 'maroon')) +
theme(plot.caption = element_text(hjust=0)) +
labs(x = "Year",
y = "Percentage in Favor of Death Penalty",
title = "Favor of Death Penalty Between Religious and Non-Religious \nAmericans over time",
subtitle = "1988 - 2020 (excluding 2002)",
color = 'Importance of Religion in Life',
caption = "Source: \nAmerican National Election Studies. 2022.
ANES Time Series Cumulative Data File
[dataset and documentation]. September 16, 2022 version. \nwww.electionstudies.org)") + scale_fill_discrete(name = 'Importance of Religion in Life')Recognizing that the importance of religion in one’s life is not the only possible determinant of a person’s stance on the death penalty, I decided to take a closer look at a specific age group’s (i.e., under age 30) death penalty stance over time in relation to their education level. The logic for this subset is that young adults have traditionally challenged the status quo in the U.S. - as demonstrated through various protests and movements (e.g., anti-war, right-to-abortion) over the years. From the last exploration, I assumed that the percentage of young in adults in favor of the death penalty would decline from 1988 to 2020 across all groups. I anticipated that religious young adults with college educations would be less in favor of the death penalty than the general population and their peers without college degrees across most time points until 2016. However, after 2016, I predict that non-religious young adults with college degrees will have the lowest percentage of young adults in favor of the death penalty.
To assess this trend, I first subsetted the data to include only respondents between the ages of 22 and 30 with valid responses to the variables of interest (i.e., death penalty stance, education level, age, religious importance). From there, I counted the number of respondents in favor of the death penalty by year for each of the 4 categories (e.g., religious & no degree, nonreligious & degree, etc.). This allowed me to then calculate the proportion and then percentage of respondents in favor of the death penalty for each group. I then plotted this in the time-series plot below:
# Subsetting for specified age range
df9_trim <- subset(df9_trim, Age <= 30 & Age >= 22)
# Generating count by year
df9_ed <- group_by(df9_trim, year) %>% count(death_cat, relig_cat, educ_level)
# Preparing for visualization
df9_ed <- df9_ed %>% spread(key = c(death_cat), value = n)
# Replacing NAs with 0s
df9_ed$Favor <- replace_na(df9_ed$Favor, 0)
df9_ed$Oppose <- replace_na(df9_ed$Oppose, 0)
# Calculating total n
df9_ed['total'] <- df9_ed$Favor + df9_ed$Oppose
# Calculating proportion
df9_ed['favor_perc'] <- df9_ed$Favor / df9_ed$total * 100ggplot(data = df9_ed, aes(x = year, y = favor_perc, color = relig_cat, linetype = educ_level)) +
geom_line(lwd = 1) +
geom_point() +
scale_color_manual(values = c('maroon', 'steelblue')) +
theme(plot.caption = element_text(hjust=0)) +
labs(x = "Year",
y = "Percentage in Favor of Death Penalty",
title = "Favor of Death Penalty Between Religious and Non-Religious \nAmericans Under 30 by Educational Attainment",
subtitle = "1988 - 2020 (excluding 2002)",
color = 'Importance of Religion in Life',
caption = "Source: \nAmerican National Election Studies. 2022.
ANES Time Series Cumulative Data File
[dataset and documentation]. September 16, 2022 version. \nwww.electionstudies.org)") + scale_fill_discrete(name = 'Importance of Religion in Life') +
scale_linetype_discrete(name = 'Education Level')In my final exploration, I wanted to see if there were generational differences in favor of the death penalty, particularly between the baby boomer generation (born 1943-1958) and millennials and generation Z (born 1991-present) by gender and the importance of religion in respondents’ lives. Based on prior analyses, I expect millennial or gen Z female, non-religious respondents to be the most opposed to the death penalty and religious, baby boomer male respondents to be the most in favor of the death penalty.
To start, I extracted data from the years for which data was available for both cohorts, namely the 2012-2020 cohorts. From there, I ensured that only valid, non-missing data was present for variables of interest and subsetted further by cohort. Using these subsets, I was then able to create the pivot tables. Of note, the death penalty stance scale ranges from 1 to 4, with lower ratings indicating stronger favor for the death penalty and higher ratings indicating opposition to the death penalty. All values presented are averaged by the specified group within a given year.
# Filtering for specific year and valid responses
anes90 = anes2[['year', 'gender', 'age', 'death_pen', 'relig_import', 'cohort']]
anes_boom = anes90[(anes90.year >= 2012) & (anes90.gender < 3) & (anes90.death_pen >= 1) & (anes90.relig_import <8) & (anes90.relig_import != 0) & (anes90.age >= 17) & (anes90.cohort == 4) & (anes90.gender >=1) & (anes90.gender <= 2)]
# Creating mapping for death penalty and gender categories
deathpen_map = {1 : 1, 2 : 2, 4 : 3, 5: 4}
gender_map = {1: 'Male', 2: 'Female'}
religion_mapping = {1 : 'Important', 2 : 'Not important'}
# Assigning mapping
anes_boom = anes_boom.assign(gender_rc = anes_boom.gender.map(gender_map))
anes_boom = anes_boom.assign(deathpen_rc = anes_boom.death_pen.map(deathpen_map))
anes_boom = anes_boom.assign(religion_rc = anes_boom.relig_import.map(religion_mapping))np.round(anes_boom.pivot_table(index = ['year','gender_rc'], columns = ['religion_rc'], values = 'deathpen_rc',
aggfunc={'deathpen_rc':'mean'}, margins = True),2)religion_rc Important Not important All
year gender_rc
2012 Female 2.01 1.86 1.98
Male 1.83 1.76 1.81
2016 Female 1.96 2.17 2.01
Male 1.81 1.82 1.82
2020 Female 2.09 2.61 2.21
Male 1.92 2.22 2.00
All 1.96 2.07 1.99
anes_mil = anes90[(anes90.death_pen >= 1) & (anes90.age >= 17) & (anes90.relig_import <8) & (anes90.relig_import != 0) & (anes90.cohort == 1) & (anes90.gender >=1) & (anes90.gender <= 2)]
# Creating mapping for death penalty and gender categories
deathpen_map = {1 : 1, 2 : 2, 4 : 3, 5: 4}
gender_map = {1: 'Male', 2: 'Female'}
religion_mapping = {1 : 'Important', 2 : 'Not important'}
# Assigning mapping
anes_mil = anes_mil.assign(gender_rc = anes_mil.gender.map(gender_map))
anes_mil = anes_mil.assign(deathpen_rc = anes_mil.death_pen.map(deathpen_map))
anes_mil = anes_mil.assign(religion_rc = anes_mil.relig_import.map(religion_mapping))np.round(anes_mil.pivot_table(index = ['year', 'gender_rc'], columns = 'religion_rc', values = 'deathpen_rc',
aggfunc={'deathpen_rc':'mean'}, margins = True),2)religion_rc Important Not important All
year gender_rc
2012 Female 2.11 1.98 2.06
Male 2.28 1.95 2.16
2016 Female 1.94 2.46 2.17
Male 1.92 2.13 2.02
2020 Female 2.44 2.63 2.53
Male 2.11 2.45 2.29
All 2.19 2.41 2.30