In this class session, we worked with the Cooperative Election Study (CCES) — one of the largest and most widely-used surveys in American political science. The CCES is conducted every two years during election cycles, with sample sizes typically ranging from 50,000 to 60,000 respondents. That large sample size is what makes it so useful for studying smaller subgroups (like specific religious traditions or generations) without losing statistical precision.
Our goal today was straightforward: use the CCES to understand how church attendance has changed over time, and then break that trend down by race and generation.
dplyr) and visualization (ggplot2)frcode(), ct(), and
mean_ci() designed specifically for survey data
analysisThe file path in read_fst() will need to be updated to
wherever you saved the data on your own machine.
The CCES asks respondents how often they attend religious services.
The raw variable (pew_attendance) is coded numerically, but
the scale runs backwards — 1 means “More than once a week” and
6 means “Never.” We wrote a custom function to flip this and apply
human-readable labels:
cces_attend <- function(df, var){
var <- enquo(var)
df %>%
mutate(att = case_when(
!! var == 6 ~ 1,
!! var == 5 ~ 2,
!! var == 4 ~ 3,
!! var == 3 ~ 4,
!! var == 2 ~ 5,
!! var == 1 ~ 6,
TRUE ~ NA_real_)) %>%
mutate(att = frcode(
att == 1 ~ "Never",
att == 2 ~ "Seldom",
att == 3 ~ "Yearly",
att == 4 ~ "Monthly",
att == 5 ~ "Weekly",
att == 6 ~ "Weekly+"))
}A few things to notice here:
case_when() reverses the numeric scale so
that higher numbers now mean more frequent attendancemutate() uses frcode() to
convert those numbers into labeled factor levels. The
frcode() function (from socsci) is similar to
case_when(), but it automatically sets the factor
levels in the order they appear in your code — so our
attendance variable will display in a logical order from “Never” to
“Weekly+”We also defined similar recoding functions for party ID, race, education, and income — all following the same logic of converting raw numeric codes into meaningful factor labels.
Before diving into analysis, it’s always good practice to see what you’re working with:
## year n pct
## 1 2006 36421 0.053
## 2 2008 32800 0.047
## 3 2009 13800 0.020
## 4 2010 55400 0.080
## 5 2011 20150 0.029
## 6 2012 54535 0.079
## 7 2013 16400 0.024
## 8 2014 56200 0.081
## 9 2015 14250 0.021
## 10 2016 64600 0.093
## 11 2017 18200 0.026
## 12 2018 60000 0.087
## 13 2019 18000 0.026
## 14 2020 61000 0.088
## 15 2021 25700 0.037
## 16 2022 60000 0.087
## 17 2023 24500 0.035
## 18 2024 60000 0.087
The ct() function from socsci produces a
quick frequency table with counts and percentages. This tells us which
survey years are present in the cumulative dataset and how many
respondents are in each wave.
Our first visualization compared the full distribution of church attendance across all six categories for just two years — 2008 and 2024 — to see how the picture has changed.
cc1 <- cces %>%
filter(year == 2008 | year == 2012 | year == 2016 | year == 2020 | year == 2024) %>%
cces_attend(pew_attendance) %>%
group_by(year) %>%
ct(att, show_na = FALSE, wt = weight)Let’s break down what each step does:
filter() — restricts the data to only
the five election years we care aboutcces_attend() — applies our recoding
function to create the att variablegroup_by(year) — tells R to perform
calculations separately within each yearct(att, show_na = FALSE, wt = weight)
— counts the frequency of each attendance category, drops missing
values, and applies the survey weight so that our
percentages reflect the actual U.S. population rather than just our raw
sampleThen we visualized this as a stacked horizontal bar chart, with one bar per year:
cc1 %>%
mutate(lab = round(pct, 2)) %>%
ggplot(., aes(x = 1, y = pct, fill = fct_rev(att))) +
geom_col(color = "black") +
coord_flip() +
facet_wrap(~ year, ncol =1, strip.position = "left") +
theme_minimal() +
scale_fill_manual(values = c(
"Never" = "#7B2D8B",
"Seldom" = "#C0659A",
"Yearly" = "#E8A0B4",
"Monthly" = "#A8C8E8",
"Weekly" = "#4A90C4",
"Weekly+" = "#1A4F8A"
)) +
theme(legend.position = "bottom") +
scale_y_continuous(labels = percent) +
theme(strip.text.y.left = element_text(angle=0)) +
guides(fill = guide_legend(reverse=T, nrow = 1)) +
theme(axis.title.y=element_blank(), axis.text.y=element_blank(), axis.ticks.y=element_blank()) +
theme(panel.grid.minor.y=element_blank(), panel.grid.major.y=element_blank()) +
geom_text(aes(label = ifelse(pct >.05, paste0(lab*100, '%'), '')),
position = position_stack(vjust = 0.5), size = 8, color = "black") +
geom_text(aes(label = ifelse(pct >.05 & att == "Never", paste0(lab*100, '%'), '')),
position = position_stack(vjust = 0.5), size = 8, color = "white") +
geom_text(aes(label = ifelse(pct >.05 & att == "Weekly+", paste0(lab*100, '%'), '')),
position = position_stack(vjust = 0.5), size = 8, color = "white") +
theme(strip.text.y.left = element_text(angle = 0, hjust = 1)) +
labs(x = "", y = "", title = "Attendance in 2008 vs 2024") +
theme(legend.title = element_blank())Some design choices worth explaining:
fct_rev(att) reverses the factor order
within the stacked bar so that “Never” appears on the left and “Weekly+”
on the right — a more intuitive layoutfacet_wrap(~ year, strip.position = "left")
creates one bar per year, stacked vertically with the year label on the
left sidecoord_flip() rotates the whole chart
so the bars run horizontallygeom_text() calls: one for all labels in
black, then two overrides in white for the darkest segments (“Never” and
“Weekly+”) so the text remains readable against dark backgroundsRather than looking at all six categories, we next focused on just the “Never” group — arguably the most important indicator of religious disengagement — and tracked it across every year in the dataset.
cc2 <- cces %>%
mutate(never = case_when(pew_attendance == 6 ~ 1,
pew_attendance <= 5 ~ 0)) %>%
group_by(year) %>%
mean_ci(never, wt = weight)Here we create a binary variable
(never) coded 1 for non-attenders and 0 for everyone else.
When you take a weighted mean of a 0/1 variable, you get the
proportion of respondents who are in the “1” category —
in this case, the share who never attend church. The
mean_ci() function also calculates confidence
intervals, which tells us how certain we can be about each
estimate.
cc2 %>%
filter(year > 2006) %>%
ggplot(., aes(x = year, y = mean)) +
geom_line() +
geom_point(shape = 21, fill = "white", color = "goldenrod", size = 3) +
theme_minimal()This simple line chart makes the trend immediately visible — and the trend is striking.
Does the rise in non-attendance look the same across racial groups?
We added a group_by(year, race) to find out:
cces_race <- function(df, var){
var <- enquo(var)
df %>%
mutate(race = frcode(
!! var == 1 ~ "White",
!! var == 2 ~ "Black",
!! var == 3 ~ "Hispanic",
!! var == 4 ~ "Asian",
TRUE ~ "All Others"))
}
cc2 <- cces %>%
cces_race(race) %>%
mutate(never = case_when(pew_attendance == 6 ~ 1,
pew_attendance <= 5 ~ 0)) %>%
group_by(year, race) %>%
mean_ci(never, wt = weight)
cc2 %>%
filter(year > 2006) %>%
filter(race != "All Others") %>%
ggplot(., aes(x = year, y = mean, color = race)) +
geom_line() +
geom_point(shape = 21, fill = "white", size = 3) +
theme_minimal() +
scale_y_continuous(labels = percent, limits = c(0, .42))The cces_race() function works just like
cces_attend() — it recodes the numeric race variable into
labeled categories. We filter out “All Others” to keep the chart focused
on the four largest groups. The key insight from this chart: the
rise in non-attendance is not uniform across racial groups.
Finally, we created a generation variable from the respondent’s birth year and tracked non-attendance separately for each cohort:
gg1 <- cces %>%
mutate(gen = frcode(
birthyr >= 1946 & birthyr <= 1964 ~ "Boomers",
birthyr >= 1965 & birthyr <= 1980 ~ "Gen X",
birthyr >= 1981 & birthyr <= 1996 ~ "Millennials",
birthyr >= 1997 ~ "Gen Z")) %>%
filter(gen != "NA") %>%
mutate(never = case_when(pew_attendance == 6 ~ 1,
pew_attendance <= 5 ~ 0)) %>%
group_by(year, gen) %>%
mean_ci(never, wt = weight)The generational cutoffs we used:
| Generation | Birth Years |
|---|---|
| Baby Boomers | 1946–1964 |
| Gen X | 1965–1980 |
| Millennials | 1981–1996 |
| Gen Z | 1997+ |
We then visualized this two ways — all generations on one chart to compare them directly, and then faceted to see each generation’s trajectory in isolation:
# All on one chart
gg1 %>%
filter(year > 2006) %>%
ggplot(., aes(x = year, y = mean, color = gen)) +
geom_line() +
geom_point(shape = 21, fill = "white", size = 3) +
theme_minimal() +
scale_y_continuous(labels = percent, limits = c(.1, .45))# Faceted by generation
gg1 %>%
filter(year > 2006) %>%
ggplot(., aes(x = year, y = mean, color = gen)) +
geom_line() +
geom_point(shape = 21, fill = "white", size = 3) +
facet_wrap(~ gen) +
theme_minimal() +
scale_y_continuous(labels = percent, limits = c(.1, .45))The faceted version makes it easier to see each generation’s internal trend, while the combined version lets you compare levels across generations directly. Both are useful, and together they tell a richer story.
By the end of class, we had built a complete workflow for working with large survey data in R:
fstfrcode()ct()) and weighted means with confidence intervals
(mean_ci())ggplot2 with
stacked bars, line charts, and facetsThe broader finding from all of this: the rise in religious non-attendance is real, it spans racial groups and generations, but the pace and level differ meaningfully across those groups — which is exactly the kind of nuance that careful data analysis can reveal.
Data: Cooperative Election Study Cumulative Dataset, 2006–2024. All estimates are weighted.