Overview

In this tutorial we’ll use the 2024 General Social Survey (GSS) to explore patterns of religious attendance across race and gender. Along the way you’ll practice core skills: filtering data, recoding variables, computing means, and building visualizations with ggplot2.


1. Load Packages and Data

We start by loading the socsci package, which provides several custom functions we’ll use throughout (ct(), mean_ci(), frcode(), etc.), along with ggplot2 and dplyr (both loaded automatically with socsci).

library(socsci)
library(tidyverse)
library(scales)

error_bar <- function(wd){


 if(missing(wd)){
  
   geom_errorbar(aes(ymin=lower, ymax=upper), width=.2, position=position_dodge(.9))
 } else{
  
   geom_errorbar(aes(ymin=lower, ymax=upper), width=wd, position=position_dodge(.9))
 }
}



gss24 <- read.csv("https://www.dropbox.com/scl/fi/t83kf7w379cwu79f535fq/gss24.csv?rlkey=8t3a6n5csqg4x0zhgsuyfba94&st=zwkoggzs&dl=1")

2. Exploring the Data with ct()

The ct() function gives us a quick frequency table — counts and percentages — for any variable. Let’s start with sex to see the basic gender breakdown in the sample.

gss24 %>%
  ct(sex)
##   sex    n   pct
## 1   1 1467 0.443
## 2   2 1823 0.551
## 3  NA   19 0.006

Notice that sex is coded numerically (1 = Male, 2 = Female). We’ll recode that into labels later when we build our chart.


3. Filtering Data

Sometimes we want to look at a specific subgroup. We can chain filter() calls to narrow the data before tabulating. Here we look at church attendance only among White men.

gss24 %>%
  filter(sex == 1) %>%   # men only
  filter(race == 1) %>%  # white only
  ct(attend, show_na = FALSE)
##   attend   n   pct
## 1      0 378 0.370
## 2      1 117 0.114
## 3      2 123 0.120
## 4      3  99 0.097
## 5      4  42 0.041
## 6      5  44 0.043
## 7      6  54 0.053
## 8      7 127 0.124
## 9      8  39 0.038

The show_na = FALSE argument drops any missing values from the table so we can focus on valid responses.


4. Creating a Binary Variable with mutate() and case_when()

The attend variable has 9 categories (0–8). For many analyses it’s useful to collapse this into a simpler binary: weekly attenders vs. everyone else.

The GSS codes attendance as follows:

Value Label
0 Never
1 Less than once a year
2 About once or twice a year
3 Several times a year
4 About once a month
5 2–3 times a month
6 Nearly every week
7 Every week
8 Several times a week

We’ll define “weekly” as values 6, 7, or 8.

gss24 %>%
  mutate(wk = case_when(
    attend == 6 | attend == 7 | attend == 8 ~ 1,
    attend <= 5 ~ 0
  )) %>%
  mean_ci(wk)
## # A tibble: 1 × 8
##    mean    sd     n n_eff      se lower upper    ci
##   <dbl> <dbl> <int> <int>   <dbl> <dbl> <dbl> <dbl>
## 1 0.247 0.431  3276  3276 0.00753 0.232 0.261  0.95

mean_ci() returns the mean and 95% confidence interval. Because wk is a 0/1 variable, the mean equals the proportion of weekly attenders in the sample.


5. Grouped Analysis by Race and Gender

Now let’s break that same weekly attendance estimate down by both race and sex. We use group_by() before mean_ci() so the calculation happens within each group.

We also use frcode() here — a socsci wrapper around case_when() that automatically turns the result into a factor with levels ordered as they appear in your recoding statements. This is handy for controlling the order of bars in a chart.

gg1 <- gss24 %>%
  mutate(race = frcode(
    race == 1 ~ "White",
    race == 2 ~ "Black",
    race == 3 ~ "Other"
  )) %>%
  mutate(sex = frcode(
    sex == 1 ~ "Men",
    sex == 2 ~ "Women"
  )) %>%
  group_by(sex, race) %>%
  mutate(wk = case_when(
    attend == 6 | attend == 7 | attend == 8 ~ 1,
    attend <= 5 ~ 0
  )) %>%
  mean_ci(wk) %>%
  na.omit()

gg1
## # A tibble: 6 × 10
##   sex   race   mean    sd     n n_eff     se lower upper    ci
##   <fct> <fct> <dbl> <dbl> <int> <int>  <dbl> <dbl> <dbl> <dbl>
## 1 Men   White 0.215 0.411  1023  1023 0.0129 0.190 0.240  0.95
## 2 Men   Black 0.219 0.415   219   219 0.0280 0.164 0.274  0.95
## 3 Men   Other 0.180 0.385   189   189 0.0280 0.125 0.235  0.95
## 4 Women White 0.259 0.438  1232  1232 0.0125 0.234 0.283  0.95
## 5 Women Black 0.300 0.459   343   343 0.0248 0.252 0.349  0.95
## 6 Women Other 0.294 0.457   194   194 0.0328 0.229 0.358  0.95

6. Custom Label Function

Before we plot, we’ll define a helper function called lab_bar() that adds percentage labels to bar charts automatically. This uses tidy evaluation (enquo() and !!) to accept column names as arguments — a more advanced R concept, but the key idea is that it lets us write lab_bar(type = mean) and have it work on whichever column we pass in.

lab_bar <- function(type, pos = 0, sz = 8, above = TRUE) {
  type <- enquo(type)
  geom_text(
    aes(
      y = if (above) !!type + pos else pos,
      label = paste0(round(!!type, 2) * 100, '%')
    ),
    position = position_dodge(width = 0.9),
    size = sz
  )
}

7. Grouped Bar Chart: Weekly Attendance by Race and Gender

Now we’re ready to visualize. We use a dodged bar chart so we can compare men and women within each racial group side by side.

Key elements of this chart:

  • geom_col(position = "dodge") — places bars next to each other rather than stacking
  • error_bar() — adds 95% confidence interval lines (from socsci)
  • lab_bar() — places percentage labels inside the bars
  • scale_fill_manual() — sets custom colors for each gender
gg1 %>%
  ggplot(aes(x = race, y = mean, fill = sex)) +
  geom_col(position = "dodge", color = "black") +
  scale_y_continuous(labels = percent) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.title = element_blank(),
    plot.title = element_text(size = 24),
    legend.text = element_text(size = 24)
  ) +
  error_bar() +
  lab_bar(above = FALSE, type = mean, pos = .03, sz = 12) +
  scale_fill_manual(name = NULL, values = c("#9b59b6", "#16a085")) +
  labs(
    x = "", y = "",
    title = "Weekly Attendance by Race and Gender",
    caption = "Data: General Social Survey, 2024"
  )

ggsave("wkattend_race_gender.png", bg = "white", width = 8, height = 6)

What do you notice? Women attend more frequently than men across all racial groups. Black respondents show the highest weekly attendance overall.


8. Distribution of Attendance: Full Categories

Rather than collapsing attendance into a binary, we can visualize the full distribution. First, let’s tabulate it.

gss24 %>%
  ct(attend, show_na = FALSE)
##   attend    n   pct
## 1      0 1028 0.314
## 2      1  356 0.109
## 3      2  371 0.113
## 4      3  353 0.108
## 5      4  152 0.046
## 6      5  208 0.063
## 7      6  194 0.059
## 8      7  446 0.136
## 9      8  168 0.051

Now we recode the numeric values into descriptive labels using frcode(), which again preserves the order we specify (Never → Several Times a Week).

gg2 <- gss24 %>%
  mutate(attend = frcode(
    attend == 0 ~ "Never",
    attend == 1 ~ "Once or Less",
    attend == 2 ~ "Once or Twice",
    attend == 3 ~ "Several Times",
    attend == 4 ~ "Once a Month",
    attend == 5 ~ "2-3 Times per Month",
    attend == 6 ~ "Nearly Weekly",
    attend == 7 ~ "Weekly",
    attend == 8 ~ "Several Times per Week"
  )) %>%
  ct(attend, show_na = FALSE)

gg2
##                   attend    n   pct
## 1                  Never 1028 0.314
## 2           Once or Less  356 0.109
## 3          Once or Twice  371 0.113
## 4          Several Times  353 0.108
## 5           Once a Month  152 0.046
## 6    2-3 Times per Month  208 0.063
## 7          Nearly Weekly  194 0.059
## 8                 Weekly  446 0.136
## 9 Several Times per Week  168 0.051

9. Horizontal Bar Chart: Full Attendance Distribution

A horizontal bar chart works well here because the attendance labels are long. We use coord_flip() to rotate the chart, and a gradient fill to encode the percentage visually (darker = higher share).

gg2 %>%
  ggplot(aes(x = factor(attend), y = pct, fill = pct)) +
  geom_col(color = "black") +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  lab_bar(above = TRUE, pos = .015, sz = 8, type = pct) +
  scale_y_continuous(labels = percent) +
  labs(
    x = "Attendance", y = "Percent",
    title = "Distribution of Annual Religious Attendance"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 20),
    legend.position = "none"
  ) +
  coord_flip()

ggsave("attend_distribution.png", bg = "white", width = 8, height = 6)

Summary

In this tutorial we covered:

  • ct() — frequency tables with counts and percentages
  • filter() — subsetting rows based on conditions
  • mutate() + case_when() — creating new variables (including binary recodes)
  • frcode() — recoding into ordered factors
  • group_by() + mean_ci() — computing grouped means with confidence intervals
  • Visualization — dodged bar charts, horizontal bar charts, gradient fills, and custom label functions

These are the core building blocks for most descriptive analyses of survey data. As you work with your own data, try swapping in different grouping variables or outcomes and see what patterns emerge.