Data Prep + Processing

I’m not going through the whole spiel about why I chose which variables because that was done in Homework 1. For reference, you can access Homework 1 here: https://rpubs.com/Elektra/1399011.

# Load ANES csv
anes_df <- read.csv("/Users/linde/Downloads/anes_timeseries_2024_csv_20250808/anes_timeseries_2024_csv_20250808.csv")

# Rename variables for ease, keeping original dataset
# intact
anes_new <- anes_df %>%
    rename(wall = V241393, hope = V241118, hispanic = V241499,
        fedspend = V241261, favdpen = V241306, religion = V241422,
        income = V241566x)

# Create a new dataset with just the variables I'm
# interested in
ANES_df <- anes_new[, c("wall", "hope", "hispanic", "fedspend",
    "favdpen", "religion", "income")]

Variable Recoding

I include the codebook from Homework 1 here for two reasons: to facilitate interpretation later on, and because some of the coding has changed to match the response categories required by the assignment. Additionally, I fleshed out the “religion” variable to consist of a series of dummy variables. In this case, “areligious” is the base category.

ANES_df$wall <- car::recode(as.numeric(ANES_df$wall), "2=0; 3=1; 1=2; else=NA")

ANES_df$hope <- car::recode(as.numeric(ANES_df$hope), "1=0; c(2, 3)=1; c(4, 5)=2; else=NA")

ANES_df$hispanic <- car::recode(as.numeric(ANES_df$hispanic),
    "2=0; 1=1; else=NA")

ANES_df$fedspend <- car::recode(as.numeric(ANES_df$fedspend),
    "2=0; 3=1; 1=2; else=NA")

ANES_df$favdpen <- car::recode(as.numeric(ANES_df$favdpen), "1=0; 2=1; else=NA")

ANES_df$lds <- car::recode(as.numeric(ANES_df$religion), "4=1; else=0")

ANES_df$christian <- car::recode(as.numeric(ANES_df$religion),
    "c(1, 2, 3)=1; else=0")

ANES_df$jewish <- car::recode(as.numeric(ANES_df$religion), "5=1; else=0")

ANES_df$muslim <- car::recode(as.numeric(ANES_df$religion), "6=1; else=0")

ANES_df$buddhist <- car::recode(as.numeric(ANES_df$religion),
    "7=1; else=0")

ANES_df$hindu <- car::recode(as.numeric(ANES_df$religion), "8=1; else=0")

ANES_df$income <- car::recode(as.numeric(ANES_df$income), "1=0; 2=1; 3=2; 4=3; 5=4; 6=5; 7=6; 8=7; 9=8; 10=9; 11=10; 12=11; 13=12; 14=13; 15=14; 16=15; 17=16; 18=17; 19=18; 20=19; 21=20; 22=21; 23=22; 24=23; 25=24; 26=25; 27=26; 28=27; else=NA")

Since I coded the wall variable differently, I again check for NA counts:

sapply(ANES_df, function(x) sum(is.na(x)))
##      wall      hope  hispanic  fedspend   favdpen  religion    income       lds 
##       272       254        28       283       336         0       556         0 
## christian    jewish    muslim  buddhist     hindu 
##         0         0         0         0         0

Much better now that I’m keeping the “indifferent” response category! The last step before modelling is to transform the dependent variable, “wall”, into a factor.

ANES_df$wall <- as.factor(ANES_df$wall)

Creating + Running the Ordered Logit Model

logit1 <- polr(wall ~ hope + hispanic + fedspend + favdpen +
    christian + lds + jewish + muslim + buddhist + hindu + income,
    data = ANES_df, Hess = TRUE)

Now, I use stargazer to create pretty, publication-esque tables. Note that I used https://github.com/JakeRuss/cheatsheets/blob/master/stargazer.Rmd (line 450) to figure out how to get p-values in the parentheses instead of stdev, and also to manually set significance thresholds (<0.1 isn’t good enough for me).

# Summary stats table
sumstat <- data.frame(ANES_df$hope, ANES_df$hispanic, ANES_df$fedspend,
    ANES_df$favdpen, ANES_df$religion, ANES_df$income)

stargazer(sumstat, type = "text", title = "Summary Statistics of Independent Variables")
## 
## Summary Statistics of Independent Variables
## ==============================================
## Statistic          N    Mean  St. Dev. Min Max
## ----------------------------------------------
## ANES_df.hope     5,267 0.930   0.640    0   2 
## ANES_df.hispanic 5,493 0.106   0.308    0   1 
## ANES_df.fedspend 5,238 1.570   0.580    0   2 
## ANES_df.favdpen  5,185 0.371   0.483    0   1 
## ANES_df.religion 5,521 5.915   5.047   -9  12 
## ANES_df.income   4,965 16.697  8.467    0  27 
## ----------------------------------------------
# Regression output table
stargazer(logit1, type = "text", report = ("vc*p"), star.cutoffs = c(0.05,
    0.01, 0.001))
## 
## ==========================================
##                   Dependent variable:     
##              -----------------------------
##                          wall             
## ------------------------------------------
## hope                   -0.821***          
##                        p = 0.000          
##                                           
## hispanic               -0.693***          
##                        p = 0.000          
##                                           
## fedspend                -0.075            
##                        p = 0.131          
##                                           
## favdpen                -1.370***          
##                        p = 0.000          
##                                           
## christian              0.709***           
##                        p = 0.000          
##                                           
## lds                    1.226***           
##                       p = 0.00003         
##                                           
## jewish                   0.116            
##                        p = 0.540          
##                                           
## muslim                   0.350            
##                        p = 0.367          
##                                           
## buddhist                -0.498            
##                        p = 0.122          
##                                           
## hindu                   1.036*            
##                        p = 0.016          
##                                           
## income                 -0.014***          
##                       p = 0.00004         
##                                           
## ------------------------------------------
## Observations             4,867            
## ==========================================
## Note:        *p<0.05; **p<0.01; ***p<0.001

Interpreting the Signs and Significance of the Model’s Coefficients

Finding Predicted Probabilities

Get the probabilities:

# Make new data frame focused on the dynamics of 3 variables 
pred_grid <- expand.grid(
  christian = c(0, 1),      # either yes or no 
  favdpen = c(0, 1),        # either yes or no
  income = c(3, 15, 25))     # 3 different income levels (I know, I know, this creates 12 combinations, but I'm interested)


# Fix other variables, but keep them in the model 
pred_grid$hispanic <- 0
pred_grid$fedspend <- 1
pred_grid$lds <- 0
pred_grid$jewish <- 0
pred_grid$muslim <- 0
pred_grid$buddhist <- 0
pred_grid$hindu <- 0
pred_grid$hope <- 1

nrow(pred_grid)
## [1] 12
pred_probs_grid <- predict(logit1, newdata = pred_grid, type = "probs")
pred_df <- cbind(pred_grid, pred_probs_grid)
pred_df
##    christian favdpen income hispanic fedspend lds jewish muslim buddhist hindu
## 1          0       0      3        0        1   0      0      0        0     0
## 2          1       0      3        0        1   0      0      0        0     0
## 3          0       1      3        0        1   0      0      0        0     0
## 4          1       1      3        0        1   0      0      0        0     0
## 5          0       0     15        0        1   0      0      0        0     0
## 6          1       0     15        0        1   0      0      0        0     0
## 7          0       1     15        0        1   0      0      0        0     0
## 8          1       1     15        0        1   0      0      0        0     0
## 9          0       0     25        0        1   0      0      0        0     0
## 10         1       0     25        0        1   0      0      0        0     0
## 11         0       1     25        0        1   0      0      0        0     0
## 12         1       1     25        0        1   0      0      0        0     0
##    hope         0         1         2
## 1     1 0.2281746 0.2424712 0.5293542
## 2     1 0.1270612 0.1774089 0.6955299
## 3     1 0.5378868 0.2399204 0.2221927
## 4     1 0.3643089 0.2685229 0.3671682
## 5     1 0.2595142 0.2536328 0.4868530
## 6     1 0.1471609 0.1944893 0.6583498
## 7     1 0.5798093 0.2260124 0.1941783
## 8     1 0.4045461 0.2668568 0.3285971
## 9     1 0.2876753 0.2607707 0.4515540
## 10    1 0.1658609 0.2083590 0.6257800
## 11    1 0.6139116 0.2131409 0.1729476
## 12    1 0.4391125 0.2627809 0.2981066

Plotting for interpretability:

# Format the plot
pred_4plot <- pred_df %>%
    pivot_longer(cols = starts_with("0") | starts_with("1") |
        starts_with("2"), names_to = "wall_cat", values_to = "prob")

# Make label data
pred_4plot <- pred_4plot %>%
    mutate(wall_cat = factor(wall_cat, levels = c("0", "1", "2"),
        labels = c("Opposes the Wall", "Indifferent", "Favors the Wall")),
        christian = factor(christian, levels = c(0, 1), labels = c("No",
            "Yes")), favdpen = factor(favdpen, levels = c(0,
            1), labels = c("Supports", "Opposes")))

# Actual plot
ggplot(pred_4plot, aes(x = income, y = prob, color = christian,
    linetype = favdpen, group = interaction(christian, favdpen))) +
    geom_line() + geom_point() + facet_wrap(~wall_cat) + labs(x = "Income Level",
    y = "Predicted Probability", color = "Christian?", linetype = "Death Penalty?",
    title = "Predicted Probability of Supporting the Wall", subtitle = "By Christian Identity, Death Penalty Views, and Income Level") +
    theme_bw()

Interpreting Predicted Probabilities

Discussion Point 1: The Relative Importance of Death Penalty Support and Christian Identity

The plot above shows that the predicted probability of opposing the wall is highest (and consequently, the predicted probabilities of being indifferent or opposing the wall are lowest) when the lines are dashed, irrespective of whether they are blue or pink. In other words, ceteris paribus, a person who opposes the death penalty is more likely to oppose the wall, regardless of whether they are Christian or not. This reflects the magnitudes of the coefficients displayed earlier: -1.37 (“favdpen”) exponentiates to a larger percentage change in log odds compared to the exponentiated 0.709 (“christian”).

Discussion Point 2: Death Penalty Support Among Predicted “Indifferent” Probabilities Given Christian Identity

In general, the predicted probabilities of being indifferent to the wall are smaller than the predicted probabilities of opposing or supporting it. The only two exceptions are the predicted probability of a Christian who supports the death penalty opposing the wall, and the predicted probability of a non-Christian who opposes the death penalty supporting the wall. The plot shows something interesting, though: within each Christian “group” (non-Christian or Christian), as income level increases, the death penalty positions “switch places”. At lower and mid-range income levels, opposing the death penalty is associated with a higher probability of being indifferent to the wall, whereas at higher income levels, supporting the death penalty is associated with a higher probability of being indifferent to the wall. In other words, people with more income are more likely to be indifferent to the wall when they also support the death penalty.

Discussion Point 3: Income as a Reliable Predictor

One of the clearest takeaways from the plot is that as income increases, so does the probability of opposing the wall. This corroborates the findings in HW 1, in which I argued that such a pattern is perhaps reflective of the popular (and erroneous) logic that “immigrants steal jobs”. If you have more income, it makes sense that you’re less likely to be worried about “immigrants stealing your job” and therefore you may have less reason to support it. Alternatively, those with more income may just have higher paying jobs as a result of a higher education level, and a higher education level itself could be associated with more opposition towards the wall for a few reasons, including exposure to more liberal environments and coursework that highlighted different life perspectives. If we continue with this data in future assignments, it could be interesting to treat the education variable (V241463) as a moderating/mediating variable.

The Proportional Odds Assumption in Context

The proportional odds assumption for ordinal logit models states that while the differences between each level of the dependent variable’s categories aren’t the same, the differences between the log odds of each level are. In the context of this data, I can say that the differences between the log odds of opposing the wall, being indifferent to the wall, and supporting the wall are all the same.

Questions:

Do I need more detail in the proportional odds assumption part? Do I need more detail in the discussion of probabilities part?