Data Prep + Processing

I’m not going through the whole spiel about why I chose which variables because that was done in Homework 1. For reference, you can access Homework 1 here: https://rpubs.com/Elektra/1399011.

# Load ANES csv
anes_df <- read.csv("/Users/linde/Downloads/anes_timeseries_2024_csv_20250808/anes_timeseries_2024_csv_20250808.csv")

# Rename variables for ease, keeping original dataset
# intact
anes_new <- anes_df %>%
    rename(wall = V241393, hope = V241118, hispanic = V241499,
        fedspend = V241261, favdpen = V241306, religion = V241422,
        income = V241566x)

# Create a new dataset with just the variables I'm
# interested in
ANES_df <- anes_new[, c("wall", "hope", "hispanic", "fedspend",
    "favdpen", "religion", "income")]

Variable Recoding

I include the codebook from Homework 1 here for two reasons: to facilitate interpretation later on, and because some of the coding has changed to match the response categories required by the assignment. Additionally, I fleshed out the “religion” variable to consist of a series of dummy variables. In this case, “areligious” is the base category.

wall: 0 means the respondent opposes the wall; 1 means they are indifferent to it; and 2 means they favor it.
hope: 0 means not at all hopeful; 1 means a little/somewhat hopeful; and 2 means very/extremely hopeful.
hispanic: 0 means the respondent is not Hispanic/Latino/Spanish, and 1 means they are.
fedspend: 0 means federal spending should be decreased, 1 means it should stay the same, and 2 means it should be increased.
favdpen: 0 means the respondent supports the death penalty, 1 means they oppose.
religion is subset as follows:
- lds: 1 if the respondent identifies as Mormon, 0 if they do not.
- christian: 1 if the respondent identifies as Christian, 0 if they do not.
- jewish: 1 if the respondent identifies as Jewish, 0 if they do not.
- muslim: 1 if the respondent identifies as Muslim, 0 if they do not.
- buddhist: 1 if the respondent identifies as Buddhist, 0 if they do not.
- hindu: 1 if the respondent identifies as Hindu, 0 if they do not.
income: 0 means the respondent makes less than $5k annually, and 27 means they make more than $250k annually. Each number in between represents a $5k increase in annual income from the previous interval.

ANES_df$wall <- car::recode(as.numeric(ANES_df$wall), "2=0; 3=1; 1=2; else=NA")

ANES_df$hope <- car::recode(as.numeric(ANES_df$hope), "1=0; c(2, 3)=1; c(4, 5)=2; else=NA")

ANES_df$hispanic <- car::recode(as.numeric(ANES_df$hispanic),
    "2=0; 1=1; else=NA")

ANES_df$fedspend <- car::recode(as.numeric(ANES_df$fedspend),
    "2=0; 3=1; 1=2; else=NA")

ANES_df$favdpen <- car::recode(as.numeric(ANES_df$favdpen), "1=0; 2=1; else=NA")

ANES_df$lds <- car::recode(as.numeric(ANES_df$religion), "4=1; else=0")

ANES_df$christian <- car::recode(as.numeric(ANES_df$religion),
    "c(1, 2, 3)=1; else=0")

ANES_df$jewish <- car::recode(as.numeric(ANES_df$religion), "5=1; else=0")

ANES_df$muslim <- car::recode(as.numeric(ANES_df$religion), "6=1; else=0")

ANES_df$buddhist <- car::recode(as.numeric(ANES_df$religion),
    "7=1; else=0")

ANES_df$hindu <- car::recode(as.numeric(ANES_df$religion), "8=1; else=0")

ANES_df$income <- car::recode(as.numeric(ANES_df$income), "1=0; 2=1; 3=2; 4=3; 5=4; 6=5; 7=6; 8=7; 9=8; 10=9; 11=10; 12=11; 13=12; 14=13; 15=14; 16=15; 17=16; 18=17; 19=18; 20=19; 21=20; 22=21; 23=22; 24=23; 25=24; 26=25; 27=26; 28=27; else=NA")

Since I coded the wall variable differently, I again check for NA counts:

sapply(ANES_df, function(x) sum(is.na(x)))

##      wall      hope  hispanic  fedspend   favdpen  religion    income       lds 
##       272       254        28       283       336         0       556         0 
## christian    jewish    muslim  buddhist     hindu 
##         0         0         0         0         0

Much better now that I’m keeping the “indifferent” response category! The last step before modelling is to transform the dependent variable, “wall”, into a factor.

ANES_df$wall <- as.factor(ANES_df$wall)

Creating + Running the Ordered Logit Model

logit1 <- polr(wall ~ hope + hispanic + fedspend + favdpen +
    christian + lds + jewish + muslim + buddhist + hindu + income,
    data = ANES_df, Hess = TRUE)

Now, I use stargazer to create pretty, publication-esque tables. Note that I used https://github.com/JakeRuss/cheatsheets/blob/master/stargazer.Rmd (line 450) to figure out how to get p-values in the parentheses instead of stdev, and also to manually set significance thresholds (<0.1 isn’t good enough for me).

# Summary stats table
sumstat <- data.frame(ANES_df$hope, ANES_df$hispanic, ANES_df$fedspend,
    ANES_df$favdpen, ANES_df$religion, ANES_df$income)

stargazer(sumstat, type = "text", title = "Summary Statistics of Independent Variables")

## 
## Summary Statistics of Independent Variables
## ==============================================
## Statistic          N    Mean  St. Dev. Min Max
## ----------------------------------------------
## ANES_df.hope     5,267 0.930   0.640    0   2 
## ANES_df.hispanic 5,493 0.106   0.308    0   1 
## ANES_df.fedspend 5,238 1.570   0.580    0   2 
## ANES_df.favdpen  5,185 0.371   0.483    0   1 
## ANES_df.religion 5,521 5.915   5.047   -9  12 
## ANES_df.income   4,965 16.697  8.467    0  27 
## ----------------------------------------------

# Regression output table
stargazer(logit1, type = "text", report = ("vc*p"), star.cutoffs = c(0.05,
    0.01, 0.001))

## 
## ==========================================
##                   Dependent variable:     
##              -----------------------------
##                          wall             
## ------------------------------------------
## hope                   -0.821***          
##                        p = 0.000          
##                                           
## hispanic               -0.693***          
##                        p = 0.000          
##                                           
## fedspend                -0.075            
##                        p = 0.131          
##                                           
## favdpen                -1.370***          
##                        p = 0.000          
##                                           
## christian              0.709***           
##                        p = 0.000          
##                                           
## lds                    1.226***           
##                       p = 0.00003         
##                                           
## jewish                   0.116            
##                        p = 0.540          
##                                           
## muslim                   0.350            
##                        p = 0.367          
##                                           
## buddhist                -0.498            
##                        p = 0.122          
##                                           
## hindu                   1.036*            
##                        p = 0.016          
##                                           
## income                 -0.014***          
##                       p = 0.00004         
##                                           
## ------------------------------------------
## Observations             4,867            
## ==========================================
## Note:        *p<0.05; **p<0.01; ***p<0.001

Interpreting the Signs and Significance of the Model’s Coefficients

“hope”: The sign of the “hope” coefficient is negative, meaning that a person who is more hopeful about the state of the country is less likely to support the wall than someone who is less hopeful, ceteris paribus. Another way to frame this is that a person who is more hopeful about the state of the country is more likely to oppose the wall than someone who is less hopeful, ceteris paribus. Further, this coefficient is highly statistically significant (<0.001), meaning that I can reject the null hypothesis proposed in HW 1 that there is no relationship between hope and support for the wall, in favor of the alternative hypothesis that there is a negative relationship between hope and support for the wall.
“hispanic”: The sign of the “hispanic” coefficient is negative, meaning that a hispanic person is less likely to support the wall than a non-hispanic person, ceteris paribus. Further, this coefficient is highly statistically significant (<0.001), meaning that I can reject the null hypothesis proposed in HW 1 that there is no relationship between being hispanic and support for the wall, in favor of the alternative hypothesis that there is a negative relationship between being hispanic and support for the wall.
“fedspend”: The sign of the “fedspend” coefficient is negative, meaning that a person who supports more federal social security spending less likely to support the wall than someone who does not support more federal social security spending, ceteris paribus. However, this coefficient is not statistically significant (p=0.131), meaning I cannot reject the null hypothesis proposed in HW1 that there is no relationship between supporting higher federal social security spending and supporting the wall. This conclusion is in alignment with that from HW 1.
“favdpen”: The sign of the “favdpen” coefficient is negative, and MASSIVELY so. I know that the assignment does not require us to comment on the magnitude of the coefficients, but I want to: the odds of a person who opposes the death penalty supporting the wall are 75% (1-exp(-1.37)) lower than the odds of a person who supports the death penalty, ceteris paribus. This coefficient is highly statistically significant (p<0.001), meaning that I can reject the null hypothesis proposed in HW 1 that there is no relationship between being opposing the death penalty and support for the wall, in favor of the alternative hypothesis that there is a negative relationship between being opposing the death penalty and support for the wall.
“religion”: Here, I devote attention to each of the nested dummy variables:
- “christian”: The sign for “christian” is positive, meaning that a Christian person is more likely to support the wall than an areligious person (base category), ceteris paribus. Further, the “christian” coefficient is highly significant (p<0.001), allowing me to reject the null hypothesis that there is no relationship between identifying as Christian and supporting the wall, in favor of the alternative hypothesis that there is a positive relationship between identifying as Christian and supporting the wall.
- “lds”: The sign for “lds” is positive, meaning that a Mormon person is more likely to support the wall than an areligious person, ceteris paribus. Further, the “lds” coefficient is highly significant (p<0.001), allowing me to reject the null hypothesis that there is no relationship between identifying as Mormon and supporting the wall, in favor of the alternative hypothesis that there is a positive relationship between identifying as Mormon and supporting the wall.
- “jewish”: The sign for “jewish” is positive, meaning that a Jewish person is more likely to support the wall than an areligious person, ceteris paribus. However, the “jewish” coefficient is not significant (p=0.540), meaning I cannot reject the null hypothesis that there is no relationship between identifying as Jewish and supporting the wall.
- “muslim”: The sign for “muslim” is positive, meaning that a Muslim person is more likely to support the wall than an areligious person, ceteris paribus. However, the “muslim” coefficient is not significant (p=0.367), meaning I cannot reject the null hypothesis that there is no relationship between identifying as Muslim and supporting the wall.
- “buddhist”: The sign for “buddhist” is negative, meaning that a Jewish person is less likely to support the wall than an areligious person, ceteris paribus. However, the “buddhist” coefficient is not significant (p=0.122), meaning I cannot reject the null hypothesis that there is no relationship between identifying as Buddhist and supporting the wall.
- “hindu”: The sign for “hindu” is negative, meaning that a Mormon person is less likely to support the wall than an areligious person, ceteris paribus. Further, the “hindu” coefficient is moderately significant (p<0.05), allowing me to reject the null hypothesis that there is no relationship between identifying as Hindu and supporting the wall, in favor of the alternative hypothesis that there is a negative relationship between identifying as Mormon and supporting the wall.
“income”: The sign of the “income” coefficient is negative, meaning that a person who has a higher annual income is less likely to support the wall than someone who has a lower annual income, ceteris paribus. This coefficient is highly statistically significant (<0.001), meaning that I can reject the null hypothesis proposed in HW 1 that there is no relationship between income and support for the wall, in favor of the alternative hypothesis that there is a negative relationship between income and support for the wall. Worth noting, however, is that the magnitude of this coefficient is very small, and that the odds of a person with $5k more income supporting the wall are only 2% lower than the odds of a person with $5k less income, again supporting the conclusion from HW 1.

Finding Predicted Probabilities

Get the probabilities:

# Make new data frame focused on the dynamics of 3 variables 
pred_grid <- expand.grid(
  christian = c(0, 1),      # either yes or no 
  favdpen = c(0, 1),        # either yes or no
  income = c(3, 15, 25))     # 3 different income levels (I know, I know, this creates 12 combinations, but I'm interested)


# Fix other variables, but keep them in the model 
pred_grid$hispanic <- 0
pred_grid$fedspend <- 1
pred_grid$lds <- 0
pred_grid$jewish <- 0
pred_grid$muslim <- 0
pred_grid$buddhist <- 0
pred_grid$hindu <- 0
pred_grid$hope <- 1

nrow(pred_grid)

## [1] 12

pred_probs_grid <- predict(logit1, newdata = pred_grid, type = "probs")
pred_df <- cbind(pred_grid, pred_probs_grid)
pred_df

##    christian favdpen income hispanic fedspend lds jewish muslim buddhist hindu
## 1          0       0      3        0        1   0      0      0        0     0
## 2          1       0      3        0        1   0      0      0        0     0
## 3          0       1      3        0        1   0      0      0        0     0
## 4          1       1      3        0        1   0      0      0        0     0
## 5          0       0     15        0        1   0      0      0        0     0
## 6          1       0     15        0        1   0      0      0        0     0
## 7          0       1     15        0        1   0      0      0        0     0
## 8          1       1     15        0        1   0      0      0        0     0
## 9          0       0     25        0        1   0      0      0        0     0
## 10         1       0     25        0        1   0      0      0        0     0
## 11         0       1     25        0        1   0      0      0        0     0
## 12         1       1     25        0        1   0      0      0        0     0
##    hope         0         1         2
## 1     1 0.2281746 0.2424712 0.5293542
## 2     1 0.1270612 0.1774089 0.6955299
## 3     1 0.5378868 0.2399204 0.2221927
## 4     1 0.3643089 0.2685229 0.3671682
## 5     1 0.2595142 0.2536328 0.4868530
## 6     1 0.1471609 0.1944893 0.6583498
## 7     1 0.5798093 0.2260124 0.1941783
## 8     1 0.4045461 0.2668568 0.3285971
## 9     1 0.2876753 0.2607707 0.4515540
## 10    1 0.1658609 0.2083590 0.6257800
## 11    1 0.6139116 0.2131409 0.1729476
## 12    1 0.4391125 0.2627809 0.2981066

Plotting for interpretability:

# Format the plot
pred_4plot <- pred_df %>%
    pivot_longer(cols = starts_with("0") | starts_with("1") |
        starts_with("2"), names_to = "wall_cat", values_to = "prob")

# Make label data
pred_4plot <- pred_4plot %>%
    mutate(wall_cat = factor(wall_cat, levels = c("0", "1", "2"),
        labels = c("Opposes the Wall", "Indifferent", "Favors the Wall")),
        christian = factor(christian, levels = c(0, 1), labels = c("No",
            "Yes")), favdpen = factor(favdpen, levels = c(0,
            1), labels = c("Supports", "Opposes")))

# Actual plot
ggplot(pred_4plot, aes(x = income, y = prob, color = christian,
    linetype = favdpen, group = interaction(christian, favdpen))) +
    geom_line() + geom_point() + facet_wrap(~wall_cat) + labs(x = "Income Level",
    y = "Predicted Probability", color = "Christian?", linetype = "Death Penalty?",
    title = "Predicted Probability of Supporting the Wall", subtitle = "By Christian Identity, Death Penalty Views, and Income Level") +
    theme_bw()

Interpreting Predicted Probabilities

Discussion Point 1: The Relative Importance of Death Penalty Support and Christian Identity

The plot above shows that the predicted probability of opposing the wall is highest (and consequently, the predicted probabilities of being indifferent or opposing the wall are lowest) when the lines are dashed, irrespective of whether they are blue or pink. In other words, ceteris paribus, a person who opposes the death penalty is more likely to oppose the wall, regardless of whether they are Christian or not. This reflects the magnitudes of the coefficients displayed earlier: -1.37 (“favdpen”) exponentiates to a larger percentage change in log odds compared to the exponentiated 0.709 (“christian”).

Discussion Point 2: Death Penalty Support Among Predicted “Indifferent” Probabilities Given Christian Identity

In general, the predicted probabilities of being indifferent to the wall are smaller than the predicted probabilities of opposing or supporting it. The only two exceptions are the predicted probability of a Christian who supports the death penalty opposing the wall, and the predicted probability of a non-Christian who opposes the death penalty supporting the wall. The plot shows something interesting, though: within each Christian “group” (non-Christian or Christian), as income level increases, the death penalty positions “switch places”. At lower and mid-range income levels, opposing the death penalty is associated with a higher probability of being indifferent to the wall, whereas at higher income levels, supporting the death penalty is associated with a higher probability of being indifferent to the wall. In other words, people with more income are more likely to be indifferent to the wall when they also support the death penalty.

Discussion Point 3: Income as a Reliable Predictor

One of the clearest takeaways from the plot is that as income increases, so does the probability of opposing the wall. This corroborates the findings in HW 1, in which I argued that such a pattern is perhaps reflective of the popular (and erroneous) logic that “immigrants steal jobs”. If you have more income, it makes sense that you’re less likely to be worried about “immigrants stealing your job” and therefore you may have less reason to support it. Alternatively, those with more income may just have higher paying jobs as a result of a higher education level, and a higher education level itself could be associated with more opposition towards the wall for a few reasons, including exposure to more liberal environments and coursework that highlighted different life perspectives. If we continue with this data in future assignments, it could be interesting to treat the education variable (V241463) as a moderating/mediating variable.

The Proportional Odds Assumption in Context

The proportional odds assumption for ordinal logit models states that while the differences between each level of the dependent variable’s categories aren’t the same, the differences between the log odds of each level are. In the context of this data, I can say that the differences between the log odds of opposing the wall, being indifferent to the wall, and supporting the wall are all the same.

Questions:

Do I need more detail in the proportional odds assumption part? Do I need more detail in the discussion of probabilities part?

POLSGU4712 HW 2

Linden James

2026-02-28