Variable Selection

Dependent Variable, Y (wall): V241393 (“Do you favor or oppose building a wall on the US border with Mexico?”). Chosen because it was required.

Independent Variables + Justification:

# Load ANES csv
anes_df <- read.csv("/Users/linde/Downloads/anes_timeseries_2024_csv_20250808/anes_timeseries_2024_csv_20250808.csv")

# Rename variables for ease, keeping original dataset
# intact
anes_new <- anes_df %>%
    rename(wall = V241393, hope = V241118, hispanic = V241499,
        fedspend = V241261, favdpen = V241306, religion = V241422,
        income = V241566x)

# Create a new dataset with just the variables I'm
# interested in
ANES_df <- anes_new[, c("wall", "hope", "hispanic", "fedspend",
    "favdpen", "religion", "income")]

Variable Recoding Codebook:

Note: I used the car package to do recoding since it allows you to consolidate the recoding for any given variable to 1 line. Credit to POLS GU4710 for teaching me about car.

ANES_df$wall <- car::recode(as.numeric(ANES_df$wall), "1=0; 2=1; else=NA")

ANES_df$hope <- car::recode(as.numeric(ANES_df$hope), "1=0; c(2, 3)=1; c(4, 5)=2; else=NA")

ANES_df$hispanic <- car::recode(as.numeric(ANES_df$hispanic),
    "2=0; 1=1; else=NA")

ANES_df$fedspend <- car::recode(as.numeric(ANES_df$fedspend),
    "2=0; 3=1; 1=2; else=NA")

ANES_df$favdpen <- car::recode(as.numeric(ANES_df$favdpen), "1=0; 2=1; else=NA")

ANES_df$religion <- car::recode(as.numeric(ANES_df$religion),
    "c(9, 10, 11, 12)=0; c(1, 2, 3)=1; 4=2; 5=3; 6=4; 7=5; 8=6; else=NA")

ANES_df$income <- car::recode(as.numeric(ANES_df$income), "1=0; 2=1; 3=2; 4=3; 5=4; 6=5; 7=6; 8=7; 9=8; 10=9; 11=10; 12=11; 13=12; 14=13; 15=14; 16=15; 17=16; 18=17; 19=18; 20=19; 21=20; 22=21; 23=22; 24=23; 25=24; 26=25; 27=26; 28=27; else=NA")

NOTE: While recoding, I was very deliberate in ensuring that there were not too many NAs in each variable, as I wanted to keep my data pool as large as possible. However, the assignment constrains us to make “wall” a binary variable. So, there are more NAs than I would like:

sapply(ANES_df, function(x) sum(is.na(x)))
##     wall     hope hispanic fedspend  favdpen religion   income 
##     1411      254       28      283      336      295      556

Summary Stats for Each Variable

summary_stats <- ANES_df %>%
    summarize_all(list(n = ~sum(!is.na(.)), mean = ~mean(., na.rm = TRUE),
        sd = ~sd(., na.rm = TRUE), min = ~min(., na.rm = TRUE),
        max = ~max(., na.rm = TRUE))) %>%
    pivot_longer(everything(), names_to = c("variable", "statistic"),
        names_sep = "_") %>%
    pivot_wider(names_from = statistic, values_from = value)

summary_stats

Logit Model 1 & Interpretations

Creating and running the model:

logit1 <- glm(wall ~ hope + hispanic, family = binomial(link = "logit"),
    data = ANES_df)
summary(logit1)
## 
## Call:
## glm(formula = wall ~ hope + hispanic, family = binomial(link = "logit"), 
##     data = ANES_df)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.34573    0.06420 -20.960  < 2e-16 ***
## hope         1.08489    0.05606  19.354  < 2e-16 ***
## hispanic     0.74501    0.11247   6.624 3.49e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5604.4  on 4091  degrees of freedom
## Residual deviance: 5119.7  on 4089  degrees of freedom
##   (1429 observations deleted due to missingness)
## AIC: 5125.7
## 
## Number of Fisher Scoring iterations: 4
exp(coef(logit1))
## (Intercept)        hope    hispanic 
##   0.2603506   2.9591214   2.1064531

Interpretations:

Logit Model 2 & Interpretations

Creating and running the model:

logit2 <- glm(wall ~ income + fedspend, family = binomial(link = "logit"),
    data = ANES_df)
summary(logit2)
## 
## Call:
## glm(formula = wall ~ income + fedspend, family = binomial(link = "logit"), 
##     data = ANES_df)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.863391   0.125394  -6.885 5.76e-12 ***
## income       0.018720   0.003982   4.702 2.58e-06 ***
## fedspend     0.197547   0.057518   3.435 0.000594 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5295.0  on 3856  degrees of freedom
## Residual deviance: 5264.5  on 3854  degrees of freedom
##   (1664 observations deleted due to missingness)
## AIC: 5270.5
## 
## Number of Fisher Scoring iterations: 4
exp(coef(logit2))
## (Intercept)      income    fedspend 
##   0.4217296   1.0188963   1.2184106

Interpretations:

Graph

Here, I’ll check whether the trend I noted above (people with lower annual incomes and who support higher federal spending are more likely to oppose the wall than people with higher annual incomes and who do not) checks out. In my new dataset, I’m looking at the probabilities of opposing the wall when “fedspend” is either 0 or 2, across all income levels (“income” is my chosen significant continuous independent variable).

# Creating dataset
prediction_data <- with(ANES_df, data.frame(income = 0:27, fedspend = rep(c(0,
    2), each = 28)))
prediction <- predict(logit2, prediction_data, type = "response",
    se.fit = TRUE)

# CI bounds
predicted_fit <- prediction$fit
lower <- prediction$fit - (1.96 * prediction$se.fit)
upper <- prediction$fit + (1.96 * prediction$se.fit)

And now: plot!

ggplot(prediction_data, aes(x = income, y = predicted_fit, color = factor(fedspend))) +
    geom_line() + geom_ribbon(aes(ymin = lower, ymax = upper),
    alpha = 0.2) + labs(title = "Pred. Prob. of Opposing Wall by Fedspend Across Income",
    x = "Income Level", y = "Pred. Prob. of Opposing Wall", color = "Fedspend") +
    theme_minimal()

From the plot, it is clear to see that overall, people who support higher levels of federal Social Security spending are more likely to oppose the wall. However, even for people who don’t support or minimally support higher levels of Social Security spending, the probability of opposing the wall increases with income. This supports the hypothesis set forth in X3, that people who support an increase in federal Social Security spending would be more likely to oppose the wall, because they hold more economically and culturally liberal values. Another potential reasoning would be that people who are more financially stable are less fearful of “immigrants stealing their jobs”, a consistently-refuted yet ever-popular narrative.