POLSGU4712 HW 1

Variable Selection

Dependent Variable, Y (wall): V241393 (“Do you favor or oppose building a wall on the US border with Mexico?”). Chosen because it was required.

Independent Variables + Justification:

X₁ (ordinal, “hope”): V241118 (“How hopeful do you feel about how things are going in the country?”). Chosen based on the competing hypotheses that respondents who are content with how things are going in the U.S. would either a) support the wall because they don’t want immigration to change things, or b) oppose the wall because they are happy that the U.S. is welcoming immigrants.
X₂ (binary, “hispanic”): V241499 (“Are you of Hispanic, Latino, or Spanish origin?”) Chosen based on the hypothesis that Hispanic/Latino/Spanish respondents would be more likely to oppose the wall because they have shared experience/culture with the immigrants that might cross it.
X₃ (ordinal, “fedspend”): V241261 (“Should federal spending on Social Security be increased, decreased, or kept the same?”) Chosen based on the assumption that people who support an increase in federal Social Security spending would be more likely to oppose the wall, because they hold more economically and culturally liberal values.
X₄ (binary, “favdpen”): V241306 (“Do you favor or oppose the death penalty for persons convicted of murder?”) Chosen based on the hypothesis that people who support the death penalty would also support the wall, because it is popularly injurious infrastructure.
X₅ (nominal, “religion”): V241422 (“What is your present religion, if any?”) Honestly this is just a question I’m interested in. A preliminary hypothesis would be that people who are religious of any kind (so, above the baseline category which denotes no present religion) would be more likely to oppose the wall because of religious values that emphasize supporting and caring for others.
X₆ (continuous, “income”): V241566 (“What is your total annual income amount?”) Chosen because we’re required to have a continuous variable. The popular logic is that “immigrants take jobs” but that has been refuted to a degree, so I’m unsure if it will be mainly working class Americans who oppose the wall, or those who are relatively wealthy. We shall see.

# Load ANES csv
anes_df <- read.csv("/Users/linde/Downloads/anes_timeseries_2024_csv_20250808/anes_timeseries_2024_csv_20250808.csv")

# Rename variables for ease, keeping original dataset
# intact
anes_new <- anes_df %>%
    rename(wall = V241393, hope = V241118, hispanic = V241499,
        fedspend = V241261, favdpen = V241306, religion = V241422,
        income = V241566x)

# Create a new dataset with just the variables I'm
# interested in
ANES_df <- anes_new[, c("wall", "hope", "hispanic", "fedspend",
    "favdpen", "religion", "income")]

Variable Recoding Codebook:

Note: I used the car package to do recoding since it allows you to consolidate the recoding for any given variable to 1 line. Credit to POLS GU4710 for teaching me about car.

wall: 0 means the respondent supports the wall, 1 means they oppose it.
hope: 0 means not at all hopeful; 1 means a little/somewhat hopeful; and 2 means very/extremely hopeful.
hispanic: 0 means the respondent is not Hispanic/Latino/Spanish, and 1 means they are.
fedspend: 0 means federal spending should be decreased, 1 means it should stay the same, and 2 means it should be increased.
favdpen: 0 means the respondent supports the death penalty, 1 means they oppose.
religion: 0 means the respondent does not identify as presently religious; 1 means they identify as Christian; 2 means LDS; 3 means Jewish; 4 means Muslim; 5 means Buddhist; and 6 means Hindu.
income: 0 means the respondent makes less than $5k annually, and 27 means they make more than $250k annually. Each number in between represents a $5k increase in annual income from the previous interval.

ANES_df$wall <- car::recode(as.numeric(ANES_df$wall), "1=0; 2=1; else=NA")

ANES_df$hope <- car::recode(as.numeric(ANES_df$hope), "1=0; c(2, 3)=1; c(4, 5)=2; else=NA")

ANES_df$hispanic <- car::recode(as.numeric(ANES_df$hispanic),
    "2=0; 1=1; else=NA")

ANES_df$fedspend <- car::recode(as.numeric(ANES_df$fedspend),
    "2=0; 3=1; 1=2; else=NA")

ANES_df$favdpen <- car::recode(as.numeric(ANES_df$favdpen), "1=0; 2=1; else=NA")

ANES_df$religion <- car::recode(as.numeric(ANES_df$religion),
    "c(9, 10, 11, 12)=0; c(1, 2, 3)=1; 4=2; 5=3; 6=4; 7=5; 8=6; else=NA")

ANES_df$income <- car::recode(as.numeric(ANES_df$income), "1=0; 2=1; 3=2; 4=3; 5=4; 6=5; 7=6; 8=7; 9=8; 10=9; 11=10; 12=11; 13=12; 14=13; 15=14; 16=15; 17=16; 18=17; 19=18; 20=19; 21=20; 22=21; 23=22; 24=23; 25=24; 26=25; 27=26; 28=27; else=NA")

NOTE: While recoding, I was very deliberate in ensuring that there were not too many NAs in each variable, as I wanted to keep my data pool as large as possible. However, the assignment constrains us to make “wall” a binary variable. So, there are more NAs than I would like:

sapply(ANES_df, function(x) sum(is.na(x)))

##     wall     hope hispanic fedspend  favdpen religion   income 
##     1411      254       28      283      336      295      556

Summary Stats for Each Variable

summary_stats <- ANES_df %>%
    summarize_all(list(n = ~sum(!is.na(.)), mean = ~mean(., na.rm = TRUE),
        sd = ~sd(., na.rm = TRUE), min = ~min(., na.rm = TRUE),
        max = ~max(., na.rm = TRUE))) %>%
    pivot_longer(everything(), names_to = c("variable", "statistic"),
        names_sep = "_") %>%
    pivot_wider(names_from = statistic, values_from = value)

summary_stats

Logit Model 1 & Interpretations

Creating and running the model:

logit1 <- glm(wall ~ hope + hispanic, family = binomial(link = "logit"),
    data = ANES_df)
summary(logit1)

## 
## Call:
## glm(formula = wall ~ hope + hispanic, family = binomial(link = "logit"), 
##     data = ANES_df)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.34573    0.06420 -20.960  < 2e-16 ***
## hope         1.08489    0.05606  19.354  < 2e-16 ***
## hispanic     0.74501    0.11247   6.624 3.49e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5604.4  on 4091  degrees of freedom
## Residual deviance: 5119.7  on 4089  degrees of freedom
##   (1429 observations deleted due to missingness)
## AIC: 5125.7
## 
## Number of Fisher Scoring iterations: 4

exp(coef(logit1))

## (Intercept)        hope    hispanic 
##   0.2603506   2.9591214   2.1064531

Interpretations:

Signs: For both “hope” and “hispanic”, the sign of the logit coefficient is positive, meaning that a person who is more hopeful about the state of the country is more likely to oppose the wall than someone who is less hopeful, ceteris paribus, and that a person who is Hispanic/Latino/Spanish is more likely to oppose the wall than someone who is not, ceteris paribus.
Magnitude and Log Odds: The coefficient for “hope” is 1.08, which is relatively large. It means that a 1-unit increase in degree of hopefulness (see “Variable Recoding Codebook”) is associated with a 1.08-unit change in the log odds, ceteris paribus. The coefficient for “hispanic” is slightly lower at 0.75, which means that, over a non-Hispanic/Latino/Spanish respondent, a Hispanic/Latino/Spanish respondent is has 0.75-unit larger log odds, ceteris paribus.
Odds: After exponentiating the coefficients, I found that the odds of a more hopeful person opposing the wall are 195% higher than the odds of a less hopeful person, and that the odds of a Hispanic/Latino/Spanish person opposing the wall are 110% higher than the odds of a person who is not Hispanic/Latino/Spanish, ceteris paribus.
Predicted Probabilities: $p = exp(𝛽_0 + 𝛽_1x + 𝛽_2x)/(1+exp(𝛽_0 + 𝛽_1x + 𝛽_2x))$. So, as an example, the predicted probability for a non-Hispanic/Latino/Spanish person with “hope” = 1 opposing the wall would be: $p = exp(-1.35 + 1.08)/(1 + exp(-1.35 + 1.08)) = 0.43$, or 43%. For the sake of the next question about Odds Ratios, I will also calculate the predicted probability for a non-Hispanic/Latino/Spanish person with “hope” = 0 opposing the wall: $p = exp(-1.35)/(1 + exp(-1.35)) = 0.21$, or 21%.
Odds Ratios: $OR = (p1/(1-p1))/(p0/(1-p0))$. So, if we were to compare the odds of non-Hispanic/Latino/Spanish people who differ only in their degree of “hope” (“hope” = 1 vs. “hope” = 0), the odds ratio would be: $OR = (0.43/(1-0.43))/(0.21/(1-0.21)) = 2.84$. So, among non-Hispanic/Latino/Spanish respondents, the odds of opposing the wall are 2.84 times higher when “hope” = 1 compared to “hope” = 0. Equivalently, we can calculate the odds ratio using just the odds: $OR = exp((𝛽_{0,1} + 𝛽_{1,1}x)-(𝛽_{0,0} + 𝛽_{1,0}x)) = exp((-1.35+1.08) - 1.35) = exp(1.08) = 2.94$ , which is slightly higher than the calculation done with probabilities, but this can be attributed to rounding error.

Logit Model 2 & Interpretations

Creating and running the model:

logit2 <- glm(wall ~ income + fedspend, family = binomial(link = "logit"),
    data = ANES_df)
summary(logit2)

## 
## Call:
## glm(formula = wall ~ income + fedspend, family = binomial(link = "logit"), 
##     data = ANES_df)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.863391   0.125394  -6.885 5.76e-12 ***
## income       0.018720   0.003982   4.702 2.58e-06 ***
## fedspend     0.197547   0.057518   3.435 0.000594 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5295.0  on 3856  degrees of freedom
## Residual deviance: 5264.5  on 3854  degrees of freedom
##   (1664 observations deleted due to missingness)
## AIC: 5270.5
## 
## Number of Fisher Scoring iterations: 4

exp(coef(logit2))

## (Intercept)      income    fedspend 
##   0.4217296   1.0188963   1.2184106

Interpretations:

Signs: For both “income” and “fedspend”, the sign of the logit coefficient is positive, meaning that a person who makes more annual income is more likely to oppose the wall than someone who makes less, ceteris paribus, and that a person who is more supportive of higher federal social security spending is more likely to oppose the wall than someone who is less supportive, ceteris paribus.
Magnitude and Log Odds: The coefficient for “income” is 0.02, which is relatively small. It means that a 1-unit increase in income (roughly $5k) is associated with a 0.02-unit change in the log odds, ceteris paribus. The coefficient for “fedspend” is higher at 0.20, which means that a 1-unit increase in “fedspend” (see “Variable Recoding Codebook”) is associated with a 0.20-unit change in the log odds, ceteris paribus.
Odds: After exponentiating the coefficients, I found that the odds of a person with more income opposing the wall are 2% higher than the odds of a person with $5k less income, and that the odds of a person who is more supportive of higher federal social security spending opposing the wall are 22% higher than the odds of a person who is less supporitive, ceteris paribus.
Predicted Probabilities: Recall the formula $p = exp(𝛽_0 + 𝛽_1x + 𝛽_2x)/(1+exp(𝛽_0 + 𝛽_1x + 𝛽_2x))$. As an example, the predicted probability for a person with “income” = 20 and “fedspend” = 0 opposing the wall would be: $p = exp(-0.86 + 0.2(20))/(1 + exp(-0.86 + 0.2(20))) = 0.03$, or 3%.
Odds Ratios: I’ll use the odds values to compare the odds of a person with “income” = 20 and “fedspend” = 0 opposing the wall with the odds of a person with “income” = 3 and “fedspend” = 2 opposing the wall. The odds ratio would be: $OR = exp((𝛽_{0,1} + 𝛽_{1,1}x + 𝛽_{2,1}x)-(𝛽_{0,0} + 𝛽_{1,0}x + 𝛽_{2,0}x)) = exp((-0.86+0.02(20))-(-0.86 + 0.02(3) + 0.2(2))) = exp(3.54) = 35$ , which is MASSIVE! The odds of opposing the wall are 35 TIMES HIGHER when “income” = 3 vs. 20 and when “fedspend” = 2 vs. 0. Concretely, it would seem that people with lower annual incomes and who support higher federal spending are more likely to oppose the wall than people with higher annual incomes and who do not.

Graph

Here, I’ll check whether the trend I noted above (people with lower annual incomes and who support higher federal spending are more likely to oppose the wall than people with higher annual incomes and who do not) checks out. In my new dataset, I’m looking at the probabilities of opposing the wall when “fedspend” is either 0 or 2, across all income levels (“income” is my chosen significant continuous independent variable).

# Creating dataset
prediction_data <- with(ANES_df, data.frame(income = 0:27, fedspend = rep(c(0,
    2), each = 28)))
prediction <- predict(logit2, prediction_data, type = "response",
    se.fit = TRUE)

# CI bounds
predicted_fit <- prediction$fit
lower <- prediction$fit - (1.96 * prediction$se.fit)
upper <- prediction$fit + (1.96 * prediction$se.fit)

And now: plot!

ggplot(prediction_data, aes(x = income, y = predicted_fit, color = factor(fedspend))) +
    geom_line() + geom_ribbon(aes(ymin = lower, ymax = upper),
    alpha = 0.2) + labs(title = "Pred. Prob. of Opposing Wall by Fedspend Across Income",
    x = "Income Level", y = "Pred. Prob. of Opposing Wall", color = "Fedspend") +
    theme_minimal()

From the plot, it is clear to see that overall, people who support higher levels of federal Social Security spending are more likely to oppose the wall. However, even for people who don’t support or minimally support higher levels of Social Security spending, the probability of opposing the wall increases with income. This supports the hypothesis set forth in X₃, that people who support an increase in federal Social Security spending would be more likely to oppose the wall, because they hold more economically and culturally liberal values. Another potential reasoning would be that people who are more financially stable are less fearful of “immigrants stealing their jobs”, a consistently-refuted yet ever-popular narrative.