Question 1

Imagine someone has a hypothesis that people who preferred Donald Trump to win in the November 2020 election were more likely to believe that the votes cast in the election would not be counted accurately.

V201351 is a pre-election variable asking respondents how accurately the votes cast in the November 2020 election would be counted. V201029 is a pre-election variable asking respondents which presidential candidate they preferred to win the November 2020 election; I have created a new variable trump_vote that takes this variable and codes it as “1” if the respondent preferred Trump and “0” if they preferred someone else.

Run a bivariate regression to test this hypothesis.
Report the coefficient estimate for the explanatory variable. What does this mean in words?
The coefficient for the explanatory variable is -0.62516, this shows that people who voted for Trump were significantly less likely to believe the votes would be counted fairly.
Report the \(t\)-statistic for the explanatory variable’s coefficient. Based on this test statistic, what conclusion can we draw regarding this hypothesis?
The T-Statistic for the explanatory variable is -24.83. Based on this value we can conclude that there is a significant association between voting for Trump and beliving in the legitimacy of the vote count, strongly supporting the hypothesis that Trump supporters were substantially more likely to anticipate inaccuracies in the electoral process.

m1 <- lm(V201351 ~ trump_vote, data = ANES2020)
confint(m1)

##                  2.5 %     97.5 %
## (Intercept)  3.3370801  3.4006997
## trump_vote  -0.6745187 -0.5758013

summary(m1)

## 
## Call:
## lm(formula = V201351 ~ trump_vote, data = ANES2020)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3689 -0.3689  0.2563  0.6311  2.2563 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.36889    0.01623  207.61   <2e-16 ***
## trump_vote  -0.62516    0.02518  -24.83   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.074 on 7486 degrees of freedom
##   (792 observations deleted due to missingness)
## Multiple R-squared:  0.07608,    Adjusted R-squared:  0.07596 
## F-statistic: 616.4 on 1 and 7486 DF,  p-value: < 2.2e-16

Question 2

Is the analysis conducted in Question 1 causal? Why not? (It’s not causal!) The analysis conducted in Question 1 is correlational not causal. The relationship between the two variables does not demonstrate that liking Trump means you have little faith in the integrity of votes. It simply means that there is correlation between voting for Trump and faith in voting procedure, and that there may be other variables to account for.

What are some possible other variables that need to be taken into account when testing this hypothesis? Consider both variables we can measure and those it would be difficult to measure. Some possible other variables that we can consider are media consumption, likelyhood to believe conspiracy theories, and overall trust in the government.

Question 3

Select 3 other variables from the pre-election wave of the ANES that you think would need to be taken into account when assessing this hypothesis. Clean the variables you have chosen to account for any missing values and do the following:

Run a multivariate regression that adds these three variables to the regression you fitted in Question 1.
Report the coefficient estimate for trump_vote. Explain what this coefficient estimate means in words. How does this compare to the coefficient estimate you obtained in Question 1?
The new coefficient estimate is -0.230215. This value is less significant due to the inclusion of the other variables.
Report the \(t\)-statistic for trump_vote. How does this compare to the \(t\)-statistic you obtained in Question 1?
The T-Statistic is -7.917. This is less than the initial value but still shows that there is a statistical significance, and trump voters still have little faith in vote accuracy.
What is the adjusted \(R^{2}\) statistic for this model? How does this compare to the \(R^{2}\) statistic you obtained in Quesiton 1? The R-squared statistic is 0.1465. This means 14% of the variability in the results in attributed to the dependent variale, where as question one only had a 0.7% variability in the results in attributed to the dependent.
Is the coefficient estimate for trump_vote now causal? Why or why not? I think the coefficient is still correlational as the addition of the variables does not demonstrate that voting for Trump leads you to have less trust in voting procedure.

model_3d <- lm(V201351 ~ trump_vote + V202004 + V201377 + V201234, data = ANES2020)
               
summary(model_3d)

## 
## Call:
## lm(formula = V201351 ~ trump_vote + V202004 + V201377 + V201234, 
##     data = ANES2020)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9442 -0.5872  0.1895  0.6368  4.1489 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.502626   0.039372  63.564  < 2e-16 ***
## trump_vote  -0.230215   0.029077  -7.917 2.78e-15 ***
## V202004      0.007032   0.005249   1.340    0.180    
## V201377      0.279933   0.011339  24.688  < 2e-16 ***
## V201234      0.013904   0.011870   1.171    0.241    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.032 on 7483 degrees of freedom
##   (792 observations deleted due to missingness)
## Multiple R-squared:  0.1465, Adjusted R-squared:  0.146 
## F-statistic:   321 on 4 and 7483 DF,  p-value: < 2.2e-16

Question 4

Now we want to formally assess whether the “full” model you estimated in Question 3 better explains the variance in respondents’ belief that the votes in the November 2020 election would be counted accurately. We will do this with an \(F\)-test.

NOTE: To conduct an \(F\)-test, both models need to have the same number of observations. Your models will inevitably have different \(N\)s because not all respondents answered all questions, so some will have been dropped from your model in Quesiton 3. This means you will need to re-estimate your model from Question 1 to include only observations in your Question 3 model. I have some helper code below to get you started; if you have issues ask me for help. You will need to change eval to TRUE when you are ready to run your code.

model1a <- lm(V201351 ~ trump_vote, data = ANES2020[which(!is.na(ANES2020$V202004) &
                                                       !is.na(ANES2020$V201377) &
                                                       !is.na(ANES2020$V201234)),])
anova(m1, model1a)

## Analysis of Variance Table
## 
## Model 1: V201351 ~ trump_vote
## Model 2: V201351 ~ trump_vote
##   Res.Df  RSS Df Sum of Sq F Pr(>F)
## 1   7486 8630                      
## 2   7486 8630  0         0

Question 5

Now imagine that the person who originally posited the hypothesis wants to revise the hypothesis such that they think that people who preferred Donald Trump to win in the November 2020 election were more likely to believe that the votes cast in the election would not be counted accurately as their level of conservatism increases. This hypothesis is asking for an interactive effect between preference for Donald Trump and conservatism.

Estimate the interactive model needed to test this hypothesis.
What are the coefficient estimates and \(t\)-statistics on the terms involved in this interaction? Explain the following relationships between ideology, preference for Trump, and belief that the vote will be accurately counted.
The T-value for trump and ideology variables is -4.702, which means that ideology plays a larger role in the decision making of Trump voters compared to non-Trump voters.
- Among people who preferred Trump, as their ideology becomes more conservative, what is the interactive effect on belief that the vote will be accurately counted? In explaining this, you might want to calculate the fitted values for a few varying levels of ideology. As an individual becomes more conservative, their trust in the accuracy of votes decreases. This is demonstrated in the fitted values as the least conservative Trump voters have a 3.123077 fitted value, and the most conservative have a 2.601792 fitted value.
- For individuals with high conservative values for ideology, if they were to switch from preferring Trump to preferring someone else, what is the interactive effect on belief that the vote will be accurately counted? In explaining this, you might want to calculate the fitted values when varying preference for Trump given some fixed ideology value. When we isolate ideology apart from candidates, the most liberal and conservative individuals have similar feelings about voting accuracy. With the most liberal having a value of 3.469350 and the most conservative being 3.354942.

m3 <- lm(V201351 ~ trump_vote * ideo, data = ANES2020)
summary(m3)

## 
## Call:
## lm(formula = V201351 ~ trump_vote * ideo, data = ANES2020)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4693 -0.4503  0.3113  0.6260  2.3982 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      3.33587    0.04547  73.364  < 2e-16 ***
## trump_vote      -0.12592    0.11009  -1.144    0.253    
## ideo             0.01907    0.01374   1.387    0.165    
## trump_vote:ideo -0.10595    0.02253  -4.702 2.63e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.055 on 6556 degrees of freedom
##   (1720 observations deleted due to missingness)
## Multiple R-squared:  0.09079,    Adjusted R-squared:  0.09037 
## F-statistic: 218.2 on 3 and 6556 DF,  p-value: < 2.2e-16

model_preds <- predict(m3,newdata = data.frame("trump_vote" = 0, "ideo" = seq(1,7,1)), se.fit = F)
model_preds2 <- predict(m3,newdata = data.frame("trump_vote" = 1, "ideo" = seq(1,7,1)), se.fit = F)
model_preds

##        1        2        3        4        5        6        7 
## 3.354942 3.374010 3.393078 3.412146 3.431214 3.450282 3.469350

model_preds2

##        1        2        3        4        5        6        7 
## 3.123077 3.036196 2.949315 2.862434 2.775554 2.688673 2.601792

Multivariate OLS Lab

Omar Ratrut

4/22/25

Question 1

Question 2

Question 3

Question 4

Question 5