Question 1

Imagine someone has a hypothesis that people who preferred Donald Trump to win in the November 2020 election were more likely to believe that the votes cast in the election would not be counted accurately.

V201351 is a pre-election variable asking respondents how accurately the votes cast in the November 2020 election would be counted. V201029 is a pre-election variable asking respondents which presidential candidate they preferred to win the November 2020 election; I have created a new variable trump_vote that takes this variable and codes it as “1” if the respondent preferred Trump and “0” if they preferred someone else.

m1 <- lm(V201351 ~ trump_vote, data = ANES2020)
confint(m1)
##                  2.5 %     97.5 %
## (Intercept)  3.3370801  3.4006997
## trump_vote  -0.6745187 -0.5758013
summary(m1)
## 
## Call:
## lm(formula = V201351 ~ trump_vote, data = ANES2020)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3689 -0.3689  0.2563  0.6311  2.2563 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.36889    0.01623  207.61   <2e-16 ***
## trump_vote  -0.62516    0.02518  -24.83   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.074 on 7486 degrees of freedom
##   (792 observations deleted due to missingness)
## Multiple R-squared:  0.07608,    Adjusted R-squared:  0.07596 
## F-statistic: 616.4 on 1 and 7486 DF,  p-value: < 2.2e-16

Question 2

Is the analysis conducted in Question 1 causal? Why not? (It’s not causal!) The analysis conducted in Question 1 is correlational not causal. The relationship between the two variables does not demonstrate that liking Trump means you have little faith in the integrity of votes. It simply means that there is correlation between voting for Trump and faith in voting procedure, and that there may be other variables to account for.

 

What are some possible other variables that need to be taken into account when testing this hypothesis? Consider both variables we can measure and those it would be difficult to measure. Some possible other variables that we can consider are media consumption, likelyhood to believe conspiracy theories, and overall trust in the government.

Question 3

Select 3 other variables from the pre-election wave of the ANES that you think would need to be taken into account when assessing this hypothesis. Clean the variables you have chosen to account for any missing values and do the following:

model_3d <- lm(V201351 ~ trump_vote + V202004 + V201377 + V201234, data = ANES2020)
               
summary(model_3d)
## 
## Call:
## lm(formula = V201351 ~ trump_vote + V202004 + V201377 + V201234, 
##     data = ANES2020)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9442 -0.5872  0.1895  0.6368  4.1489 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.502626   0.039372  63.564  < 2e-16 ***
## trump_vote  -0.230215   0.029077  -7.917 2.78e-15 ***
## V202004      0.007032   0.005249   1.340    0.180    
## V201377      0.279933   0.011339  24.688  < 2e-16 ***
## V201234      0.013904   0.011870   1.171    0.241    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.032 on 7483 degrees of freedom
##   (792 observations deleted due to missingness)
## Multiple R-squared:  0.1465, Adjusted R-squared:  0.146 
## F-statistic:   321 on 4 and 7483 DF,  p-value: < 2.2e-16

Question 4

Now we want to formally assess whether the “full” model you estimated in Question 3 better explains the variance in respondents’ belief that the votes in the November 2020 election would be counted accurately. We will do this with an \(F\)-test.

NOTE: To conduct an \(F\)-test, both models need to have the same number of observations. Your models will inevitably have different \(N\)s because not all respondents answered all questions, so some will have been dropped from your model in Quesiton 3. This means you will need to re-estimate your model from Question 1 to include only observations in your Question 3 model. I have some helper code below to get you started; if you have issues ask me for help. You will need to change eval to TRUE when you are ready to run your code.

model1a <- lm(V201351 ~ trump_vote, data = ANES2020[which(!is.na(ANES2020$V202004) &
                                                       !is.na(ANES2020$V201377) &
                                                       !is.na(ANES2020$V201234)),])
anova(m1, model1a)
## Analysis of Variance Table
## 
## Model 1: V201351 ~ trump_vote
## Model 2: V201351 ~ trump_vote
##   Res.Df  RSS Df Sum of Sq F Pr(>F)
## 1   7486 8630                      
## 2   7486 8630  0         0

Question 5

Now imagine that the person who originally posited the hypothesis wants to revise the hypothesis such that they think that people who preferred Donald Trump to win in the November 2020 election were more likely to believe that the votes cast in the election would not be counted accurately as their level of conservatism increases. This hypothesis is asking for an interactive effect between preference for Donald Trump and conservatism.

m3 <- lm(V201351 ~ trump_vote * ideo, data = ANES2020)
summary(m3)
## 
## Call:
## lm(formula = V201351 ~ trump_vote * ideo, data = ANES2020)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4693 -0.4503  0.3113  0.6260  2.3982 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      3.33587    0.04547  73.364  < 2e-16 ***
## trump_vote      -0.12592    0.11009  -1.144    0.253    
## ideo             0.01907    0.01374   1.387    0.165    
## trump_vote:ideo -0.10595    0.02253  -4.702 2.63e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.055 on 6556 degrees of freedom
##   (1720 observations deleted due to missingness)
## Multiple R-squared:  0.09079,    Adjusted R-squared:  0.09037 
## F-statistic: 218.2 on 3 and 6556 DF,  p-value: < 2.2e-16
model_preds <- predict(m3,newdata = data.frame("trump_vote" = 0, "ideo" = seq(1,7,1)), se.fit = F)
model_preds2 <- predict(m3,newdata = data.frame("trump_vote" = 1, "ideo" = seq(1,7,1)), se.fit = F)
model_preds
##        1        2        3        4        5        6        7 
## 3.354942 3.374010 3.393078 3.412146 3.431214 3.450282 3.469350
model_preds2
##        1        2        3        4        5        6        7 
## 3.123077 3.036196 2.949315 2.862434 2.775554 2.688673 2.601792