Imagine someone has a hypothesis that people who preferred Donald Trump to win in the November 2020 election were more likely to believe that the votes cast in the election would not be counted accurately.
V201351 is a pre-election variable asking
respondents how accurately the votes cast in the November 2020 election
would be counted. V201029 is a pre-election
variable asking respondents which presidential candidate they preferred
to win the November 2020 election; I have created a new variable
trump_vote that takes this variable and codes it
as “1” if the respondent preferred Trump and “0” if they preferred
someone else.
m1 <- lm(V201351 ~ trump_vote, data = ANES2020)
confint(m1)
## 2.5 % 97.5 %
## (Intercept) 3.3370801 3.4006997
## trump_vote -0.6745187 -0.5758013
summary(m1)
##
## Call:
## lm(formula = V201351 ~ trump_vote, data = ANES2020)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3689 -0.3689 0.2563 0.6311 2.2563
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.36889 0.01623 207.61 <2e-16 ***
## trump_vote -0.62516 0.02518 -24.83 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.074 on 7486 degrees of freedom
## (792 observations deleted due to missingness)
## Multiple R-squared: 0.07608, Adjusted R-squared: 0.07596
## F-statistic: 616.4 on 1 and 7486 DF, p-value: < 2.2e-16
Is the analysis conducted in Question 1 causal? Why not? (It’s not causal!) The analysis conducted in Question 1 is correlational not causal. The relationship between the two variables does not demonstrate that liking Trump means you have little faith in the integrity of votes. It simply means that there is correlation between voting for Trump and faith in voting procedure, and that there may be other variables to account for.
What are some possible other variables that need to be taken into account when testing this hypothesis? Consider both variables we can measure and those it would be difficult to measure. Some possible other variables that we can consider are media consumption, likelyhood to believe conspiracy theories, and overall trust in the government.
Select 3 other variables from the pre-election wave of the ANES that you think would need to be taken into account when assessing this hypothesis. Clean the variables you have chosen to account for any missing values and do the following:
trump_vote. Explain what this coefficient estimate
means in words. How does this compare to the coefficient estimate you
obtained in Question 1?trump_vote. How does this compare to the \(t\)-statistic you obtained in Question
1?trump_vote now
causal? Why or why not? I think the coefficient is still correlational
as the addition of the variables does not demonstrate that voting for
Trump leads you to have less trust in voting procedure.model_3d <- lm(V201351 ~ trump_vote + V202004 + V201377 + V201234, data = ANES2020)
summary(model_3d)
##
## Call:
## lm(formula = V201351 ~ trump_vote + V202004 + V201377 + V201234,
## data = ANES2020)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9442 -0.5872 0.1895 0.6368 4.1489
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.502626 0.039372 63.564 < 2e-16 ***
## trump_vote -0.230215 0.029077 -7.917 2.78e-15 ***
## V202004 0.007032 0.005249 1.340 0.180
## V201377 0.279933 0.011339 24.688 < 2e-16 ***
## V201234 0.013904 0.011870 1.171 0.241
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.032 on 7483 degrees of freedom
## (792 observations deleted due to missingness)
## Multiple R-squared: 0.1465, Adjusted R-squared: 0.146
## F-statistic: 321 on 4 and 7483 DF, p-value: < 2.2e-16
Now we want to formally assess whether the “full” model you estimated in Question 3 better explains the variance in respondents’ belief that the votes in the November 2020 election would be counted accurately. We will do this with an \(F\)-test.
NOTE: To conduct an \(F\)-test, both
models need to have the same number of observations. Your models will
inevitably have different \(N\)s
because not all respondents answered all questions, so some will have
been dropped from your model in Quesiton 3. This means you will need to
re-estimate your model from Question 1 to include only observations in
your Question 3 model. I have some helper code below to get you started;
if you have issues ask me for help. You will need to change
eval to TRUE when you are
ready to run your code.
model1a <- lm(V201351 ~ trump_vote, data = ANES2020[which(!is.na(ANES2020$V202004) &
!is.na(ANES2020$V201377) &
!is.na(ANES2020$V201234)),])
anova(m1, model1a)
## Analysis of Variance Table
##
## Model 1: V201351 ~ trump_vote
## Model 2: V201351 ~ trump_vote
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 7486 8630
## 2 7486 8630 0 0
Now imagine that the person who originally posited the hypothesis wants to revise the hypothesis such that they think that people who preferred Donald Trump to win in the November 2020 election were more likely to believe that the votes cast in the election would not be counted accurately as their level of conservatism increases. This hypothesis is asking for an interactive effect between preference for Donald Trump and conservatism.
m3 <- lm(V201351 ~ trump_vote * ideo, data = ANES2020)
summary(m3)
##
## Call:
## lm(formula = V201351 ~ trump_vote * ideo, data = ANES2020)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4693 -0.4503 0.3113 0.6260 2.3982
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.33587 0.04547 73.364 < 2e-16 ***
## trump_vote -0.12592 0.11009 -1.144 0.253
## ideo 0.01907 0.01374 1.387 0.165
## trump_vote:ideo -0.10595 0.02253 -4.702 2.63e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.055 on 6556 degrees of freedom
## (1720 observations deleted due to missingness)
## Multiple R-squared: 0.09079, Adjusted R-squared: 0.09037
## F-statistic: 218.2 on 3 and 6556 DF, p-value: < 2.2e-16
model_preds <- predict(m3,newdata = data.frame("trump_vote" = 0, "ideo" = seq(1,7,1)), se.fit = F)
model_preds2 <- predict(m3,newdata = data.frame("trump_vote" = 1, "ideo" = seq(1,7,1)), se.fit = F)
model_preds
## 1 2 3 4 5 6 7
## 3.354942 3.374010 3.393078 3.412146 3.431214 3.450282 3.469350
model_preds2
## 1 2 3 4 5 6 7
## 3.123077 3.036196 2.949315 2.862434 2.775554 2.688673 2.601792