Question 1
Imagine someone has a hypothesis that the stronger peoples’ belief
that people in America have a lot of opportunity to “get ahead,” the
less likely they are to believe that the government should try to reduce
income inequality.
V202317 is a post-election variable asking
respondents how much opportunity there is in America for the average
person to get ahead. V202426 is a post-election
variable asking respondents the degree to which they agree or disagree
that the government should take measures to reduce differences in income
levels.
- Run a bivariate regression to test this hypothesis.
- Report the coefficient estimate for the explanatory variable. What
does this mean in words?
The coefficient estimate is -0.5962631. This means that for every unit
the independent variable increases by 1, the dependent variable
decreases by -0.5962631. This demonstrates an inverse relationship
between the independent and dependent variables. So the more someone
feels America provides opportunity to get ahead, the less the same
individual will agree with the government taking measures to reduce
income differences.
- Report the 95% confidence interval for the explanatory variable’s
coefficient. What does this mean in words? The lower bound of the 95%
confidence interval is -0.5962631 and the upper bound is -0.5396456.
This means that 95% of responses with the data will be between this
interval, proving that there is statistical significance as the range is
quite minimal.
- Report the \(t\)-statistic for the
explanatory variable’s coefficient. Based on this test statistic, what
conclusion can we draw regarding this hypothesis? As the t-statistic is
-39.33, which demonstrates a high statistical significance that there is
a large deviance from the beta coefficient as the t-value shows that the
standard error is -39.33 units away from 0.
- Report the \(R^2\) statistic. What
does this mean in words? The r-squared value is 0.1741 or 17%. This
means that 17% of the variance in the dependent variable can be
attributed to a relationship with the independent variable.
- What is the fitted value and 95% confidence interval for the fitted
value if the explanatory variable takes on a value of 2? If X takes the
value of 2, the 95% confidence interval for the fitted value would upper
and lower bounds values of 3.26 and 3.33.
m1 <- lm(V202426 ~ V202317, data = ANES2020)
confint(m1)
## 2.5 % 97.5 %
## (Intercept) 4.3491164 4.5149157
## V202317 -0.5962631 -0.5396456
summary(m1)
##
## Call:
## lm(formula = V202426 ~ V202317, data = ANES2020)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8641 -0.8641 -0.1602 1.1359 3.4078
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.43202 0.04229 104.80 <2e-16 ***
## V202317 -0.56795 0.01444 -39.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.247 on 7340 degrees of freedom
## (938 observations deleted due to missingness)
## Multiple R-squared: 0.1741, Adjusted R-squared: 0.1739
## F-statistic: 1547 on 1 and 7340 DF, p-value: < 2.2e-16
model_preds <- predict(m1,newdata = data.frame("V202317" = 2), se.fit = TRUE)
model_preds$fit
## 1
## 3.296107
model_preds$fit + 1.96 * model_preds$se.fit
## 1
## 3.331651
model_preds$fit - 1.96 * model_preds$se.fit
## 1
## 3.260563
Question 2
Imagine that someone has a hypothesis that the more often people post
political context on social media, the more important they think “being
American” is to their identity.
V202504 is a post-election variable asking
respondents how important “being American” is to their identity.
V202545 is a post-election variable asking
respondents how often they post political context on Twitter.
- Run a bivariate regression to test this hypothesis.
- Report the coefficient estimate for the explanatory variable. What
does this mean in words?
The coefficient estimate is 0.210400834 This means that for every unit
the independent variable increases by 1, the dependent variable
decreases by 0.210400834. This demonstrates a positive relationship
between the independent and dependent variables. So the more someone
feels being American is important to their identity the more likely they
are to post political context on Twitter.
- Report the 95% confidence interval for the explanatory variable’s
coefficient. What does this mean in words? The lower bound of the 95%
confidence interval is 0.210400834 and the upper bound is 0.3070834.
This means that 95% of responses with the data will be between this
interval, proving that there is statistical significance as the range is
quite minimal.
- Report the \(t\)-statistic for the
explanatory variable’s coefficient. Based on this test statistic, what
conclusion can we draw regarding this hypothesis? As the t-statistic is
10.492, which demonstrates a high statistical significance that there is
a large deviance from the beta coefficient as the t-value shows that the
standard error is 10.492 units away from 0.
- Report the \(R^2\) statistic. What
does this mean in words?
The r-squared value is 0.01481 or 1%. This means that ~1% of the
variance in the dependent variable can be attributed to a relationship
with the independent variable. While there is high statistical
significance, the statistical significance is not that significant.
- What is the fitted value and 95% confidence interval for the fitted
value if the explanatory variable takes on a value of 3? If X takes the
value of 3, the 95% confidence interval for the fitted value would upper
and lower bounds values of 0.5893228 and 0.7116872.
m1 <- lm(V202545 ~ V202504, data = ANES2020)
confint(m1)
## 2.5 % 97.5 %
## (Intercept) 0.008376291 0.2576652
## V202504 0.210400834 0.3070834
summary(m1)
##
## Call:
## lm(formula = V202545 ~ V202504, data = ANES2020)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.427 -1.651 -1.392 3.349 4.608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.13302 0.06358 2.092 0.0365 *
## V202504 0.25874 0.02466 10.492 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.614 on 7325 degrees of freedom
## (953 observations deleted due to missingness)
## Multiple R-squared: 0.01481, Adjusted R-squared: 0.01467
## F-statistic: 110.1 on 1 and 7325 DF, p-value: < 2.2e-16
model_preds <- predict(m1,newdata = data.frame("V202504" = 2), se.fit = TRUE)
model_preds$fit
## 1
## 0.650505
model_preds$fit + 1.96 * model_preds$se.fit
## 1
## 0.7116872
model_preds$fit - 1.96 * model_preds$se.fit
## 1
## 0.5893228
Question 3
Imagine someone has a hypothesis that wealthier people are more
interested in politics.
V202406 is a post-election variable asking
respondents how interested they are in politics.
V202468x is a post-election variable asking
respondents about their family incomes.
- Run a bivariate regression to test this hypothesis.
- Report the coefficient estimate for the explanatory variable. What
does this mean in words?
he coefficient estimate is -0.01985 this means that for every unit the
independent variable increases by 1, the dependent variable decreases by
-0.01985. This demonstrates a positive relationship between the
independent and dependent variables. So based on an individuals family
income we can guess one’s interest in politics. The relationship is
inverse, so as someone has more family income, they will be more likely
to be more interested in politics.
- Report the 95% confidence interval for the explanatory variable’s
coefficient. What does this mean in words? The lower bound of the 95%
confidence interval is -0.01985265 and the upper bound is -0.01404132.
This means that 95% of responses with the data will be between this
interval, proving that there is statistical significance as the range is
quite minimal.
- Report the \(t\)-statistic for the
explanatory variable’s coefficient. Based on this test statistic, what
conclusion can we draw regarding this hypothesis? As the t-statistic is
-11.43, which demonstrates a high statistical significance that there is
a large deviance from the beta coefficient as the t-value shows that the
standard error is -11.43 units away from 0.
- Report the \(R^2\) statistic. What
does this mean in words?
The r-squared value is 0.01787 or 1.7%. This means that ~1.7% of the
variance in the dependent variable can be attributed to a relationship
with the independent variable. While there is high statistical
significance, the statistical significance is not that significant.
- What is the fitted value and 95% confidence interval for the fitted
value if the explanatory variable takes on a value of 4? If X takes the
value of 4, the 95% confidence interval for the fitted value would upper
and lower bounds values of 2.22491 and 2.293818.
- Calculate the predicted values of interest in politics for all
values of income (1-22) and create a marginal effects plot showing how
changes in income are related to interest for the full range of x
m1 <- lm(V202406 ~ V202468x, data = ANES2020)
confint(m1)
## 2.5 % 97.5 %
## (Intercept) 2.25387318 2.33264329
## V202468x -0.01985265 -0.01404132
summary(m1)
##
## Call:
## lm(formula = V202406 ~ V202468x, data = ANES2020)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.27631 -0.27631 -0.07295 0.72369 2.07958
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.293258 0.020091 114.14 <2e-16 ***
## V202468x -0.016947 0.001482 -11.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8446 on 7185 degrees of freedom
## (1093 observations deleted due to missingness)
## Multiple R-squared: 0.01787, Adjusted R-squared: 0.01773
## F-statistic: 130.7 on 1 and 7185 DF, p-value: < 2.2e-16
model_preds <- predict(m1,newdata = data.frame("V202468x" = 2), se.fit = TRUE)
model_preds$fit
## 1
## 2.259364
model_preds$fit + 1.96 * model_preds$se.fit
## 1
## 2.293818
model_preds$fit - 1.96 * model_preds$se.fit
## 1
## 2.22491
model_preds_full <- predict(m1, newdata = data.frame("V202468x" = seq(0,22,1)),
se.fit = TRUE)
plot(x = seq(0,22,1), y = model_preds_full$fit, type = "l", xlab = "wealth",
ylab = "plsiinterest")
lines(x = seq(0,22,1), y = model_preds_full$fit + model_preds_full$se.fit * 1.96, lty = 4)
lines(x = seq(0,22,1), y = model_preds_full$fit - model_preds_full$se.fit * 1.96, lty = 4)
