Question 1

Imagine someone has a hypothesis that the stronger peoples’ belief that people in America have a lot of opportunity to “get ahead,” the less likely they are to believe that the government should try to reduce income inequality.

V202317 is a post-election variable asking respondents how much opportunity there is in America for the average person to get ahead. V202426 is a post-election variable asking respondents the degree to which they agree or disagree that the government should take measures to reduce differences in income levels.

Run a bivariate regression to test this hypothesis.
Report the coefficient estimate for the explanatory variable. What does this mean in words?
The coefficient estimate is -0.5962631. This means that for every unit the independent variable increases by 1, the dependent variable decreases by -0.5962631. This demonstrates an inverse relationship between the independent and dependent variables. So the more someone feels America provides opportunity to get ahead, the less the same individual will agree with the government taking measures to reduce income differences.
Report the 95% confidence interval for the explanatory variable’s coefficient. What does this mean in words? The lower bound of the 95% confidence interval is -0.5962631 and the upper bound is -0.5396456. This means that 95% of responses with the data will be between this interval, proving that there is statistical significance as the range is quite minimal.
Report the \(t\)-statistic for the explanatory variable’s coefficient. Based on this test statistic, what conclusion can we draw regarding this hypothesis? As the t-statistic is -39.33, which demonstrates a high statistical significance that there is a large deviance from the beta coefficient as the t-value shows that the standard error is -39.33 units away from 0.
Report the \(R^2\) statistic. What does this mean in words? The r-squared value is 0.1741 or 17%. This means that 17% of the variance in the dependent variable can be attributed to a relationship with the independent variable.
What is the fitted value and 95% confidence interval for the fitted value if the explanatory variable takes on a value of 2? If X takes the value of 2, the 95% confidence interval for the fitted value would upper and lower bounds values of 3.26 and 3.33.

m1 <- lm(V202426 ~ V202317, data = ANES2020)
confint(m1)

##                  2.5 %     97.5 %
## (Intercept)  4.3491164  4.5149157
## V202317     -0.5962631 -0.5396456

summary(m1)

## 
## Call:
## lm(formula = V202426 ~ V202317, data = ANES2020)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8641 -0.8641 -0.1602  1.1359  3.4078 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.43202    0.04229  104.80   <2e-16 ***
## V202317     -0.56795    0.01444  -39.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.247 on 7340 degrees of freedom
##   (938 observations deleted due to missingness)
## Multiple R-squared:  0.1741, Adjusted R-squared:  0.1739 
## F-statistic:  1547 on 1 and 7340 DF,  p-value: < 2.2e-16

model_preds <- predict(m1,newdata = data.frame("V202317" = 2), se.fit = TRUE)
model_preds$fit

##        1 
## 3.296107

model_preds$fit + 1.96 * model_preds$se.fit

##        1 
## 3.331651

model_preds$fit - 1.96 * model_preds$se.fit

##        1 
## 3.260563

Question 2

Imagine that someone has a hypothesis that the more often people post political context on social media, the more important they think “being American” is to their identity.

V202504 is a post-election variable asking respondents how important “being American” is to their identity. V202545 is a post-election variable asking respondents how often they post political context on Twitter.

Run a bivariate regression to test this hypothesis.
Report the coefficient estimate for the explanatory variable. What does this mean in words?
The coefficient estimate is 0.210400834 This means that for every unit the independent variable increases by 1, the dependent variable decreases by 0.210400834. This demonstrates a positive relationship between the independent and dependent variables. So the more someone feels being American is important to their identity the more likely they are to post political context on Twitter.
Report the 95% confidence interval for the explanatory variable’s coefficient. What does this mean in words? The lower bound of the 95% confidence interval is 0.210400834 and the upper bound is 0.3070834. This means that 95% of responses with the data will be between this interval, proving that there is statistical significance as the range is quite minimal.
Report the \(t\)-statistic for the explanatory variable’s coefficient. Based on this test statistic, what conclusion can we draw regarding this hypothesis? As the t-statistic is 10.492, which demonstrates a high statistical significance that there is a large deviance from the beta coefficient as the t-value shows that the standard error is 10.492 units away from 0.
Report the \(R^2\) statistic. What does this mean in words?
The r-squared value is 0.01481 or 1%. This means that ~1% of the variance in the dependent variable can be attributed to a relationship with the independent variable. While there is high statistical significance, the statistical significance is not that significant.
What is the fitted value and 95% confidence interval for the fitted value if the explanatory variable takes on a value of 3? If X takes the value of 3, the 95% confidence interval for the fitted value would upper and lower bounds values of 0.5893228 and 0.7116872.

m1 <- lm(V202545 ~ V202504, data = ANES2020)
confint(m1)

##                   2.5 %    97.5 %
## (Intercept) 0.008376291 0.2576652
## V202504     0.210400834 0.3070834

summary(m1)

## 
## Call:
## lm(formula = V202545 ~ V202504, data = ANES2020)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.427  -1.651  -1.392   3.349   4.608 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.13302    0.06358   2.092   0.0365 *  
## V202504      0.25874    0.02466  10.492   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.614 on 7325 degrees of freedom
##   (953 observations deleted due to missingness)
## Multiple R-squared:  0.01481,    Adjusted R-squared:  0.01467 
## F-statistic: 110.1 on 1 and 7325 DF,  p-value: < 2.2e-16

model_preds <- predict(m1,newdata = data.frame("V202504" = 2), se.fit = TRUE)
model_preds$fit

##        1 
## 0.650505

model_preds$fit + 1.96 * model_preds$se.fit

##         1 
## 0.7116872

model_preds$fit - 1.96 * model_preds$se.fit

##         1 
## 0.5893228

Question 3

Imagine someone has a hypothesis that wealthier people are more interested in politics.

V202406 is a post-election variable asking respondents how interested they are in politics. V202468x is a post-election variable asking respondents about their family incomes.

Run a bivariate regression to test this hypothesis.
Report the coefficient estimate for the explanatory variable. What does this mean in words?
he coefficient estimate is -0.01985 this means that for every unit the independent variable increases by 1, the dependent variable decreases by -0.01985. This demonstrates a positive relationship between the independent and dependent variables. So based on an individuals family income we can guess one’s interest in politics. The relationship is inverse, so as someone has more family income, they will be more likely to be more interested in politics.
Report the 95% confidence interval for the explanatory variable’s coefficient. What does this mean in words? The lower bound of the 95% confidence interval is -0.01985265 and the upper bound is -0.01404132. This means that 95% of responses with the data will be between this interval, proving that there is statistical significance as the range is quite minimal.
Report the \(t\)-statistic for the explanatory variable’s coefficient. Based on this test statistic, what conclusion can we draw regarding this hypothesis? As the t-statistic is -11.43, which demonstrates a high statistical significance that there is a large deviance from the beta coefficient as the t-value shows that the standard error is -11.43 units away from 0.
Report the \(R^2\) statistic. What does this mean in words?
The r-squared value is 0.01787 or 1.7%. This means that ~1.7% of the variance in the dependent variable can be attributed to a relationship with the independent variable. While there is high statistical significance, the statistical significance is not that significant.
What is the fitted value and 95% confidence interval for the fitted value if the explanatory variable takes on a value of 4? If X takes the value of 4, the 95% confidence interval for the fitted value would upper and lower bounds values of 2.22491 and 2.293818.
Calculate the predicted values of interest in politics for all values of income (1-22) and create a marginal effects plot showing how changes in income are related to interest for the full range of x

m1 <- lm(V202406 ~ V202468x, data = ANES2020)
confint(m1)

##                   2.5 %      97.5 %
## (Intercept)  2.25387318  2.33264329
## V202468x    -0.01985265 -0.01404132

summary(m1)

## 
## Call:
## lm(formula = V202406 ~ V202468x, data = ANES2020)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.27631 -0.27631 -0.07295  0.72369  2.07958 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.293258   0.020091  114.14   <2e-16 ***
## V202468x    -0.016947   0.001482  -11.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8446 on 7185 degrees of freedom
##   (1093 observations deleted due to missingness)
## Multiple R-squared:  0.01787,    Adjusted R-squared:  0.01773 
## F-statistic: 130.7 on 1 and 7185 DF,  p-value: < 2.2e-16

model_preds <- predict(m1,newdata = data.frame("V202468x" = 2), se.fit = TRUE)
model_preds$fit

##        1 
## 2.259364

model_preds$fit + 1.96 * model_preds$se.fit

##        1 
## 2.293818

model_preds$fit - 1.96 * model_preds$se.fit

##       1 
## 2.22491

model_preds_full <- predict(m1, newdata = data.frame("V202468x" = seq(0,22,1)),
se.fit = TRUE)

plot(x = seq(0,22,1), y = model_preds_full$fit, type = "l", xlab = "wealth",
ylab = "plsiinterest")
lines(x = seq(0,22,1), y = model_preds_full$fit + model_preds_full$se.fit * 1.96, lty = 4)
lines(x = seq(0,22,1), y = model_preds_full$fit - model_preds_full$se.fit * 1.96, lty = 4)

OLS Hypothesis Testing and Model Fit

Omar Ratrut

Question 1

Question 2

Question 3