Problem Set 03 - Predicting 2016 ANES Survey Results

Due: Mar 09

In this exercise, we will predict the presidential vote in the 2016 election using a liberal-conservative scale. We will also analyze some patterns in the 2016 American National Elections Survey (ANES). ANES is among the most comprehensive electoral surveys conducted in the US. It is conducted both offline and online, and pre and post-election.

The original dataset has 1842 variables. I selected a few here for us to study:

Variable Meaning
int_vote_trump Intend to vote for Trump in the 2016 election
voted_trump Voted for Trump in the 2016 election
int_vote_clinton Intend to vote for Clinton in the 2016 election
voted_clinton Voted for Clinton in the 2016 election
lib_conserv_scale Liberal-Conservative scale
white_voter Respondent declared herself as White
latinx_voter Respondent declared herself as Latinx
swing_voter Intended one vote but voted different
swing_trump Did not intend to vote for Trump but did vote for him.
swing_hillary Did not intend to vote for Clinton but did vote for her.
region Country region
age Age in years
religion_important_life Binary indicator for the belief that religion is important for life.

As always, we start by looking at the data:

head(anes)
##   int_vote_trump voted_trump int_vote_clinton voted_clinton lib_conserv_scale
## 1              1           1                0             0         0.5215019
## 2              0           0                0             0        -0.1032504
## 3              0           1                0             0         1.1462541
## 4              1           1                0             0         0.5215019
## 5              0           0                0             0        -0.1032504
## 6              0           0                0             0        -0.7280026
##   white_voter latinx_voter swing_voter swing_trump swing_hillary    region age
## 1           1            0           0           0             0     South  26
## 2           1            0           0           0             0   Midwest  38
## 3           0            0           1           1             0 Northeast  60
## 4           1            0           0           0             0 Northeast  56
## 5           1            0           0           0             0     South  45
## 6           1            0           0           0             0     South  30
##   religion_important_life
## 1                       0
## 2                       1
## 3                       1
## 4                       1
## 5                       0
## 6                       1

1. In this dataset, what does each observation represent? (1 point)

Answer: In the given data I am able to see if people voted (Yes/no) Which explains the 1 and 0. In this chart I am able to see the different presidential candidates and as well the reflection of the voters signifying how liberal/conservative the voters are.


2. Suppose we want to predict the vote for Trump using the liberal-conservative scale. What should be our X variable? In other words, which variable will we use as the predictor? (1 point)

I want to predict the vote for Trump using the liberal-conservative scale. The X-Variable should be liberal-conservative.


3. Suppose we want to predict the vote for Trump using the liberal-conservative scale. What should be our Y variable? In other words, which variable will we use as the outcome variable? (1 point)

I want to predict the vote for Trump using the liberal-conservative scale. The Y-Variable should be the vote for Trump.


4. Suppose we want to predict the vote for Trump using the liberal-conservative scale. Create a scatter plot of the relationship between the two variables. (1 point)

(Hint: The default scatter plot will not work for this data because both variables have discrete variation: Vote for Trump is binary, and the lib-con scale has seven categories. To plot this, you should use some jitter. The parameters for jitter are height for y and width for x.)

R code:ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) + geom_jitter(alpha=0.5, height=0.4, width=0.2)

ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) +
  geom_jitter(alpha=0.5, height=0.4, width=0.2)


5. Suppose we want to predict the vote for Trump using the liberal-conservative scale. Use the function lm() to fit a linear model to the data. (1 point)

R code:ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) + geom_jitter(alpha=0.5, height=0.4, width=0.2)+ geom_point(fill = ‘lightblue’, alpha = 0.6) + labs(title = ’‘, y = ’predict the vote for Trump’, x = ‘liberal-conservative scale’) + geom_smooth(formula = ‘y ~ x’, method = ‘lm’, se = F, color = ‘blue’, lwd = 1) + theme_minimal()

ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) +
  geom_jitter(alpha=0.5, height=0.4, width=0.2)+ geom_point(fill = 'lightblue', alpha = 0.6) +
labs(title = '', y = 'predict the vote for Trump', x = 'liberal-conservative scale') +
geom_smooth(formula = 'y ~ x', method = 'lm',
se = F, color = 'blue', lwd = 1) + theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

 lm(data=anes, voted_trump~lib_conserv_scale)
## 
## Call:
## lm(formula = voted_trump ~ lib_conserv_scale, data = anes)
## 
## Coefficients:
##       (Intercept)  lib_conserv_scale  
##            0.3031             0.2297

6. In the model fitted in Question 05, what is the fitted line? In other words, provide the formula \(\widehat{Y} = \widehat{\beta}_0 + \widehat{\beta}_1 X\) where you specify each term. (1 point)

(I.e., substitute \(Y\) for the name of the outcome variable, substitute \(\widehat{\beta}_0\) for the estimated value of the intercept coefficient, substitute \(\widehat{\beta}_1\) for the estimated value of the slope coefficient, and substitute \(X\) for the name of the predictor.)

ANSWER\(\widehat{\text{vote for trump}} = \hat{\beta}_0+\hat{\beta}_1\text{scale}\) \(Y\)= Predicted the vote for Donald Trump according to the input. \(\widehat{\beta}_0\) predicted vote for Donald Trump when the liberal-conservative scale is 0. \(\widehat{\beta}_0\) predicted vote for Donald Trump for every one unit in liberal conservative scale. \(Y\) is the value of the liberal-conservative scale.


7. Assume we use the model fitted in Question 05. Now, use the fitted line to make some predictions.

  1. Computing \(\widehat{Y}\) based on \(X\): Suppose that one scores at \(-1\) in the liberal-conservative scale. What is the predicted chance that this person will vote for Trump? Please show your calculations and then answer the question with a full sentence (including units of measurement). (0.5 points)

Calculations: I put the following: 0.3031 + 0.2297(-1) = 0.0734 \(\widehat{\text{voted_trump}}=0.3031\text{(lib_conserv_scale)}\)

Answer: There is always a chance when it comes to statistics but there is never a 0% chance, but yes there should be a 7% chance. My equation looks like I put the following: 0.3031 + 0.2297(-1) = 0.0734. Which in total will be &% chance of predicted chance that this person will vote for Trump. 7.24 P.P

  1. Computing \(\triangle \widehat{Y}\) based on \(\triangle X\): Suppose that a liberal person that initially scored as a -1 in the liberal-conservative scale watches only conservative news outlets for one year straight. She then revises some of her ideas and now scores 1 on the liberal-conservative scale. What would it be our best guess of how much her predicted chance for voting for Trump due to this change in the liberal-conservative scale? Please show your calculations and then answer the question with a complete sentence (including units of measurement). (0.5 points) \(\triangle \widehat{Y}\)

Calculations: \(\widehat{\text{voted_trump}}=0.3031\text{(lib_conserv_scale)}\)+\(\triangle \widehat{Y}\) \(-1\) 2*0.02297= 0.4594 0.2297 + 0.3031 = 0.5328 The predicted chance that this person will now vote for Trump is 53.28%.

Answer: 0.5328


8. Assume we are still using the model fitted in Question 05. What is the \(R^2\) of that model? And how would you interpret it? (1 point)

(Hint: the function cor() might be helpful here.)

**R code*cor(anes\(lib_conserv_scale,anes\)voted_trump)^2

cor(anes$lib_conserv_scale,anes$voted_trump)^2
## [1] 0.249724

Answer:In the model fitted line we can see that there is a positive correlation.The \(R^2\) is 0.249724 approximately 24%. The most common interpretation of r-squared is how well the regression model explains observed data. For example, an r-squared of 60% reveals that 60% of the variability observed in the target variable is explained by the regression model.


9. In Question 05, we fit a model that uses the liberal-conservative scale to predict the intentions to vote for Trump. Now, suppose that age can affect both the chance of voting for Trump and the liberal-conservative scale. Fit a new prediction model that controls for age. Is it true that older people tended to vote for Trump? Does your liberal-conservative scale do better or worse in this new model? How do you interpret the liberal-conservative scores after controlling for age? Explain. (1 point)

R code:lm(voted_trump~ lib_conserv_scale + age , data = anes)

lm(voted_trump~ lib_conserv_scale + age , data = anes)
## 
## Call:
## lm(formula = voted_trump ~ lib_conserv_scale + age, data = anes)
## 
## Coefficients:
##       (Intercept)  lib_conserv_scale                age  
##          0.198557           0.223668           0.002079

Answer: In the prediction model the results show that age has a 0.2% affect in the voting of trump. My liberal conservative scale does better by 22% chance. The control for age of the variation of liberal concervative like if you were already to that political party then you have a 22% of voting for Trump. Then on the other hard age has a .2% correlation ofvoting for Trump. Call: lm(formula = voted_trump ~ lib_conserv_scale + age, data = anes)

Coefficients: (Intercept) lib_conserv_scale age
0.198557 0.223668 0.002079


10. In Question 09, we fit a model that uses the liberal-conservative scale and age to predict the intentions to vote for Trump. Many analysts said that, besides the person’s age, Latinx voters tended to support Trump in the 2016 election. This could confound the results: a relatively liberal young Latinx could vote for Trump because she could associate the liberals with radical policies. Is this true? Explain using a new regression model with age, the liberal-conservative scale, and the Latinx binary indicator. Moreover, how do you interpret the liberal-conservative scores after controlling for age and Latinx? (1 point)

R code:lm(voted_trump~ lib_conserv_scale + age + latinx_voter, data=anes)

lm(voted_trump~ lib_conserv_scale + age + latinx_voter, data=anes)
## 
## Call:
## lm(formula = voted_trump ~ lib_conserv_scale + age + latinx_voter, 
##     data = anes)
## 
## Coefficients:
##       (Intercept)  lib_conserv_scale                age       latinx_voter  
##          0.222532           0.222963           0.001836          -0.127574

Answer: Latinx Voters have a -12% chance of voting for Trump.So a liberal young Latinx could vote for Trump is false.The control for age for as well has a 0.1% of affecting the chances of voting for Trump. Call: lm(formula = voted_trump ~ lib_conserv_scale + age + latinx_voter, data = anes)

Coefficients: (Intercept) lib_conserv_scale age latinx_voter
0.222532 0.222963 0.001836 -0.127574