Problem Set 03 - Predicting 2016 ANES Survey Results

Due: Mar 09

In this exercise, we will predict the presidential vote in the 2016 election using a liberal-conservative scale. We will also analyze some patterns in the 2016 American National Elections Survey (ANES). ANES is among the most comprehensive electoral surveys conducted in the US. It is conducted both offline and online, and pre and post-election.

The original dataset has 1842 variables. I selected a few here for us to study:

Variable Meaning
int_vote_trump Intend to vote for Trump in the 2016 election
voted_trump Voted for Trump in the 2016 election
int_vote_clinton Intend to vote for Clinton in the 2016 election
voted_clinton Voted for Clinton in the 2016 election
lib_conserv_scale Liberal-Conservative scale
white_voter Respondent declared herself as White
latinx_voter Respondent declared herself as Latinx
swing_voter Intended one vote but voted different
swing_trump Did not intend to vote for Trump but did vote for him.
swing_hillary Did not intend to vote for Clinton but did vote for her.
region Country region
age Age in years
religion_important_life Binary indicator for the belief that religion is important for life.

As always, we start by looking at the data:

head(anes)
##   int_vote_trump voted_trump int_vote_clinton voted_clinton lib_conserv_scale
## 1              1           1                0             0         0.5215019
## 2              0           0                0             0        -0.1032504
## 3              0           1                0             0         1.1462541
## 4              1           1                0             0         0.5215019
## 5              0           0                0             0        -0.1032504
## 6              0           0                0             0        -0.7280026
##   white_voter latinx_voter swing_voter swing_trump swing_hillary    region age
## 1           1            0           0           0             0     South  26
## 2           1            0           0           0             0   Midwest  38
## 3           0            0           1           1             0 Northeast  60
## 4           1            0           0           0             0 Northeast  56
## 5           1            0           0           0             0     South  45
## 6           1            0           0           0             0     South  30
##   religion_important_life
## 1                       0
## 2                       1
## 3                       1
## 4                       1
## 5                       0
## 6                       1

1. In this dataset, what does each observation represent? (1 point)

Answer: In this data set, what each observation represents a vote. The data set demonstrates a reflection of the voters’ demographics and which party they subscribed to (liberal/conservative). Apart from the social makeup, we see the presidential candidates and whether people voted yes or no for them. The yes being articulated by a 1 and a no being articulated by a 0.


2. Suppose we want to predict the vote for Trump using the liberal-conservative scale. What should be our X variable? In other words, which variable will we use as the predictor? (1 point)

Answer: The variable we will use as the predictor will be the liberal-conservative scale therefore liberal-conservative being our X variable.


3. Suppose we want to predict the vote for Trump using the liberal-conservative scale. What should be our Y variable? In other words, which variable will we use as the outcome variable? (1 point)

Answer: Utilizing the liberal-conservative scale we determine that the Y-variable should be the vote for trump.


4. Suppose we want to predict the vote for Trump using the liberal-conservative scale. Create a scatter plot of the relationship between the two variables. (1 point)

R code: ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) + geom_jitter(alpha=0.5, height=0.4, width=0.2)

# ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) +
  geom_jitter(alpha=0.5, height=0.4, width=0.2)
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_jitter

5. Suppose we want to predict the vote for Trump using the liberal-conservative scale. Use the function lm() to fit a linear model to the data. (1 point)

R code: ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) + geom_jitter(alpha=0.5, height=0.4, width=0.2)+ geom_point(fill = ‘lightblue’, alpha = 0.6) + labs(title = ’‘, y = ’predict the vote for Trump’, x = ‘liberal-conservative scale’) + geom_smooth(formula = ‘y ~ x’, method = ‘lm’, se = F, color = ‘blue’, lwd = 1) + theme_minimal()

# ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) + geom_jitter(alpha=0.5, height=0.4, width=0.2)+ geom_point(fill = ‘lightblue’, alpha = 0.6) + labs(title = ’‘, y = ’predict the vote for Trump’, x = ‘liberal-conservative scale’) + geom_smooth(formula = ‘y ~ x’, method = ‘lm’, se = F, color = ‘blue’, lwd = 1) + theme_minimal()

lm(data=anes, voted_trump~lib_conserv_scale)

Call: lm(formula = voted_trump ~ lib_conserv_scale, data = anes)

Coefficients: (Intercept) lib_conserv_scale
0.3031 0.2297

6. In the model fitted in Question 05, what is the fitted line? In other words, provide the formula \(\widehat{Y} = \widehat{\beta}_0 + \widehat{\beta}_1 X\) where you specify each term. (1 point)

Answer: Y= predicted votes in favor of Donald Trump. B^1= predicted votes in favor of Donald Trump when liberal conservative scale near 1. B^0= predicted votes in favor of Donald Trump when liberal conservative scale near 0.


7. Assume we use the model fitted in Question 05. Now, use the fitted line to make some predictions.

  1. Computing \(\widehat{Y}\) based on \(X\): Suppose that one scores at \(-1\) in the liberal-conservative scale. What is the predicted chance that this person will vote for Trump? Please show your calculations and then answer the question with a full sentence (including units of measurement). (0.5 points)

Calculations: Calculations here.

Answer: Answers here.

  1. Computing \(\triangle \widehat{Y}\) based on \(\triangle X\): Suppose that a liberal person that initially scored as a -1 in the liberal-conservative scale watches only conservative news outlets for one year straight. She then revises some of her ideas and now scores 1 on the liberal-conservative scale. What would it be our best guess of how much her predicted chance for voting for Trump due to this change in the liberal-conservative scale? Please show your calculations and then answer the question with a complete sentence (including units of measurement). (0.5 points)

Calculations: Calculations here.

Answer: Answers here.


8. Assume we are still using the model fitted in Question 05. What is the \(R^2\) of that model? And how would you interpret it? (1 point)

R code: cor(anes\(lib_conserv_scale,anes\)voted_trump)^2

# cor(anes$lib_conserv_scale,anes$voted_trump)^2

Answer: [1] 0.249724

The R^2 is about 24% and this 24% is the ability in which the regression model is able to depict the variable


9. In Question 05, we fit a model that uses the liberal-conservative scale to predict the intentions to vote for Trump. Now, suppose that age can affect both the chance of voting for Trump and the liberal-conservative scale. Fit a new prediction model that controls for age. Is it true that older people tended to vote for Trump? Does your liberal-conservative scale do better or worse in this new model? How do you interpret the liberal-conservative scores after controlling for age? Explain. (1 point)

R code: lm(voted_trump~ lib_conserv_scale + age , data = anes)

# :lm(voted_trump~ lib_conserv_scale + age , data = anes)

Answer: Call: lm(formula = voted_trump ~ lib_conserv_scale + age, data = anes)

Coefficients: (Intercept) lib_conserv_scale age
0.198557 0.223668 0.002079

As shown in the prediction model 0.2% chance that age affected the vote for Trump. However, the liberal-conservative scale demonstrates that there is a 22% chance.


10. In Question 09, we fit a model that uses the liberal-conservative scale and age to predict the intentions to vote for Trump. Many analysts said that, besides the person’s age, Latinx voters tended to support Trump in the 2016 election. This could confound the results: a relatively liberal young Latinx could vote for Trump because she could associate the liberals with radical policies. Is this true? Explain using a new regression model with age, the liberal-conservative scale, and the Latinx binary indicator. Moreover, how do you interpret the liberal-conservative scores after controlling for age and Latinx? (1 point)

R code: lm(voted_trump~ lib_conserv_scale + age + latinx_voter, data=anes)

# lm(voted_trump~ lib_conserv_scale + age + latinx_voter, data=anes)

Call: lm(formula = voted_trump ~ lib_conserv_scale + age + latinx_voter, data = anes)

Coefficients: (Intercept) lib_conserv_scale age
0.222532 0.222963 0.001836
latinx_voter
-0.127574

Answer: The notion of liberal young Latinx Trump voters is false. As shown these Latinx voters displayed a -12% chance of favoring Trump in the election.