Problem Set 03 - Predicting 2016 ANES Survey Results

Due: Mar 09

In this exercise, we will predict the presidential vote in the 2016 election using a liberal-conservative scale. We will also analyze some patterns in the 2016 American National Elections Survey (ANES). ANES is among the most comprehensive electoral surveys conducted in the US. It is conducted both offline and online, and pre and post-election.

The original dataset has 1842 variables. I selected a few here for us to study:

Variable	Meaning
`int_vote_trump`	Intend to vote for Trump in the 2016 election
`voted_trump`	Voted for Trump in the 2016 election
`int_vote_clinton`	Intend to vote for Clinton in the 2016 election
`voted_clinton`	Voted for Clinton in the 2016 election
`lib_conserv_scale`	Liberal-Conservative scale
`white_voter`	Respondent declared herself as White
`latinx_voter`	Respondent declared herself as Latinx
`swing_voter`	Intended one vote but voted different
`swing_trump`	Did not intend to vote for Trump but did vote for him.
`swing_hillary`	Did not intend to vote for Clinton but did vote for her.
`region`	Country region
`age`	Age in years
`religion_important_life`	Binary indicator for the belief that religion is important for life.

As always, we start by looking at the data:

head(anes)
##   int_vote_trump voted_trump int_vote_clinton voted_clinton lib_conserv_scale
## 1              1           1                0             0         0.5215019
## 2              0           0                0             0        -0.1032504
## 3              0           1                0             0         1.1462541
## 4              1           1                0             0         0.5215019
## 5              0           0                0             0        -0.1032504
## 6              0           0                0             0        -0.7280026
##   white_voter latinx_voter swing_voter swing_trump swing_hillary    region age
## 1           1            0           0           0             0     South  26
## 2           1            0           0           0             0   Midwest  38
## 3           0            0           1           1             0 Northeast  60
## 4           1            0           0           0             0 Northeast  56
## 5           1            0           0           0             0     South  45
## 6           1            0           0           0             0     South  30
##   religion_important_life
## 1                       0
## 2                       1
## 3                       1
## 4                       1
## 5                       0
## 6                       1

1. In this dataset, what does each observation represent? (1 point)

Answer: In this data set, what each observation represents a vote. The data set demonstrates a reflection of the voters’ demographics and which party they subscribed to (liberal/conservative). Apart from the social makeup, we see the presidential candidates and whether people voted yes or no for them. The yes being articulated by a 1 and a no being articulated by a 0.

2. Suppose we want to predict the vote for Trump using the liberal-conservative scale. What should be our X variable? In other words, which variable will we use as the predictor? (1 point)

Answer: The variable we will use as the predictor will be the liberal-conservative scale therefore liberal-conservative being our X variable.

3. Suppose we want to predict the vote for Trump using the liberal-conservative scale. What should be our Y variable? In other words, which variable will we use as the outcome variable? (1 point)

Answer: Utilizing the liberal-conservative scale we determine that the Y-variable should be the vote for trump.

4. Suppose we want to predict the vote for Trump using the liberal-conservative scale. Create a scatter plot of the relationship between the two variables. (1 point)

R code: ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) + geom_jitter(alpha=0.5, height=0.4, width=0.2)

# ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) +
  geom_jitter(alpha=0.5, height=0.4, width=0.2)

## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_jitter

5. Suppose we want to predict the vote for Trump using the liberal-conservative scale. Use the function `lm()` to fit a linear model to the data. (1 point)

R code: ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) + geom_jitter(alpha=0.5, height=0.4, width=0.2)+ geom_point(fill = ‘lightblue’, alpha = 0.6) + labs(title = ’‘, y = ’predict the vote for Trump’, x = ‘liberal-conservative scale’) + geom_smooth(formula = ‘y ~ x’, method = ‘lm’, se = F, color = ‘blue’, lwd = 1) + theme_minimal()

# ggplot(data=anes, aes(x=lib_conserv_scale, y=voted_trump)) + geom_jitter(alpha=0.5, height=0.4, width=0.2)+ geom_point(fill = ‘lightblue’, alpha = 0.6) + labs(title = ’‘, y = ’predict the vote for Trump’, x = ‘liberal-conservative scale’) + geom_smooth(formula = ‘y ~ x’, method = ‘lm’, se = F, color = ‘blue’, lwd = 1) + theme_minimal()

lm(data=anes, voted_trump~lib_conserv_scale)

Call: lm(formula = voted_trump ~ lib_conserv_scale, data = anes)

Coefficients: (Intercept) lib_conserv_scale
0.3031 0.2297
—

6. In the model fitted in Question 05, what is the fitted line? In other words, provide the formula \(\widehat{Y} = \widehat{\beta}_0 + \widehat{\beta}_1 X\) where you specify each term. (1 point)

Answer: Y= predicted votes in favor of Donald Trump. B^1= predicted votes in favor of Donald Trump when liberal conservative scale near 1. B^0= predicted votes in favor of Donald Trump when liberal conservative scale near 0.

7. Assume we use the model fitted in Question 05. Now, use the fitted line to make some predictions.

Computing \(\widehat{Y}\) based on \(X\): Suppose that one scores at \(-1\) in the liberal-conservative scale. What is the predicted chance that this person will vote for Trump? Please show your calculations and then answer the question with a full sentence (including units of measurement). (0.5 points)

Calculations: Calculations here.

Answer: Answers here.

Computing \(\triangle \widehat{Y}\) based on \(\triangle X\): Suppose that a liberal person that initially scored as a -1 in the liberal-conservative scale watches only conservative news outlets for one year straight. She then revises some of her ideas and now scores 1 on the liberal-conservative scale. What would it be our best guess of how much her predicted chance for voting for Trump due to this change in the liberal-conservative scale? Please show your calculations and then answer the question with a complete sentence (including units of measurement). (0.5 points)

Calculations: Calculations here.

Answer: Answers here.

8. Assume we are still using the model fitted in Question 05. What is the \(R^2\) of that model? And how would you interpret it? (1 point)

R code: cor(anes\(lib_conserv_scale,anes\)voted_trump)^2

# cor(anes$lib_conserv_scale,anes$voted_trump)^2

Answer: [1] 0.249724

The R^2 is about 24% and this 24% is the ability in which the regression model is able to depict the variable

9. In Question 05, we fit a model that uses the liberal-conservative scale to predict the intentions to vote for Trump. Now, suppose that age can affect both the chance of voting for Trump and the liberal-conservative scale. Fit a new prediction model that controls for age. Is it true that older people tended to vote for Trump? Does your liberal-conservative scale do better or worse in this new model? How do you interpret the liberal-conservative scores after controlling for age? Explain. (1 point)

R code: lm(voted_trump~ lib_conserv_scale + age , data = anes)

# :lm(voted_trump~ lib_conserv_scale + age , data = anes)

Answer: Call: lm(formula = voted_trump ~ lib_conserv_scale + age, data = anes)

Coefficients: (Intercept) lib_conserv_scale age
0.198557 0.223668 0.002079

As shown in the prediction model 0.2% chance that age affected the vote for Trump. However, the liberal-conservative scale demonstrates that there is a 22% chance.

10. In Question 09, we fit a model that uses the liberal-conservative scale and age to predict the intentions to vote for Trump. Many analysts said that, besides the person’s age, Latinx voters tended to support Trump in the 2016 election. This could confound the results: a relatively liberal young Latinx could vote for Trump because she could associate the liberals with radical policies. Is this true? Explain using a new regression model with age, the liberal-conservative scale, and the Latinx binary indicator. Moreover, how do you interpret the liberal-conservative scores after controlling for age and Latinx? (1 point)

R code: lm(voted_trump~ lib_conserv_scale + age + latinx_voter, data=anes)

# lm(voted_trump~ lib_conserv_scale + age + latinx_voter, data=anes)

Call: lm(formula = voted_trump ~ lib_conserv_scale + age + latinx_voter, data = anes)

Coefficients: (Intercept) lib_conserv_scale age
0.222532 0.222963 0.001836
latinx_voter
-0.127574

Answer: The notion of liberal young Latinx Trump voters is false. As shown these Latinx voters displayed a -12% chance of favoring Trump in the election.

POLI 30 D

Axel Chavez

Problem Set 03 - Predicting 2016 ANES Survey Results

Due: Mar 09

1. In this dataset, what does each observation represent? (1 point)

2. Suppose we want to predict the vote for Trump using the liberal-conservative scale. What should be our X variable? In other words, which variable will we use as the predictor? (1 point)

3. Suppose we want to predict the vote for Trump using the liberal-conservative scale. What should be our Y variable? In other words, which variable will we use as the outcome variable? (1 point)

4. Suppose we want to predict the vote for Trump using the liberal-conservative scale. Create a scatter plot of the relationship between the two variables. (1 point)

5. Suppose we want to predict the vote for Trump using the liberal-conservative scale. Use the function `lm()` to fit a linear model to the data. (1 point)

6. In the model fitted in Question 05, what is the fitted line? In other words, provide the formula \(\widehat{Y} = \widehat{\beta}_0 + \widehat{\beta}_1 X\) where you specify each term. (1 point)

7. Assume we use the model fitted in Question 05. Now, use the fitted line to make some predictions.

8. Assume we are still using the model fitted in Question 05. What is the \(R^2\) of that model? And how would you interpret it? (1 point)

POLI 30 D

Axel Chavez

Problem Set 03 - Predicting 2016 ANES Survey Results

Due: Mar 09

1. In this dataset, what does each observation represent? (1 point)

2. Suppose we want to predict the vote for Trump using the liberal-conservative scale. What should be our X variable? In other words, which variable will we use as the predictor? (1 point)

3. Suppose we want to predict the vote for Trump using the liberal-conservative scale. What should be our Y variable? In other words, which variable will we use as the outcome variable? (1 point)

4. Suppose we want to predict the vote for Trump using the liberal-conservative scale. Create a scatter plot of the relationship between the two variables. (1 point)

5. Suppose we want to predict the vote for Trump using the liberal-conservative scale. Use the function lm() to fit a linear model to the data. (1 point)

6. In the model fitted in Question 05, what is the fitted line? In other words, provide the formula \(\widehat{Y} = \widehat{\beta}_0 + \widehat{\beta}_1 X\) where you specify each term. (1 point)

7. Assume we use the model fitted in Question 05. Now, use the fitted line to make some predictions.

8. Assume we are still using the model fitted in Question 05. What is the \(R^2\) of that model? And how would you interpret it? (1 point)

5. Suppose we want to predict the vote for Trump using the liberal-conservative scale. Use the function `lm()` to fit a linear model to the data. (1 point)