This is the Hibbs assignment answer key, but it’s not the only way to do this assignment. There may be variations, as many of you have already done.

Load data

We will begin by loading the libraries (if they haven’t been done already). Then we will load the data using the link given in the assignment.

library("rstanarm")
## Loading required package: Rcpp
## This is rstanarm version 2.32.1
## - See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
## - Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
## - For execution on a local, multicore CPU with excess RAM we recommend calling
##   options(mc.cores = parallel::detectCores())
library("arm")
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: lme4
## 
## arm (Version 1.14-4, built: 2024-4-1)
## Working directory is C:/Users/rafsa/OneDrive/Documents/R practice/Latest assignment practice
## 
## Attaching package: 'arm'
## The following objects are masked from 'package:rstanarm':
## 
##     invlogit, logit
library("ggplot2")
library("bayesplot")
## This is bayesplot version 1.11.1
## - Online documentation and vignettes at mc-stan.org/bayesplot
## - bayesplot theme set to bayesplot::theme_default()
##    * Does _not_ affect other ggplot2 plots
##    * See ?bayesplot_theme_set for details on theme setting
theme_set(bayesplot::theme_default(base_family = "sans"))
hibbs <- read.table(url("https://raw.githubusercontent.com/avehtari/ROS-Examples/master/ElectionsEconomy/data/hibbs.dat"),
                    header = TRUE)
head(hibbs)
##   year growth  vote inc_party_candidate other_candidate
## 1 1952   2.40 44.60           Stevenson      Eisenhower
## 2 1956   2.89 57.76          Eisenhower       Stevenson
## 3 1960   0.85 49.91               Nixon         Kennedy
## 4 1964   4.21 61.34             Johnson       Goldwater
## 5 1968   3.02 49.60            Humphrey           Nixon
## 6 1972   3.62 61.79               Nixon        McGovern

Now that we have loaded our data, let us add two rows for 2016 and 2020 in the dataset.

newrow <- data.frame(year=2016, growth=1.07, vote=48.2,
inc_party_candidate = "Clinton",
other_candidate = "Trump")
updated.data <- rbind(hibbs, newrow)
newrow2 <- data.frame(year=2020, growth=1.56, vote=46.86,
inc_party_candidate = "Trump",
other_candidate = "Biden")
updated.data <- rbind(hibbs, newrow, newrow2)
tail(updated.data, 18)
##    year growth  vote inc_party_candidate other_candidate
## 1  1952   2.40 44.60           Stevenson      Eisenhower
## 2  1956   2.89 57.76          Eisenhower       Stevenson
## 3  1960   0.85 49.91               Nixon         Kennedy
## 4  1964   4.21 61.34             Johnson       Goldwater
## 5  1968   3.02 49.60            Humphrey           Nixon
## 6  1972   3.62 61.79               Nixon        McGovern
## 7  1976   1.08 48.95                Ford          Carter
## 8  1980  -0.39 44.70              Carter          Reagan
## 9  1984   3.86 59.17              Reagan         Mondale
## 10 1988   2.27 53.94           Bush, Sr.         Dukakis
## 11 1992   0.38 46.55           Bush, Sr.         Clinton
## 12 1996   1.04 54.74             Clinton            Dole
## 13 2000   2.36 50.27                Gore       Bush, Jr.
## 14 2004   1.72 51.24           Bush, Jr.           Kerry
## 15 2008   0.10 46.32              McCain           Obama
## 16 2012   0.95 52.00               Obama          Romney
## 17 2016   1.07 48.20             Clinton           Trump
## 18 2020   1.56 46.86               Trump           Biden

The new rows have been added using the codes above. This should slightly change our linear regression model. Below is the new model summary.

M2 <- stan_glm(vote ~ growth, data = updated.data, refresh = 0)
print(M2)
## stan_glm
##  family:       gaussian [identity]
##  formula:      vote ~ growth
##  observations: 18
##  predictors:   2
## ------
##             Median MAD_SD
## (Intercept) 45.9    1.5  
## growth       3.1    0.7  
## 
## Auxiliary parameter(s):
##       Median MAD_SD
## sigma 3.8    0.7   
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
summary(M2)
## 
## Model Info:
##  function:     stan_glm
##  family:       gaussian [identity]
##  formula:      vote ~ growth
##  algorithm:    sampling
##  sample:       4000 (posterior sample size)
##  priors:       see help('prior_summary')
##  observations: 18
##  predictors:   2
## 
## Estimates:
##               mean   sd   10%   50%   90%
## (Intercept) 45.9    1.6 43.9  45.9  47.9 
## growth       3.1    0.7  2.1   3.1   4.0 
## sigma        3.9    0.8  3.1   3.8   4.9 
## 
## Fit Diagnostics:
##            mean   sd   10%   50%   90%
## mean_PPD 51.5    1.3 49.9  51.5  53.2 
## 
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
## 
## MCMC diagnostics
##               mcse Rhat n_eff
## (Intercept)   0.0  1.0  2400 
## growth        0.0  1.0  2749 
## sigma         0.0  1.0  1994 
## mean_PPD      0.0  1.0  3228 
## log-posterior 0.0  1.0  1159 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

We can graph the new data into a scatter-plot model.

plot(x = updated.data$growth, y=updated.data$vote,
xlab = "Personal Income Growth", ylab = "Incumbent Vote Share")
abline(h=50, lty=2) # Horizontal line at 50
abline(v=0, lty=2) # Vertical line at 0

We can plot the regression line as well.

par(mar=c(3,3,2,.1), mgp=c(1.7,.5,0), tck=-.01)
plot(c(-.7, 4.5), c(43,63), type="n",
xlab="Average recent growth in personal income",
ylab="Incumbent party's vote share",
xaxt="n", yaxt="n", mgp=c(2,.5,0),
main="Data and linear fit", bty="l")
# Add custom axes
axis(1, 0:4, paste(0:4,"%",sep=""), mgp=c(2,.5,0))
axis(2, seq(45,60,5), paste(seq(45,60,5),"%",sep=""), mgp=c(2,.5,0))
# Input the graph elements
with(updated.data, points(growth, vote, pch=20))
abline(50, 0, lwd=.5, col="gray")
abline(coef(M2), col="gray15")
# Add the regression equation
text(2.7, 53.5, paste("y =", fround(coef(M2)[1],1), "+", fround(coef(M2)
[2],1), "x"), adj=0, col="gray15")

Here we see in the mean column that the intercept estimate is 45.9, and the slope is 3.1. This means that for every one unit increase in growth, we predict a 3.1 point increase in the vote share for the incumbent (using data from 1952–2020). The change in y-intercept does not matter much for this regression. However, 0.1 increase in the slope compared to the previous model means that 1% personal income increase would rise incumbent vote share 0.1% more than our previously given model. This is the effect of the two new observations. (Answer to question 2)

Point predictions for 2024

Using this new model, we predict the expected vote percentage for the incumbent in the 2024 Presidential election, by plotting a growth rate. Many of you have opted for different methods of retrieving this growth rate. You can check for the most recent stats, or make an average for the four years. We are not particularly looking for who got the best method, but it’s more of an exercise of brainstorming what makes sense. I am just randomly going with a number for now. This is the answer for Q 1, including the new rows for 2016 and 2020 that we gave in the beginning.

new <- data.frame(growth=2.0)
print(y_point_pred <- predict(M2, newdata=new))
##        1 
## 52.06285

This new model says that Harris is supposed to get 52% of the popular vote if we plot 1.5 growth rate. We can graph it in many ways including below (answer to Q 3).

par(mar=c(3,3,3,1), mgp=c(1.7,.5,0), tck=-.01)
mu <- 52.024
sigma <- 3.9
curve (dnorm(x,mu,sigma), ylim=c(0,.103), from=35, to=70, bty="n",
xaxt="n", yaxt="n", yaxs="i",
xlab="Harris share of the two-party vote", ylab="",
main="Probability forecast of Kamala Harris vote share in 2024,\
nbased on 1.5% rate of economic growth", cex.main=.9)
x <- seq (50,65,.1)
polygon(c(min(x),x,max(x)), c(0,dnorm(x,mu,sigma),0),
col="darkgray", border="black")
axis(1, seq(40,65,5), paste(seq(40,65,5),"%",sep=""))
text(50, .025, "Predicted\n52% chance\nof Harris victory", adj=0)

The final question is about the strengths and weaknesses of the analysis. Given that the correlation of these two data has always been positive for almost 70 years with very few outliers, it gives this analysis some credibility. It also allows politicians to understand the public mindset, to whom personal income growth seems to be a large priority and a deciding factor. It also corroborates the famous quote by Jim Carville: “It’s the economy, stupid.” Many of you gave some fantastic answers, such as the simplicity of the model, how easily it can be shown, explained, and communicated, reliability of the data, and the time-length of the data.

The weakness of the model is taht the analysis does not consider any other dependent variable that may have been a confounding factor. Many factors decide public perception and popular vote, so relying only on one dependent variable gives an incomplete picture. Most of you have pointed this out. You guys also mentioned that the model shifts quickly when we change the personal income growth rate for 2024, where the growth rate is an estimation itself. It also does not account for anomalies or uniqueness like COVID-19 or Presidential candidate changing so late into the campaign.