Gil_Raitses

HW Week 9. Exercises 6 and 7 on page 234.

6. Continue the analysis of the Chile data set described in this chapter. The data set is in the “car” package, so you will have to install.packages() and library() that package first, and then use the data(Chile) command to get access to the data set. Pay close attention to the transformations needed to isolate cases with the Yes and No votes as shown in this chapter. Add a new predictor, statusquo, into the model and remove the income variable. Your new model specification should be vote ~ age + statusquo.

The statusquo variable is a rating that each respondent gave indicating whether they preferred change or maintaining the status quo. Conduct general linear model and Bayesian analysis on this model and report and interpret all relevant results. Compare the AIC from this model to the AIC from the model that was developed in the chapter (using income and age as predictors).

# Install and load the car package
if (!requireNamespace("car", quietly = TRUE)) {
  install.packages("car")
}
library(car)

## Loading required package: carData

data(Chile)

# Isolate cases with Yes and No votes
Chile <- Chile[Chile$vote %in% c("Y", "N"), ]

# Add the statusquo variable and remove the income variable
Chile$statusquo <- as.numeric(Chile$statusquo)
Chile <- Chile[, c("vote", "age", "statusquo")]

# Convert vote to a binary factor
Chile$vote <- as.factor(Chile$vote)

# Perform a general linear model analysis:
model_glm <- glm(vote ~ age + statusquo, data = Chile, family = binomial)

# Summarize the model
summary(model_glm)

## 
## Call:
## glm(formula = vote ~ age + statusquo, family = binomial, data = Chile)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.199874   0.265760  -0.752   0.4520    
## age          0.011305   0.006723   1.682   0.0927 .  
## statusquo    3.195761   0.143291  22.303   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2431.28  on 1753  degrees of freedom
## Residual deviance:  749.76  on 1751  degrees of freedom
##   (3 observations deleted due to missingness)
## AIC: 755.76
## 
## Number of Fisher Scoring iterations: 6

# Each additional year of age increases the odds of voting “Yes” by approximately 1.1%, while each one-unit increase in the statusquo rating multiplies the odds of voting “Yes” by about 24.43 times. This indicates a strong positive effect of preferring the status quo on the likelihood of voting “Yes.”

# Conduct Bayesian analysis using the BayesFactor package
if (!requireNamespace("BayesFactor", quietly = TRUE)) {
  install.packages("BayesFactor")
}
library(BayesFactor)

## Loading required package: coda

## Loading required package: Matrix

## ************
## Welcome to BayesFactor 0.9.12-4.7. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
## 
## Type BFManual() to open the manual.
## ************

# Handle missing values by removing rows with any missing values
Chile <- na.omit(Chile)

# Convert vote to numeric for Bayesian analysis
Chile$vote_numeric <- ifelse(Chile$vote == "Y", 1, 0)

# Bayesian analysis using the BayesFactor package
bf_model <- regressionBF(vote_numeric ~ age + statusquo, data = Chile)

# Display the Bayes factor results
bf_model

## Bayes factor analysis
## --------------
## [1] age             : 28282797      ±0.01%
## [2] statusquo       : 1.871219e+496 ±0.01%
## [3] age + statusquo : 8.495415e+494 ±0%
## 
## Against denominator:
##   Intercept only 
## ---
## Bayes factor type: BFlinearModel, JZS

# The Bayesian analysis shows very strong evidence for the influence of age on voting behavior, with a Bayes factor of approximately 2.83 x 10^7 compared to the null model (BF = 28282797 ±0.01%).

# The statusquo predictor has an overwhelmingly larger impact, with a Bayes factor of approximately 1.87 x 10^496 (BF = 1.871219e+496 ±0.01%).

# The combined model of age and statusquo is highly supported over the null model (BF = 8.495415e+494 ±0%).

# Comparison of AIC values between the new model (age + statusquo) and the original model (age + income)
# Reload the original dataset and add the income variable back
data(Chile)

# Isolate cases with Yes and No votes
Chile <- Chile[Chile$vote %in% c("Y", "N"), ]

# Handle missing values by removing rows with any missing values
Chile <- na.omit(Chile)

# Convert vote to a binary factor
Chile$vote <- as.factor(Chile$vote)

# Original model with income
model_original <- glm(vote ~ age + income, data = Chile, family = binomial)

# New model with statusquo
model_new <- glm(vote ~ age + statusquo, data = Chile, family = binomial)

# Compare AIC values
AIC(model_original, model_new)

##                df       AIC
## model_original  3 2332.0294
## model_new       3  740.5207

# The AIC for the original model (age + income) is 2332.0294.
# The AIC for the new model (age + statusquo) is 740.5207.
# The significantly lower AIC value for the new model indicates a much better fit to the data.
# This suggests that the statusquo variable is a much stronger predictor of voting behavior than income.
# The new model, which includes age and statusquo as predictors, is therefore preferred.
# Overall, the new model explains the variability in voting behavior more effectively than the original model.

7. Bonus R code question: Develop your own custom function that will take the posterior distribution of a coefficient from the output object from an MCMClogit() analysis and automatically create a histogram of the posterior distributions of the coefficient in terms of regular odds (instead of log‐odds). Make sure to mark vertical lines on the histogram indicating the boundaries of the 95% HDI.

# Load the necessary package
if (!requireNamespace("coda", quietly = TRUE)) {
  install.packages("coda")
}
library(coda)

# Custom function to plot posterior distributions of coefficients in terms of regular odds
plot_posterior_odds <- function(mcmc_output, coef_name) {
  # Extract the posterior samples for the specified coefficient
  posterior_samples <- as.mcmc(mcmc_output[[1]])[, coef_name]
  
  # Convert log-odds to regular odds
  odds_samples <- exp(posterior_samples)
  
  # Calculate the 95% HDI
  hdi <- HPDinterval(as.mcmc(odds_samples), prob = 0.95)
  
  # Plot the histogram of the posterior odds
  hist(odds_samples, breaks = 50, main = paste("Posterior distribution of", coef_name, "in terms of odds"), 
       xlab = "Odds", ylab = "Frequency", col = "lightblue", border = "black")
  
  # Add vertical lines for the 95% HDI
  abline(v = hdi, col = "red", lwd = 2)
  abline(v = median(odds_samples), col = "blue", lwd = 2)
  
  # Add legend
  legend("topright", legend = c("95% HDI", "Median"), col = c("red", "blue"), lwd = 2)
  
  return(hdi)
}
# hdi_result <- plot_posterior_odds(mcmc_output, "age")

# This function accepts the following arguments:
# - mcmc_output: The output object from an MCMClogit() analysis containing the MCMC samples.
# - coef_name: The name of the coefficient for which the posterior distribution will be plotted.

# The function will create a histogram of the posterior distributions of the specified coefficient in terms of regular odds, with vertical lines marking the 95% Highest Density Interval (HDI) and the median of the distribution.

Gil_Raitses_HW9

Gil Raitses

2024-06-27