Homework 6

Getting Started:

# Load Packages
library(readr)
library(dplyr)
library(tidyr)
library(tidyverse)
library(descr)
library(knitr)
library(ggplot2)
library(ggthemes)
library(clarify)
library(MASS)
library(arm)

Dataset:

I will use mtcars because the initial data set I wanted to use won’t work.

# Using mtcars dataset
data(mtcars)

# Inspecting the date
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Poisson Regression Model

This code fits a Poisson regression model to predict the count of carburetors based on horsepower and weight from the mtcars dataset, and then displays a summary of the model’s coefficients and statistics.

The outcome shows that the Poisson regression model predicts the count of carburetors based on horsepower (hp) and weight (wt) in mtcars; here, the significant positive coefficient for hp (0.00549, p < 0.001) indicates that higher horsepower is associated with a slight increase in the expected log count of carburetors (roughly a 0.55% increase per additional horsepower), while wt is not a significant predictor, and overall the model fit is reasonable as indicated by the deviance and AIC values.

# Poisson Model
poisson_model <- glm(carb ~ hp + wt, data = mtcars, family = poisson(link = "log"))

# Summary of the Model
summary(poisson_model)

## 
## Call:
## glm(formula = carb ~ hp + wt, family = poisson(link = "log"), 
##     data = mtcars)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 0.138788   0.398666   0.348 0.727741    
## hp          0.005487   0.001644   3.337 0.000846 ***
## wt          0.004482   0.130972   0.034 0.972701    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 27.043  on 31  degrees of freedom
## Residual deviance: 12.278  on 29  degrees of freedom
## AIC: 107.64
## 
## Number of Fisher Scoring iterations: 4

Using Arms as an alternative to Clarify due to issues

This code uses the arm package’s sim() function (instead of clarify, which didn’t work on my computer because the function couldn’t be found even though I loaded it) along with base R to simulate 1000 draws of the Poisson model’s coefficients, capturing uncertainty in the parameter estimates. It then calculates predicted log counts for two scenarios—a car with 100 hp and one with 200 hp—while holding weight constant at its mean, and converts these log counts to expected counts via exponentiation. The difference between the two sets of predictions is computed for each simulation, and finally, the code summarizes this difference by calculating its average and a 95% confidence interval, thereby quantifying the impact of increasing horsepower on the predicted number of carburetors.

# Simulating 1000 draws of the model coefficients to incorporate uncertainty.
sims <- sim(poisson_model, n.sims = 1000)

# Extracting the simulated coefficients as a matrix.
coefs <- as.matrix(sims@coef)

# Defining two scenarios:
# Scenario 1: A car with 100 hp and weight fixed at its mean.
# Scenario 2: A car with 200 hp and weight fixed at its mean.
lp_low  <- coefs[, 1] + coefs[, 2] * 100 + coefs[, 3] * mean(mtcars$wt)
lp_high <- coefs[, 1] + coefs[, 2] * 200 + coefs[, 3] * mean(mtcars$wt)

# Converting the linear predictors to expected counts using the inverse log link (exponentiation).
pred_low  <- exp(lp_low)
pred_high <- exp(lp_high)

# Computing the simulated difference in predicted carb counts between the two scenarios.
diff <- pred_high - pred_low

# Summarizing the difference: compute the mean and the 95% confidence interval.
mean_diff <- mean(diff)
ci_diff   <- quantile(diff, c(0.025, 0.975))

# Displaing the results
mean_diff

## [1] 1.47746

ci_diff

##      2.5%     97.5% 
## 0.6119816 2.3456759

Plots

This code uses ggplot2 to create two histogram plots that visualize simulation results. In the first plot, a data frame is created from the simulated differences in predicted carburetor counts (for an hp increase from 100 to 200), and a histogram is generated with a red dashed line marking the mean difference. In the second plot, a data frame is created from the simulated coefficients for hp, and a histogram is produced with a blue dashed line at the mean coefficient value. These plots help visualize the uncertainty and distribution of the simulation results.

# Plot 1: Distribution of the Simulated Difference in Predicted Carburetor Counts
library(ggplot2)
df_diff <- data.frame(diff = diff)
ggplot(df_diff, aes(x = diff)) +
  geom_histogram(bins = 30, fill = "lightblue", color = "black") +
  geom_vline(aes(xintercept = mean(diff)), color = "red", linetype = "dashed", size = 1) +
  labs(title = "Distribution of Simulated Difference in Predicted Carburetor Counts",
       x = "Difference in Predicted Counts (hp = 200 - hp = 100)",
       y = "Frequency")

# Plot 2: Distribution of the Simulated Coefficient for 'hp'
df_coefs <- data.frame(hp_coef = coefs[, 2])
ggplot(df_coefs, aes(x = hp_coef)) +
  geom_histogram(bins = 30, fill = "lightgreen", color = "black") +
  geom_vline(aes(xintercept = mean(hp_coef)), color = "blue", linetype = "dashed", size = 1) +
  labs(title = "Distribution of Simulated Coefficient for hp",
       x = "hp Coefficient",
       y = "Frequency")

Analysis

How does increasing horsepower from 100 to 200 affect the expected number of carburetors in a car, holding weight constant?

A carburetor is a mechanical device in an engine that mixes air and fuel in the proper ratio for combustion. The analysis indicates that, holding weight constant at its mean, increasing horsepower from 100 to 200 is associated with a predicted increase in the expected number of carburetors. Using a Poisson regression model and simulating 1,000 draws of the model’s coefficients with arm’s sim() function, we calculated the expected counts for a car at 100 hp and at 200 hp. The exponential transformation of the simulated linear predictors yields expected carburetor counts, and the difference between these predictions quantifies the effect of increasing horsepower; the output shows that the mean predicted difference is approximately 1.451 carburetors, meaning that on average, a 100-hp increase results in about 1.451 additional carburetors.

Furthermore, the 95% confidence interval for this difference ranges from roughly 0.622 to 2.369 carburetors, indicating that we are 95% confident that the true increase in the expected number of carburetors lies within this interval. This result, along with the statistically significant positive coefficient for horsepower in the model, demonstrates a robust relationship between horsepower and carburetor count in the mtcars dataset, thereby quantifying the impact of an increase in horsepower on the expected number of carburetors.

The two plots further illustrate the simulation results: the first plot displays the distribution of the simulated differences in predicted carburetor counts, with a red dashed line marking the mean difference, which visually confirms the central tendency and variability of the predicted effect. The second plot shows the distribution of the simulated coefficient for horsepower, with a blue dashed line at its mean, indicating the overall positive effect of horsepower on carburetor count. Together, these visualizations provide an intuitive understanding of both the uncertainty in the model estimates and the impact of horsepower changes on the count outcome.