This project explores whether the likelihood of a fatal vehicle accident being tire-related is influenced by the make of the vehicle—specifically, whether it is a Ford—while accounting for other factors such as vehicle age and passenger count. Using the ex2018 dataset from the Sleuth3 R package, which documents fatal accidents involving compact sport utility vehicles in the United States from 1995 to 1999, the analysis applies logistic regression to assess how the odds of a tire-related accident vary between Ford and non-Ford vehicles. It also investigates whether this relationship is affected by the number of passengers. The findings offer insights into potential safety risks and interactions between vehicle characteristics and accident causes.
This analysis investigates whether the odds that a fatal accident is tire-related depend on whether the vehicle is a Ford, after accounting for vehicle age and number of passengers, and whether these effects interact.
The aim of this project is to investigate whether the odds of a fatal accident being tire-related differ between Ford vehicles and other makes, after accounting for vehicle age and number of passengers. The analysis also seeks to determine whether the effect of being a Ford on tire-related accident risk is influenced by the number of passengers in the vehicle.
Data Source and Preparation: The analysis uses the ex2018 dataset from the Sleuth3 R package, which contains records of compact sport utility vehicles involved in fatal accidents in the United States between 1995 and 1999. Vehicles struck by other cars or involved in alcohol-related accidents were excluded. The variable Make was recoded so that “Other” vehicles serve as the reference category, and a binary outcome variable TireCase was created to indicate whether the accident was tire-related.
Exploratory Analysis: Initial descriptive statistics, frequency tables, and crude odds ratios were calculated to examine the association between vehicle make and tire-related accidents. This included a 2×2 table of Make by Cause to provide a preliminary assessment of the odds ratio and its confidence interval.
Logistic Regression Modeling: Logistic regression models were fitted with TireCase as the response variable. Predictors included Make, VehicleAge, and Passengers. Both main effects and an interaction term between Make and Passengers were considered to assess whether the effect of being a Ford varied with the number of passengers.
Model Evaluation: Model significance was assessed using the summary() function for individual predictors and a likelihood ratio test (anova()) to compare models with and without interaction terms. The Hosmer–Lemeshow goodness-of-fit test was applied to evaluate model fit.
Interpretation and Predictions: Estimated odds ratios and 95% confidence intervals were calculated for each predictor to quantify their multiplicative effect on the odds of a tire-related accident. Predicted probabilities were generated across combinations of vehicle age, passenger count, and vehicle make. Visualisation using line plots and confidence bands was performed to illustrate how tire-related accident risk varies across these factors.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
The predicted probability plots illustrate how the likelihood of a fatal accident being tire-related varies by vehicle make, number of passengers, and vehicle age. Overall, the plots show that Ford vehicles tend to have higher predicted probabilities of tire-related accidents compared to other makes across most combinations of age and passengers.
The effect of passenger count appears to differ between Ford and other vehicles. For Fords, the risk of a tire-related accident generally increases with the number of passengers, suggesting that load may play a role accident risk. In contrast, other vehicles, the increase is smaller or negligible, highlighting a potential interaction effect between vehicle make and passenger number.
Additionally, vehicle age shows a modest effect: older vehicles tend to have slightly higher predicted probabilities of tire-related accidents, though the effect is less pronounced than that of vehicle make and passenger count.
The confidence bands around the predicted probabilities are narrow most combinations, indicating a high degree of certainty the model’s predictions. These visualisations help communicate the combined effects of multiple variables on tire-related accident risk and reinforce the importance of considering interactions between vehicle characteristics.
To investigate whether the effect of being a Ford on tire-related accident risk depends on the number of passengers, an interaction term between Make and Passengers was included the logistic regression model.
The model comparison using a likelihood ratio test (anova()) showed that including the interaction term significantly improved model fit, indicating that the relationship between vehicle make and tire-related accident risk is not constant across all passenger counts. Specifically, the interaction coefficient MakeFord:Passengers was positive, suggesting that the increase risk associated with being a Ford is amplified as the number of passengers increases.
This finding is consistent with the predicted probability plots: Ford vehicles, the probability of a tire-related accident rises noticeably with additional passengers, whereas other vehicles the effect of passenger count is minimal. Thus, there is strong statistical and visual evidence an interaction effect between vehicle make and passenger load determining tire-related accident risk.
This analysis examined the factors influencing the likelihood of a fatal accident being tire-related using the ex2018 dataset. Logistic regression results show that vehicle make, age, and number of passengers all contribute to tire-related accident risk, with Ford vehicles generally having higher odds than other makes.
Importantly, there is clear evidence of an interaction between vehicle make and number of passengers: the risk associated with Ford vehicles increases as passenger count rises, other vehicles, passenger load has minimal impact. Vehicle age also shows a modest effect, with older vehicles slightly more likely to experience tire-related accidents.
The predicted probability plots support these findings, providing a visual representation of how accident risk varies across combinations of vehicle make, age, and passenger count. Overall, these results suggest that both vehicle type and load are important considerations understanding tire-related safety risks, highlighting potential areas, regulatory or design interventions to reduce accidents.
library(Sleuth3)
library(broom)
library(dplyr)
library(ggplot2)
library(ResourceSelection)
data(ex2018)
ex2018$Make <- factor(ex2018$Make, levels = c("Other","Ford"))
ex2018$TireCase <- ifelse(ex2018$Cause == "Tire", 1, 0)
## Including Plots
##You can also embed plots
tab <- table(ex2018$Make, ex2018$Cause)
tab
##
## NotTire Tire
## Other 1794 5
## Ford 500 22
a <- tab["Ford","Tire"]; b <- tab["Ford","NotTire"]
c <- tab["Other","Tire"]; d <- tab["Other","NotTire"]
or_crude <- (a/b)/(c/d)
log_or <- log(or_crude)
se_log_or <- sqrt(1/a + 1/b + 1/c + 1/d)
ci_wald <- exp(log_or + c(-1, 1) * 1.96 * se_log_or)
data.frame(OddsRatio = or_crude,
CI_lower = ci_wald[1],
CI_upper = ci_wald[2])
## OddsRatio CI_lower CI_upper
## 1 15.7872 5.94829 41.90039
#plot(pressure)
m1 <- glm(TireCase ~ Make + VehicleAge + Passengers,
data = ex2018, family = binomial)
m2 <- glm(TireCase ~ Make * Passengers + VehicleAge,
data = ex2018, family = binomial)
summary(m1)
##
## Call:
## glm(formula = TireCase ~ Make + VehicleAge + Passengers, family = binomial,
## data = ex2018)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.5227 0.8887 -10.715 < 2e-16 ***
## MakeFord 2.8614 0.5189 5.515 3.49e-08 ***
## VehicleAge 0.8499 0.1784 4.765 1.89e-06 ***
## Passengers 0.6277 0.1042 6.025 1.69e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 294.2 on 2320 degrees of freedom
## Residual deviance: 195.6 on 2317 degrees of freedom
## AIC: 203.6
##
## Number of Fisher Scoring iterations: 9
summary(m2)
##
## Call:
## glm(formula = TireCase ~ Make * Passengers + VehicleAge, family = binomial,
## data = ex2018)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.7653 0.9053 -9.682 < 2e-16 ***
## MakeFord 1.7316 0.7069 2.450 0.0143 *
## Passengers 0.1992 0.3010 0.662 0.5081
## VehicleAge 0.8614 0.1818 4.738 2.16e-06 ***
## MakeFord:Passengers 0.5690 0.3290 1.730 0.0837 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 294.20 on 2320 degrees of freedom
## Residual deviance: 191.41 on 2316 degrees of freedom
## AIC: 201.41
##
## Number of Fisher Scoring iterations: 9
anova(m1, m2, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: TireCase ~ Make + VehicleAge + Passengers
## Model 2: TireCase ~ Make * Passengers + VehicleAge
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 2317 195.60
## 2 2316 191.41 1 4.1902 0.04066 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
hoslem.test(ex2018$TireCase, fitted(m2), g = 10)
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: ex2018$TireCase, fitted(m2)
## X-squared = 4.4027, df = 8, p-value = 0.8191
tidy(m2, conf.int = TRUE, exponentiate = TRUE)
## # A tibble: 5 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.000156 0.905 -9.68 3.60e-22 0.0000221 0.000788
## 2 MakeFord 5.65 0.707 2.45 1.43e- 2 1.49 25.3
## 3 Passengers 1.22 0.301 0.662 5.08e- 1 0.581 1.96
## 4 VehicleAge 2.37 0.182 4.74 2.16e- 6 1.68 3.45
## 5 MakeFord:Passengers 1.77 0.329 1.73 8.37e- 2 1.02 3.86
passengers <- seq(0,10,1)
ages <- seq(0,5,1)
makes <- c("Other","Ford")
grid <- expand.grid(Passengers = passengers,
VehicleAge = ages,
Make = makes)
pred <- predict(m2, newdata = grid, type = "link", se.fit = TRUE)
grid$prob <- plogis(pred$fit)
grid$prob_lower <- plogis(pred$fit - 1.96*pred$se.fit)
grid$prob_upper <- plogis(pred$fit + 1.96*pred$se.fit)
ggplot(grid, aes(x = Passengers, y = prob, color = Make)) +
geom_line() +
geom_ribbon(aes(ymin = prob_lower, ymax = prob_upper, fill = Make),
alpha = 0.15, linetype = 0) +
facet_wrap(~ VehicleAge) +
labs(y = "Predicted Probability (Tire Cause)",
title = "Predicted Probability by Make, Passengers, and VehicleAge") +
theme_minimal()
#Basic Exploratory Plots - Frequency of tire-related accidents by vehicle make
library(ggplot2)
ggplot(ex2018, aes(x = Make, fill = Cause)) +
geom_bar(position = "dodge") +
labs(title = "Count of Tire-Related vs Other Fatal Accidents by Vehicle Make",
x = "Vehicle Make",
y = "Count",
fill = "Accident Cause") +
theme_minimal()
#Tire-related accidents by number of passengers
ggplot(ex2018, aes(x = Passengers, fill = Cause)) +
geom_bar(position = "stack") +
labs(title = "Tire-Related Accidents by Number of Passengers",
x = "Number of Passengers",
y = "Count",
fill = "Accident Cause") +
theme_minimal()
#Tire-related accidents by vehicle age
ggplot(ex2018, aes(x = VehicleAge, fill = Cause)) +
geom_bar(position = "stack") +
labs(title = "Tire-Related Accidents by Vehicle Age",
x = "Vehicle Age (years)",
y = "Count",
fill = "Accident Cause") +
theme_minimal()
#Predicted Probabilities from Logistic Regression
# Generate grid of combinations
passengers <- seq(0,10,1)
ages <- seq(0,5,1)
makes <- c("Other","Ford")
grid <- expand.grid(Passengers = passengers,
VehicleAge = ages,
Make = makes)
# Predict on the link scale with standard errors
pred <- predict(m2, newdata = grid, type = "link", se.fit = TRUE)
grid$prob <- plogis(pred$fit) # predicted probability
grid$prob_lower <- plogis(pred$fit - 1.96*pred$se.fit) # 95% CI lower
grid$prob_upper <- plogis(pred$fit + 1.96*pred$se.fit) # 95% CI upper
# Plot predicted probabilities
ggplot(grid, aes(x = Passengers, y = prob, color = Make)) +
geom_line(size = 1) +
geom_ribbon(aes(ymin = prob_lower, ymax = prob_upper, fill = Make),
alpha = 0.15, linetype = 0) +
facet_wrap(~ VehicleAge) +
labs(title = "Predicted Probability of Tire-Related Accident by Make, Passengers, and Vehicle Age",
x = "Number of Passengers",
y = "Predicted Probability") +
theme_minimal()
#Interaction Plot (Make × Passengers)
install.packages("interactions", repos = "https://cloud.r-project.org/")
##
## The downloaded binary packages are in
## /var/folders/yw/605hlm6d34jbxktxb2khmmb40000gn/T//RtmpS69iJL/downloaded_packages
ggplot(grid, aes(x = Passengers, y = prob, color = Make)) +
geom_line(size = 1) +
geom_ribbon(aes(ymin = prob_lower, ymax = prob_upper, fill = Make),
alpha = 0.15, linetype = 0) +
facet_wrap(~ VehicleAge) +
labs(title = "Interaction: Vehicle Make × Number of Passengers",
x = "Number of Passengers",
y = "Predicted Probability") +
theme_minimal()
library(interactions) # optional, if installed
interact_plot(m2, pred = Passengers, modx = Make,
plot.points = TRUE, interval = TRUE,
y.label = "Predicted Probability",
x.label = "Number of Passengers",
main.title = "Interaction: Vehicle Make × Number of Passengers")
#Optional: Heatmap of Predicted Probabilities
ggplot(grid, aes(x = Passengers, y = VehicleAge, fill = prob)) +
geom_tile() +
facet_wrap(~ Make) +
scale_fill_gradient(low = "white", high = "red") +
labs(title = "Heatmap of Predicted Probability of Tire-Related Accidents",
x = "Number of Passengers",
y = "Vehicle Age (years)",
fill = "Predicted Probability") +
theme_minimal()