Are newer vehicles better at ameliorating collision damage?

Introduction

The Montgomery County of Maryland: Crash Reporting dataset is provided by the Maryland State Police. It contains information on motor vehicles involved in traffic collisions on roadways within Montgomery County, Maryland. Source: https://catalog.data.gov/dataset/crash-reporting-drivers-data

Key Variables: -Driver at fault -Vehicle damage extent -Vehicle year.

Data Analysis

I will perform exploratory data analysis to filter the dataset by at-fault crashes, and I will group these crashes by vehicle year and vehicle damage to compare the damage extent in new versus old vehicles. This information will be used to create a proportional stacked area graph.

Load libraries and dataset

library(tidyverse)
library(forcats)
library(RColorBrewer)
crashes <- read_csv("crash_reporting.csv")

Check structure and head

#str(crashes)   #commented for knitting 
head(crashes)
## # A tibble: 6 × 39
##   `Report Number` `Local Case Number` `Agency Name` `ACRS Report Type`   
##   <chr>                         <dbl> <chr>         <chr>                
## 1 MCP3296002G               240018653 MONTGOMERY    Property Damage Crash
## 2 MCP276700BF               240012321 MONTGOMERY    Property Damage Crash
## 3 MCP32790038               240022955 MONTGOMERY    Property Damage Crash
## 4 MCP34000014               240019831 MONTGOMERY    Property Damage Crash
## 5 MCP3341003F               240011829 MONTGOMERY    Injury Crash         
## 6 MCP3118004Q               240025609 MONTGOMERY    Injury Crash         
## # ℹ 35 more variables: `Crash Date/Time` <chr>, `Route Type` <chr>,
## #   `Road Name` <chr>, `Cross-Street Name` <chr>, `Off-Road Description` <chr>,
## #   Municipality <chr>, `Related Non-Motorist` <chr>, `Collision Type` <chr>,
## #   Weather <chr>, `Surface Condition` <chr>, Light <chr>,
## #   `Traffic Control` <chr>, `Driver Substance Abuse` <chr>,
## #   `Non-Motorist Substance Abuse` <chr>, `Person ID` <chr>,
## #   `Driver At Fault` <chr>, `Injury Severity` <chr>, Circumstance <chr>, …

Clean variable names and data

names(crashes) <- gsub(" ", "_", names(crashes))    #sub spaces w/ underscores
names(crashes) <- tolower(names(crashes))      #variable names lowercase

Replace missing values with NAs

crashes$vehicle_damage_extent[crashes$vehicle_damage_extent %in% c("Vehicle Not at Scene", "UNKNOWN", "N/A", "OTHER")] <- NA 

crashes$vehicle_year[crashes$vehicle_year == 0] <- NA

Filter out inaccurate information (also set 50 year limit due to lack of significant data)

crashes <- crashes |>
  filter(!vehicle_year < 1975 & !vehicle_year > 2025) 
head(crashes)
## # A tibble: 6 × 39
##   report_number local_case_number agency_name acrs_report_type `crash_date/time`
##   <chr>                     <dbl> <chr>       <chr>            <chr>            
## 1 MCP3296002G           240018653 MONTGOMERY  Property Damage… 04/21/2024 06:53…
## 2 MCP32790038           240022955 MONTGOMERY  Property Damage… 05/15/2024 07:30…
## 3 MCP34000014           240019831 MONTGOMERY  Property Damage… 04/28/2024 05:30…
## 4 MCP3341003F           240011829 MONTGOMERY  Injury Crash     03/12/2024 07:30…
## 5 MCP33250032           240020414 MONTGOMERY  Property Damage… 05/01/2024 07:47…
## 6 EJ7904000T            240020913 GAITHERSBU… Property Damage… 05/04/2024 05:44…
## # ℹ 34 more variables: route_type <chr>, road_name <chr>,
## #   `cross-street_name` <chr>, `off-road_description` <chr>,
## #   municipality <chr>, `related_non-motorist` <chr>, collision_type <chr>,
## #   weather <chr>, surface_condition <chr>, light <chr>, traffic_control <chr>,
## #   driver_substance_abuse <chr>, `non-motorist_substance_abuse` <chr>,
## #   person_id <chr>, driver_at_fault <chr>, injury_severity <chr>,
## #   circumstance <chr>, driver_distracted_by <chr>, …

Factor damage extent

crashes <- crashes |>
  mutate(damage_factor = factor(vehicle_damage_extent))
levels(crashes$damage_factor)
## [1] "DESTROYED"   "Disabling"   "DISABLING"   "Functional"  "FUNCTIONAL" 
## [6] "No Damage"   "NO DAMAGE"   "Superficial" "SUPERFICIAL"

Combine duplicate factors and reorder

crashes <- crashes |>
  mutate(damage_recoded = fct_recode(damage_factor,
                                     "Destroyed" = "DESTROYED",
                                     "Disabling" = "DISABLING",
                                     "Functional" = "FUNCTIONAL",
                                     "Superficial" = "SUPERFICIAL",
                                     "No Damage" = "NO DAMAGE"))
crashes$damage_recoded <- factor(crashes$damage_recoded, levels = c("Destroyed", "Disabling", "Functional", "Superficial", "No Damage"))
table(crashes$damage_recoded)
## 
##   Destroyed   Disabling  Functional Superficial   No Damage 
##        7602       75650       51499       51768        6410

Filter only drivers at fault and prepare data for plotting

crashes2 <- crashes |>
  filter(driver_at_fault == "Yes" & !is.na(vehicle_damage_extent)) |>
  group_by(vehicle_year, damage_recoded) |>
  summarise(count = n())
head(crashes2)
## # A tibble: 6 × 3
## # Groups:   vehicle_year [4]
##   vehicle_year damage_recoded count
##          <dbl> <fct>          <int>
## 1         1975 Disabling          1
## 2         1975 Superficial        1
## 3         1976 Disabling          1
## 4         1977 Functional         1
## 5         1978 Disabling          3
## 6         1978 Functional         1

Plot a proportional stacked area graph

#Graphing help from https://r-resources.massey.ac.nz/rgcookbook/RECIPE-LINE-GRAPH-PROPORTIONAL-STACKED-AREA.html

ggplot(crashes2, aes(x = vehicle_year, y = count, fill = damage_recoded)) +
  geom_area(position = "fill", color = "darkgrey", size = .3) +
  scale_y_continuous(labels = scales::percent) +
  scale_fill_brewer(name = "Damage Extent", palette = "Spectral") +
  labs(
    title = "Proportion of Collision Damage by Vehicle Year (1975-2025)",
    caption = "Source: Mayland State Police",
    x = "Vehicle Year",
    y = "Proportion of Crashes",
    ) +
  theme_minimal()

Conclusion

This analysis shows that there is somewhat of a correlation between vehicle year and damage. The graph appears to show that newer vehicles involved in collisions, particularly after ~1995, are destroyed less often and more frequently left with superficial damage. This would support the hypothesis that newer vehicles are better at ameliorating collision damage. I suspected this would be the case due to the standardization of antilock brakes and other saftey equipment, but there seems to be a smaller correlation than I had thought. The issue with this analysis is the presence of several potential confounding variables. What if people driving newer cars are more cautious on the road than those driving 30 year old rust buckets on historic plates?

In further research I would either pose a less general question (how does stopping distance differ in cars with/without ABS?), or I would take more variables into account for a more accurate conclusion, such as classifying vehicle type or area of impact.

References

Dataset: https://catalog.data.gov/dataset/crash-reporting-drivers-data

Graphing resource: https://r-resources.massey.ac.nz/rgcookbook/RECIPE-LINE-GRAPH-PROPORTIONAL-STACKED-AREA.html