Are newer vehicles better at ameliorating collision damage?
The Montgomery County of Maryland: Crash Reporting dataset is provided by the Maryland State Police. It contains information on motor vehicles involved in traffic collisions on roadways within Montgomery County, Maryland. Source: https://catalog.data.gov/dataset/crash-reporting-drivers-data
Key Variables: -Driver at fault -Vehicle damage extent -Vehicle year.
I will perform exploratory data analysis to filter the dataset by at-fault crashes, and I will group these crashes by vehicle year and vehicle damage to compare the damage extent in new versus old vehicles. This information will be used to create a proportional stacked area graph.
library(tidyverse)
library(forcats)
library(RColorBrewer)
crashes <- read_csv("crash_reporting.csv")
#str(crashes) #commented for knitting
head(crashes)
## # A tibble: 6 × 39
## `Report Number` `Local Case Number` `Agency Name` `ACRS Report Type`
## <chr> <dbl> <chr> <chr>
## 1 MCP3296002G 240018653 MONTGOMERY Property Damage Crash
## 2 MCP276700BF 240012321 MONTGOMERY Property Damage Crash
## 3 MCP32790038 240022955 MONTGOMERY Property Damage Crash
## 4 MCP34000014 240019831 MONTGOMERY Property Damage Crash
## 5 MCP3341003F 240011829 MONTGOMERY Injury Crash
## 6 MCP3118004Q 240025609 MONTGOMERY Injury Crash
## # ℹ 35 more variables: `Crash Date/Time` <chr>, `Route Type` <chr>,
## # `Road Name` <chr>, `Cross-Street Name` <chr>, `Off-Road Description` <chr>,
## # Municipality <chr>, `Related Non-Motorist` <chr>, `Collision Type` <chr>,
## # Weather <chr>, `Surface Condition` <chr>, Light <chr>,
## # `Traffic Control` <chr>, `Driver Substance Abuse` <chr>,
## # `Non-Motorist Substance Abuse` <chr>, `Person ID` <chr>,
## # `Driver At Fault` <chr>, `Injury Severity` <chr>, Circumstance <chr>, …
names(crashes) <- gsub(" ", "_", names(crashes)) #sub spaces w/ underscores
names(crashes) <- tolower(names(crashes)) #variable names lowercase
crashes$vehicle_damage_extent[crashes$vehicle_damage_extent %in% c("Vehicle Not at Scene", "UNKNOWN", "N/A", "OTHER")] <- NA
crashes$vehicle_year[crashes$vehicle_year == 0] <- NA
crashes <- crashes |>
filter(!vehicle_year < 1975 & !vehicle_year > 2025)
head(crashes)
## # A tibble: 6 × 39
## report_number local_case_number agency_name acrs_report_type `crash_date/time`
## <chr> <dbl> <chr> <chr> <chr>
## 1 MCP3296002G 240018653 MONTGOMERY Property Damage… 04/21/2024 06:53…
## 2 MCP32790038 240022955 MONTGOMERY Property Damage… 05/15/2024 07:30…
## 3 MCP34000014 240019831 MONTGOMERY Property Damage… 04/28/2024 05:30…
## 4 MCP3341003F 240011829 MONTGOMERY Injury Crash 03/12/2024 07:30…
## 5 MCP33250032 240020414 MONTGOMERY Property Damage… 05/01/2024 07:47…
## 6 EJ7904000T 240020913 GAITHERSBU… Property Damage… 05/04/2024 05:44…
## # ℹ 34 more variables: route_type <chr>, road_name <chr>,
## # `cross-street_name` <chr>, `off-road_description` <chr>,
## # municipality <chr>, `related_non-motorist` <chr>, collision_type <chr>,
## # weather <chr>, surface_condition <chr>, light <chr>, traffic_control <chr>,
## # driver_substance_abuse <chr>, `non-motorist_substance_abuse` <chr>,
## # person_id <chr>, driver_at_fault <chr>, injury_severity <chr>,
## # circumstance <chr>, driver_distracted_by <chr>, …
crashes <- crashes |>
mutate(damage_factor = factor(vehicle_damage_extent))
levels(crashes$damage_factor)
## [1] "DESTROYED" "Disabling" "DISABLING" "Functional" "FUNCTIONAL"
## [6] "No Damage" "NO DAMAGE" "Superficial" "SUPERFICIAL"
crashes <- crashes |>
mutate(damage_recoded = fct_recode(damage_factor,
"Destroyed" = "DESTROYED",
"Disabling" = "DISABLING",
"Functional" = "FUNCTIONAL",
"Superficial" = "SUPERFICIAL",
"No Damage" = "NO DAMAGE"))
crashes$damage_recoded <- factor(crashes$damage_recoded, levels = c("Destroyed", "Disabling", "Functional", "Superficial", "No Damage"))
table(crashes$damage_recoded)
##
## Destroyed Disabling Functional Superficial No Damage
## 7602 75650 51499 51768 6410
crashes2 <- crashes |>
filter(driver_at_fault == "Yes" & !is.na(vehicle_damage_extent)) |>
group_by(vehicle_year, damage_recoded) |>
summarise(count = n())
head(crashes2)
## # A tibble: 6 × 3
## # Groups: vehicle_year [4]
## vehicle_year damage_recoded count
## <dbl> <fct> <int>
## 1 1975 Disabling 1
## 2 1975 Superficial 1
## 3 1976 Disabling 1
## 4 1977 Functional 1
## 5 1978 Disabling 3
## 6 1978 Functional 1
#Graphing help from https://r-resources.massey.ac.nz/rgcookbook/RECIPE-LINE-GRAPH-PROPORTIONAL-STACKED-AREA.html
ggplot(crashes2, aes(x = vehicle_year, y = count, fill = damage_recoded)) +
geom_area(position = "fill", color = "darkgrey", size = .3) +
scale_y_continuous(labels = scales::percent) +
scale_fill_brewer(name = "Damage Extent", palette = "Spectral") +
labs(
title = "Proportion of Collision Damage by Vehicle Year (1975-2025)",
caption = "Source: Mayland State Police",
x = "Vehicle Year",
y = "Proportion of Crashes",
) +
theme_minimal()
This analysis shows that there is somewhat of a correlation between vehicle year and damage. The graph appears to show that newer vehicles involved in collisions, particularly after ~1995, are destroyed less often and more frequently left with superficial damage. This would support the hypothesis that newer vehicles are better at ameliorating collision damage. I suspected this would be the case due to the standardization of antilock brakes and other saftey equipment, but there seems to be a smaller correlation than I had thought. The issue with this analysis is the presence of several potential confounding variables. What if people driving newer cars are more cautious on the road than those driving 30 year old rust buckets on historic plates?
In further research I would either pose a less general question (how does stopping distance differ in cars with/without ABS?), or I would take more variables into account for a more accurate conclusion, such as classifying vehicle type or area of impact.
Dataset: https://catalog.data.gov/dataset/crash-reporting-drivers-data
Graphing resource: https://r-resources.massey.ac.nz/rgcookbook/RECIPE-LINE-GRAPH-PROPORTIONAL-STACKED-AREA.html