Vehicle Crash Analysis: Montgomery County, MD (2015–2025)
Author
Maisha Ann Subin
Crashes in Montgomery County, MD Over the Years
(iStock, 2017)
Data Set Variables
Weather
Weather Frequency by Damage
Drivers License plate
Speed Limit
Vehicle Make
Latitude
Longitude
Year (of crash)
Damage
Damage Frequency
(Total 39 variables; I’ve included the ones I use in this analysis)
Overview
For this analysis, I selected a data set from Data.gov that reports crash data in Montgomery County, Maryland. The data set, collected through the Automated Crash Reporting System (ACRS) managed by the Maryland State Police, includes detailed records of vehicle crashes categorized into three types: fatal, property damage, and injury-related crashes.
The data is compiled from reports filled out by law enforcement officers at the scene of a crash. These reports include a variety of factors such as driver and vehicle information, weather and road conditions, and contributing causes.
In this research, I aim to explore questions such as: “Does weather play a role in the frequency or type of crashes?” and “Is there a relationship between the make of a vehicle—like Toyota, Mercedes, or Tesla—and how often it’s involved in a crash?” These questions are not only data-driven but also reflect real-world concerns about safety and accountability on the road.
Separated the combined date/time field into distinct columns; extracted year as numeric
Renamed all variables to snake_case for consistency
Removed rows with missing or placeholder “N/A” values across all key variables
Engineered three additional numeric features: weather frequency by damage type, damage frequency, and traffic control frequency
# Removing columns not used in this analysiscrash_col <- crash[, -c(1, 2, 3, 7, 8, 9, 10, 11, 18, 19, 25, 26, 27, 28, 30, 32, 33, 34, 36 )]
# Create separate columns for date and timecrash_time <- crash_col %>%separate('Crash Date/Time', into =c("year", "time"), sep =" ")# Extract year as numeric from date columncrash_time$year <-as.numeric(str_sub(crash_time$year, start =-4L))# Sort the years in ascending ordersorted_years <-sort(crash_time$year)unique(sorted_years)
Additional numeric variable: weather frequency by damage
# Removing additional columnscrash_nona <- crash_nona %>%select(-starts_with("weather_freq_by_damage")) # Counting how often each weather type appears within each damage typeweather_freq_by_damage <- crash_nona %>%count(damage, weather, name ="weather_freq_by_damage") # Combining 2 variable's# Joining those frequencies back into my original datacrash_nona <- crash_nona %>%left_join(weather_freq_by_damage, by =c("damage", "weather"))
Additional numeric variable: damage frequency
# Creating a frequency table for the 'damage' columndamage_freq <-table(crash_nona$damage)# Mapping frequencies back to the original data framecrash_nona$damage_freq <- damage_freq[crash_nona$damage]
Additional numeric variable: traffic control frequency
# Creating frequency table for 'traffic_control'traffic_control_freq <-table(crash_nona$traffic_control)# Mapping frequency values back to the data framecrash_nona$traffic_control_freq <- traffic_control_freq[crash_nona$traffic_control]
Analyzing data through correlation plot
# To analyze correlation of selected variablescrash_corr <- crash_nona [, c("speed_limit", "year", "latitude", "longitude", "weather_freq_by_damage", "damage_freq", "traffic_control_freq")]plot_correlation(crash_corr)
Warning in dummify(data, maxcat = maxcat): Ignored all discrete features since
`maxcat` set to 20 categories!
There seem to be relatively significant negative correlation between year and traffic control frequency of -0.48.
There also seem to strong correlation between damage frequency and damage freq property and injury. However that’s possible because of the common points in the variables.
Testing multiple linear regression model in relation to damage frequency
# Running linear regression modelslm_model <-lm(damage_freq ~ speed_limit + year + latitude + longitude + weather_freq_by_damage+ traffic_control_freq, data = crash_nona)# Summary of modelsummary(lm_model)
Call:
lm(formula = damage_freq ~ speed_limit + year + latitude + longitude +
weather_freq_by_damage + traffic_control_freq, data = crash_nona)
Residuals:
Min 1Q Median 3Q Max
-12412 -2593 1080 2029 3784
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.327e+05 1.640e+04 -8.091 6.18e-16 ***
speed_limit -1.631e+01 1.943e+00 -8.395 < 2e-16 ***
year 8.295e+01 5.400e+00 15.360 < 2e-16 ***
latitude -4.620e+02 2.565e+02 -1.801 0.0717 .
longitude 5.605e+01 1.941e+02 0.289 0.7728
weather_freq_by_damage 2.770e-01 6.524e-03 42.460 < 2e-16 ***
traffic_control_freq -3.702e-03 5.221e-03 -0.709 0.4783
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2381 on 24855 degrees of freedom
Multiple R-squared: 0.07452, Adjusted R-squared: 0.0743
F-statistic: 333.6 on 6 and 24855 DF, p-value: < 2.2e-16
speed_limit, year and weather_freq_by_damage type are statistically significant predictors of crash damage frequency with p values of < 2e-16, < 2e-16 and < 2e-16 respectively.
Variables like latitude, longitude, weather_freq_by_damage, and traffic_control_freq are not statistically significant based on their high p-values (around or greater than 0.05).
Adjusted R^2 values:
Adjusted R^2 = 0.0743 suggests this model explains about 7.4% of the variability in the response variable damage_freq, even after adjusting for the number of predictors. While the model is statistically significant (F-statistic p < 2.2e-16), it’s not strong, the predictive power is low.
Facet wrap for better understanding yearly damage count based on type:
# Grouping data by year and damagecrash_yearly <- crash_nona %>%group_by(year, damage) %>%summarise(damage_count =n(), .groups ="drop")# Converting year to factor for discrete x-axiscrash_yearly$year <-as.factor(crash_yearly$year)# Facet plotggplot(crash_yearly, aes(x = year, y = damage_count, color = damage, group = damage)) +geom_point() +geom_line() +facet_wrap(~damage) +labs(title ="Crash Count by Year and Damage Type",x ="Year", y ="Number of Crashes",caption ="Source: Data.gov") +scale_color_viridis_d("Damage Type") +theme_bw()
2024 seems to be an interesting year based on this data, with a sudden upward trend.
Fatalities in general are low and property damages are more.
Final Visualizations…
Visualization 1:
library(leaflet.extras)
Warning: package 'leaflet.extras' was built under R version 4.5.2
# Filtering for 2024 and scale severity as 2024 showed a unusal upward trend in damage crash_heat_2024 <- crash_nona %>%filter(year ==2024) %>%mutate(intensity =case_when( damage =="Fatal Crash"~15, damage =="Injury Crash"~3, damage =="Property Damage Crash"~2, # For intensity of colors on heatmapTRUE~0 )) %>%mutate(intensity = intensity *20) # Increasing intensity scale# GIS plotleaflet(crash_heat_2024) %>%addProviderTiles("CartoDB.DarkMatter") %>%# Selecting theme for map# Heatmap layeraddHeatmap(lng =~longitude, lat =~latitude,intensity =~intensity, # Controls how strong each point affects the heatmapblur =15, # Blur effect of heatmapmax =10, # Adjust max intensityradius =20, # Radius of heatmapgradient =c( "0"="#440154", "0.2"="#3b528b","0.6"="#5dc962", "0.8"="#fde725", "1"="#ff5a00") ) %>%# Adding interactive markers with popupsaddCircleMarkers(lng =~longitude, lat =~latitude,radius =5, # Smaller radius to avoid overcrowdingcolor =~ifelse(damage =="Fatal Crash", "purple", ifelse(damage =="Injury Crash", "orangered", "yellow")),fillOpacity =0.5,stroke =FALSE,popup =~paste("Year: ", year, "<br>","Damage Type: ", damage, "<br>","Vehicle Make: ", vehicle_make, "<br>","Weather: ", weather ),group ="markers"# Creating a separate group for the markers so heatmap and points can be viewed separately ) %>%# Adding layer control to toggle between heatmap and markersaddLayersControl( # Adds a checkbox menu to turn the circle marker layer on/off.overlayGroups =c("markers"),options =layersControlOptions(collapsed =FALSE) )%>%# For legend in the bottom rightaddLegend(position ="bottomright",colors =c("purple", "orangered", "yellow"),labels =c("Fatal Crash", "Injury Crash", "Property Damage Crash"),title ="Damage Type",opacity =1)
The above is a GIS visualization to show damage type such as property crash or fatality in 2024 in Montgomery County, Maryland. A heatmap can also be seen if the marker is unchecked which shows the density of points i.e. crashes.
Understanding data for 2nd visualization:
# Group and count crashes per speed limitspeed_summary <- crash_nona %>%group_by(speed_limit, damage) %>%summarise(crash_count =n()) %>%arrange(speed_limit, damage)
`summarise()` has grouped output by 'speed_limit'. You can override using the
`.groups` argument.
ggplot(speed_summary, aes(x =factor(speed_limit), y = crash_count, fill = damage)) +geom_bar(stat ="identity") +labs(title ="Crash Frequency by Speed Limit and Damage Type",x ="Speed Limit (mph)",y ="Number of Crashes",caption ="Source: Data.gov" ) +theme_minimal() +scale_fill_manual(values =c("red", "maroon", "pink")) +# Adjust colors as neededtheme(legend.title =element_blank())
Visualization 2:
# Another GIS plot for speed limit# Using color palettes to represent speedpal_speed <-colorNumeric(palette ="viridis", domain = crash_nona$speed_limit) # Defining a numeric color scale (pal_speed) using the "viridis" palette.leaflet(crash_nona) %>%addProviderTiles("CartoDB.Positron") %>%addCircleMarkers(lng =~longitude, lat =~latitude,color =~pal_speed(speed_limit),fillOpacity =0.6,radius =3,popup =~paste("Speed Limit: ", speed_limit, "<br>Damage: ", damage) ) %>%addLegend("bottomright", pal = pal_speed, values =~speed_limit,title ="Speed Limit (mph)" ) # Adds a legend in the bottom right showing the color gradient for speed limits.
As seen through the analysis bar chart before the GIS, there’s a trend in crashed between speed limit of 25 - 45, with highest being property damage followed by injury and rarely any fatalities. This GIS chart shows each locations in Montgomery County, where the crash took place with respect to speed limit. It helps us know if there are any patterns.Includes popup as well for interactivity.
Visualization 3:
# Time series line graph# Defining selected vehicle makesselected_makes <-c("TOYOTA", "FORD", "HYUNDAI", "NISSAN", "HONDA", "MERCEDES", "PORSCHE", "VOLKSWAGEN", "TESLA") # Filtering and summarizing by year and makesummary_data <- crash_nona %>%filter(vehicle_make %in% selected_makes) %>%group_by(year, vehicle_make) %>%summarise(avg_damage =mean(damage_freq, na.rm =TRUE), .groups ="drop") # Averaging# Final line plotp <-ggplot(summary_data, aes(x =factor(year), y = avg_damage, color = vehicle_make, group = vehicle_make)) +geom_line(linewidth =1) +scale_color_viridis_d(option ="magma") +labs(title ="Average Damage per Crash by Vehicle Make (2015–2025)",x ="Year",y ="Average Damage Frequency",caption ="Data.gov"# Caption not showing up because of interactivity ) +theme(panel.background =element_blank(), # Removes background colorplot.background =element_blank(), # Removes plot area backgroundpanel.grid =element_blank() # Removes gridlines )# Converts ggplot to interactive plotly plotggplotly(p)
The above time series line graph shows the average damage_freq based on vehicle_make between 2015 and 2025. Plotly adds interactivity to it so specific data can be obtained.
Final Thoughts:
Understanding the spatial distribution of motor vehicle crashes and the factors associated with them is essential for improving road safety and guiding traffic policies. The data set I chose in this research contains crash records from Montgomery County, Maryland, with variables such as speed limit, crash year, weather condition, damage type, and geographic coordinates. Using R and the leaflet package, I created interactive GIS visualizations to explore how speed and damage type are spatially distributed across the county.
One visualization used a heatmap layered over a base map to show crash concentrations. Locations with a high density of crashes appear as intense areas, mostly clustered around major intersections and highways. Another visualization used circle markers colored by speed limit, allowing for quick assessment of where high-speed zones coincide with crashes. Interactive popups on these markers provided crash-specific information like year, weather, and damage severity.
According to the National Highway Traffic Safety Administration (NHTSA), speeding contributed to 29% of all traffic fatalities in 2021, and crash likelihood increases significantly with speed, especially in adverse weather or uncontrolled intersections (NHTSA, 2022). This supports the patterns seen in the GIS plot: higher-speed zones not only had more crashes but also a higher share of injury-related incidents.
Data set used for this research: Data.gov. (2025, May 10). Montgomery County of Maryland - Crash reporting - Drivers data. https://catalog.data.gov/dataset/crash-reporting-drivers-data