Introduction

Crash data provides us with useful information about accident trends in Montgomery County Maryland. This dataset includes ten years of crash data. can We can use this data to study the most likely times that accidents will occur as well as the contributing factors of these accidents.

Dataset

The dataset used is from Data.gov provided by Montgomery County Government. Montgomery County Crash dataset provides information on motor vehicle drivers involved in traffic collisions occurring on county and local roadways.

The dataset is comprised of accident data for roadways in Montgomery County Maryland dated from 2015 through 2024 and is updated weekly.

Information collected and reported includes, weather conditions, severity of accident, number of accidents per day, week, month, and year.

setwd("U:/Data Sets")



fileURL <- "https://data.montgomerycountymd.gov/api/views/mmzv-x632/rows.csv?accessType=DOWNLOAD"

download.file(fileURL, "R_datafiles//Motor_Vehicle_Crashes_MD.csv", method = "curl")

df <- read.csv("R_datafiles//Motor_Vehicle_Crashes_MD.csv")

library(lubridate)
library(dplyr)
library(scales)
library(ggthemes)
library(ggplot2)
library(ggrepel)
library(RColorBrewer)

Findings

ColsToDrop <- c("Municipality", "Cross-Street Name", "Route.Type", "Road.Name", "Non-Motorist.Substance.Abuse", "Location", "Circumstance", "Report Number", "Cross.Street.Name",  "Local.Case.Number", "Vehicle.Going.Dir", "Vehicle.Body.Type" )
df <- df[, !names(df) %in% ColsToDrop]

df[df == 0] <- NA

df <- na.omit(df)

Accidents by Year

The data shows that the yearly count of accidents has decreased since a dramatic drop in 2020. The low accidents in 2020 is likely due to the government shut downs and widespread remote work as a result of the Covid-19 Pandemic. Work from Home became necessary and continued for many people but we are seeing a steady uptick in number of accidents and are on track to reach prepandemic levels.

df$Year <-year(mdy_hms(df$Crash.Date.Time))

library(scales)
p1 <- ggplot(df, aes(x=Year)) +
  geom_histogram(bins = 10, color="blue", fill="yellow") +
  labs(title = "Histogram of Accidents by Year", x = "Year", y = "Count of Accidents") +
  scale_y_continuous(labels=comma) +
  stat_bin(binwidth=1, geom = 'text', color='blue', aes(label=scales::comma(..count..)), vjust=-0.4) +
  theme(plot.title = element_text(hjust = 0.5))

x_axis_labels <- min(df$Year):max(df$Year)

p1 <- p1 + scale_x_continuous(labels = x_axis_labels, breaks = x_axis_labels)

p1

Accidents by Hour

Here we can see the amount of accidents by hour from 2015 through 2024. The least accidents occur at 4:00AM, likely due to the decreased amount of people driving. The highest number of accidents occur at 17:00 or 5:00PM which is the middle of “rush hour” when the majority of people are commuting from work.

We can see that accidents spike in the 7:00AM hour and the hours of 3:00PM, 4:00PM, and 5:00PM.

hours_df <- df %>%
  select(Crash.Date.Time) %>%
  mutate(hour24 = hour(mdy_hms(Crash.Date.Time))) %>%
  group_by(hour24) %>%
  summarise(n = length(Crash.Date.Time), .groups = 'keep') %>%
  data.frame()
x_axis_labels = min(hours_df$hour24):max(hours_df$hour24)


hi_lo <- hours_df%>%
  filter(n == min(n) | n == max(n)) %>%
  data.frame()

ggplot(hours_df, aes(x = hour24, y = n)) +
  geom_line(color='black', size=1) +
  geom_point(shape=21, size=4, color='blue', fill='green') +
  labs(x="Hour", y = "Accident Count", title="Accidents by Hour", caption="Source: Montgomery County Government")+
  scale_y_continuous(labels=comma) +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_x_continuous(labels = x_axis_labels, breaks = x_axis_labels, minor_breaks = NULL) +
  geom_point(data = hi_lo, aes(x = hour24, y = n), shape=21, size=4, fill='yellow', color='red') +
  geom_label_repel(aes(label = ifelse(n == max(n) | n == min(n), scales::comma(n) , "")),
                   box.padding = 1,
                   point.padding = 1,
                   size=2,
                   color='black',
                   segment.color = 'red')

Accidents by Month

Accidents by month follow similar patterns each year excluding 2020 were there is a significant dip in accidents in April of 2020, the start of the government mandated shut downs for Covid-19.

Accidents are still lower than prepandemic levels but we can see an uptick in accidents during the summer and winter. Likely due to increased number of people traveling and taking vacations.

months_df <- df %>%
  select(Crash.Date.Time) %>%
  mutate(months = months(mdy_hms(Crash.Date.Time), abbreviate = TRUE),
         year = year(mdy_hms(Crash.Date.Time))) %>%
  group_by(year, months) %>%
  summarise(n = length(Crash.Date.Time), .groups='keep') %>%
  data.frame()


months_df$year <- factor(months_df$year)

mymonths <- c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' )

month_order <- factor(months_df$months, levels=mymonths)

x = min(as.numeric(levels(months_df$year)))

y = max(as.numeric(levels(months_df$year)))


months_df$year <- factor(months_df$year, levels = seq(y, x, by = -1))


ggplot(months_df, aes(x = month_order, y = n, fill=year)) +
  geom_bar(stat = "identity", position="dodge") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = comma) +
  labs(title = "Multiple Bar Charts - Total Accidents by Month by Year",
       x = "Months of the Year",
       y = "Accident Count",
       fill = "Year") +
  facet_wrap(~year, ncol=5, nrow=10)

Accident by Day of the Week

This heatmap shows the number of accidents pr day of the week. Immediatly we see a dramatic drop in accidents between 2015 to 2019 and 2020 to 2024.

Pre-Pandemic we see that the majority of accidents occured between Monday and Friday. From 2020 through 2024 accident numbers are similar throughout the week with Friday having a slightly higher number of accidents and Sunday having the lowest amount of accidents.

days_df <- df %>%
  select(Crash.Date.Time) %>%
  mutate(year = year(mdy_hms(Crash.Date.Time)),
         dayoftheweek = weekdays(mdy_hms(Crash.Date.Time), abbreviate = TRUE)) %>%
  group_by(year, dayoftheweek) %>%
  summarise(n = length(Crash.Date.Time), .groups='keep') %>%
  data.frame()

days_df$year <- as.factor(days_df$year)

mylevels <- c('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')
days_df$dayoftheweek <- factor(days_df$dayoftheweek, levels = mylevels)

breaks <- c(seq(0, max(days_df$n), by=500))

ggplot(days_df, aes(x= year, y=dayoftheweek, fill=n)) +
  geom_tile(color="black") +
  geom_text(aes(label = comma(n))) +
  coord_equal(ratio=1) +
  labs(title= "Heatmap: Accidents by Day of the Week",
       x = "Year",
       y = "Day of the Week",
       fill = "Accident Count") +
  theme_minimal() +
  theme(plot.title = element_text(hjust=0.5)) +
  scale_y_discrete(limits = rev(levels(days_df$dayoftheweek))) +
  scale_fill_continuous(low="white", high="red", breaks = breaks) +
  guides(fill = guide_legend(reverse=TRUE, override.aes = list(colour="black")))

Weather Conditions

The following pie charts show the top 10 Weather Conditions cited in accident reports. Poor weather conditions are often cited as a contributing factor for motor vehicle collisions.

The majority of accidents occur during clear weather, followed by Cloudy and Raining. The information collected shows that weather as a contributing factor for accidents largely remained the same over the recorded 10 years of data. The vast majority of accidents occur during clear weather. We can infer from this information that weather, while a contributing factor, is not major predictor for likelihood of getting into an accident.

top_weather_condition <- count(df, Weather)


weather_df <- df %>%
  select(Weather, Crash.Date.Time) %>%
  mutate(Year = year(mdy_hms(Crash.Date.Time)),
         TopWeather = ifelse(Weather=="RAINING","RAINING", ifelse(Weather=="CLEAR","CLEAR", ifelse(Weather=="SNOW","SNOW",ifelse(Weather=="CLOUDY","CLOUDY", ifelse(Weather=="FOGGY","FOGGY", "Other")))))) %>%
  group_by(Year, TopWeather) %>%
    summarise(n=length(TopWeather), .groups='keep') %>%
  group_by(Year) %>%
  mutate(percent_of_total = round(100*n/sum(n),1)) %>%
  ungroup() %>%
  data.frame()

ggplot(weather_df, aes(x="", y = n, fill = TopWeather)) +
  geom_bar(stat="identity", position = "fill") +
  coord_polar(theta = "y", start = 0) +
  labs(fill = "Weather", x = NULL, y = NULL,
       title = "Top Accident Weather Condition",
       caption = "Slices under 5% are not labeled") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.grid = element_blank()) +
  facet_wrap(~Year, ncol=4, nrow = 4) +
  scale_fill_brewer(palette = "Set1") +
  geom_text(aes(x=1.7, label =ifelse(percent_of_total>5,paste0(percent_of_total,"%"), "")),
            size=1,
            position = position_fill(vjust = 0.5))

Vehicle Damage

This chart shows the top 10 vehicle damage extent over the last 10 years. In 2024 the reporting changed from all caps, however we can see that over the years the amount of damage extent in each catagory remains steady each year. The majority of accidents result in superficial damage and/or minor damage that leaves the vehicle functional. The next most likely outcome is damage that disables the vehicle.

df_Vehicle.Damage.Extent <- count(df, Vehicle.Damage.Extent)
df_Vehicle.Damage.Extent <- df_Vehicle.Damage.Extent[order(df_Vehicle.Damage.Extent$n, decreasing = TRUE),]

Top_Damage <- df_Vehicle.Damage.Extent$Vehicle.Damage.Extent[1:10]


new_df <- df %>%
  filter(Vehicle.Damage.Extent %in% Top_Damage) %>%
  select(Crash.Date.Time, Vehicle.Damage.Extent) %>%
  mutate(year = year(mdy_hms(Crash.Date.Time))) %>%
  group_by(Vehicle.Damage.Extent, year) %>%
  summarise(n = length(Vehicle.Damage.Extent), .groups = 'keep') %>%
  data.frame()

other_df <- df %>%
  filter(!Vehicle.Damage.Extent %in% Top_Damage) %>%
  select(Crash.Date.Time) %>%
  mutate(year = year(mdy_hms(Crash.Date.Time)), Vehicle.Damage.Extent = "Other") %>%
  group_by(Vehicle.Damage.Extent, year) %>%
  summarise(n = length(Vehicle.Damage.Extent), .groups = 'keep') %>%
  data.frame()

new_df <- rbind(new_df, other_df)

agg_tot <- new_df %>%
  select(Vehicle.Damage.Extent, n) %>%
  group_by(Vehicle.Damage.Extent) %>%
  summarise(tot = sum(n), .groups = 'keep') %>%
  data.frame()

injury_df <- df %>%
  filter(Vehicle.Damage.Extent %in% Top_Damage) %>%
  select(Vehicle.Damage.Extent, Injury.Severity) %>%
  group_by(Vehicle.Damage.Extent) %>%
  data.frame()

new_df$year <- as.factor(new_df$year)

ggplot(new_df, aes(x = reorder(Vehicle.Damage.Extent, n, sum), y = n, fill = year)) +
  geom_bar(stat = "identity", position = position_stack(reverse = TRUE)) +
  coord_flip() +
  labs(title = "Vehicle Damage Extent", x = "", y = "Accident Count", fill = "Year") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_brewer(palette = "Paired", guide = guide_legend(reverse = TRUE)) +
  geom_text(data = agg_tot, aes(x = Vehicle.Damage.Extent, y = tot, label = scales::comma(tot), fill = NULL), hjust = -0.1, size = 3) +
  scale_y_continuous(labels = comma)

Wrap up

In conclusion, although the amount of accidents significantly lowered in 2020, the rate of accidents in Montgomery County Maryland is steadily rising to pre-pandemic levels. This shows us that external factors like weather are contributing factors but the largest predictor of accidents is a high volume of drivers during peak hours.