Car Crashes in Washington, D.C.

Car crashes are a major problem in Washington, D.C., with driving fatalities having increased to an all-time high in 2022. Washington, D.C. is a walkable city with many pedestrians and cyclists, and the increased number of car crashes has also resulted in increased risk for pedestrians and cyclists. Speed is a factor in crash-related injuries. From my own analysis of the data, 35.5% of crashes that were speeding-involved in D.C. in 2022 resulted in injuries, compared to 20.8% of crashes that were not speeding-involved crashes. Due to this, I focused on speeding-involved crashes.

Speeding-involved car crashes are concentrated on highways within D.C., which is intuitive as speed limits and average speeds are higher in these areas. At the same time, there are hot spots that stick out, as shown in the heat map layer of the map below.

Car crash data was obtained from D.C. Open Data, available here.

R packages used for data manipulation include lubridate for date modification, tidyverse packages for data wrangling, and janitor to modify column names.

R packages for mapping include leaflet, leaflet.extras, and leaflet.extras2. The htmltools package is used for additional modification of the HTML and CSS used on the map.

library(tidyverse)
library(janitor)
library(lubridate)
library(htmltools)
library(leaflet)
library(leaflet.extras)
library(leaflet.extras2)
library(leafsync)

Data Preparation

In order to prepare the data for mapping, it is necessary to remove observations that are missing coordinate data or that contain coordinates outside of Washington, D.C.

# Read in DC Car Crash data from DC Open Data
crashes <- read_csv("data/Crashes_in_DC.csv")
crashes <- crashes %>% clean_names()

# Remove data that is missing or outside of D.C.
crashes <- crashes %>% filter(!is.na(latitude) & !is.na(longitude))

# Filter dataset to remove observations outside of D.C.
crashes <- crashes %>% 
  filter(between(longitude,-77.119759,-76.909395),
         between(latitude,38.791645,38.99511))

Add New Variables

Next, I add variables derived from existing variables in the dataset. These include speeding and date columns such as year. Using the derived year column, I am able to limit my analysis to focus on crashes within 2022.

# Add variables
# Add speeding variable derived from speeding_involved (var that sums speeding vehicles)
crashes <- crashes %>% 
  mutate(speeding = if_else(speeding_involved > 0, "Yes", "No"))


# Parse year, month, day, hour, DOW, week
crashes <- crashes  %>%
  mutate(year = as.integer(year(fromdate)),
         month = as.integer(month(fromdate)),
         day = day(fromdate),
         dow = wday(fromdate),
         week = week(fromdate))


# Filter for 2022 only
crashes <- crashes %>% filter(year == 2022)

# Add month name and month abbr
crashes <- crashes %>% 
  mutate(month_abbr = month(fromdate, label = TRUE),
         month_name = month(fromdate, label = TRUE, abbr = FALSE))

crashes <- crashes %>% mutate(date_abbr = paste(month, day, year, sep = "/"))
crashes <- crashes %>% mutate(date = paste0(month_name," " ,day,", " ,year))

Next, I created a variable that sums injuries across all injury columns and create a subset of the data that is limited to crashes identified as speeding-related. I created additional variables for injuries relating to pedestrians, cyclists, users of active transportation, and passengers and drivers.

I also created a subset that contains only speed-related crashes.

# Add injuries
crashes <- crashes %>% 
  mutate(total_injuries = across(contains("injuries")) %>% rowSums)

# Add injury categories
crashes <- crashes %>% 
  mutate(injury_cat = case_when(
    total_injuries == 0 ~ "None",
    total_injuries == 1 ~ "One",
    total_injuries == 2 ~ "Two",
    total_injuries >= 3 ~ "Three or More"
  ))

# Add pedestrian injuries
crashes <- crashes %>% 
  mutate(pedestrian_injuries = across(contains("injuries_pedestrian")) %>% rowSums)

# Add cyclist injuries
crashes <- crashes %>% 
  mutate(cyclist_injuries = across(contains("injuries_bicyclist")) %>% rowSums)

# Add driver injuries
crashes <- crashes %>% 
  mutate(driver_injuries = across(contains("injuries_driver")) %>% rowSums)

# Add passenger injuries
crashes <- crashes %>% 
  mutate(passenger_injuries = across(contains("injuries_passenger")) %>% rowSums)


# Add active transit injuries
crashes <- crashes %>% 
  mutate(active_transit_injuries = pedestrian_injuries + cyclist_injuries) 

# Driver or passenger injuries
crashes <- crashes %>% 
  mutate(driver_or_pass_injuries = driver_injuries + passenger_injuries) 

# Create derived variables for each form of transit and injuries
crashes <- crashes %>% 
  mutate(injured_pedestrian = if_else(pedestrian_injuries > 0, 1, 0),
         injured_cyclist = if_else(cyclist_injuries > 0, 1, 0),
         injured_pass_or_driver = if_else(driver_injuries > 0 | passenger_injuries >0, 1, 0),
         injured_active_transit = if_else(pedestrian_injuries > 0 | cyclist_injuries >0, 1, 0))

active_transit_inj_involved_crashes <- crashes %>% filter(active_transit_injuries > 0)
driver_or_pass_inj_involved_crashes <- crashes %>% filter(driver_or_pass_injuries > 0)


# Create subset of speed-related crashes
crashes_speed <- crashes %>% filter(speeding == "Yes")
crashes_speed <- crashes_speed %>% 
  mutate(density = 0) %>%
  as.data.frame()

Customize Map Appearance

In order to customize the look of the Leaflet map, I add CSS to change the font and color.

# Add CSS for map
map_css <- 
  ".leaflet-popup-content-wrapper {
    color:#2c3e50;
    background:#fff;
    font-size:14px;
    font-family: Lato, sans-serif;
    }
  "

# Convert CSS to HTML to be used by HTMLtools package
map_html <- htmltools::tags$style(type = "text/css", map_css)

Map of Speeding-Involved Car Crashes in Washington, D.C.

Once the data is prepared, I created a map of speed-related crashes.

First, I created a palette for the circle markers; I decided to create a palette related to the number of injuries in each speed-related crash.

Next, I created a map that contains both a layer containing the coordinates of speed-related crashes and a heat map layer that highlights where speed-related crashes are most common. The map can be switched between the Carto Positron basemap and the Carto Dark Matter basemap in order to see the heat map more easily. Using the layer control panel, it is possible to toggle between the points and heat map layer.

# Palette for injuries
pal_inj <- colorFactor(c("grey", "#EBC138", "#FF5509", "#761E0F"), 
                       domain = c("None", "One", "Two", "Three or More"),
                       ordered = T)

# Palette for injuries
pal_inj_outline <- colorFactor(c("#B2BEB5", "#ffeda0", "#feb24c", "#f03b20"), 
                       domain = c("None", "One", "Two", "Three or More"),
                       ordered = T)


# Popup
popup_speed_inj <- paste0("</br><strong>Date: </strong>", as.character(crashes_speed$date),
                          "</br><strong>Address: </strong>", as.character(crashes_speed$address),
                          "</br><strong>Total Injuries: </strong>", as.character(crashes_speed$total_injuries))

leaflet(crashes_speed) %>%
  addProviderTiles(providers$CartoDB.DarkMatter, group = "Carto Dark Matter") %>% 
  addProviderTiles(providers$CartoDB.Positron, group = "Carto") %>% 
  addWebGLHeatmap(
    lng = ~longitude,
    lat = ~latitude,
    size = 1500,
    units = "m",
    intensity = 0.1,
    gradientTexture = "skyline",
    alphaRange = 1,
    opacity = 0.7,
    layerId = "Speed-Related Crashes: Heat Map",
    group = "Speed-Related Crashes: Heat Map"
  ) %>%
  addCircleMarkers(
    ~longitude, 
    ~latitude,
    weight = 1,
    opacity = .8,
    fillOpacity = ifelse(crashes_speed$total_injuries > 0, .9, .3),
    radius = 4,
    color = ~pal_inj_outline(crashes_speed$injury_cat),
    popup = popup_speed_inj,
    fillColor = ~pal_inj(crashes_speed$injury_cat),
    group = "Speed-Related Crashes: Points"
  ) %>%
  addLegend(data = crashes_speed, 
            pal = pal_inj, 
            title = "Total Injuries",
            values = ~injury_cat, 
            opacity = 1) %>%
  setMaxBounds(-77.119759, 38.99511, -76.909395, 38.791645) %>%
  addLayersControl(
    baseGroups = c("Carto", "Carto Dark Matter"),
    overlayGroups = c("Speed-Related Crashes: Points",
                      "Speed-Related Crashes: Heat Map"),
    position = "bottomleft",
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  hideGroup("Speed-Related Crashes: Heat Map") %>%
  htmlwidgets::prependContent(map_html)

Figure 1. A map depicting speed-related crashes in Washington, D.C. in 2022. Each crash is displayed as a point with a color corresponding to the number of injuries. An additional layer displays a heat map of all speed-related crashes within Washington, D.C. in 2022. The dark matter basemap option can better display the hotspots, but it is possible to choose between two basemaps.

Active Transit and Driver/Passenger Injuries

The pattern of injury-related crashes throughout D.C. varies by who is being injured.

Some areas are shared between the two maps below, which depict crashes involving injuries of active transit users (pedestrians or cyclists) and crashes involving injuries of drivers or passengers. The U Street Corridor, a well-frequented nightlife area, appears on both maps. On the other hand, the active transit-involved map features other hotspots, such as Foggy Bottom, home to George Washington University. Driver or passenger-involved crashes are more likely to take places at complicated intersections and on arterial roads with high speeds.

These heat maps are synced using the sync() function from the leafsync package. This allows you to compare different points throughout D.C.

active_transit_involved_crash_map <- leaflet(active_transit_inj_involved_crashes) %>%
  addProviderTiles(providers$CartoDB.DarkMatter, group = "Carto Dark Matter") %>% 
  addProviderTiles(providers$CartoDB.Positron, group = "Carto") %>% 
  addWebGLHeatmap(
    lng = ~longitude,
    lat = ~latitude,
    size = 1000,
    units = "m",
    intensity = 0.1,
    gradientTexture = "skyline",
    alphaRange = 1,
    opacity = 0.7,
    group = "Active Transit-Involved Crashes"
  ) %>%
  setMaxBounds(-77.119759, 38.99511, -76.909395, 38.791645) %>%
  addLayersControl(
    baseGroups = c(
      "Carto Dark Matter", "Carto"
    ),
    overlayGroups = c("Active Transit-Involved Crashes"),
    position = "bottomleft",
    options = layersControlOptions(collapsed = FALSE)
  ) 

driver_or_pass_involved_crash_map <- leaflet(driver_or_pass_inj_involved_crashes) %>%
  addProviderTiles(providers$CartoDB.DarkMatter, group = "Carto Dark Matter") %>% 
  addProviderTiles(providers$CartoDB.Positron, group = "Carto") %>% 
  addWebGLHeatmap(
    lng = ~longitude,
    lat = ~latitude,
    size = 1000,
    units = "m",
    intensity = 0.05,
    gradientTexture = "skyline",
    alphaRange = 1,
    opacity = 0.7,
    group = "Driver or Passenger-Involved Crashes"
  ) %>%
  setMaxBounds(-77.119759, 38.99511, -76.909395, 38.791645) %>%
  addLayersControl(
    baseGroups = c(
      "Carto Dark Matter", "Carto"
    ),
    overlayGroups = c("Driver or Passenger-Involved Crashes"),
    position = "bottomleft",
    options = layersControlOptions(collapsed = FALSE)
  ) 

sync(active_transit_involved_crash_map, driver_or_pass_involved_crash_map)

Figure 2. Two heat maps displaying crashes with active transit user injuries (left) and crashes with driver or passenger injuries (right). Active transit is defined here as including pedestrians and cyclists. Some areas are shared between the two maps, such as the U Street Corridor in Northwest D.C. The dark matter basemap option can better display the hotspots, but it is possible to choose between two basemaps.