Traffic incidents in NYC and review of Vision Zero effectiveness

View
Sreejaya, Vuthy and Suman
May, 2018

CUNY SPS - DATA 698.4 Capstone

Introduction - Accidents in NYC

Today in NYC

  • Approximately 4,000 people are seriously injured and over 250 are killed each year in motor vehicle accidents.
  • Being struck by a vehicle is the leading cause of injury-related death for children under 14, and the second leading cause for seniors.
  • On average, vehicles seriously injure or kill a New Yorker every two hours.
  • 56% of all NYC traffic fatalities are pedestrians, versus 11% nationwide.

Introduction - Vision Zero

  • An action plan released in 2014, in an effort to reduce traffic fatalities and injuries in NYC.
  • Regard every accident as preventable incident that can be systematically addressed.
  • A multifaceted approach, incorporating education, engineering, enforcement and legislation.
  • 14 different Vision Zero initiatives that are in place today: Slowzones, Speed Humps, Signal Timings, Bike Priority, Enhanced Crossing, Lead Pedestrian Interval, Left Turn Traffic Calming, Safe Streets for Seniors, Priority Corridors, Zones and Intersections

Objectives

  • Exploratory study / Investigation of factors associated with the traffic incidents in NYC:

    • Who are involved in majority of traffic violations?
    • What are the Key contributing factors to the traffic accidents?
    • Where are the HotSpots?
    • Which boroughs had more accidents related to injuries/fatalities ?
  • Impact of Vision Zero

    • Before and After Vision Zero analysis
  • Time series analysis with traffic accident data

EDA - Accidents By Year

plot of chunk unnamed-chunk-2plot of chunk unnamed-chunk-2 Note: Data for 2012, 2018 is incomplete

EDA - Accidents By Borough

plot of chunk unnamed-chunk-3

EDA - Traffic Violations

plot of chunk unnamed-chunk-4

EDA - Traffic Violations

plot of chunk unnamed-chunk-5

EDA - Contributing Factors

plot of chunk unnamed-chunk-6

EDA - Contributing Vehicles

plot of chunk unnamed-chunk-7

EDA - Pedestrian Incidents

plot of chunk unnamed-chunk-8 Note: Above are Kernal Density Graphs

EDA - Pedestrian Incidents in areas with SlowZone initiative

plot of chunk unnamed-chunk-9

Data Prep - Augment with Vision Zero Initiatives

plot of chunk unnamed-chunk-10

Clusters - Why?

  • In the past, not having enough spatial data was a hurdle
  • Today, too much data is the probelm !
  • Too many scattered points on a map can be overwhelming !
  • Displaying on a device would be processor intensive !
  • Solution ?
    • Compress spatially redundant data points into a set of representative features.

Clustering - Review K-Means

  • We need to provide K, the number of clusters.
  • Place centroids \( C_1 ...C_K \) at random locations.
  • Repeat until convergence: (Stop when none of the cluster assignments change)
    • for each point \( X_i \) :
      • find nearest centroid \( C_j \)
      • assign the point \( X_i \) to cluster j
    • for each cluster j = 1…K:
      • new centroid \( c_j \) = mean of all points \( X_i \) assinged to cluster j in prior step

Clustering - DBSCAN

  • Human intuitive clustering method best suited for spatial data points (non-flat geometry/uneven clusters).
  • Views clusters as areas of high density separated by areas of low density
  • Got 2 parameters:
    • epsilon , essentially decides the size of the neighborhood.
    • minPoints, minimum cluster size
  • You do not need to specify the number of clusters, it determines automatically based on the params.

Clustering - Identify HotSpots

plot of chunk unnamed-chunk-11

App - HotSpotViewer

Time Series Analysis - Data Prep

plot of chunk unnamed-chunk-13

  • Split the dataset - Persons injured in NYC Road Accidents before and after 2017
  • Subset our dataset just for Brooklyn and Accidents caused by 'Driver Distraction'

Time Series Analysis - EDA

  • Summarized accidents by month.
  • Train Dataset 55, Test dataset 15
  • Explore this time series as a data visualization

plot of chunk unnamed-chunk-14

Time Series Analysis - ARIMA

  • ARIMA is a model that can be fitted to time series data to better understand and predict future points in the series.
  • ARIMA(p,d,q) model, where:
    • p is the number of autoregressive terms,
    • d is the number of non-seasonal differences needed for stationarity, and
    • q is the number of lagged forecast errors in the prediction equation. Let y denote the dth difference of Y, which means:
  • If d=0: \( y_t = Y_t \)
  • If d=1: \( y_t = Y_t - Y_{t-1} \)
  • If d=2: \( y_t = (Y_t - Y_{t-1}) - (Y_{t-1} - Y_{t-2}) = Y_t - 2Y_{t-1} + Y_{t-2} \) (discrete analog of a second derivative)

Time Series Analysis - Dickey-Fuller test

  • We used Dickey-Fuller test see whether the TS is Stationary or not: The test results comprise of a Test Statistic and some Critical Values for different confidence levels. If the 'Test Statistic' is less than the 'Critical Value', then the series is stationary.

  • ACF (autocorrelation function) and PACF (partial autocorrelation) plots are used to find the AR(p) and MA(q) terms.

plot of chunk unnamed-chunk-15

Time Series Analysis - Log Transformation.

  • Used log transformation of the series.
  • To eliminate seasonality used Differencing which takes the difference with a time lag.

plot of chunk unnamed-chunk-16

Time Series Analysis - Model.

  • Our model shows that there is a not much variation in the number of accidents due to DISTRACTED DRIVING from 2017 onwards.

plot of chunk unnamed-chunk-17

Conclusion

  • Mixed results from vision zero analysis.
  • Human factors ( driver distraction, improper driving ) are major reasons for the accidents.
  • Though Manhattan had more traffic violations, Brooklyn and Queens had more accidents that resulted in injuries/deaths.
  • Over 50% of violations are attributed to younger drivers (35 and under).
  • Passenger vehicles were most likely to be involved in the accidents.
  • Hotspot viewer showed reduced pedestrian spots after VZ initiative. However no such indication in multi vehicle incidents.
  • Time series analysis forecasted a gradual decrease in incidents.

Future Work

  • Include more incident data sets for hotspot analysis.
  • Expand Time series analysis with other factors.
  • Combine incident data with weather data and investigate the factors (environment factors like weather) for the increase of incidents in 2017.
  • Enhance the incident data sets by imputing the location data.