IS 607 Final Project

In this assignment, I will be looking at a dataset of motor vehicle collisions in New York City over the last three years. The dataset is hosted on the NYC Open Data website and includes all vehicle collisions recorded by the NYPD since July 2012. This is a live dataset that is updated every weekday with recent accident data. Over 50,000 accidents are recorded in the set, along with details relevant to the accidents: time, data, vehicle type, possible contributing factors, injuries, fatalities, and the precise coordinates of each incident. I believe Vision Zero is an important policy initiative to reduce traffic accidents in New York City, but I am uncertain of the efficacy of reducing speed limits. My hypothesis is that the most accidents in NYC occur at difficult intersections and conjection points, and not necessarily as a result of speeding.

To start our analysis, we will gather our data and load the necessary packages we need for our analysis. For optimal performance, we will pull the json data directly from the website and store it in a dataframe for analysis.

The Motor Vehicle Collision Data Can be accessed here:

https://nycopendata.socrata.com/Public-Safety/NYPD-Motor-Vehicle-Collisions/h9gi-nx95

In JSON Format:

https://data.cityofnewyork.us/resource/h9gi-nx95.json

Data Sources for Road Miles and Blocks For Each Borough:

Manhattan: http://www.fcny.org/cmgp/streets/pages/2001PDF/Report/DFMN.pdf Queens: http://www.fcny.org/cmgp/streets/pages/2001PDF/Report/DFQU.pdf Bronx: http://www.fcny.org/cmgp/streets/pages/2001PDF/Report/DFBX.pdf Staten Island: http://www.fcny.org/cmgp/streets/pages/2001PDF/Report/DFSI.pdf Brooklyn: http://www.fcny.org/cmgp/streets/pages/2001PDF/Report/DFbk.pdf

suppressWarnings(suppressMessages(library(ggmap)))
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(plyr)))
suppressWarnings(suppressMessages(library(RCurl)))
suppressWarnings(suppressMessages(library(jsonlite)))
suppressWarnings(suppressMessages(library(knitr)))

This call will access the up-to-date Json data and convert it into a dataframe:

jsonURL <- getURL("https://data.cityofnewyork.us/resource/h9gi-nx95.json?$limit=50000")
accidents <- fromJSON(jsonURL,simplifyDataFrame=TRUE)

Some of the accident data will need to be converted to class numeric or factor for our analysis:

accidents$borough <- as.factor(accidents$borough)
accidents$latitude <- as.numeric(accidents$latitude)
accidents$longitude <- as.numeric(accidents$longitude)

I gathered information on the number of blocks and road miles per borough from the above reports published by the Fund for the City of New York. Since we do not have data on the number of intersections in NYC, we can infer that it will be roughly proportional to the number of blocks. Unfortunately, the information is stored in PDFs, so I had to gather the information manually.

collisions <- summary(accidents$borough)[2:6]
miles <- c(771,1742,508,2136,788)
blocks <- c(9921,21410,6718,27780,9937)
densityData <- rbind(collisions,miles,blocks)
densityData <- as.data.frame(densityData)

Now we will draw up some descriptive graphics. We can use a pie chart to visualize the total number of accidents by borough, as it is a small number of categories. Here we see that approximately 15% of the 500,000+ accidents are lacking geographical information. My hypothesis would be that the number of accidents is proportional to the density of the borough. The pie chart shows that there are more accidents in Brooklyn than anywhere else and that there are approximately as many accidents in Queens as in Manhattan.

byBorough <- summary(accidents$borough) 
pie(byBorough)

Quick Facts: Total Number of Miles and Blocks in NYC

sum(miles)

## [1] 5945

sum(blocks)

## [1] 75766

Now we will examine contributing factors in accidents. The dataset has five columns of contributing factors to accidents, one for each possible vehicle involved, up to 5 different cars. We will see that there is a wide variety of contributing factors to accidents in NYC, but unsafe speed is only mentioned in 43 cases in the entire dataset.

allFactors <- rbind(accidents$contributing_factor_vehicle_1,accidents$contributing_factor_vehicle_2,accidents$contributing_factor_vehicle_3,accidents$contributing_factor_vehicle_4,accidents$contributing_factor_vehicle_5)
allFactors <- as.factor(allFactors)
allFactors <- sort(summary(allFactors),decreasing=T)
facTable <- as.table(allFactors) 
# Remove the first two values, since they are NA and "Unspecified"
facTable <- facTable[-1]
facTable <- facTable[-1]
facTable <- as.data.frame(facTable)
kable(facTable, style = "markdown")

	facTable
Driver Inattention/Distraction	6645
Failure to Yield Right-of-Way	2517
Fatigued/Drowsy	2510
Other Vehicular	2024
Backing Unsafely	1707
Lost Consciousness	1450
Turning Improperly	1260
Prescription Medication	1232
Pavement Slippery	864
Driver Inexperience	742
Physical Disability	721
Traffic Control Disregarded	662
Outside Car Distraction	565
Alcohol Involvement	410
Oversized Vehicle	337
Passenger Distraction	278
View Obstructed/Limited	256
Other Electronic Device	189
Aggressive Driving/Road Rage	177
Illness	169
Glare	131
Reaction to Other Uninvolved Vehicle	125
Failure to Keep Right	106
Brakes Defective	100
Obstruction/Debris	100
Fell Asleep	73
Pavement Defective	73
Unsafe Speed	43
Drugs (Illegal)	40
Steering Failure	40
Lane Marking Improper/Inadequate	36
Traffic Control Device Improper/Non-Working	35
Following Too Closely	27
Accelerator Defective	23
Animals Action	23
Tire Failure/Inadequate	23
Passing or Lane Usage Improper	22
Unsafe Lane Changing	21
Cell Phone (hands-free)	14
Pedestrian/Bicyclist/Other Pedestrian Error/Confusion	9
Windshield Inadequate	7
Cell Phone (hand-held)	6
Headlights Defective	5
Other Lighting Defects	5
Shoulders Defective/Improper	4
Tow Hitch Defective	3

Now we will perform linear regression on both blocks and miles as the independent variables.The coefficients from the linear regression show us tha that the number of road miles in a borough has a fairly significant effect on the number of collisions that occur, but the number of blocks is fairly insignificant. In the following two plots, we will plot each variable separately against the number of collisions along with the least fit line. We will see that there is a negative correlation between the amount of road miles in a borough and the number of collisions. The plot showing blocks and collisions hardly varies from the first, as it has little effect on the number of accidents. There is also however a high degree of variability in these plots, but this is to be expected with a small number of data points.

fit <- lm(collisions ~ miles + blocks, data=densityData)
summary(fit)$coefficients

##                 Estimate  Std. Error   t value   Pr(>|t|)
## (Intercept) 13858.665944 3137.182561  4.417552 0.04761322
## miles          58.569570   41.621555  1.407193 0.29465486
## blocks         -4.922372    3.270036 -1.505296 0.27118757

plot(blocks,collisions,xlab="Blocks",ylab="Collisions",main="Density 1 - Collision vs. Blocks")
abline(lm(collisions ~ blocks,densityData))

plot(miles,collisions,xlab="Road Miles",ylab="Collisions",main="Density 2 - Collision vs. Road Miles")
abline(lm(collisions ~ miles,densityData))

Now we will create heatmaps of accidents for each borough. We will do so by creating a new dataframe of accidents for each borough and plotting the density of accidents with the \(\textit{ggmap}\) package.

First we will subset the dataframes for each borough:

justManhattan <- subset(accidents,borough == "MANHATTAN")
justBrooklyn <- subset(accidents,borough == "BROOKLYN")
justQueens <- subset(accidents,borough == "QUEENS")
justBronx <- subset(accidents,borough == "BRONX")
justSI <- subset(accidents,borough == "STATEN ISLAND")

Now we will initialize the map objects for each:

mapManhattan <- suppressWarnings(suppressMessages(get_map(location="Manhattan", source="google", maptype="roadmap", crop=FALSE,zoom=12)))
mapBrooklyn <- suppressWarnings(suppressMessages(get_map(location="Brooklyn", source="google", maptype="roadmap", crop=FALSE,zoom=12)))
mapQueens <- suppressWarnings(suppressMessages(get_map(location="Queens", source="google", maptype="roadmap", crop=FALSE,zoom=11)))
mapBronx <- suppressWarnings(suppressMessages(get_map(location="Bronx", source="google", maptype="roadmap", crop=FALSE,zoom=12)))
mapSI <- suppressWarnings(suppressMessages(get_map(location="Staten Island", source="google", maptype="roadmap", crop=FALSE,zoom=12)))

Manhattan Heatmap

ggmap(mapManhattan) + stat_density2d(aes(x = longitude, y = latitude,alpha=0.01), size = 0.05, bins = 10, data = justManhattan, geom = "polygon", colour = "red") + ggtitle("Manhattan - Heatmap of Accidents")

## Warning: Removed 80 rows containing non-finite values (stat_density2d).

If we look closely, we that the accidents are concentrated around all of the access points to bridges and tunnels. It should not be suprising that vehicle collisions occur more frequently at these congestion points, but it brings into question whether speed is a factor in most collisions.

ggmap(mapBrooklyn) + stat_density2d(aes(x = longitude, y = latitude,alpha=0.01), size = 0.05, bins = 10, data = justBrooklyn, geom = "polygon", colour = "red") + ggtitle("Brooklyn - Heatmap of Accidents")

## Warning: Removed 712 rows containing non-finite values (stat_density2d).

The map of Brooklyn shows a number of areas where vehicle collisions are common. There appear to be a few clusters on both the BQE (I-278) and Atlantic Avenue.

Queens Heatmap

ggmap(mapQueens) + stat_density2d(aes(x = longitude, y = latitude,alpha=0.01), size = 0.05, bins = 14, data = justQueens, geom = "polygon", colour = "red") + ggtitle("Queens - Heatmap of Accidents")

Bronx Heatmap

ggmap(mapBronx) + stat_density2d(aes(x = longitude, y = latitude,alpha=0.01), size = 0.05, bins = 10, data = justBronx, geom = "polygon", colour = "red") + ggtitle("Bronx - Heatmap of Accidents")

Staten Island Heatmap

ggmap(mapSI) + stat_density2d(aes(x = longitude, y = latitude,alpha=0.01), size = 0.05, bins = 10, data = justSI, geom = "polygon", colour = "red") + ggtitle("Staten Island - Heatmap of Accidents")

We can compare these heatmaps to the Vision Zero View infographic here: http://www.vzv.nyc/

The Vision Zero Plan emphasizes “Arterial Slow Zones” - lowering speed limits on specific roadways - to reduce avoidable accidents. Comparing our heatmaps with Vision Zero View, we see that some of the new speed limits will problematic roadways in Brooklyn and Queens, but not as much in the other boroughs. We see a high volume of accidents in the Arterial Slow Zones on Queens Boulevard (Queens) and Atlantic Ave and Ocean Parkway south of Prospect Park (Brooklyn). The new zones, however, do not cover problem areas in the Bronx (roadways near the Harlem River), Staten Island (Hyland Boulevard and the entrance to the Verrazona-Narrows Bridge) or Manhattan (all access points). In fact, the slow zones on Broadway and Amsterdam on the Upper West Side appear on our heatmap to be in a low-accident area.

IS 607 Final Project

David Stern

May 24, 2015

Manhattan Heatmap

Queens Heatmap

Bronx Heatmap

Staten Island Heatmap