CUNY DATA 608

Question

NYC is one of the largest and busiest cities in the world. Accidents in NYC are a common occurance - according to the NYPD, there are about 678 car accidents a day! This research study will investigate accidents in New York City - SI. The focus will be on what are the leading causes of accidents, what type of vehicles are involved in an accident, and where are the “hot spots” - or major areas prone to an accident.

Libraries

library(tidyverse)
library(plotly)
library(readr)
library(knitr)
library(leaflet)
library(tigris)
library(httr)
library(leaflet.extras)

Data

The data is taken from NYC Open data - NYPD Motor Vehicle Collisions.

df <- read_csv("https://raw.githubusercontent.com/mandiemannz/Data-608/master/NYPD_Motor_Vehicle_Collisions%202017.csv")
kable(head(df))
DATE TIME BOROUGH ZIP CODE LATITUDE LONGITUDE LOCATION ON STREET NAME CROSS STREET NAME OFF STREET NAME NUMBER OF PERSONS INJURED NUMBER OF PERSONS KILLED NUMBER OF PEDESTRIANS INJURED NUMBER OF PEDESTRIANS KILLED NUMBER OF CYCLIST INJURED NUMBER OF CYCLIST KILLED NUMBER OF MOTORIST INJURED NUMBER OF MOTORIST KILLED CONTRIBUTING FACTOR VEHICLE 1 CONTRIBUTING FACTOR VEHICLE 2 CONTRIBUTING FACTOR VEHICLE 3 CONTRIBUTING FACTOR VEHICLE 4 CONTRIBUTING FACTOR VEHICLE 5 UNIQUE KEY VEHICLE TYPE CODE 1 VEHICLE TYPE CODE 2 VEHICLE TYPE CODE 3 VEHICLE TYPE CODE 4 VEHICLE TYPE CODE 5
03/20/2017 07:00:00 STATEN ISLAND 10306 40.57046 -74.10977 (40.570465, -74.10977) HYLAN BOULEVARD NEW DORP LANE NA 0 0 0 0 0 0 0 0 Following Too Closely Unspecified NA NA NA 3635600 SPORT UTILITY / STATION WAGON PASSENGER VEHICLE NA NA NA
03/20/2017 08:00:00 STATEN ISLAND 10309 40.54909 -74.22084 (40.54909, -74.22084) VETERANS ROAD WEST BLOOMINGDALE ROAD NA 0 0 0 0 0 0 0 0 Turning Improperly Unspecified NA NA NA 3635743 PASSENGER VEHICLE NA NA NA NA
03/20/2017 12:28:00 STATEN ISLAND 10309 40.53793 -74.21623 (40.537933, -74.21623) NA NA 53 MARISA CIRCLE 0 0 0 0 0 0 0 0 Driver Inattention/Distraction Unspecified NA NA NA 3635748 SPORT UTILITY / STATION WAGON NA NA NA NA
03/20/2017 12:55:00 STATEN ISLAND 10312 40.53536 -74.15594 (40.535355, -74.15594) KING STREET RICHMOND AVENUE NA 0 0 0 0 0 0 0 0 Unspecified Unspecified NA NA NA 3635794 SPORT UTILITY / STATION WAGON NA NA NA NA
03/20/2017 09:50:00 STATEN ISLAND 10312 40.56041 -74.16975 (40.56041, -74.16975) NA NA 3229 RICHMOND AVENUE 0 0 0 0 0 0 0 0 Unspecified Unspecified NA NA NA 3635796 PASSENGER VEHICLE NA NA NA NA
03/20/2017 15:10:00 STATEN ISLAND 10301 NA NA NA NA NA 2 ST. PAULS AVENUE 0 0 0 0 0 0 0 0 Other Vehicular Unspecified NA NA NA 3635845 PICK-UP TRUCK SPORT UTILITY / STATION WAGON SPORT UTILITY / STATION WAGON NA NA

Contributing Factors

The first step to take a closer look at what factors contributed towards a driver getting into an accident is to filter by the number of observations. The dataset has a lot of low count occurances; so the data is filtered by a count of greater than 50.

df1 <- df %>%
  group_by(df$`CONTRIBUTING FACTOR VEHICLE 1`) %>%
  filter(n()>50)

The data is then transformed into a gg plotly graph.

p <- ggplot(df1,
             aes(df1$`CONTRIBUTING FACTOR VEHICLE 1`)) +
  geom_bar(aes(fill=df1$`VEHICLE TYPE CODE 1`)) +
  coord_flip() +
  theme(legend.position = "none") +
  xlab("Contributing Factor")

p <- ggplotly(p)
p

Looking at the graph, Driver Inattention/Distraction has the highest count of occurances within this dataset, followed by failure to yield right-of-way and following too closely. Unspecified, while it has a high # of occurances, doesn’t indicate what the actual contributiong factor was. One might assume that distracted drivers are busy on their cell phones.

The next investgation is of the types of vehicles themselves. For this analysis, the data is filtered by a count of greater than 10. For this varible, there was also a lot of single occurances in the data for types of vehicles.

vehicle <- df %>% group_by(df$`VEHICLE TYPE CODE 1`) %>% filter( n() > 10 )
vehicles <- ggplot(vehicle, aes(vehicle$`VEHICLE TYPE CODE 1`)) +
  geom_bar(aes(fill=vehicle$`VEHICLE TYPE CODE 1`))+
  theme(legend.position = "none") +
  theme(axis.text.x=element_text(angle=45, hjust=1)) +
  xlab("Vehicle Type") +
  ggtitle("Count of Vehicle Types per Accidents")


vehicles<- ggplotly(vehicles)
vehicles

Looking at the graphs above, the data shows that the type of vehicle with the most accidents is a passenger vehicle, followed by an SUV - which makes sense - most vehicles found are of that category.

Location of Accidents

The next investigation was of where exactly was the most accidents - prehaps there are locations that can be investigated by the local authorities.

streetname <- df %>%
  group_by(df$`ON STREET NAME`) %>%
  filter(n()>50)

streetnames <- ggplot(streetname, aes(streetname$`ON STREET NAME`)) +
  geom_bar(aes(fill = streetname$`VEHICLE TYPE CODE 1`)) +
  theme(legend.position = "none") +
  theme(axis.text.x=element_text(angle=40, hjust=1)) +
  xlab("Street Name") +
  ggtitle("Count of Vehicle Types vs. Street")
streetnames<- ggplotly(streetnames)
streetnames

According to the data, it seems that a lot of the location data is missing - null. Following the null data, we can see that Hylan Boulevard and Richmond road have the higest occurance of accidents recorded.

Leaflet Map

The next step in the analysis is to plot the datapoints on a map - this allows us to better see areas of accidents. The data is filtered to include only SI - however, it seems that some lat/longs point to other areas in NYC.

df2 <- subset(df, select=c("LONGITUDE", "LATITUDE", "CONTRIBUTING FACTOR VEHICLE 1"))
df2 <- na.omit(df2)
leaflet() %>%
  addTiles() %>% 
  addProviderTiles("CartoDB.Positron") %>%
  setView(-74.15, 40.57, zoom = 11) %>%
  addHeatmap(
    lng = df2$LONGITUDE, lat = df2$LATITUDE,blur = 20, max = 0.05, radius = 15
  )

The heatmap shows what our other data indicated - the most accidents seem to happy on Southern Hyland Blvd.

Conclusion

The data shows that the most accidents occur due to driver inattention. One could assume that inattention is related to cell-phone usage. The data also shows that the most accidents appear to happen in a sedan or SUV, and are located within busy major streets.

Amanda Arce

12/15/2019