Introduction

Speed humps are traffic calming devices intended to slow traffic speeds on low volume, low speed roads. Speed humps are generally installed on local residential (non-truck route, non-bus route locations), while speed cushions are generally installed on designated truck route locations and bus route locations.

Intuitively, I suspect that the speed reducers would have a strong relationship with accidents (Why else to slow down traffic in a city notoriously known for traffic). Luckily, NYC provides these data sets openly.

In my analysis I will look at these data sets individually and then see if the installed speed reducers have a relationship to accidents in NYC.

Loading Data

Reading The Speed Reducer Dataset

speed_reducer_url <- 'https://github.com/dcorrea614/MSDS/blob/master/Speed_Reducer_Tracking_System__SRTS_.csv?raw=true'

speed_reducer <- read.csv(speed_reducer_url)

str(speed_reducer)
## 'data.frame':    42029 obs. of  44 variables:
##  $ ProjectCode             : chr  "SR-20191217-17157" "SR-20191217-17156" "SR-20191217-17155" "SR-20191217-17154" ...
##  $ GroupOwner              : chr  "Queens BC" "Bronx BC" "SI BC" "SI BC" ...
##  $ LocationDescription     : chr  "43 STREET from BARNETT AVENUE to SKILLMAN AVENUE" "YATES AVENUE from PIERCE AVENUE to VAN NEST AVENUE" "WEST FINGERBOARD ROAD from GRASMERE COURT to MARIE STREET" "WINANT AVENUE from WOODROW ROAD to CORRELL AVENUE" ...
##  $ Borough                 : chr  "Queens" "Bronx" "Staten Island" "Staten Island" ...
##  $ Description             : chr  "Citizen speed hump request due to on-going speeding concerns.  " "Citizen speed hump request due to speeding concern.  Crash involving child referenced." "Citizen speed hump request due to speeding concern.  " "Anonymous speed hump request due to speeding concern." ...
##  $ ProjectStatus           : chr  "Study request passed to planning" "Study request passed to planning" "Study request passed to planning" "Study request passed to planning" ...
##  $ NextStep                : chr  "Awaiting  planning decision" "Awaiting  planning decision" "Awaiting  planning decision" "Awaiting  planning decision" ...
##  $ DateAdded               : chr  "12/17/2019 06:45:28 PM" "12/17/2019 06:39:58 PM" "12/17/2019 06:36:55 PM" "12/17/2019 06:07:19 PM" ...
##  $ BCTSNum                 : chr  "DOT-414077-F3M5   " "DOT-413799-Y2N5   " "DOT-414468-R5F1   " "DOT-414367-J8V9   " ...
##  $ CCUNum                  : chr  "" "" "" "" ...
##  $ RequestorLetterReplyDate: chr  "" "" "" "" ...
##  $ CBLetterRequestDate     : chr  "" "" "" "" ...
##  $ CBLetterRecievedDate    : chr  "" "" "" "" ...
##  $ OldSign                 : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ NewSign                 : logi  NA NA NA NA NA NA ...
##  $ MarkingsDate            : chr  "" "" "" "" ...
##  $ InstallationDate        : chr  "" "" "" "" ...
##  $ SecondStudyCode         : chr  "" "" "" "" ...
##  $ ClosedDate              : chr  "" "" "" "" ...
##  $ speedCushion            : chr  "No" "No" "No" "No" ...
##  $ RequestDate             : chr  "05/07/2019 12:00:00 AM" "05/04/2019 12:00:00 AM" "05/09/2019 12:00:00 AM" "05/08/2019 12:00:00 AM" ...
##  $ RequestorType           : chr  "Citizen" "Citizen" "Citizen" "Anonymous" ...
##  $ SegmentID               : int  67846 86861 196254 1563 1561 1578 1576 1572 1574 100405 ...
##  $ OnStreet                : chr  "43 STREET" "YATES AVENUE" "WEST FINGERBOARD ROAD" "WINANT AVENUE" ...
##  $ FromStreet              : chr  "BARNETT AVENUE" "PIERCE AVENUE" "GRASMERE COURT" "WOODROW ROAD" ...
##  $ ToStreet                : chr  "SKILLMAN AVENUE" "VAN NEST AVENUE" "MARIE STREET" "KRAMER AVENUE" ...
##  $ GeoBoroughName          : chr  "Queens" "Bronx" "Staten Island" "Staten Island" ...
##  $ LIONKey                 : num  4.55e+09 2.84e+09 5.27e+09 5.76e+09 5.76e+09 ...
##  $ FromLatitude            : num  40.7 40.8 40.6 40.5 40.5 ...
##  $ FromLongitude           : num  -73.9 -73.8 -74.1 -74.2 -74.2 ...
##  $ ToLatitude              : num  40.7 40.8 40.6 40.5 40.5 ...
##  $ ToLongitude             : num  -73.9 -73.8 -74.1 -74.2 -74.2 ...
##  $ OFT                     : num  4.10e+17 2.79e+17 5.28e+17 5.56e+17 5.56e+17 ...
##  $ CB                      : int  402 211 502 503 503 503 503 503 503 413 ...
##  $ SegmentStatus           : chr  "No" "No" "No" "No" ...
##  $ SegmentStatusDescription: chr  "Not Feasible" "Not Feasible" "Not Feasible" "Not Feasible" ...
##  $ DenialReason            : chr  "" "" "" "" ...
##  $ NumSRProposed           : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ TrafficDirection        : chr  "T   " "W   " "T   " "T   " ...
##  $ TrafficDirectionDesc    : chr  "Two-way" "One-way" "Two-way" "Two-way" ...
##  $ OldSign1                : chr  "" "" "" "" ...
##  $ NewSign1                : chr  "" "" "" "" ...
##  $ OldSign2                : chr  "" "" "" "" ...
##  $ NewSign2                : chr  "" "" "" "" ...

API Connection to get the Accidet Data

df <- read.socrata(
  'https://data.cityofnewyork.us/resource/h9gi-nx95.json',
  app_token = 'zldDi07wqx4G9fo7G6Hno9m3g',
  email     = 'dcorrea614@gmail.com',
  password  = 'Astoria.91'
)

accidents <- df

str(accidents)
## 'data.frame':    1736215 obs. of  30 variables:
##  $ crash_date                   : POSIXct, format: "2020-12-01" "2020-12-01" ...
##  $ crash_time                   : chr  "0:00" "0:00" "0:00" "0:01" ...
##  $ borough                      : chr  "QUEENS" "QUEENS" "BROOKLYN" "MANHATTAN" ...
##  $ zip_code                     : chr  "11367" "11411" "11201" "10002" ...
##  $ latitude                     : chr  "40.7192460" "40.6962170" "40.6960330" "40.7168120" ...
##  $ longitude                    : chr  "-73.8114550" "-73.7436000" "-73.9845300" "-73.9781600" ...
##  $ location.latitude            : chr  "40.719246" "40.696217" "40.696033" "40.716812" ...
##  $ location.longitude           : chr  "-73.811455" "-73.7436" "-73.98453" "-73.97816" ...
##  $ on_street_name               : chr  "UNION TURNPIKE                  " "LINDEN BOULEVARD                " "TILLARY STREET                  " NA ...
##  $ off_street_name              : chr  "152 STREET" "SPRINGFIELD BOULEVARD" "FLATBUSH AVENUE EXTENSION" NA ...
##  $ number_of_persons_injured    : chr  "0" "0" "2" "0" ...
##  $ number_of_persons_killed     : chr  "0" "0" "0" "0" ...
##  $ number_of_pedestrians_injured: chr  "0" "0" "0" "0" ...
##  $ number_of_pedestrians_killed : chr  "0" "0" "0" "0" ...
##  $ number_of_cyclist_injured    : chr  "0" "0" "0" "0" ...
##  $ number_of_cyclist_killed     : chr  "0" "0" "0" "0" ...
##  $ number_of_motorist_injured   : chr  "0" "0" "2" "0" ...
##  $ number_of_motorist_killed    : chr  "0" "0" "0" "0" ...
##  $ contributing_factor_vehicle_1: chr  "Driver Inattention/Distraction" "Unspecified" "Failure to Yield Right-of-Way" "Passing Too Closely" ...
##  $ contributing_factor_vehicle_2: chr  "Unspecified" "Unspecified" "Failure to Yield Right-of-Way" "Unspecified" ...
##  $ collision_id                 : chr  "4372639" "4372363" "4372447" "4372199" ...
##  $ vehicle_type_code1           : chr  "Van" "Sedan" "Sedan" "Sedan" ...
##  $ vehicle_type_code2           : chr  "Box Truck" "Station Wagon/Sport Utility Vehicle" "Station Wagon/Sport Utility Vehicle" NA ...
##  $ cross_street_name            : chr  NA NA NA "70        BARUCH DRIVE                  " ...
##  $ contributing_factor_vehicle_3: chr  NA NA NA NA ...
##  $ vehicle_type_code_3          : chr  NA NA NA NA ...
##  $ contributing_factor_vehicle_4: chr  NA NA NA NA ...
##  $ vehicle_type_code_4          : chr  NA NA NA NA ...
##  $ contributing_factor_vehicle_5: chr  NA NA NA NA ...
##  $ vehicle_type_code_5          : chr  NA NA NA NA ...

Transforming

Speed Reducer Data Set

The transformation needed in this data set includes converting the date from a character data type to formatting it to a standard date type and creating a column that identifies the speed reducers that were either installed or not installed.

sr_df <- speed_reducer %>%
  select(ProjectCode, Borough, Description, ProjectStatus, speedCushion, InstallationDate, 
         RequestDate, FromLongitude, FromLatitude, TrafficDirectionDesc) %>%
  mutate(InstallationDate = mdy(str_extract(InstallationDate, '[0-9]+/[0-9]+/[0-9]+')),
          RequestDate = mdy(str_extract(RequestDate, '[0-9]+/[0-9]+/[0-9]+'))) %>%
  arrange(RequestDate) %>%
  mutate(Requested = ifelse(is.na(InstallationDate), 'Not Installed', 'Installed'),
         Installed = ifelse(is.na(InstallationDate), 0, 1),
         Not_Installed = ifelse(is.na(InstallationDate), 1, 0))

# this data set only includes the speed reducer requests that were installed
installed <- sr_df %>%
  filter(Requested == 'Installed')

Accident Data Set

The transformation needed in this data set includes dropping the NA values from the longitude and latitude columns and converting columns from character data type to a numeric data type.

colnames(accidents)
##  [1] "crash_date"                    "crash_time"                   
##  [3] "borough"                       "zip_code"                     
##  [5] "latitude"                      "longitude"                    
##  [7] "location.latitude"             "location.longitude"           
##  [9] "on_street_name"                "off_street_name"              
## [11] "number_of_persons_injured"     "number_of_persons_killed"     
## [13] "number_of_pedestrians_injured" "number_of_pedestrians_killed" 
## [15] "number_of_cyclist_injured"     "number_of_cyclist_killed"     
## [17] "number_of_motorist_injured"    "number_of_motorist_killed"    
## [19] "contributing_factor_vehicle_1" "contributing_factor_vehicle_2"
## [21] "collision_id"                  "vehicle_type_code1"           
## [23] "vehicle_type_code2"            "cross_street_name"            
## [25] "contributing_factor_vehicle_3" "vehicle_type_code_3"          
## [27] "contributing_factor_vehicle_4" "vehicle_type_code_4"          
## [29] "contributing_factor_vehicle_5" "vehicle_type_code_5"
accidents <- accidents %>%
  drop_na(longitude, latitude) %>%
  mutate(latitude = as.numeric(latitude),
         longitude = as.numeric(longitude),
         number_of_persons_injured = as.numeric(number_of_persons_injured),
         number_of_persons_killed = as.numeric(number_of_persons_killed))

Exploratory

Speed Reducer

description <- tibble(text = sr_df$Description)

description %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))
## Joining, by = "word"

# Count by Borough
sr_df %>% 
  group_by(Requested, Borough) %>%
  summarise(n = n()) %>%
  ggplot(aes(x = Borough, y = n, fill = Requested)) + 
  geom_bar(stat = 'identity', position = position_dodge()) +
  geom_text(aes(label = n), vjust= -0.5, color='Black',
            position = position_dodge(0.9), size=3.5) +
  scale_fill_brewer(palette = 'Paired') + 
  labs(title = 'Requested Speed Reducer by Borough', x = 'Borough', y = 'Count')
## `summarise()` regrouping output by 'Requested' (override with `.groups` argument)

# Cumulative Count of Speed Reducers
sr_df %>%
  mutate(cumsum_Installed = cumsum(Installed),
         cumsum_Not_Installed = cumsum(Not_Installed)) %>%
  ggplot(aes(x = RequestDate)) + 
  geom_line(mapping = aes(y = cumsum_Installed, color = 'red'), size = 1.5) +
  geom_line(mapping = aes(y = cumsum_Not_Installed, color = 'blue'), size = 1.5) +
  scale_color_discrete(name = 'Requested', labels = c('Not_Installed','Installed')) + 
  labs(title = 'Cumulative Sum of Speed Reducers Over the Years', x = 'Requested Date',
       y = 'Cumulative Sum')
## Warning: Removed 53 row(s) containing missing values (geom_path).

## Warning: Removed 53 row(s) containing missing values (geom_path).

# Map of installed speed reducers
long_lat <- sr_df %>%
  filter(Requested == 'Installed') %>%
  select(FromLongitude, FromLatitude)


leaflet(long_lat) %>%
  addTiles() %>% 
  addCircleMarkers(~FromLongitude, ~FromLatitude, data = installed, 
                   clusterOptions=markerClusterOptions()) %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(-73.98, 40.75, zoom = 10)

Accidents

accidents %>%
  drop_na(borough) %>%
  count(borough) %>%
  rename(accident_count = n) %>%
  ggplot(aes(x = borough, y = accident_count)) +
  geom_bar(stat = 'identity', fill = 'steelblue') +
  geom_text(aes(label = accident_count), vjust= 1.5, color='white', size=3.5) +
  labs(title = 'Accident Coount by Borough')

accidents %>%
  count(crash_date) %>%
  rename(accident_count = n) %>%
  ggplot(aes(x = crash_date, y = accident_count)) + 
  geom_smooth(size = 1.5) +
  labs(title = 'Accident Count Through the Years', x = 'Accident Date', 
       y = 'Accident Count')
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Second Transformation

After getting to know the data, I identified that I would like to get to go beyond the borough level and get to the neighborhood level. Additionally, the accidents data set starts from 2012 and the speed reducer data set begins from 1990.

Getting the Neighborhoods

Need to include another dataset to match the coordinates to neighborhood. From there, I can get the the aggregate on a lower level and see if there’s a relationship between speed reducers and accidents.

get <- GET('http://data.beta.nyc//dataset/0ff93d2d-90ba-457c-9f7e-39e47bf2ac5f/resource/35dd04fb-81b3-479b-a074-a27a37888ce7/download/d085e2f8d0b54d4590b1e7d1f35594c1pediacitiesnycneighborhoods.geojson')

neighborhoods <- readOGR(content(get,'text'), 'OGRGeoJSON', verbose = F)
## No encoding supplied: defaulting to UTF-8.
points_spdf_installed <- installed

coordinates(points_spdf_installed) <- ~FromLongitude + FromLatitude
(proj4string(points_spdf_installed) <- proj4string(neighborhoods))
## Warning in proj4string(neighborhoods): CRS object has comment, which is lost in
## output
## [1] "+proj=longlat +datum=WGS84 +no_defs"
matches <- over(points_spdf_installed, neighborhoods)
installed <- cbind(installed, matches)

accidents <- accidents %>%
  filter(longitude > -80) 

points_spdf_acc <- accidents
coordinates(points_spdf_acc) <- ~longitude + latitude
proj4string(points_spdf_acc) <- proj4string(neighborhoods)
## Warning in proj4string(neighborhoods): CRS object has comment, which is lost in
## output
matches <- over(points_spdf_acc, neighborhoods)
matches <- matches %>%
  rename(borough1 = 'borough')
accidents <- cbind(accidents, matches)

str(installed)
## 'data.frame':    6451 obs. of  17 variables:
##  $ ProjectCode         : chr  "SR-19900101-364" "SR-19900101-652" "SR-19980831-257" "SR-19980910-192" ...
##  $ Borough             : chr  "Bronx" "Bronx" "Queens" "Queens" ...
##  $ Description         : chr  "Legacy Data Record; (SchoolInfo: N/S); (PlanningID: 544)" "Legacy Data Record; (SchoolInfo: PS 60); (PlanningID: 892)" "Legacy Data Record; (SchoolInfo: ); (PlanningID: 424)" "Legacy Data Record; (SchoolInfo: PS 150); (PlanningID: 345)" ...
##  $ ProjectStatus       : chr  "MOSAICS Entry - actual - closed" "MOSAICS Entry - actual - closed" "MOSAICS Entry - actual - closed" "MOSAICS Entry - actual - closed" ...
##  $ speedCushion        : chr  "No" "No" "No" "No" ...
##  $ InstallationDate    : Date, format: "2006-09-09" "2019-06-18" ...
##  $ RequestDate         : Date, format: "1990-01-01" "1990-01-01" ...
##  $ FromLongitude       : num  -73.9 -73.9 -73.8 -73.9 -73.8 ...
##  $ FromLatitude        : num  40.9 40.8 40.7 40.7 40.8 ...
##  $ TrafficDirectionDesc: chr  "One-way" "One-way" "Two-way" "One-way" ...
##  $ Requested           : chr  "Installed" "Installed" "Installed" "Installed" ...
##  $ Installed           : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Not_Installed       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ neighborhood        : chr  "Allerton" "Longwood" "Kew Gardens Hills" "Sunnyside" ...
##  $ boroughCode         : chr  "2" "2" "4" "4" ...
##  $ borough             : chr  "Bronx" "Bronx" "Queens" "Queens" ...
##  $ X.id                : chr  "http://nyc.pediacities.com/Resource/Neighborhood/Allerton" "http://nyc.pediacities.com/Resource/Neighborhood/Longwood" "http://nyc.pediacities.com/Resource/Neighborhood/Kew_Gardens_Hills" "http://nyc.pediacities.com/Resource/Neighborhood/Sunnyside" ...
str(accidents)
## 'data.frame':    1529320 obs. of  34 variables:
##  $ crash_date                   : POSIXct, format: "2020-12-01" "2020-12-01" ...
##  $ crash_time                   : chr  "0:00" "0:00" "0:00" "0:01" ...
##  $ borough                      : chr  "QUEENS" "QUEENS" "BROOKLYN" "MANHATTAN" ...
##  $ zip_code                     : chr  "11367" "11411" "11201" "10002" ...
##  $ latitude                     : num  40.7 40.7 40.7 40.7 40.7 ...
##  $ longitude                    : num  -73.8 -73.7 -74 -74 -73.9 ...
##  $ location.latitude            : chr  "40.719246" "40.696217" "40.696033" "40.716812" ...
##  $ location.longitude           : chr  "-73.811455" "-73.7436" "-73.98453" "-73.97816" ...
##  $ on_street_name               : chr  "UNION TURNPIKE                  " "LINDEN BOULEVARD                " "TILLARY STREET                  " NA ...
##  $ off_street_name              : chr  "152 STREET" "SPRINGFIELD BOULEVARD" "FLATBUSH AVENUE EXTENSION" NA ...
##  $ number_of_persons_injured    : num  0 0 2 0 0 1 1 0 0 1 ...
##  $ number_of_persons_killed     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ number_of_pedestrians_injured: chr  "0" "0" "0" "0" ...
##  $ number_of_pedestrians_killed : chr  "0" "0" "0" "0" ...
##  $ number_of_cyclist_injured    : chr  "0" "0" "0" "0" ...
##  $ number_of_cyclist_killed     : chr  "0" "0" "0" "0" ...
##  $ number_of_motorist_injured   : chr  "0" "0" "2" "0" ...
##  $ number_of_motorist_killed    : chr  "0" "0" "0" "0" ...
##  $ contributing_factor_vehicle_1: chr  "Driver Inattention/Distraction" "Unspecified" "Failure to Yield Right-of-Way" "Passing Too Closely" ...
##  $ contributing_factor_vehicle_2: chr  "Unspecified" "Unspecified" "Failure to Yield Right-of-Way" "Unspecified" ...
##  $ collision_id                 : chr  "4372639" "4372363" "4372447" "4372199" ...
##  $ vehicle_type_code1           : chr  "Van" "Sedan" "Sedan" "Sedan" ...
##  $ vehicle_type_code2           : chr  "Box Truck" "Station Wagon/Sport Utility Vehicle" "Station Wagon/Sport Utility Vehicle" NA ...
##  $ cross_street_name            : chr  NA NA NA "70        BARUCH DRIVE                  " ...
##  $ contributing_factor_vehicle_3: chr  NA NA NA NA ...
##  $ vehicle_type_code_3          : chr  NA NA NA NA ...
##  $ contributing_factor_vehicle_4: chr  NA NA NA NA ...
##  $ vehicle_type_code_4          : chr  NA NA NA NA ...
##  $ contributing_factor_vehicle_5: chr  NA NA NA NA ...
##  $ vehicle_type_code_5          : chr  NA NA NA NA ...
##  $ neighborhood                 : chr  "Kew Gardens Hills" "Cambria Heights" "Downtown Brooklyn" "Lower East Side" ...
##  $ boroughCode                  : chr  "4" "4" "3" "1" ...
##  $ borough1                     : chr  "Queens" "Queens" "Brooklyn" "Manhattan" ...
##  $ X.id                         : chr  "http://nyc.pediacities.com/Resource/Neighborhood/Kew_Gardens_Hills" "http://nyc.pediacities.com/Resource/Neighborhood/Cambria_Heights" "http://nyc.pediacities.com/Resource/Neighborhood/Downtown_Brooklyn" "http://nyc.pediacities.com/Resource/Neighborhood/Lower_East_Side" ...

Joining the Data Sets

aggregated_installed <- installed %>%
  filter(year(InstallationDate) >= 2012) %>%
  group_by(borough, neighborhood) %>%
  summarise(speed_reducer_count = n())
## `summarise()` regrouping output by 'borough' (override with `.groups` argument)
aggregated_accidents <- accidents %>%
  select(borough1, neighborhood, number_of_persons_injured, number_of_persons_killed,
         number_of_pedestrians_injured, number_of_pedestrians_killed, 
         number_of_cyclist_injured, number_of_cyclist_killed, number_of_motorist_injured,
         number_of_motorist_killed) %>%
  group_by(borough1, neighborhood) %>%
  mutate(number_of_persons_injured = as.numeric(number_of_persons_injured),
         number_of_persons_killed = as.numeric(number_of_persons_killed),
         number_of_pedestrians_injured = as.numeric(number_of_pedestrians_injured),
         number_of_cyclist_injured = as.numeric(number_of_cyclist_injured),
         number_of_motorist_injured = as.numeric(number_of_motorist_injured),
         number_of_motorist_killed = as.numeric(number_of_motorist_killed)
         )%>%
  summarise(accident_count = n(),
            sum_injured = sum(number_of_persons_injured, na.rm = TRUE),
            sum_mortality = sum(number_of_persons_killed, na.rm = TRUE),
            sum_pedestrians_injured = sum(number_of_pedestrians_injured, na.rm = TRUE),

            sum_cyclist_injured = sum(number_of_cyclist_injured, na.rm = TRUE),

            sum_motorist_injured = sum(number_of_motorist_injured, na.rm = TRUE),
            sum_motorist_killed = sum(number_of_motorist_killed, na.rm = TRUE)
            )
## `summarise()` regrouping output by 'borough1' (override with `.groups` argument)
installed_and_accidents <- data.frame(left_join(aggregated_installed, aggregated_accidents, 
                                     by = c('borough' = 'borough1',
                                            'neighborhood' = 'neighborhood')))

Analysis

The moment of truth, is there a relationship between speed reducer counts and accidents. In order to do this analysis, I look at the correlation and attempt to construct a linear regression model.

Correlation

# Source for this function: Professor Jason Bryer, DATA 606

panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...){
    usr <- par("usr"); on.exit(par(usr))
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y))
    rreal = cor(x, y)
    txtreal <- format(c(rreal, 0.123456789), digits=digits)[1]
    txt <- format(c(r, 0.123456789), digits=digits)[1]
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txtreal, cex = cex.cor * r)
}

pairs(installed_and_accidents[,3:10], lower.panel = panel.cor, pch = 19)

Linear Regression

lm_speed <- lm(speed_reducer_count ~ accident_count,
                data = installed_and_accidents)
summary(lm_speed)
## 
## Call:
## lm(formula = speed_reducer_count ~ accident_count, data = installed_and_accidents)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -73.381 -14.374  -5.039   7.001 117.421 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    14.408651   2.421004   5.952 1.13e-08 ***
## accident_count  0.001596   0.000233   6.848 8.44e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.76 on 206 degrees of freedom
## Multiple R-squared:  0.1855, Adjusted R-squared:  0.1815 
## F-statistic:  46.9 on 1 and 206 DF,  p-value: 8.436e-11

Plotting Model and Residuals

ggplot(data = installed_and_accidents, aes(x = accident_count, y = speed_reducer_count)) +
  geom_point() + 
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = lm_speed, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed") +
  xlab("Fitted values") +
  ylab("Residuals")

ggplot(data = lm_speed, aes(x = .resid)) +
  geom_histogram() +
  xlab("Residuals")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data = lm_speed, aes(sample = .resid)) +
  stat_qq()

Conclusion

Contrary to my intuition, there is not strong enough evidence that indicate accidents have affect speed reducers in NYC neighborhoods. The adjusted \(R^2\) indicates that accidents only account for 18.15% variability of the installed speed reducers and the residuals of the linear regression has high variability.