Race Against Time

Fire Department Response Times

Response Time is the key factor in evaluating the performance of Fire Departments.The Fire Department Response Time is made up of several components.Fire departments can’t control how much time elapses between the start of a fire and when a call is placed to 911, which makes it critical for them to minimize the time they can control. There are a number of factors that affect the response times for an emergency.

Total response time is made up of three distinct components:

1.Dispatch time: Time elapsed from when a call is received at the 9-1-1 center until units are notified.

2.Turnout time: Time elapsed from when units are notified until they are responding.

3.Travel time: Time elapsed from when units respond until they arrive on the incident scene.

Most fire departments have a habit of focusing solely on improving their travel time, because it’s traditionally accepted that little can be done to improve the other two components. Firefighters falsely believe that improving response time is made easy by driving faster. This solution rarely has a positive impact; in fact, it can lead to disastrous outcomes.

This project involves using FDNY’s response time data to get a trend in response times for different boroughs, identify (if applicable) new emergency service areas for current fire stations based on travel times, identify areas of concern within service areas, provide an in-depth comparative analysis of response times.

Motivation

The response times determine how well the emergency reponders are performing their duties. My motivation for this project was due to Fire Department’s response time in news. The New York Times articles related to Fire Department’s response time are selected and converted into a dataframe using web api key. These articles will shed light on inportance of response times.

## 
## Attaching package: 'jsonlite'

## The following object is masked from 'package:utils':
## 
##     View

api_key <- "&api-key=e2ecd4b8ba0d275c1ae6a42a808a991e:5:74859716"

url <- "http://api.nytimes.com/svc/search/v2/articlesearch.json?q=Response-Fire"

Reading the json data in r and converting it into a data frame

dat <- (paste0(url, api_key))

dat <-fromJSON(dat)


#converting dat into a dataframe
df <- as.data.frame(dat$response)


# creating data frame for the required terms

dataframe <- data.frame(df$docs.section_name,df$docs.lead_paragraph, df$docs.abstract, df$docs.web_url)

	df.docs.section_name	df.docs.lead_paragraph	df.docs.abstract	df.docs.web_url
3	N.Y. / Region	When I see fire trucks responding to incidents, there always seem to be more vehicles and more commotion than I remember from childhood. Why is that?	NA	http://www.nytimes.com/2008/08/31/nyregion/thecity/31fyi.html
6	N.Y. / Region	The Fire Departments average response time decreased in 2007 for the second consecutive year.	NA	http://www.nytimes.com/2008/01/07/nyregion/07mbrfs-FIRECALLS.html

	df.docs.section_name	df.docs.lead_paragraph	df.docs.abstract	df.docs.web_url
7	New York and Region	Relatives and union members reacted angrily yesterday to Fire Department findings that a floor collapse in which two firefighters died last summer in Brooklyn was partly caused by the failure of city housing officials to maintain the building’s structural integrity. ‘’It boggles my mind that they may have done substandard work,’’ said one union leader, Capt. Richard Brower of the Uniformed Fire Officers Association, two of whose members died fighting a fire in the building on June 5, 1998.	Relatives and union members react angrily to Fire Dept findings that floor collapse in which Lieut James Blackmore and Capt Scott LaPiedra died last summer in Brooklyn was partly caused by failure of city housing officials to maintain building’s structural integrity; housing officials take issue with fairness of Fire Dept report; photo (M)	http://www.nytimes.com/1999/07/15/nyregion/angry-response-to-report-on-fatal-fire.html

	df.docs.section_name	df.docs.lead_paragraph	df.docs.abstract	df.docs.web_url
5	New York and Region; Opinion	The recent article on manpower problems in volunteer fire departments addressed valid concerns; however, reference made to a typical firefighter ‘’racing to the firehouse in his car at life-endangering speed’’ is incorrect and fosters a negative stereotype. While prompt response is necessary in any emergency, the law does not allow, nor does the volunteer fire service condone, excessive speed when responding to alarms. The flashing blue light displayed on private autos is not intended to warn of high speed, but rather to alert motorists and encourage them to move aside so firefighters may proceed safely through traffic with the least possible delay. THOMAS BELLINGHAM Captain, Sea Cliff Fire Department	NA	http://www.nytimes.com/1981/10/18/nyregion/l-volunteers-response-to-fire-alarms-055283.html

Research Questions

The availability of required datasets is a concern for this project, I will try to leverage the open available datasets on NYC open data portal into useful insights and try to answer the following research questions:

Can the boroughs be divided into high, moderate and low risk fire zones based on the incident counts or all the boroughs have evenly distriburted fire incidents.
Is the average response time in all the boroughs equal
All the fire houses locations distributed evenly in the boroughs
Does the average response time varies on the number of incidents (check the independence between Incident counts and response times)

Installing required libraries

suppressWarnings(library(knitr))
suppressWarnings(library(plyr))
suppressWarnings(library(rgdal))

## Loading required package: sp

## rgdal: version: 1.1-8, (SVN revision 616)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.0.1, released 2015/09/15
##  Path to GDAL shared files: C:/Users/Gurpreet/Documents/R/win-library/3.2/rgdal/gdal
##  GDAL does not use iconv for recoding strings.
##  Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
##  Path to PROJ.4 shared files: C:/Users/Gurpreet/Documents/R/win-library/3.2/rgdal/proj
##  Linking to sp version: 1.2-3

suppressWarnings(library(ggplot2))
suppressWarnings(library(sp))
suppressWarnings(library(rgdal))
suppressWarnings(library(rgeos))

## rgeos version: 0.3-19, (SVN revision 524)
##  GEOS runtime version: 3.5.0-CAPI-1.9.0 r4084 
##  Linking to sp version: 1.2-3 
##  Polygon checking: TRUE

suppressWarnings(library(rvest))

## Loading required package: xml2

suppressWarnings(library(stringr))
suppressWarnings(library(tidyr))
suppressWarnings(library(maps))

## 
##  # maps v3.1: updated 'world': all lakes moved to separate new #
##  # 'lakes' database. Type '?world' or 'news(package="maps")'.  #

## 
## Attaching package: 'maps'

## The following object is masked from 'package:plyr':
## 
##     ozone

suppressWarnings(library(choroplethr))

## Loading required package: acs

## Loading required package: XML

## 
## Attaching package: 'XML'

## The following object is masked from 'package:rvest':
## 
##     xml

## 
## Attaching package: 'acs'

## The following object is masked from 'package:base':
## 
##     apply

suppressWarnings(library(ggthemes))
suppressWarnings(library(jsonlite))

```

Data Collection

The data is collected using New York City’s open data community portal and Socrata. The data files response and locations are used for analysis of fire incidents and locations of fire houses in different boroughs. The dataset pop_county is included in the package choroplethr. The data for square mileage is gleaned from NY state webpage.

res <- read.csv("https://raw.githubusercontent.com/gpsingh12/IS-607-MSDA/master/response.csv")

loc <- read.csv("https://raw.githubusercontent.com/gpsingh12/IS-607-MSDA/master/locations.csv", skip=1)

data(df_pop_county)

Data Wrangling

Response time is in mm:ss format. We need to convert the time into useful insights (seconds) for analyzing and applying the statistical tests. In addition the data with insufficient information is removed from the file.

We will convert the time format into seconds by adding additional column in the response dataset.

res <- res[-(505:547),]
time<-(res$AVERAGERESPONSETIME)
time<-as.character(time)
SEC <-sapply(strsplit(time,":"),
       function(x) {
         x <- as.numeric(x)
         x[1]*60+x[2]
       }
)

res["RESPONSESECONDS"] <-SEC
kable(head(res))

YEARMONTH	INCIDENTCLASSIFICATION	INCIDENTBOROUGH	INCIDENTCOUNT	AVERAGERESPONSETIME	RESPONSESECONDS
200907	Structural Fires	Citywide	1947	3:54	234
200907	Structural Fires	Manhattan	435	4:00	240
200907	Structural Fires	Bronx	432	3:59	239
200907	Structural Fires	Staten Island	90	4:34	274
200907	Structural Fires	Brooklyn	652	3:29	209
200907	Structural Fires	Queens	338	4:17	257

For the locations dataset, we will get the headcount for number of stations in the all five boroughs.

loc<- loc[,3]
loc<- as.data.frame(table(loc))
loc

##             loc Freq
## 1         Bronx   34
## 2      Brooklyn   66
## 3     Manhattan   48
## 4        Queens   50
## 5 Staten Island   20

Analysis

Incident Counts for all the boroughs and Average response time for them. The incidents are divided into different catregories. We will need the total count for all incidents.

Man <- subset(res, INCIDENTBOROUGH == "Manhattan"& INCIDENTCLASSIFICATION=="All Fire/Emergency Incidents")
Brk<- subset(res, INCIDENTBOROUGH == "Brooklyn"& INCIDENTCLASSIFICATION=="All Fire/Emergency Incidents")
Bx<- subset(res, INCIDENTBOROUGH == "Bronx"& INCIDENTCLASSIFICATION=="All Fire/Emergency Incidents")
Qns <- subset(res, INCIDENTBOROUGH == "Queens"& INCIDENTCLASSIFICATION=="All Fire/Emergency Incidents")
SI <- subset(res, INCIDENTBOROUGH == "Staten Island"& INCIDENTCLASSIFICATION=="All Fire/Emergency Incidents")

Dividing boroughs as high, medium and low risk zones

Divide the data into three zones.

High risk > 9,000 Incident Counts

Medium risk between 5,000 - 9,000 Incident Counts

Low Risk < 5,000 Incident Counts

High_risk <- subset(res, INCIDENTCLASSIFICATION=="All Fire/Emergency Incidents" & INCIDENTCOUNT > 9000)
Medium_risk <- subset(res, INCIDENTCLASSIFICATION=="All Fire/Emergency Incidents" &  5000 < INCIDENTCOUNT & INCIDENTCOUNT < 9000)

Low_risk <- subset(res, INCIDENTCLASSIFICATION=="All Fire/Emergency Incidents" & INCIDENTCOUNT < 5000)

Comaparing mean response time and Incident count of boroughs with zones.

M_mean <- mean(Man$RESPONSESECONDS)
Brk_mean <- mean(Brk$RESPONSESECONDS)
Bx_mean <- mean(Bx$RESPONSESECONDS)
Qns_mean <-mean(Qns$RESPONSESECONDS)
SI_mean <- mean(SI$RESPONSESECONDS)


summary(Man$INCIDENTCOUNT)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9189   10020   10560   10470   10800   11630

summary(SI$INCIDENTCOUNT)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1993    2104    2238    2324    2350    3555

summary(Brk$INCIDENTCOUNT)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10400   10860   11530   11470   11870   12670

summary(Qns$INCIDENTCOUNT)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7453    7670    8337    8316    8888    9272

summary(Bx$INCIDENTCOUNT)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7038    7708    8332    8194    8748    8949

Classifying the boroughs in high, moderate and low risk zones.

high<- High_risk[,3]
as.data.frame(table(high))

##              high Freq
## 1           Bronx    0
## 2        Brooklyn   12
## 3        Citywide   12
## 4 INCIDENTBOROUGH    0
## 5       Manhattan   12
## 6          Queens    2
## 7   Staten Island    0

Med<- Medium_risk[,3]
as.data.frame(table(Med))

##               Med Freq
## 1           Bronx   12
## 2        Brooklyn    0
## 3        Citywide    0
## 4 INCIDENTBOROUGH    0
## 5       Manhattan    0
## 6          Queens   10
## 7   Staten Island    0

Low<- Low_risk[,3]
as.data.frame(table(Low))

##               Low Freq
## 1           Bronx    0
## 2        Brooklyn    0
## 3        Citywide    0
## 4 INCIDENTBOROUGH    0
## 5       Manhattan    0
## 6          Queens    0
## 7   Staten Island   12

The frequency table based on the incident count defines Brooklyn and Manhattan (considering Queens as an outlier) as high risk, Queens and Bronx as moderate risk, Staten Island as low risk fire zones.

Calculate Average response time in seconds for three zones.

summary(High_risk$RESPONSESECONDS)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   238.0   249.8   271.0   266.1   277.0   289.0

summary(Medium_risk$RESPONSESECONDS)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   276.0   282.2   284.5   286.1   289.0   301.0

summary(Low_risk$RESPONSESECONDS)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   273.0   280.0   285.0   291.7   288.2   338.0

We have the summary of all three zones. The mean response time for the zones are highest for low risk followed by medium and high risk. Although the high response time in low risk zone reveals that the incident count and response time are not releated. But we will perform the statistical check on individual zones.

Chi-square test of Independence

Null Hypothesis: Incident Count and Response times are independent

Alternate Hypothesis: Incident Count and Response times are dependent

chisq.test(High_risk$INCIDENTCOUNT, High_risk$RESPONSESECONDS)

## Warning in chisq.test(High_risk$INCIDENTCOUNT, High_risk$RESPONSESECONDS):
## Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  High_risk$INCIDENTCOUNT and High_risk$RESPONSESECONDS
## X-squared = 874, df = 851, p-value = 0.2848

chisq.test(Medium_risk$INCIDENTCOUNT, Medium_risk$RESPONSESECONDS)

## Warning in chisq.test(Medium_risk$INCIDENTCOUNT, Medium_risk
## $RESPONSESECONDS): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  Medium_risk$INCIDENTCOUNT and Medium_risk$RESPONSESECONDS
## X-squared = 330, df = 315, p-value = 0.2693

chisq.test(Low_risk$INCIDENTCOUNT, Low_risk$RESPONSESECONDS)

## Warning in chisq.test(Low_risk$INCIDENTCOUNT, Low_risk$RESPONSESECONDS):
## Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  Low_risk$INCIDENTCOUNT and Low_risk$RESPONSESECONDS
## X-squared = 108, df = 99, p-value = 0.252

The p-value in all three cases is greater than .05. We do not have enough evidence to reject Null hypothesis. i.e Incident Count and Response times are independent

Locations distributions by Zones

Borough <-c("Bronx", "Brooklyn","Manhattan","Queens", "Staten Island")
StnCount <- c(34,66,48,50,20)
IncidentCount <-c(8949,12670,11630,9272,3555)
Area <- c(57,96.9, 33.7, 178, 59)

location <-data.frame(Borough,StnCount,IncidentCount, Area)

location%>%
   mutate(sqMilesCovered = Area/StnCount)

##         Borough StnCount IncidentCount  Area sqMilesCovered
## 1         Bronx       34          8949  57.0      1.6764706
## 2      Brooklyn       66         12670  96.9      1.4681818
## 3     Manhattan       48         11630  33.7      0.7020833
## 4        Queens       50          9272 178.0      3.5600000
## 5 Staten Island       20          3555  59.0      2.9500000

Mean_time <- c(Bx_mean, Brk_mean, M_mean, Qns_mean, SI_mean)

loc_mean <- data.frame(Borough, StnCount, Mean_time)



SqMilesCvd <-  c(1.67,1.46,.70,3.56,2.95)

cor(loc_mean$Mean_time, SqMilesCvd)

## [1] 0.5271344

#No strong corelation

cor(loc_mean$Mean_time,loc_mean$StnCount)

## [1] -0.7970305

## Evident that station count and response time negatively corelated. Station Count increases response time will decrease and vice versa.

cor(location$StnCount,location$Area)

## [1] 0.384409

#No Corelation

Visualization

# Mean response time for three zones
# High= 1, Medium = 2, Low =3

boxplot(High_risk$RESPONSESECONDS,Medium_risk$RESPONSESECONDS, Low_risk$RESPONSESECONDS)

#Distribution of Fire Houses

x <-barplot(location$StnCount, main = "Distribution of FireHouses", xlab = "Borough", ylab="frequency", col=c("darkblue","red"), names.arg=location$Borough)

# Distribution of response time in seconds over 5 Boroughs


barplot(loc_mean$Mean_time, main = "Average Respopnse Time in sec. by Boroughs", xlab = "Borough", ylab="Time (in seconds)", col=c("darkblue","red"), names.arg=loc_mean$Borough)

#Distribution of high, medium and low risk zones in 5 boroughs high risk with dark blue and least with light blue.
# FIPS codes for the 5 counties (boroughs) of New York City
nyc_fips = c(36005, 36047, 36061, 36081, 36085)



region <- c(36005, 36047, 36061, 36081, 36085)
value <-c(8949, 12670,11630,9272,3555)

df <- data.frame(region, value)
nyc_county_fips = region
county_choropleth(df, 
 title       = "NY City County Fire Zones",
 legend      = "Boroughs",
 num_colors  = 5,

 county_zoom = nyc_county_fips)

#Distribution of population covered by Fire Houses in 5 boroughs high risk with dark blue and least with light blue.

nyc_county_fips = c(36005, 36047, 36061, 36081, 36085)
county_choropleth(df_pop_county, 
 title       = "NY City County Population Estimates",
 legend      = "Population",
 num_colors  = 5,
 county_zoom = nyc_county_fips)

Conclusion

The analysis provided solutions to the areas of concern. The boroughs were dividedbinto high, moderate and low risk zones. The average response time for each borough was different. In addition the location of fire houses was unevenly distributed with stations in Manhattan were shortest distance apart as compared to Queens and SI. Also the incident count and response time were independent of each other.Brooklyn had more occurence of fire incidents and Staten Island with least no. of incidents. The response time in Brooklyn is still leastand most effective although it has highest incident count and it is densely populated while SI has the highest response time in an emergency. One possible factor is the no. of square miles covered by each fire company in Brooklyn is much smaller as compared to Staten Island.The count of fire houses (in Brooklyn more than SI) is a possible cause of the least response time in Brooklyn. In addition the population to be covered in Brooklyn is more than SI also (can not be considered as an important factor). Being more dense and existence of high rise buildings does not effect the response time in Brooklyn while SI having least amount of population and having less fire incidents got comparitively ineffective service, in accordance with the previous analysis link here The indepth analysis of response in SI can provide causes of delays and possible solutions for increasing the effectiveness that is response times. Above all these facts, FDNY firefighters are committed to our service. FDNY is ranked top in US among all the fire departments see link. Traffic might be another aspect of the delay. Although there might be some other factors, this study can provide foundation in considering other factors.

Refrences:

http://www.firefighternation.com/article/technology/using-technology-reduce-response-times

http://www.r-bloggers.com/choroplethr-v3-0-0-is-now-on-cran/

http://onlinefiresciencedegree.org/noteworthy-fire-departments/

https://nycplatform.socrata.com/Public-Safety/FDNY-Firehouse-Listing/hc8x-tcnd

http://blogs.wsj.com/numbers/the-fire-countdown-clock-1134/