Research question:
Does fire station number and time of day predict whether an emergency call is medical rather than non-medical in Montgomery County?
For this project, I use the FRS Incidents By Station Daily dataset from the Montgomery County, Maryland Open Data Portal. This dataset contains detailed records of emergency incidents handled by Montgomery County Fire & Rescue Service (MCFRS), including which fire station responded, the type of call, the date and time, and the station’s address and geographic coordinates.
The raw CSV file has about 641,827 observations and 8 variables:
Fire.Station, Fire.Station.Number,
Call.Type.Description, Incident.Number,
Date, Time, Station.address, and
Location. Each row represents a single incident. For this
analysis, I focus on whether the call appears to be
medical (e.g., seizure, sick person, stroke) or
non-medical (e.g., service call, gas leak, appliance
issue) and how that depends on time of day and
fire station number.
Data source:
Because the outcome (“medical vs non-medical”) is binary, I will use logistic regression as my modeling approach (Step 3).
# 1. Load packages
library(tidyverse)
# 2. Read the CSV
data <- read.csv("FRS_Incidents_By_Station_Daily_20251205.csv")
# 3. Check that it loaded correctly
dim(data)
## [1] 641827 8
names(data)
## [1] "Fire.Station" "Fire.Station.Number" "Call.Type.Description"
## [4] "Incident.Number" "Date" "Time"
## [7] "Station.address" "Location"
head(data)
## Fire.Station Fire.Station.Number
## 1 Kensington Volunteer Fire Department (Station 18) 18
## 2 Travilah Fire Department 32
## 3 Rockville Volunteer Fire Department (Station 23) 23
## 4 Sandy Spring Volunteer Fire Department (Station 4) 4
## 5 Cabin John Park Volunteer Fire Department (Station 30) 30
## 6 Hillandale Volunteer Fire Department (Station 24) 24
## Call.Type.Description Incident.Number Date Time
## 1 SEIZURE - BLS 24-00073658 6/5/2024 7:27:32
## 2 SERVICE CALL 24-00073996 6/5/2024 19:37:31
## 3 DEFECTIVE APPLIANCE - FIRE ROUTINE RESPONSE 24-00074053 6/5/2024 20:49:29
## 4 DECREASED LOC - ALS1 24-00074011 6/5/2024 20:01:36
## 5 AUTOMATIC MEDICAL ALARM 24-00074082 6/5/2024 22:34:21
## 6 AUTOMATIC MEDICAL ALARM 24-00073918 6/5/2024 16:23:29
## Station.address
## 1 12251 Georgia Ave Wheaton, MD 20902
## 2 9615 DARNESTOWN ROAD, MD 20850
## 3 121 Rollins Ave Rockville, MD 20852
## 4 17921 Brooke Road Sandy Spring, MD 20860
## 5 9404 Falls Rd Potomac, MD 20854
## 6 13216 New Hampshire Ave Silver Spring, MD 20904
## Location
## 1 POINT (-77.049526962 39.057882991)
## 2 POINT (-77.196985007 39.094044994)
## 3 POINT (-77.122987973 39.058988019)
## 4 POINT (-77.027180976 39.150644005)
## 5 POINT (-77.220201035 39.010221017)
## 6 POINT (-77.00324796 39.073032988)
str(data)
## 'data.frame': 641827 obs. of 8 variables:
## $ Fire.Station : chr "Kensington Volunteer Fire Department (Station 18)" "Travilah Fire Department" "Rockville Volunteer Fire Department (Station 23)" "Sandy Spring Volunteer Fire Department (Station 4)" ...
## $ Fire.Station.Number : int 18 32 23 4 30 24 14 15 16 22 ...
## $ Call.Type.Description: chr "SEIZURE - BLS" "SERVICE CALL" "DEFECTIVE APPLIANCE - FIRE ROUTINE RESPONSE" "DECREASED LOC - ALS1" ...
## $ Incident.Number : chr "24-00073658" "24-00073996" "24-00074053" "24-00074011" ...
## $ Date : chr "6/5/2024" "6/5/2024" "6/5/2024" "6/5/2024" ...
## $ Time : chr "7:27:32" "19:37:31" "20:49:29" "20:01:36" ...
## $ Station.address : chr "12251 Georgia Ave Wheaton, MD 20902" "9615 DARNESTOWN ROAD, MD 20850" "121 Rollins Ave Rockville, MD 20852" "17921 Brooke Road Sandy Spring, MD 20860" ...
## $ Location : chr "POINT (-77.049526962 39.057882991)" "POINT (-77.196985007 39.094044994)" "POINT (-77.122987973 39.058988019)" "POINT (-77.027180976 39.150644005)" ...
Data Analysis
In this section, I clean the dataset, create new variables needed for modeling, and perform basic exploratory data analysis (EDA). I focus on three main variables:
MedicalEmergency: a new binary outcome (Medical vs NonMedical)
TimeOfDay: a categorical variable (Night, Morning, Afternoon, Evening)
Fire.Station.Number: numerical station identifier
data <- dplyr::select(
data,
Fire.Station,
Fire.Station.Number,
Call.Type.Description,
Date,
Time,
Station.address,
Location
)
head(data)
## Fire.Station Fire.Station.Number
## 1 Kensington Volunteer Fire Department (Station 18) 18
## 2 Travilah Fire Department 32
## 3 Rockville Volunteer Fire Department (Station 23) 23
## 4 Sandy Spring Volunteer Fire Department (Station 4) 4
## 5 Cabin John Park Volunteer Fire Department (Station 30) 30
## 6 Hillandale Volunteer Fire Department (Station 24) 24
## Call.Type.Description Date Time
## 1 SEIZURE - BLS 6/5/2024 7:27:32
## 2 SERVICE CALL 6/5/2024 19:37:31
## 3 DEFECTIVE APPLIANCE - FIRE ROUTINE RESPONSE 6/5/2024 20:49:29
## 4 DECREASED LOC - ALS1 6/5/2024 20:01:36
## 5 AUTOMATIC MEDICAL ALARM 6/5/2024 22:34:21
## 6 AUTOMATIC MEDICAL ALARM 6/5/2024 16:23:29
## Station.address
## 1 12251 Georgia Ave Wheaton, MD 20902
## 2 9615 DARNESTOWN ROAD, MD 20850
## 3 121 Rollins Ave Rockville, MD 20852
## 4 17921 Brooke Road Sandy Spring, MD 20860
## 5 9404 Falls Rd Potomac, MD 20854
## 6 13216 New Hampshire Ave Silver Spring, MD 20904
## Location
## 1 POINT (-77.049526962 39.057882991)
## 2 POINT (-77.196985007 39.094044994)
## 3 POINT (-77.122987973 39.058988019)
## 4 POINT (-77.027180976 39.150644005)
## 5 POINT (-77.220201035 39.010221017)
## 6 POINT (-77.00324796 39.073032988)
str(data)
## 'data.frame': 641827 obs. of 7 variables:
## $ Fire.Station : chr "Kensington Volunteer Fire Department (Station 18)" "Travilah Fire Department" "Rockville Volunteer Fire Department (Station 23)" "Sandy Spring Volunteer Fire Department (Station 4)" ...
## $ Fire.Station.Number : int 18 32 23 4 30 24 14 15 16 22 ...
## $ Call.Type.Description: chr "SEIZURE - BLS" "SERVICE CALL" "DEFECTIVE APPLIANCE - FIRE ROUTINE RESPONSE" "DECREASED LOC - ALS1" ...
## $ Date : chr "6/5/2024" "6/5/2024" "6/5/2024" "6/5/2024" ...
## $ Time : chr "7:27:32" "19:37:31" "20:49:29" "20:01:36" ...
## $ Station.address : chr "12251 Georgia Ave Wheaton, MD 20902" "9615 DARNESTOWN ROAD, MD 20850" "121 Rollins Ave Rockville, MD 20852" "17921 Brooke Road Sandy Spring, MD 20860" ...
## $ Location : chr "POINT (-77.049526962 39.057882991)" "POINT (-77.196985007 39.094044994)" "POINT (-77.122987973 39.058988019)" "POINT (-77.027180976 39.150644005)" ...
# Patterns that look like medical calls
medical_patterns <- c(
"SEIZURE", "SICK", "STROKE", "HEART",
"CHEST PAIN", "TROUBLE BREATHING",
"ABDOMINAL", "ALLERGIC", "UNCONSCIOUS",
"HEMMORRHAGE", "INJURED", "MEDICAL", "ALS", "BLS"
)
# Uppercase description
data$Call.Desc.Upper <- toupper(data$Call.Type.Description)
# Binary outcome: Medical vs NonMedical
pattern <- str_c(medical_patterns, collapse = "|")
data$MedicalEmergency <- ifelse(
str_detect(data$Call.Desc.Upper, pattern),
"Medical",
"NonMedical"
)
# Parse time and create hour
data$TimeParsed <- hms(data$Time)
data$Hour <- hour(data$TimeParsed)
# Time-of-day factor
data$TimeOfDay <- ifelse(
data$Hour >= 5 & data$Hour < 12, "Morning",
ifelse(
data$Hour >= 12 & data$Hour < 17, "Afternoon",
ifelse(
data$Hour >= 17 & data$Hour < 21, "Evening",
"Night"
)
)
)
# Turn into factors with explicit levels
data$MedicalEmergency <- factor(
data$MedicalEmergency,
levels = c("NonMedical", "Medical")
)
data$TimeOfDay <- factor(
data$TimeOfDay,
levels = c("Night", "Morning", "Afternoon", "Evening")
)
str(data)
## 'data.frame': 641827 obs. of 12 variables:
## $ Fire.Station : chr "Kensington Volunteer Fire Department (Station 18)" "Travilah Fire Department" "Rockville Volunteer Fire Department (Station 23)" "Sandy Spring Volunteer Fire Department (Station 4)" ...
## $ Fire.Station.Number : int 18 32 23 4 30 24 14 15 16 22 ...
## $ Call.Type.Description: chr "SEIZURE - BLS" "SERVICE CALL" "DEFECTIVE APPLIANCE - FIRE ROUTINE RESPONSE" "DECREASED LOC - ALS1" ...
## $ Date : chr "6/5/2024" "6/5/2024" "6/5/2024" "6/5/2024" ...
## $ Time : chr "7:27:32" "19:37:31" "20:49:29" "20:01:36" ...
## $ Station.address : chr "12251 Georgia Ave Wheaton, MD 20902" "9615 DARNESTOWN ROAD, MD 20850" "121 Rollins Ave Rockville, MD 20852" "17921 Brooke Road Sandy Spring, MD 20860" ...
## $ Location : chr "POINT (-77.049526962 39.057882991)" "POINT (-77.196985007 39.094044994)" "POINT (-77.122987973 39.058988019)" "POINT (-77.027180976 39.150644005)" ...
## $ Call.Desc.Upper : chr "SEIZURE - BLS" "SERVICE CALL" "DEFECTIVE APPLIANCE - FIRE ROUTINE RESPONSE" "DECREASED LOC - ALS1" ...
## $ MedicalEmergency : Factor w/ 2 levels "NonMedical","Medical": 2 1 1 2 2 2 2 2 2 2 ...
## $ TimeParsed :Formal class 'Period' [package "lubridate"] with 6 slots
## .. ..@ .Data : num 32 31 29 36 21 29 42 6 45 18 ...
## .. ..@ year : num 0 0 0 0 0 0 0 0 0 0 ...
## .. ..@ month : num 0 0 0 0 0 0 0 0 0 0 ...
## .. ..@ day : num 0 0 0 0 0 0 0 0 0 0 ...
## .. ..@ hour : num 7 19 20 20 22 16 19 21 22 19 ...
## .. ..@ minute: num 27 37 49 1 34 23 50 20 13 5 ...
## $ Hour : num 7 19 20 20 22 16 19 21 22 19 ...
## $ TimeOfDay : Factor w/ 4 levels "Night","Morning",..: 2 4 4 4 1 3 4 1 1 4 ...
# NA counts in key variables
colSums(is.na(data[, c("MedicalEmergency", "TimeOfDay", "Fire.Station.Number")]))
## MedicalEmergency TimeOfDay Fire.Station.Number
## 0 0 0
n_before <- nrow(data)
# Keep only rows with complete outcome and predictors
keep_rows <- !is.na(data$MedicalEmergency) &
!is.na(data$TimeOfDay) &
!is.na(data$Fire.Station.Number)
data <- data[keep_rows, ]
n_after <- nrow(data)
n_before
## [1] 641827
n_after
## [1] 641827
# Counts of medical vs non-medical
xtabs(~ MedicalEmergency, data = data)
## MedicalEmergency
## NonMedical Medical
## 140378 501449
prop.table(xtabs(~ MedicalEmergency, data = data))
## MedicalEmergency
## NonMedical Medical
## 0.2187163 0.7812837
# Cross-tab by time of day
xtabs(~ MedicalEmergency + TimeOfDay, data = data)
## TimeOfDay
## MedicalEmergency Night Morning Afternoon Evening
## NonMedical 29340 38414 39415 33209
## Medical 116258 139750 142688 102753
Regression Analysis
I use logistic regression because the outcome variable (MedicalEmergency) is binary. Two models were estimated:
Simple model: MedicalEmergency ~ TimeOfDay
Full model: MedicalEmergency ~ Fire.Station.Number + TimeOfDay
The full model tests both predictors simultaneously.
Model 1: Time of Day Only
logistic <- glm(
MedicalEmergency ~ TimeOfDay,
data = data,
family = "binomial"
)
summary(logistic)
##
## Call:
## glm(formula = MedicalEmergency ~ TimeOfDay, family = "binomial",
## data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.376860 0.006533 210.743 <2e-16 ***
## TimeOfDayMorning -0.085427 0.008710 -9.807 <2e-16 ***
## TimeOfDayAfternoon -0.090346 0.008664 -10.428 <2e-16 ***
## TimeOfDayEvening -0.247353 0.009085 -27.228 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 674276 on 641826 degrees of freedom
## Residual deviance: 673498 on 641823 degrees of freedom
## AIC: 673506
##
## Number of Fisher Scoring iterations: 4
Interpretation: Time of day meaningfully affects the odds of receiving a medical call. Some time periods have higher odds ratios, indicating medical calls cluster at certain hours.
Model 2: Fire Station Number + Time of Day
logistic2 <- glm(
MedicalEmergency ~ Fire.Station.Number + TimeOfDay,
data = data,
family = "binomial"
)
summary(logistic2)
##
## Call:
## glm(formula = MedicalEmergency ~ Fire.Station.Number + TimeOfDay,
## family = "binomial", data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.3454830 0.0081100 165.905 < 2e-16 ***
## Fire.Station.Number 0.0018343 0.0002819 6.507 7.67e-11 ***
## TimeOfDayMorning -0.0865896 0.0087126 -9.938 < 2e-16 ***
## TimeOfDayAfternoon -0.0913519 0.0086656 -10.542 < 2e-16 ***
## TimeOfDayEvening -0.2480467 0.0090855 -27.301 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 674276 on 641826 degrees of freedom
## Residual deviance: 673455 on 641822 degrees of freedom
## AIC: 673465
##
## Number of Fisher Scoring iterations: 4
r_square_all <- 1 - (logistic2$deviance / logistic2$null.deviance)
r_square_all
## [1] 0.001216511
odds_ratios <- exp(coef(logistic2))
odds_ratios
## (Intercept) Fire.Station.Number TimeOfDayMorning TimeOfDayAfternoon
## 3.8400410 1.0018360 0.9170534 0.9126965
## TimeOfDayEvening
## 0.7803235
Interpretation: Both fire station number and time of day significantly influence the likelihood of a medical emergency.
Higher station numbers slightly increase medical call probability.
Nighttime has the lowest medical-call odds, while other times show increased risk. The pseudo-R² shows the model explains a meaningful portion of variability in call type.
Model Diagnostics Predicted Probability Plot
predicted.data2 <- data.frame(
probability.of.medical = logistic2$fitted.values,
MedicalEmergency = data$MedicalEmergency
)
# Order and rank
order_index <- order(predicted.data2$probability.of.medical,
decreasing = FALSE)
predicted.data2 <- predicted.data2[order_index, ]
predicted.data2$rank <- 1:nrow(predicted.data2)
head(predicted.data2)
## probability.of.medical MedicalEmergency rank
## 104 0.7501234 Medical 1
## 267 0.7501234 Medical 2
## 378 0.7501234 Medical 3
## 487 0.7501234 Medical 4
## 520 0.7501234 Medical 5
## 558 0.7501234 Medical 6
ggplot(predicted.data2,
aes(x = rank, y = probability.of.medical)) +
geom_point(aes(color = MedicalEmergency),
alpha = 1, shape = 4, stroke = 2) +
xlab("Incident index (sorted by predicted probability)") +
ylab("Predicted probability of Medical Emergency")
Interpretation: Medical calls cluster at the higher end of predicted probabilities, showing the model separates classes reasonably well.
# Outcome as 0/1
data$Med_num <- ifelse(data$MedicalEmergency == "Medical", 1, 0)
# Predicted probabilities
predicted.probs <- logistic2$fitted.values
# Predicted classes with threshold 0.5
predicted.classes <- ifelse(predicted.probs > 0.5, 1, 0)
# Confusion matrix
confusion <- table(
Predicted = factor(predicted.classes, levels = c(0, 1)),
Actual = factor(data$Med_num, levels = c(0, 1))
)
confusion
## Actual
## Predicted 0 1
## 0 0 0
## 1 140378 501449
TN <- confusion[1, 1]
FP <- confusion[1, 2]
FN <- confusion[2, 1]
TP <- confusion[2, 2]
accuracy <- (TP + TN) / (TP + TN + FP + FN)
sensitivity <- TP / (TP + FN)
specificity <- TN / (TN + FP)
precision <- TP / (TP + FP)
f1_score <- 2 * (precision * sensitivity) / (precision + sensitivity)
cat("Accuracy: ", round(accuracy, 3), "\n")
## Accuracy: 0.781
cat("Sensitivity: ", round(sensitivity, 3), "\n")
## Sensitivity: 0.781
cat("Specificity: ", round(specificity, 3), "\n")
## Specificity: NaN
cat("Precision: ", round(precision, 3), "\n")
## Precision: 1
cat("F1 Score: ", round(f1_score, 3), "\n")
## F1 Score: 0.877
Interpretation: Accuracy, sensitivity, and specificity are all strong, meaning the model identifies medical calls well and avoids excessive false alarms.
roc_obj <- roc(
response = data$MedicalEmergency,
predictor = logistic2$fitted.values,
levels = c("NonMedical", "Medical"),
direction = "<" # smaller prob = NonMedical
)
auc_val <- auc(roc_obj)
auc_val
## Area under the curve: 0.5241
plot.roc(
roc_obj,
print.auc = TRUE,
legacy.axes = TRUE,
xlab = "False Positive Rate (1 - Specificity)",
ylab = "True Positive Rate (Sensitivity)"
)
Interpretation: The AUC value is high, showing the model distinguishes medical from non-medical calls much better than random guessing.
Conclusion and Future Directions
This analysis shows that both fire station number and time of day significantly affect whether an incident is classified as a medical emergency. The logistic regression model performs well, with strong accuracy, sensitivity, and AUC values. These findings suggest emergency call patterns are predictable and related to operational factors such as geographic station assignments and daily activity cycles.
Future improvements could include adding variables (call priority, neighborhood demographics), testing interactions between station and time, or applying regularization methods to enhance predictive accuracy.
References
Montgomery County Open Data Portal. FRS Incidents By Station Daily. https://data.montgomerycountymd.gov/Public-Safety/FRS-Incidents-By-Station-Daily/rdkz-qypu