Going Behind the Call: Uncovering Patterns in 911 Emergencies Using Python and Pandas

Independent Data Analysis Project

Published

November 1, 2024

Modified

November 1, 2024

Executive Summary

In this analysis, we leveraged Python and Pandas to uncover key insights into 911 emergency call patterns. Our findings revealed that medical emergencies (EMS) are the leading cause of calls, with peak call times around 7 AM and 7 PM, likely linked to periods of high activity. Interestingly, January saw the highest volume of emergency calls, while December experienced a drop, perhaps due to holiday festivities. These insights underscore the value of data analysis in understanding and anticipating community needs, helping to better allocate resources and improve emergency response strategies.

Keywords

Data analysis, Python, Pandas, Seaborn, Numpy, Descriptive Analysis, Data Science, Machine Learning

Background

For my capstone project, I’m diving into the world of emergency response by analyzing 911 call data from Montgomery County, Pennsylvania, available on Kaggle. This dataset captures essential details like call locations, times, and types of emergencies—from accidents to back pains. My goal? To uncover patterns and insights that could help improve response strategies and resource allocation for first responders, ultimately making emergency services faster and more effective 1.

Data

The data contains the following nine (9) fields and 99,942 observations:

Variable Name Variable Type Variable Description
lat String Latitude
lng String Longitude
desc String Description of the Emergency Call
zip String Zipcode
title String Title
timeStamp String YYYY-MM-DD HH:MM:SS
twp String Township
addr String Address
e String Dummy variable (always 1)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
calls = pd.read_csv("911.csv")

Data Analysis

Descriptive Analysis

We start by examining the data.

calls.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99492 entries, 0 to 99491
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   lat        99492 non-null  float64
 1   lng        99492 non-null  float64
 2   desc       99492 non-null  object 
 3   zip        86637 non-null  float64
 4   title      99492 non-null  object 
 5   timeStamp  99492 non-null  object 
 6   twp        99449 non-null  object 
 7   addr       98973 non-null  object 
 8   e          99492 non-null  int64  
dtypes: float64(3), int64(1), object(5)
memory usage: 6.8+ MB

We see that the data consists of three (9)variables with three (3) being numeric, while the rest are string variables (labeled object). The data has a 99,942 observations. It is also in order that we look into the first few rows of the data.

calls.head()
lat lng desc zip title timeStamp twp addr e
0 40.297876 -75.581294 REINDEER CT & DEAD END; NEW HANOVER; Station ... 19525.0 EMS: BACK PAINS/INJURY 2015-12-10 17:40:00 NEW HANOVER REINDEER CT & DEAD END 1
1 40.258061 -75.264680 BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP... 19446.0 EMS: DIABETIC EMERGENCY 2015-12-10 17:40:00 HATFIELD TOWNSHIP BRIAR PATH & WHITEMARSH LN 1
2 40.121182 -75.351975 HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St... 19401.0 Fire: GAS-ODOR/LEAK 2015-12-10 17:40:00 NORRISTOWN HAWS AVE 1
3 40.116153 -75.343513 AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;... 19401.0 EMS: CARDIAC EMERGENCY 2015-12-10 17:40:01 NORRISTOWN AIRY ST & SWEDE ST 1
4 40.251492 -75.603350 CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S... NaN EMS: DIZZINESS 2015-12-10 17:40:01 LOWER POTTSGROVE CHERRYWOOD CT & DEAD END 1

Finally, I run some summary statistics for numeric statistics for numeric and string variables and then visualize their relationship.

calls.describe()
lat lng zip e
count 99492.000000 99492.000000 86637.000000 99492.0
mean 40.159526 -75.317464 19237.658298 1.0
std 0.094446 0.174826 345.344914 0.0
min 30.333596 -95.595595 17752.000000 1.0
25% 40.100423 -75.392104 19038.000000 1.0
50% 40.145223 -75.304667 19401.000000 1.0
75% 40.229008 -75.212513 19446.000000 1.0
max 41.167156 -74.995041 77316.000000 1.0
calls.describe(include = "object")
desc title timeStamp twp addr
count 99492 99492 99492 99449 98973
unique 99455 110 72577 68 21914
top GREEN ST & E BASIN ST; NORRISTOWN; Station 30... Traffic: VEHICLE ACCIDENT - 2015-12-10 17:40:01 LOWER MERION SHANNONDELL DR & SHANNONDELL BLVD
freq 4 23066 8 8443 938
sns.pairplot(calls)

Basic Questions

What are the top 5 Zip Codes for 911 Calls

For emergency planning, managers could be interested in the Zip Codes or areas where most emergency calls originate from. Here is the list of the 5 hot-spots for this city.

calls['zip'].value_counts().head(5)
zip
19401.0    6979
19464.0    6643
19403.0    4854
19446.0    4748
19406.0    3174
Name: count, dtype: int64

What are the top 5 townships for emergency calls

Similarly, we can drill down further into the townships with the most emergency calls, as shown below.

calls['twp'].value_counts().head(5)
twp
LOWER MERION    8443
ABINGTON        5977
NORRISTOWN      5890
UPPER MERION    5227
CHELTENHAM      4575
Name: count, dtype: int64

How many unique title codes are there?

There are 110 unique title codes in the title column.

calls['title'].str.lower().nunique()
110

Creating New Features

Leading Causes of 911 Calls

In the titles column there are “Reasons/Departments” specified before the title code. These are EMS, Fire, and Traffic. I use .apply() with a custom lambda expression to create a new column called “Reason” that contains this string value.

For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS.

calls['reason'] = calls["title"].apply(lambda x: x.split(":")[0])

Based from this reason column, we can see that EMS is the most common Reason for a 911 call.

calls['reason'].value_counts()
reason
EMS        48877
Traffic    35695
Fire       14920
Name: count, dtype: int64

We can visualize this using seaborn.

sns.countplot(x = "reason", data = calls, hue = "reason", palette = "mako")
plt.title("Laeding Causes of 911 Calls")
plt.show()

Times and Dates in Pandas

In this section, we focus on the dates and times in the calls data. Specifically, there is one column titled timeStamp that contains time information. However, it is not in a date-time format. Intead, it is coded as a string.

calls["timeStamp"].head()
0    2015-12-10 17:40:00
1    2015-12-10 17:40:00
2    2015-12-10 17:40:00
3    2015-12-10 17:40:01
4    2015-12-10 17:40:01
Name: timeStamp, dtype: object

We convert this variables into a date-time object.

calls["timeStamp"] = pd.to_datetime(calls["timeStamp"])

Now the column is properly coded into a date and time object.

calls["timeStamp"].dtype
dtype('<M8[ns]')

We can now grab specific attributes from a Datetime object by calling them. For example:

time = calls['timeStamp'].iloc[0]
time.hour
17

NB: You can use Jupyter’s tab method to explore the various attributes you can call. Now that the timestamp column are actually DateTime objects, we use .apply() to create 3 new columns called Hour, Month, and Day of Week, based off of the timeStamp column.

calls["Hour"] = calls["timeStamp"].dt.hour
calls["Month"] = calls["timeStamp"].dt.month_name()
calls["DayOfWeek"] = calls["timeStamp"].dt.day_name()
calls.head()
lat lng desc zip title timeStamp twp addr e reason Hour Month DayOfWeek
0 40.297876 -75.581294 REINDEER CT & DEAD END; NEW HANOVER; Station ... 19525.0 EMS: BACK PAINS/INJURY 2015-12-10 17:40:00 NEW HANOVER REINDEER CT & DEAD END 1 EMS 17 December Thursday
1 40.258061 -75.264680 BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP... 19446.0 EMS: DIABETIC EMERGENCY 2015-12-10 17:40:00 HATFIELD TOWNSHIP BRIAR PATH & WHITEMARSH LN 1 EMS 17 December Thursday
2 40.121182 -75.351975 HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St... 19401.0 Fire: GAS-ODOR/LEAK 2015-12-10 17:40:00 NORRISTOWN HAWS AVE 1 Fire 17 December Thursday
3 40.116153 -75.343513 AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;... 19401.0 EMS: CARDIAC EMERGENCY 2015-12-10 17:40:01 NORRISTOWN AIRY ST & SWEDE ST 1 EMS 17 December Thursday
4 40.251492 -75.603350 CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S... NaN EMS: DIZZINESS 2015-12-10 17:40:01 LOWER POTTSGROVE CHERRYWOOD CT & DEAD END 1 EMS 17 December Thursday

We now use seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column.

sns.countplot(x = "DayOfWeek", data = calls, hue = "reason")

We see that EMS and Traffic emergency calls are the highest throughout the month with notable dips over the weekend. Fire incidents are relatively constant throughout the week.

Now, we do the same for months.

sns.countplot(x = "Month", data = calls, hue = "reason")

Do you notice something strange about the Plot?

You should have noticed it was missing some Months, let’s see if we can maybe fill in this information by plotting the information in another way, possibly a simple line plot that fills in the missing months, in order to do this, we’ll need to do some work with pandas…

We create a groupby object called byMonth, where you group the DataFrame by the month column and use the count() method for aggregation. Use the head() method on this returned DataFrame.

bymonth = calls.groupby("Month").count()
bymonth.head()
lat lng desc zip title timeStamp twp addr e reason Hour DayOfWeek
Month
April 11326 11326 11326 9895 11326 11326 11323 11283 11326 11326 11326 11326
August 9078 9078 9078 7832 9078 9078 9073 9025 9078 9078 9078 9078
December 7969 7969 7969 6907 7969 7969 7963 7916 7969 7969 7969 7969
February 11467 11467 11467 9930 11467 11467 11465 11396 11467 11467 11467 11467
January 13205 13205 13205 11527 13205 13205 13203 13096 13205 13205 13205 13205

We then create a simple plot off of the dataframe indicating the count of calls per month.

bymonth["lat"].plot()

Next, we create a new column called ‘Date’ that contains the date from the timeStamp column. We use apply along with the .date() method.

calls["date"] = calls["timeStamp"].dt.date

We groupby this Date column with the count() aggregate and create a plot of counts of 911 calls.

calls.groupby("date").count()["lat"].plot()
plt.title("Calls by Dates, 2015-")
Text(0.5, 1.0, 'Calls by Dates, 2015-')

Next, we recreate this plot but create 3 separate plots with each plot representing a Reason for the 911 call

calls_reasons = calls.groupby(["date", "reason"]).count().reset_index()[["date", "reason", "lat"]]
sns.lineplot(x = "date", y = "lat", data = calls_reasons, hue = "reason")
plt.title("Calls by Reason")
Text(0.5, 1.0, 'Calls by Reason')

Creating Heatmaps

Now let’s move on to creating heatmaps with seaborn and our data. We’ll first need to restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week. There are lots of ways to do this, but I would recommend trying to combine groupby with an unstack method. Reference the solutions if you get stuck on this!

calls_hour = calls.groupby(["DayOfWeek", "Hour"]).count()["date"].reset_index().pivot_table(index = "DayOfWeek", columns = "Hour", values = "date")

sns.heatmap(calls_hour, cmap = "coolwarm")
plt.title("Heatmap: Time of Day and Emergency Call Ups")
Text(0.5, 1.0, 'Heatmap: Time of Day and Emergency Call Ups')

We note that most incidents are duting working hours between 7AM and 7PM, just when people are active. Very few incidents happen during the other periods.

We also create a clustermap using this dataframe, which tells a similar strory but groups periods with roughly equal numer of emergency calls together using hierarchichal clustering.

sns.clustermap(calls_hour, cmap = "coolwarm")
plt.title("Clustermap: Time of Day and Emergency Call Ups")
Text(0.5, 1.0, 'Clustermap: Time of Day and Emergency Call Ups')

Now we create a heatmap and clustermap showing the number of incodents per month.

calls_month = calls.groupby(["Month", "Hour"]).count()["date"].reset_index().pivot_table(index = "Month", columns = "Hour", values = "date")

sns.heatmap(calls_month, cmap = "coolwarm")
plt.title("Heatmap: Time of Day and Emergency Call Ups by Month")
Text(0.5, 1.0, 'Heatmap: Time of Day and Emergency Call Ups by Month')

sns.clustermap(calls_month, cmap = "coolwarm")
plt.title("Clustermap: Time of Day and Emergency Call Ups by Month")
Text(0.5, 1.0, 'Clustermap: Time of Day and Emergency Call Ups by Month')

Conclusion

In this analysis, we leveraged Python and Pandas to uncover key insights into 911 emergency call patterns. Our findings revealed that medical emergencies (EMS) are the leading cause of calls, with peak call times around 7 AM and 7 PM, likely linked to periods of high activity. Interestingly, January saw the highest volume of emergency calls, while December experienced a drop, perhaps due to holiday festivities. These insights underscore the value of data analysis in understanding and anticipating community needs, helping to better allocate resources and improve emergency response strategies (Muddana and Vinayakam 2024).

References

Muddana, A Lakshmi, and Sandhya Vinayakam. 2024. Python for Data Science. Springer.

Footnotes

  1. You can access the data from Kagle from this link https://www.kaggle.com/mchirico/montcoalert. Note that you will need to create a Kagle account of you do not already have one.↩︎