Going Behind the Call: Uncovering Patterns in 911 Emergencies Using Python and Pandas

Independent Data Analysis Project

Author

Affiliations

John Karuitha, PhD

Karatina University, Department of Business and Economics

University of the Witwatersrand, School of Construction Economics & Management

Published

November 1, 2024

Modified

November 1, 2024

Executive Summary

In this analysis, we leveraged Python and Pandas to uncover key insights into 911 emergency call patterns. Our findings revealed that medical emergencies (EMS) are the leading cause of calls, with peak call times around 7 AM and 7 PM, likely linked to periods of high activity. Interestingly, January saw the highest volume of emergency calls, while December experienced a drop, perhaps due to holiday festivities. These insights underscore the value of data analysis in understanding and anticipating community needs, helping to better allocate resources and improve emergency response strategies.

Keywords

Data analysis, Python, Pandas, Seaborn, Numpy, Descriptive Analysis, Data Science, Machine Learning

Background

For my capstone project, I’m diving into the world of emergency response by analyzing 911 call data from Montgomery County, Pennsylvania, available on Kaggle. This dataset captures essential details like call locations, times, and types of emergencies—from accidents to back pains. My goal? To uncover patterns and insights that could help improve response strategies and resource allocation for first responders, ultimately making emergency services faster and more effective ¹.

Data

The data contains the following nine (9) fields and 99,942 observations:

Variable Name	Variable Type	Variable Description
lat	String	Latitude
lng	String	Longitude
desc	String	Description of the Emergency Call
zip	String	Zipcode
title	String	Title
timeStamp	String	YYYY-MM-DD HH:MM:SS
twp	String	Township
addr	String	Address
e	String	Dummy variable (always 1)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

calls = pd.read_csv("911.csv")

Data Analysis

Descriptive Analysis

We start by examining the data.

calls.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99492 entries, 0 to 99491
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   lat        99492 non-null  float64
 1   lng        99492 non-null  float64
 2   desc       99492 non-null  object 
 3   zip        86637 non-null  float64
 4   title      99492 non-null  object 
 5   timeStamp  99492 non-null  object 
 6   twp        99449 non-null  object 
 7   addr       98973 non-null  object 
 8   e          99492 non-null  int64  
dtypes: float64(3), int64(1), object(5)
memory usage: 6.8+ MB

We see that the data consists of three (9)variables with three (3) being numeric, while the rest are string variables (labeled object). The data has a 99,942 observations. It is also in order that we look into the first few rows of the data.

calls.head()

	lat	lng	desc	zip	title	timeStamp	twp	addr	e
0	40.297876	-75.581294	REINDEER CT & DEAD END; NEW HANOVER; Station ...	19525.0	EMS: BACK PAINS/INJURY	2015-12-10 17:40:00	NEW HANOVER	REINDEER CT & DEAD END	1
1	40.258061	-75.264680	BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...	19446.0	EMS: DIABETIC EMERGENCY	2015-12-10 17:40:00	HATFIELD TOWNSHIP	BRIAR PATH & WHITEMARSH LN	1
2	40.121182	-75.351975	HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...	19401.0	Fire: GAS-ODOR/LEAK	2015-12-10 17:40:00	NORRISTOWN	HAWS AVE	1
3	40.116153	-75.343513	AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...	19401.0	EMS: CARDIAC EMERGENCY	2015-12-10 17:40:01	NORRISTOWN	AIRY ST & SWEDE ST	1
4	40.251492	-75.603350	CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...	NaN	EMS: DIZZINESS	2015-12-10 17:40:01	LOWER POTTSGROVE	CHERRYWOOD CT & DEAD END	1

Finally, I run some summary statistics for numeric statistics for numeric and string variables and then visualize their relationship.

calls.describe()

	lat	lng	zip	e
count	99492.000000	99492.000000	86637.000000	99492.0
mean	40.159526	-75.317464	19237.658298	1.0
std	0.094446	0.174826	345.344914	0.0
min	30.333596	-95.595595	17752.000000	1.0
25%	40.100423	-75.392104	19038.000000	1.0
50%	40.145223	-75.304667	19401.000000	1.0
75%	40.229008	-75.212513	19446.000000	1.0
max	41.167156	-74.995041	77316.000000	1.0

calls.describe(include = "object")

	desc	title	timeStamp	twp	addr
count	99492	99492	99492	99449	98973
unique	99455	110	72577	68	21914
top	GREEN ST & E BASIN ST; NORRISTOWN; Station 30...	Traffic: VEHICLE ACCIDENT -	2015-12-10 17:40:01	LOWER MERION	SHANNONDELL DR & SHANNONDELL BLVD
freq	4	23066	8	8443	938

sns.pairplot(calls)

Basic Questions

What are the top 5 Zip Codes for 911 Calls

For emergency planning, managers could be interested in the Zip Codes or areas where most emergency calls originate from. Here is the list of the 5 hot-spots for this city.

calls['zip'].value_counts().head(5)

zip
19401.0    6979
19464.0    6643
19403.0    4854
19446.0    4748
19406.0    3174
Name: count, dtype: int64

What are the top 5 townships for emergency calls

Similarly, we can drill down further into the townships with the most emergency calls, as shown below.

calls['twp'].value_counts().head(5)

twp
LOWER MERION    8443
ABINGTON        5977
NORRISTOWN      5890
UPPER MERION    5227
CHELTENHAM      4575
Name: count, dtype: int64

How many unique title codes are there?

There are 110 unique title codes in the title column.

calls['title'].str.lower().nunique()

Creating New Features

Leading Causes of 911 Calls

In the titles column there are “Reasons/Departments” specified before the title code. These are EMS, Fire, and Traffic. I use .apply() with a custom lambda expression to create a new column called “Reason” that contains this string value.

For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS.

calls['reason'] = calls["title"].apply(lambda x: x.split(":")[0])

Based from this reason column, we can see that EMS is the most common Reason for a 911 call.

calls['reason'].value_counts()

reason
EMS        48877
Traffic    35695
Fire       14920
Name: count, dtype: int64

We can visualize this using seaborn.

sns.countplot(x = "reason", data = calls, hue = "reason", palette = "mako")
plt.title("Laeding Causes of 911 Calls")
plt.show()

Times and Dates in Pandas

In this section, we focus on the dates and times in the calls data. Specifically, there is one column titled timeStamp that contains time information. However, it is not in a date-time format. Intead, it is coded as a string.

calls["timeStamp"].head()

0    2015-12-10 17:40:00
1    2015-12-10 17:40:00
2    2015-12-10 17:40:00
3    2015-12-10 17:40:01
4    2015-12-10 17:40:01
Name: timeStamp, dtype: object

We convert this variables into a date-time object.

calls["timeStamp"] = pd.to_datetime(calls["timeStamp"])

Now the column is properly coded into a date and time object.

calls["timeStamp"].dtype

dtype('<M8[ns]')

We can now grab specific attributes from a Datetime object by calling them. For example:

time = calls['timeStamp'].iloc[0]
time.hour

NB: You can use Jupyter’s tab method to explore the various attributes you can call. Now that the timestamp column are actually DateTime objects, we use .apply() to create 3 new columns called Hour, Month, and Day of Week, based off of the timeStamp column.

calls["Hour"] = calls["timeStamp"].dt.hour
calls["Month"] = calls["timeStamp"].dt.month_name()
calls["DayOfWeek"] = calls["timeStamp"].dt.day_name()

calls.head()

	lat	lng	desc	zip	title	timeStamp	twp	addr	e	reason	Hour	Month	DayOfWeek
0	40.297876	-75.581294	REINDEER CT & DEAD END; NEW HANOVER; Station ...	19525.0	EMS: BACK PAINS/INJURY	2015-12-10 17:40:00	NEW HANOVER	REINDEER CT & DEAD END	1	EMS	17	December	Thursday
1	40.258061	-75.264680	BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...	19446.0	EMS: DIABETIC EMERGENCY	2015-12-10 17:40:00	HATFIELD TOWNSHIP	BRIAR PATH & WHITEMARSH LN	1	EMS	17	December	Thursday
2	40.121182	-75.351975	HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...	19401.0	Fire: GAS-ODOR/LEAK	2015-12-10 17:40:00	NORRISTOWN	HAWS AVE	1	Fire	17	December	Thursday
3	40.116153	-75.343513	AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...	19401.0	EMS: CARDIAC EMERGENCY	2015-12-10 17:40:01	NORRISTOWN	AIRY ST & SWEDE ST	1	EMS	17	December	Thursday
4	40.251492	-75.603350	CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S...	NaN	EMS: DIZZINESS	2015-12-10 17:40:01	LOWER POTTSGROVE	CHERRYWOOD CT & DEAD END	1	EMS	17	December	Thursday

We now use seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column.

sns.countplot(x = "DayOfWeek", data = calls, hue = "reason")

We see that EMS and Traffic emergency calls are the highest throughout the month with notable dips over the weekend. Fire incidents are relatively constant throughout the week.

Now, we do the same for months.

sns.countplot(x = "Month", data = calls, hue = "reason")

Do you notice something strange about the Plot?

You should have noticed it was missing some Months, let’s see if we can maybe fill in this information by plotting the information in another way, possibly a simple line plot that fills in the missing months, in order to do this, we’ll need to do some work with pandas…

We create a groupby object called byMonth, where you group the DataFrame by the month column and use the count() method for aggregation. Use the head() method on this returned DataFrame.

bymonth = calls.groupby("Month").count()
bymonth.head()

	lat	lng	desc	zip	title	timeStamp	twp	addr	e	reason	Hour	DayOfWeek
Month
April	11326	11326	11326	9895	11326	11326	11323	11283	11326	11326	11326	11326
August	9078	9078	9078	7832	9078	9078	9073	9025	9078	9078	9078	9078
December	7969	7969	7969	6907	7969	7969	7963	7916	7969	7969	7969	7969
February	11467	11467	11467	9930	11467	11467	11465	11396	11467	11467	11467	11467
January	13205	13205	13205	11527	13205	13205	13203	13096	13205	13205	13205	13205

We then create a simple plot off of the dataframe indicating the count of calls per month.

bymonth["lat"].plot()

Next, we create a new column called ‘Date’ that contains the date from the timeStamp column. We use apply along with the .date() method.

calls["date"] = calls["timeStamp"].dt.date

We groupby this Date column with the count() aggregate and create a plot of counts of 911 calls.

calls.groupby("date").count()["lat"].plot()
plt.title("Calls by Dates, 2015-")

Text(0.5, 1.0, 'Calls by Dates, 2015-')

Next, we recreate this plot but create 3 separate plots with each plot representing a Reason for the 911 call

calls_reasons = calls.groupby(["date", "reason"]).count().reset_index()[["date", "reason", "lat"]]

sns.lineplot(x = "date", y = "lat", data = calls_reasons, hue = "reason")
plt.title("Calls by Reason")

Text(0.5, 1.0, 'Calls by Reason')

Creating Heatmaps

Now let’s move on to creating heatmaps with seaborn and our data. We’ll first need to restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week. There are lots of ways to do this, but I would recommend trying to combine groupby with an unstack method. Reference the solutions if you get stuck on this!

calls_hour = calls.groupby(["DayOfWeek", "Hour"]).count()["date"].reset_index().pivot_table(index = "DayOfWeek", columns = "Hour", values = "date")

sns.heatmap(calls_hour, cmap = "coolwarm")
plt.title("Heatmap: Time of Day and Emergency Call Ups")

Text(0.5, 1.0, 'Heatmap: Time of Day and Emergency Call Ups')

We note that most incidents are duting working hours between 7AM and 7PM, just when people are active. Very few incidents happen during the other periods.

We also create a clustermap using this dataframe, which tells a similar strory but groups periods with roughly equal numer of emergency calls together using hierarchichal clustering.

sns.clustermap(calls_hour, cmap = "coolwarm")
plt.title("Clustermap: Time of Day and Emergency Call Ups")

Text(0.5, 1.0, 'Clustermap: Time of Day and Emergency Call Ups')

Now we create a heatmap and clustermap showing the number of incodents per month.

calls_month = calls.groupby(["Month", "Hour"]).count()["date"].reset_index().pivot_table(index = "Month", columns = "Hour", values = "date")

sns.heatmap(calls_month, cmap = "coolwarm")
plt.title("Heatmap: Time of Day and Emergency Call Ups by Month")

Text(0.5, 1.0, 'Heatmap: Time of Day and Emergency Call Ups by Month')

sns.clustermap(calls_month, cmap = "coolwarm")
plt.title("Clustermap: Time of Day and Emergency Call Ups by Month")

Text(0.5, 1.0, 'Clustermap: Time of Day and Emergency Call Ups by Month')

Conclusion

References

Muddana, A Lakshmi, and Sandhya Vinayakam. 2024. Python for Data Science. Springer.

Footnotes

You can access the data from Kagle from this link https://www.kaggle.com/mchirico/montcoalert. Note that you will need to create a Kagle account of you do not already have one.↩︎