import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Going Behind the Call: Uncovering Patterns in 911 Emergencies Using Python and Pandas
Independent Data Analysis Project
In this analysis, we leveraged Python and Pandas to uncover key insights into 911 emergency call patterns. Our findings revealed that medical emergencies (EMS) are the leading cause of calls, with peak call times around 7 AM and 7 PM, likely linked to periods of high activity. Interestingly, January saw the highest volume of emergency calls, while December experienced a drop, perhaps due to holiday festivities. These insights underscore the value of data analysis in understanding and anticipating community needs, helping to better allocate resources and improve emergency response strategies.
Data analysis, Python, Pandas, Seaborn, Numpy, Descriptive Analysis, Data Science, Machine Learning
Background
For my capstone project, I’m diving into the world of emergency response by analyzing 911 call data from Montgomery County, Pennsylvania, available on Kaggle. This dataset captures essential details like call locations, times, and types of emergencies—from accidents to back pains. My goal? To uncover patterns and insights that could help improve response strategies and resource allocation for first responders, ultimately making emergency services faster and more effective 1.
Data
The data contains the following nine (9) fields and 99,942 observations:
Variable Name | Variable Type | Variable Description |
---|---|---|
lat | String | Latitude |
lng | String | Longitude |
desc | String | Description of the Emergency Call |
zip | String | Zipcode |
title | String | Title |
timeStamp | String | YYYY-MM-DD HH:MM:SS |
twp | String | Township |
addr | String | Address |
e | String | Dummy variable (always 1) |
= pd.read_csv("911.csv") calls
Data Analysis
Descriptive Analysis
We start by examining the data.
calls.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99492 entries, 0 to 99491
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 lat 99492 non-null float64
1 lng 99492 non-null float64
2 desc 99492 non-null object
3 zip 86637 non-null float64
4 title 99492 non-null object
5 timeStamp 99492 non-null object
6 twp 99449 non-null object
7 addr 98973 non-null object
8 e 99492 non-null int64
dtypes: float64(3), int64(1), object(5)
memory usage: 6.8+ MB
We see that the data consists of three (9)variables with three (3) being numeric, while the rest are string variables (labeled object). The data has a 99,942 observations. It is also in order that we look into the first few rows of the data.
calls.head()
lat | lng | desc | zip | title | timeStamp | twp | addr | e | |
---|---|---|---|---|---|---|---|---|---|
0 | 40.297876 | -75.581294 | REINDEER CT & DEAD END; NEW HANOVER; Station ... | 19525.0 | EMS: BACK PAINS/INJURY | 2015-12-10 17:40:00 | NEW HANOVER | REINDEER CT & DEAD END | 1 |
1 | 40.258061 | -75.264680 | BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP... | 19446.0 | EMS: DIABETIC EMERGENCY | 2015-12-10 17:40:00 | HATFIELD TOWNSHIP | BRIAR PATH & WHITEMARSH LN | 1 |
2 | 40.121182 | -75.351975 | HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St... | 19401.0 | Fire: GAS-ODOR/LEAK | 2015-12-10 17:40:00 | NORRISTOWN | HAWS AVE | 1 |
3 | 40.116153 | -75.343513 | AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;... | 19401.0 | EMS: CARDIAC EMERGENCY | 2015-12-10 17:40:01 | NORRISTOWN | AIRY ST & SWEDE ST | 1 |
4 | 40.251492 | -75.603350 | CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S... | NaN | EMS: DIZZINESS | 2015-12-10 17:40:01 | LOWER POTTSGROVE | CHERRYWOOD CT & DEAD END | 1 |
Finally, I run some summary statistics for numeric statistics for numeric and string variables and then visualize their relationship.
calls.describe()
lat | lng | zip | e | |
---|---|---|---|---|
count | 99492.000000 | 99492.000000 | 86637.000000 | 99492.0 |
mean | 40.159526 | -75.317464 | 19237.658298 | 1.0 |
std | 0.094446 | 0.174826 | 345.344914 | 0.0 |
min | 30.333596 | -95.595595 | 17752.000000 | 1.0 |
25% | 40.100423 | -75.392104 | 19038.000000 | 1.0 |
50% | 40.145223 | -75.304667 | 19401.000000 | 1.0 |
75% | 40.229008 | -75.212513 | 19446.000000 | 1.0 |
max | 41.167156 | -74.995041 | 77316.000000 | 1.0 |
= "object") calls.describe(include
desc | title | timeStamp | twp | addr | |
---|---|---|---|---|---|
count | 99492 | 99492 | 99492 | 99449 | 98973 |
unique | 99455 | 110 | 72577 | 68 | 21914 |
top | GREEN ST & E BASIN ST; NORRISTOWN; Station 30... | Traffic: VEHICLE ACCIDENT - | 2015-12-10 17:40:01 | LOWER MERION | SHANNONDELL DR & SHANNONDELL BLVD |
freq | 4 | 23066 | 8 | 8443 | 938 |
sns.pairplot(calls)
Basic Questions
What are the top 5 Zip Codes for 911 Calls
For emergency planning, managers could be interested in the Zip Codes or areas where most emergency calls originate from. Here is the list of the 5 hot-spots for this city.
'zip'].value_counts().head(5) calls[
zip
19401.0 6979
19464.0 6643
19403.0 4854
19446.0 4748
19406.0 3174
Name: count, dtype: int64
What are the top 5 townships for emergency calls
Similarly, we can drill down further into the townships with the most emergency calls, as shown below.
'twp'].value_counts().head(5) calls[
twp
LOWER MERION 8443
ABINGTON 5977
NORRISTOWN 5890
UPPER MERION 5227
CHELTENHAM 4575
Name: count, dtype: int64
How many unique title codes are there?
There are 110 unique title codes in the title
column.
'title'].str.lower().nunique() calls[
110
Creating New Features
Leading Causes of 911 Calls
In the titles column there are “Reasons/Departments” specified before the title code. These are EMS, Fire, and Traffic. I use .apply() with a custom lambda expression to create a new column called “Reason” that contains this string value.
For example, if the title column value is EMS: BACK PAINS/INJURY , the Reason column value would be EMS.
'reason'] = calls["title"].apply(lambda x: x.split(":")[0]) calls[
Based from this reason
column, we can see that EMS is the most common Reason for a 911 call.
'reason'].value_counts() calls[
reason
EMS 48877
Traffic 35695
Fire 14920
Name: count, dtype: int64
We can visualize this using seaborn.
= "reason", data = calls, hue = "reason", palette = "mako")
sns.countplot(x "Laeding Causes of 911 Calls")
plt.title( plt.show()
Times and Dates in Pandas
In this section, we focus on the dates and times in the calls data. Specifically, there is one column titled timeStamp
that contains time information. However, it is not in a date-time format. Intead, it is coded as a string.
"timeStamp"].head() calls[
0 2015-12-10 17:40:00
1 2015-12-10 17:40:00
2 2015-12-10 17:40:00
3 2015-12-10 17:40:01
4 2015-12-10 17:40:01
Name: timeStamp, dtype: object
We convert this variables into a date-time object.
"timeStamp"] = pd.to_datetime(calls["timeStamp"]) calls[
Now the column is properly coded into a date and time object.
"timeStamp"].dtype calls[
dtype('<M8[ns]')
We can now grab specific attributes from a Datetime object by calling them. For example:
= calls['timeStamp'].iloc[0]
time time.hour
17
NB: You can use Jupyter’s tab method to explore the various attributes you can call. Now that the timestamp column are actually DateTime objects, we use .apply() to create 3 new columns called Hour, Month, and Day of Week, based off of the timeStamp column.
"Hour"] = calls["timeStamp"].dt.hour
calls["Month"] = calls["timeStamp"].dt.month_name()
calls["DayOfWeek"] = calls["timeStamp"].dt.day_name() calls[
calls.head()
lat | lng | desc | zip | title | timeStamp | twp | addr | e | reason | Hour | Month | DayOfWeek | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 40.297876 | -75.581294 | REINDEER CT & DEAD END; NEW HANOVER; Station ... | 19525.0 | EMS: BACK PAINS/INJURY | 2015-12-10 17:40:00 | NEW HANOVER | REINDEER CT & DEAD END | 1 | EMS | 17 | December | Thursday |
1 | 40.258061 | -75.264680 | BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP... | 19446.0 | EMS: DIABETIC EMERGENCY | 2015-12-10 17:40:00 | HATFIELD TOWNSHIP | BRIAR PATH & WHITEMARSH LN | 1 | EMS | 17 | December | Thursday |
2 | 40.121182 | -75.351975 | HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St... | 19401.0 | Fire: GAS-ODOR/LEAK | 2015-12-10 17:40:00 | NORRISTOWN | HAWS AVE | 1 | Fire | 17 | December | Thursday |
3 | 40.116153 | -75.343513 | AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;... | 19401.0 | EMS: CARDIAC EMERGENCY | 2015-12-10 17:40:01 | NORRISTOWN | AIRY ST & SWEDE ST | 1 | EMS | 17 | December | Thursday |
4 | 40.251492 | -75.603350 | CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S... | NaN | EMS: DIZZINESS | 2015-12-10 17:40:01 | LOWER POTTSGROVE | CHERRYWOOD CT & DEAD END | 1 | EMS | 17 | December | Thursday |
We now use seaborn to create a countplot of the Day of Week column with the hue based off of the Reason column.
= "DayOfWeek", data = calls, hue = "reason") sns.countplot(x
We see that EMS and Traffic emergency calls are the highest throughout the month with notable dips over the weekend. Fire incidents are relatively constant throughout the week.
Now, we do the same for months.
= "Month", data = calls, hue = "reason") sns.countplot(x
Do you notice something strange about the Plot?
You should have noticed it was missing some Months, let’s see if we can maybe fill in this information by plotting the information in another way, possibly a simple line plot that fills in the missing months, in order to do this, we’ll need to do some work with pandas…
We create a groupby object called byMonth, where you group the DataFrame by the month column and use the count() method for aggregation. Use the head() method on this returned DataFrame.
= calls.groupby("Month").count()
bymonth bymonth.head()
lat | lng | desc | zip | title | timeStamp | twp | addr | e | reason | Hour | DayOfWeek | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Month | ||||||||||||
April | 11326 | 11326 | 11326 | 9895 | 11326 | 11326 | 11323 | 11283 | 11326 | 11326 | 11326 | 11326 |
August | 9078 | 9078 | 9078 | 7832 | 9078 | 9078 | 9073 | 9025 | 9078 | 9078 | 9078 | 9078 |
December | 7969 | 7969 | 7969 | 6907 | 7969 | 7969 | 7963 | 7916 | 7969 | 7969 | 7969 | 7969 |
February | 11467 | 11467 | 11467 | 9930 | 11467 | 11467 | 11465 | 11396 | 11467 | 11467 | 11467 | 11467 |
January | 13205 | 13205 | 13205 | 11527 | 13205 | 13205 | 13203 | 13096 | 13205 | 13205 | 13205 | 13205 |
We then create a simple plot off of the dataframe indicating the count of calls per month.
"lat"].plot() bymonth[
Next, we create a new column called ‘Date’ that contains the date from the timeStamp column. We use apply along with the .date() method.
"date"] = calls["timeStamp"].dt.date calls[
We groupby this Date column with the count() aggregate and create a plot of counts of 911 calls.
"date").count()["lat"].plot()
calls.groupby("Calls by Dates, 2015-") plt.title(
Text(0.5, 1.0, 'Calls by Dates, 2015-')
Next, we recreate this plot but create 3 separate plots with each plot representing a Reason for the 911 call
= calls.groupby(["date", "reason"]).count().reset_index()[["date", "reason", "lat"]] calls_reasons
= "date", y = "lat", data = calls_reasons, hue = "reason")
sns.lineplot(x "Calls by Reason") plt.title(
Text(0.5, 1.0, 'Calls by Reason')
Creating Heatmaps
Now let’s move on to creating heatmaps with seaborn and our data. We’ll first need to restructure the dataframe so that the columns become the Hours and the Index becomes the Day of the Week. There are lots of ways to do this, but I would recommend trying to combine groupby with an unstack method. Reference the solutions if you get stuck on this!
= calls.groupby(["DayOfWeek", "Hour"]).count()["date"].reset_index().pivot_table(index = "DayOfWeek", columns = "Hour", values = "date")
calls_hour
= "coolwarm")
sns.heatmap(calls_hour, cmap "Heatmap: Time of Day and Emergency Call Ups") plt.title(
Text(0.5, 1.0, 'Heatmap: Time of Day and Emergency Call Ups')
We note that most incidents are duting working hours between 7AM and 7PM, just when people are active. Very few incidents happen during the other periods.
We also create a clustermap using this dataframe, which tells a similar strory but groups periods with roughly equal numer of emergency calls together using hierarchichal clustering.
= "coolwarm")
sns.clustermap(calls_hour, cmap "Clustermap: Time of Day and Emergency Call Ups") plt.title(
Text(0.5, 1.0, 'Clustermap: Time of Day and Emergency Call Ups')
Now we create a heatmap and clustermap showing the number of incodents per month.
= calls.groupby(["Month", "Hour"]).count()["date"].reset_index().pivot_table(index = "Month", columns = "Hour", values = "date")
calls_month
= "coolwarm")
sns.heatmap(calls_month, cmap "Heatmap: Time of Day and Emergency Call Ups by Month") plt.title(
Text(0.5, 1.0, 'Heatmap: Time of Day and Emergency Call Ups by Month')
= "coolwarm")
sns.clustermap(calls_month, cmap "Clustermap: Time of Day and Emergency Call Ups by Month") plt.title(
Text(0.5, 1.0, 'Clustermap: Time of Day and Emergency Call Ups by Month')
Conclusion
In this analysis, we leveraged Python and Pandas to uncover key insights into 911 emergency call patterns. Our findings revealed that medical emergencies (EMS) are the leading cause of calls, with peak call times around 7 AM and 7 PM, likely linked to periods of high activity. Interestingly, January saw the highest volume of emergency calls, while December experienced a drop, perhaps due to holiday festivities. These insights underscore the value of data analysis in understanding and anticipating community needs, helping to better allocate resources and improve emergency response strategies (Muddana and Vinayakam 2024).
References
Footnotes
You can access the data from Kagle from this link https://www.kaggle.com/mchirico/montcoalert. Note that you will need to create a Kagle account of you do not already have one.↩︎