Dataset Overview

The following data visualizations derive from a dataset entitled, “Incidents Responded to by Fire Companies” (https://data.cityofnewyork.us/Public-Safety/Incidents-Responded-to-by-Fire-Companies/tm6d-hbzd). It observes fire incidents that have occurred throughout the five boroughs (Bronx, Brooklyn, Manhattan, Staten Island and Queens) of New York City from 2013 – June 2018. The incidents in this dataset were reported by the Fire Department of New York units (FDNY) and observe fire, medical and non-fire emergencies. With over 2.5 million rows and 24 columns, this dataset provides detailed information on FDNY emergencies. The columns used to create data frames and graphs range from, incident type (description), incident dates and incident duration among other variables that were created to optimize understanding of this large dataset.

## READING IN DATA
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/ProgramData/Anaconda3/Library/plugins/platforms'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings

df = pd.read_csv("U:/NYC_Fire_Reports.csv")

Data Visualizations

The data visualizations that were created shed light on a variety of variables and fields correlated with fire, medical and non-fire emergencies the FDNY have attended to from 2013 – June 2018. Some graphs display top 100 and top 10 incident types in terms of total duration of each type, top 20 incident types and total duration, total incident duration analysis through top 10 incident types; while others observe total duration of incidents by hour and day of the week, a stacked bar plot and heatmap. Individuals viewing this data visualization set should develop a better understanding of incidents and time duration of emergencies that are reported and attended by the FDNY.


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

path = "U:/"

filename = path + "NYC_Fire_Reports.csv"

df = pd.read_csv(filename, nrows = 5)
print(df.columns)
df = pd.read_csv(filename, usecols = ['TOTAL_INCIDENT_DURATION', 'INCIDENT_TYPE_DESC', 'INCIDENT_DATE_TIME'])
df['INCIDENT_DATE_TIME'] = pd.to_datetime(df['INCIDENT_DATE_TIME'], format = '%m/%d/%Y %I:%M:%S %p')

df['Hour'] = df.INCIDENT_DATE_TIME.dt.hour
df['Day'] = df.INCIDENT_DATE_TIME.dt.day
df['Month'] = df.INCIDENT_DATE_TIME.dt.month
df['Year'] = df.INCIDENT_DATE_TIME.dt.year
df['WeekDay'] = df.INCIDENT_DATE_TIME.dt.strftime('%a')
df['MonthName'] = df.INCIDENT_DATE_TIME.dt.strftime('%b')

Frequency Analysis

Visualization 1: Frequency Analysis

This analysis provides insight on the top 100 incidents that the FDNY report to and how long the duration of each type is recorded in terms of seconds through the years. The top incident reported is about 160,000 total seconds throughout the six-year time period (2013 - June 2018), while the lowest had well below 2,000 seconds. The overall mean of the total duration of incidents is 16,083.87 seconds (4.5 hours) for the top 100 incident types. The orange coloring suggests that those incidents recorded were above the calculated mean and the red coloring represents incident types that were below the mean.

The top 10 Incident Types observe the top incident types in terms of how long the FDNY was assisting for that particular problem. The number one incident type with the longest duration of FDNY tending to the issue, is 161,886 seconds followed by, 129,466 and 121,509 seconds (1.87, 1.5 and 1.4 days). The specific incident types with longest duration can be observed in the following plot. Incident types that took the longest time for the FDNY to attend to throughout a six-year period, can be indicative of the frequency of these issues or the severity of them.

x = df.groupby(['INCIDENT_TYPE_DESC']).agg({'INCIDENT_TYPE_DESC':['count'], 'TOTAL_INCIDENT_DURATION':['sum', 'mean']}).reset_index()

x.columns = ['IncidentType', 'Count', 'IncidentDuration', 'AverageDuration']

x = x.sort_values('Count', ascending=False)

x.reset_index(inplace=True, drop=True)

def pick_colors_according_to_mean_count(this_data):
    colors=[]
    avg = this_data.Count.mean()
    for each in this_data.Count:
        if each > avg*1.01:
            colors.append('orange')
        elif each< avg*0.99:
            colors.append('red')
        else:
            colors.append('lightpink')
    return colors
    
import matplotlib.patches as mpatches
bottom1 = 1
top1 = 100
d1 = x.loc[bottom1:top1]
my_colors1 = pick_colors_according_to_mean_count(d1)

bottom2 = 1

top2 = 10
d2 = x.loc[bottom2:top2]
my_colors2 = pick_colors_according_to_mean_count(d2)

Above = mpatches.Patch(color = 'orange', label= 'Above Average')
At = mpatches.Patch(color = 'lightpink', label= 'Within 1% of the Average')
Below = mpatches.Patch(color = 'red', label= 'Below Average')

my_colors1
fig = plt.figure(figsize=(18,10))
fig.suptitle('Frequency of Incident Type Analysis & Incident Duration:\n Top '+ str(top1) + ' and Top ' +str(top2),
             fontsize=18, fontweight='bold')

ax1 = fig.add_subplot(2, 1, 1)
ax1.bar(d1.IncidentType, d1.Count, label= 'Count', color=my_colors1)
#ax1.legend(fontsize=14)
ax1.legend(handles= [Above, At, Below],fontsize=14)
plt.axhline(d1.Count.mean(), color='black', linestyle='dashed')
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.axes.xaxis.set_visible(False)
ax1.set_title('Top ' + str(top1) + ' Incident Types', size = 20)
ax1.text(top1-10, d1.Count.mean()+5, 'Mean = ' + str(d1.Count.mean()), rotation= 0, fontsize=14)

ax2 = fig.add_subplot(2, 1, 2)
ax2.bar(d2.IncidentType, d2.Count, label= 'Count', color=my_colors2)
ax2.legend(fontsize=14)
ax2.legend(handles= [Above, At, Below],fontsize=14)
plt.axhline(d2.Count.mean(), color='black', linestyle='dashed')
ax2.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax2.axes.xaxis.set_visible(False)
ax2.set_title('Top ' + str(top2) + ' Incident Types', size = 20)
ax2.text(top2-1, d2.Count.mean()+5, 'Mean = ' + str(d2.Count.mean()), rotation= 0, fontsize=14)

fig.subplots_adjust(hspace = 0.3)

plt.show()

Bar Chart Analysis

Visualization 2: Bar Chart Analysis

As indicated in the previous analysis, the longest duration of the FDNY attending to incidents is about 161,000 seconds throughout a six-year time period, or 1.86 days. The top incident with the longest duration accounted from 2013 – June 2018 is referred to as a 651 by the FDNY or, “smoke scare, odor of smoke”. It is likely that this is the top incident with the longest duration because this is a common occurrence for the FDNY to deal with. A 353 or, removal of victim(s) from stalled elevator is the second longest time duration for the FDNY. While a 651 likely refers to a common occurrence for the FDNY, a 353 is possibly considered a more severe incident that prompt the FDNY to take longer time attending to. The average of the top 20 incidents in terms of total duration throughout six years, is 65,978.45 seconds, which is roughly 18.32 hours.

bottom3 = 1
top3    = 20
d3 = x.loc[bottom3:top3]
d3 = d3.sort_values('Count', ascending=True)
d3.reset_index(inplace=True, drop=True)
my_colors3 = pick_colors_according_to_mean_count(d3)

Above = mpatches.Patch(color = 'orange', label= 'Above Average')
At = mpatches.Patch(color = 'lightpink', label= 'Within 1% of the Average')
Below = mpatches.Patch(color = 'red', label= 'Below Average')

fig = plt.figure(figsize=(36,18))
ax1 = fig.add_subplot(1, 1, 1)
ax1.barh(d3.IncidentType, d3.Count, color=my_colors3)
for row_counter, value_at_row_counter in enumerate(d3.Count):
    if value_at_row_counter > d3.Count.mean()*1.01:
        color = 'orange'
    elif value_at_row_counter < d3.Count.mean()*0.99:
        color = 'red'
    else:
        color = 'lightpink'
    ax1.text(value_at_row_counter+2, row_counter, str(value_at_row_counter), color='black', size=22, fontweight='bold',
            ha='left', va='center')
plt.xlim(0, d3.Count.max()*1.1)
ax1.legend(loc='lower right', handles=[Above, At, Below], fontsize=22)
plt.axvline(d3.Count.mean(), color='black', linestyle='dashed')
ax1.text(d3.Count.mean()+4, 0, 'Mean = ' + str(d3.Count.mean()), rotation=0, fontsize=22)

ax1.set_title('Top ' + str(top3) + ' Incident Types by Total Duration', size= 30, fontweight='bold')
ax1.set_xlabel('Incident Duration Count', fontsize=26)
ax1.set_ylabel('Incident Type', fontsize=26)
plt.xticks(fontsize=22)
plt.yticks(fontsize=15)
plt.show()

Comparative Analysis

Visualization 3: Comparative Analysis

The following analysis displays a comparison between incident duration and average duration for the top 10 incident types. These top 10 incident types account for the longest time periods the FDNY attended to in terms of seconds. The Incident Duration Count sums up the total time the FDNY helped with an emergency throughout a six-year time period. The average duration observes the mean of the time spent on a specific incident type. As indicated in the previous graph, a 535 – smoke scare, odor of smoke accounts the longest time in terms of seconds that fire departments helped with. The average duration for this incident type is 1,123.86 seconds or 18.73 minutes per incident. A 322 – motor vehicle accident with injuries is the tenth longest duration for the FDNY to tend to at 49,389 seconds or 13.71 hours. While this may be the shortest duration in terms of top 10 incident types and total duration, it has the highest average duration of 2,056.67 seconds or 34.28 minutes per incident. This can reflect that 322’s does not occur as often as other incident types, however they can be the most severe as the FDNY spends the most time tending to these incidents compared to other incident types.

def autolabel(these_bars, this_ax, place_of_decimals, symbol):
    for each_bar in these_bars:
        height = each_bar.get_height()
        this_ax.text(each_bar.get_x()+each_bar.get_width()/2, height*1.01, symbol+format(height, place_of_decimals),
                    fontsize=11, color='black', ha='center', va='bottom')

fig = plt.figure(figsize=(20,24))
ax1 = fig.add_subplot(1, 1, 1)
ax2 = ax1.twinx()
bar_width = 0.4

x_pos = np.arange(10)
count_bars = ax1.bar(x_pos-(0.5*bar_width), d2.Count, bar_width, color='lightgray',
                     edgecolor='black', label='Incident Duration Count')
aver_duration_bars = ax2.bar(x_pos+(0.5*bar_width), d2.AverageDuration, bar_width,                        color='orange',edgecolor='black', label='Average Duration')

ax1.set_xlabel('Incident Type', fontsize=20)
ax1.set_ylabel('Incident Duration Count', fontsize=20, labelpad=20)
ax2.set_ylabel('Average Duration', fontsize=22, rotation=270, labelpad=22)
ax1.tick_params(axis='y', labelsize=16)
ax2.tick_params(axis='y', labelsize=16)

plt.title('Total Incident Duration and Average Duration Analysis\n Top 10 Incident Types', fontsize=26, fontweight='bold')
ax1.set_xticks(x_pos)
ax1.set_xticklabels(d2.IncidentType, fontsize=14, rotation=270)
count_color, count_label = ax1.get_legend_handles_labels()
duration_color, duration_label = ax2.get_legend_handles_labels()
legend = ax1.legend(count_color + duration_color, count_label + duration_label, loc='upper left', frameon=True,
                   ncol=1, shadow=True, borderpad=1, fontsize=20)
ax1.set_ylim(0,d1.Count.max()*1.50)
autolabel(count_bars, ax1, '.0f', '')
autolabel(aver_duration_bars, ax2, '.2f', '')

plt.show()

Line Chart

Visualization 4: Total Duration of Incidents by Hour & Day

This line chart reflects the total duration (by seconds) of each incident by hour of the day and day of the week. At 19:00 or 7:00 PM, there is a clear spike in total time spent on incidents on Sundays. There is another spike at around 10:00 AM on Mondays in which the FDNY assists with emergencies. From Monday to Sunday, the least activity or duration of FDNY assistance is between 12:00 AM and 5:00 AM. Afterwards, occurrence and time duration for emergencies increase steadily and begin to decrease around 20:00 or 8:00 PM.

duration_df = df.groupby(['Hour', 'WeekDay'])['TOTAL_INCIDENT_DURATION'].sum().reset_index(name='TotalDuration')
duration_df
from matplotlib.ticker import FuncFormatter

fig = plt.figure(figsize = (18, 10))
ax = fig.add_subplot(1, 1, 1)

my_colors = {'Mon':'red',
            'Tue': 'gold',
            'Wed': 'gray',
            'Thu': 'blue',
            'Fri': 'purple',
            'Sat': 'orange',
            'Sun': 'green'}

for key, grp in duration_df.groupby(['WeekDay']):
    grp.plot(ax=ax, kind='line', x='Hour', y = 'TotalDuration', color=my_colors[key], label=key, marker='8')

plt.title('Total Duration of Incidents by Hour (from 2013 - June 2018)', fontsize=20, fontweight='bold')
ax.set_xlabel('Hour (24 Hour Interval)', fontsize=18)
ax.set_ylabel('Total Duration (by 100 Million Seconds)', fontsize= 18, labelpad=20)
ax.tick_params(axis='x', labelsize=14, rotation=0)
ax.tick_params(axis='y', labelsize=14, rotation=0)

ax.set_xticks(np.arange(24))
handles, labels = ax.get_legend_handles_labels()
handles = [ handles[1], handles[5], handles[6], handles[4], handles[0], handles[2], handles[3]  ]
labels = [ labels[1], labels[5], labels[6], labels[4], labels[0], labels[2], labels[3] ]
plt.legend(handles, labels, loc='best', fontsize=16, ncol=1)

ax.yaxis.set_major_formatter( FuncFormatter( lambda x, pos:('%1.1fM')%(x*1e-8)))

plt.show()

Stacked Bar Chart

Visualization 5: Total Duration of Incidents by Hour & Day

This bar chart reinforces evidence from previous line graph in which 19:00 or 7:00 PM has an obvious spike in total time spent tending to emergencies. It also displays once again that Sundays have the longest total duration for emergencies by the FDNY. As suggested in this bar chart and the line chart, between the early mornings of 12:00 AM and 5:00 PM less time is spent on emergencies until it begins to increase steadily around 6:00 AM throughout the week. Perhaps this is because emergencies are less likely to occur in the early mornings.

stacked_df = df.groupby(['Hour', 'WeekDay'])['TOTAL_INCIDENT_DURATION'].sum().reset_index(name='TotalDuration')

stacked_df= stacked_df.pivot(index='Hour', columns='WeekDay', values='TotalDuration')

stacked_df
day_order = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
stacked_df = stacked_df.reindex(columns=reversed(day_order))
stacked_df
from matplotlib.ticker import FuncFormatter

fig = plt.figure(figsize=(18,10))
ax= fig.add_subplot(1, 1, 1)

stacked_df.plot(kind='bar', stacked=True, ax=ax)

plt.ylabel('Total Duration of Fire Incidents (by 100 Million Seconds)', fontsize=18, labelpad=10)
plt.title('Total Duration of Fire Incidents by Hour and by Day \n Stacked Bar Plot', fontsize=18, fontweight='bold')
plt.xticks(rotation=0, horizontalalignment = 'center', fontsize=14)
plt.yticks(fontsize=14)
ax.set_xlabel('Hour (24 Hour Interval)', fontsize=18)

handles, labels = ax.get_legend_handles_labels()
handles = [ handles[6], handles[5], handles[4], handles[3], handles[2], handles[1], handles[0] ]
labels = [ labels[6], labels[5], labels[4], labels[3], labels[2], labels[1], labels[0] ]
plt.legend(handles, labels, loc= 'best', fontsize=14)

ax.yaxis.set_major_formatter(FuncFormatter( lambda x, pos: ('%1.1fM')%(x*1e-8)))

plt.show()

Heatmap

Visualization 6: Heatmap of Total Duration of Fire Incidents by Year & Month (from 2013 - June 2018)

The final chart that was generated for this data visualization set observes the total duration of fire incidents by year and month from 2013 – June 2018. Months are labeled 1 – 12 and represent the months of January, February, March, April, May, June, July, August, September, October, November and December respectively. The more pigmented a square is, suggests a longer time spent on emergencies in certain months. For instance, the month of January (1) in 2018 shows that the FDNY spent the most amount of time tending to emergencies with 49,097 seconds (13.64 hours) dedicated to this month in that pertained year. This suggests that there were more emergencies that the FDNY needed to assist with, or certain emergencies took longer and were more severe compared to other years and months. The month and year with the least total duration of fire incidents is February (2) of 2013 at 30,194 seconds (8.39 hours). Perhaps less incidents occurred, or less severe emergencies were reported in February of 2013. Overall, this heatmap provides insight on total seconds the FDNY spent with every incident type by month and year in an organized, clear format.

Note: Dataset stops after June 2018

heatmap = df.groupby(['Year', 'Month']).agg('count')['TOTAL_INCIDENT_DURATION'].to_frame(name = 'count').reset_index()
heatmap = heatmap[ heatmap['Year'].notna() & heatmap['Month'].notna()]

heatmap
heatmap_df= pd.pivot_table(heatmap, index='Year', columns='Month', values='count')
heatmap_df
import seaborn as sns
from matplotlib.ticker import FuncFormatter

fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1, 1, 1)

comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))

ax = sns.heatmap(heatmap_df, linewidth = 0.2, annot = True, cmap = 'OrRd', fmt=',.0f',
                 square = True, annot_kws={'size': 11},
                 cbar_kws = {'format': comma_fmt, 'orientation': 'vertical'})

plt.title('Heatmap of Total Duration of Fire Incidents by Year & Month (from 2013 - June 2018)', fontsize=18, fontweight='bold', pad=15)
plt.xlabel('Month', fontsize=18, labelpad=10)
plt.ylabel('Year', fontsize=18, labelpad=10)
plt.yticks(rotation=0, size=14)
plt.xticks(size=14)
ax.invert_yaxis()

cbar = ax.collections[0].colorbar

plt.show()

Summary of Data Visualizations

In conclusion, this report analyzed FDNY emergencies in New York City from 2013 – June 2018 throughout the five boroughs. The top incident with the longest time spent tending to the emergency is a 651 – smoke scare, odor of smoke at about 161,000 seconds or 1.86 days while the lowest had well below 2,000 seconds (33 minutes). The overall mean of the total duration of incidents is 16,083.87 seconds (4.5 hours) for the top 100 incident types. The average of the top 20 incidents in terms of total duration throughout six years, is 65,978.45 seconds, which is roughly 18.32 hours. A 322 – motor vehicle accident with injuries is the tenth longest duration for the FDNY to tend to at 49,389 seconds or 13.71 hours. It also has the highest average duration at 2,056.67 seconds or 34.28 minutes per incident which can suggest that these occurrences are more severe than other types. At 19:00 or 7:00 PM, there is a clear spike in total time spent on incidents on Sundays. Also, between the early mornings of 12:00 AM and 5:00 PM less time is spent on emergencies until it begins to increase steadily around 6:00 AM throughout the week. Finally, January 2018 has the longest time spent per emergency while February 2013 has the shortest time dedicated which can indicate frequency and severity of incident types.