import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
import pandas as pd
import warnings
import seaborn as sns
from matplotlib.ticker import FuncFormatter
import plotly.graph_objs as go 
from plotly.offline import init_notebook_mode,iplot,plot
init_notebook_mode(connected=True)


import pandas as pd

warnings.filterwarnings('ignore')

path = "/Users/sydneywhitaker/Documents/module2/human_stampedes.csv"
df = pd.read_csv(path)

Human Stampede Data

Description of Dataset

The name of this data set is human_stampedes.csv. It was obtained from Kaggle. It is a report every human stampede worldwide since 1807. Country, description of event, number of deaths and location are included in the data.

Objective

What are the trends among human stampedes? We can ask questions like: Do stampedes occur more in warm or cool climate countries? Which continent has the most stampedes? Which country has had the most stampede deaths?

Findings

Deaths by Year

Below is a scatter plot showing all of the human stampedes and which year they occurred in.

df['Date'] = pd.to_datetime(df['Date'], format = '%Y-%m-%d' )

df['Year'] = df['Date'].dt.year
df['Quarter'] = df['Date'].dt.quarter
df['Month'] = df['Date'].dt.month
df['DayOfTheWeek'] = df['Date'].dt.dayofweek

death_counts = df['Number of Deaths'].value_counts()
year_counts = df ['Year'].value_counts()

plt.figure(figsize = (14, 10))
x = (df['Year'])
y= (df['Number of Deaths'])
plt.scatter(x, y, marker = 'o', c = "olive", 
             edgecolors = 'black')
plt.title('Human Stampedes, Number of Deaths by Year', fontsize = 18)
plt.xlabel("Year")
plt.ylabel("Number of Deaths")

current_values = plt.gca().get_yticks()
plt.gca().set_yticklabels(['{:,.0f}'.format(x) for x in current_values])

plt.show()

Deaths by Country

Below is a line graph showing all of the human stampede deaths worldwide since 1807 by country.

#BAR GRAPH
Country2 = ['Country' + str(i) for i in list(np.random.randint(1, 15, 30))]
TotalDeath = np.random.randint(30, 200, 30)
df2 = pd.DataFrame({'Country': Country2, 'Number of Deaths': TotalDeath})

df_1 = df.groupby(['Country'])['Number of Deaths'].sum().reset_index()

plt.figure(figsize = (20, 10))
df_sorted =(df_1.sort_values('Number of Deaths', ascending=False))
plt.bar('Country', 'Number of Deaths', data=df_sorted, color='olive')

plt.title('Total Number of Deaths by Human Stampede by Country', 
        fontsize = 20)
plt.ylabel('Total Number of Deaths')
plt.xticks(rotation=90)

plt.xlabel('Country')

#to get commas on the axis
current_values = plt.gca().get_yticks()
plt.gca().set_yticklabels(['{:,.0f}'.format(x) for x in current_values])

plt.show()

Stampedes by Month

The Donut chart below displays the number of stampedes worldwide since 1807. It shows the frequency of stampedes occuring in each month along with a total count of stampedes in the center of the graph.

from matplotlib import colormaps
list(colormaps)

df['Date'] = pd.to_datetime(df['Date'], format = '%Y-%m-%d' )
df['Month'] = df['Date'].dt.month

pie_df = df.groupby(['Month'])['Event ID'].count().reset_index(name = 'TotalStampedes')


fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(1,1,1)


all_stamps = pie_df.TotalStampedes.sum()
colors = ['#fce8f5','#92ab95','#e7eaae','#fbb05a','#c8e6ee','#bfd2b9','#c5b9d2','#eeb6ba', '#efe38a','#eb9494','#d9eb94','#ffcc99']
labels = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December' ]

pie_df.groupby(['Month'])['TotalStampedes'].sum().plot(
        kind = 'pie', radius = 1, colors = colors, pctdistance = 0.85, labeldistance = 1.05, 
        wedgeprops = dict(edgecolor = 'white'), textprops = {'fontsize': 12 }, labels=labels,  autopct=lambda p : '{:.2f}%  ({:,.0f})'.format(p,p * (all_stamps)/100),
        startangle = 90)

hole = plt.Circle((0,0), 0.3, fc = 'white')
fig3 = plt.gcf()
fig3.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Total Number of Stampedes by Month', fontsize = 12)


ax.text(0, 0, 'Total Number of Stampedes:\n'  + str(round(all_stamps)), size = 12, ha = 'center', va = 'center')
ax.axis('equal')

plt.tight_layout()
plt.show()

Deaths by Country Map

The visual below shows the number of deaths by human stmapede by country. This graph is interactive, it allows you to zoom in on countries and hover to see their total death count.




map_df = df.groupby(['Country'])['Number of Deaths'].sum().reset_index(name = 'NumDeaths')
map_df= map_df.sort_values('NumDeaths', ascending=False)
data = dict(
        type = 'choropleth',
        colorscale = 'agsunset',
        locations = df['Country'],
        locationmode = "country names",
        z = map_df['NumDeaths'],
        text = df['Country'],
        colorbar = {'title' : 'Total Number of Stampede Deaths'},
      )

layout = dict(title = 'Stampede Deaths Worldwide Since 1807 ',
              geo = dict(projection = {'type':'mercator'})
             )
choromap = go.Figure(data = [data],layout = layout)
iplot(choromap,validate=False)

choromap.write_html('/Users/sydneywhitaker/Documents/map.html', auto_open=False)

Number of Deaths: US, China, India

The graph below shows the count of deaths by stampede in the United States, India and China.

line_df1 = map_df = df.groupby(['Country'])['Number of Deaths'].sum().reset_index()
line_df1 = df.loc[df['Country'] == 'India', :]

fig = plt.figure(figsize = (20, 10))

x1 = line_df1['Year']
y1 = line_df1['Number of Deaths']


plt.plot(x1, y1, label = "India", linewidth = 2, markersize = 12)


line_df2 = map_df = df.groupby(['Country'])['Number of Deaths'].sum().reset_index()
line_df2 = df.loc[df['Country'] == 'United States', :]
x2 = line_df2['Year']
y2 = line_df2['Number of Deaths']


plt.plot(x2, y2, label = "United States", linewidth = 2, markersize = 12)


line_df3 = map_df = df.groupby(['Country'])['Number of Deaths'].sum().reset_index()
line_df3 = df.loc[df['Country'] == 'China', :]
x3 = line_df3['Year']
y3 = line_df3['Number of Deaths']


plt.plot(x3, y3, label = "China", linewidth = 2, markersize = 12)


plt.xlabel('Year')  # add X-axis label 
plt.ylabel("Number of Deaths")  # add Y-axis label 
plt.title("Deaths by Stampede in the United States, China and India", fontsize = 18 )  # add title 
plt.legend( prop={"size":12})

plt.ylim(0,)

plt.show()

Wrap up

Studying stampede data is important for several reasons, particularly in the context of large crowds or gatherings. Here are some key points:

Public Safety: Understanding the dynamics of stampedes is crucial for ensuring the safety of people attending events or crowded places. By analyzing stampede data, authorities can implement better crowd management strategies, emergency response plans, and infrastructure improvements to prevent and mitigate stampede incidents.

Event Planning: Organizers of large events, festivals, or gatherings can benefit from studying stampede data to optimize crowd control measures, entry and exit procedures, and overall event logistics. This helps in creating a safer and more enjoyable experience for attendees.

Emergency Preparedness: In the unfortunate event of a stampede, emergency responders can use data-driven insights to enhance their preparedness and response strategies. This may involve deploying resources strategically, coordinating evacuation procedures, and minimizing panic among crowds.

In summary, studying stampede data is essential for safeguarding public welfare, improving infrastructure, and enhancing emergency response capabilities in crowded environments. It contributes to creating more resilient and secure spaces for large gatherings.

Module 2

Sydney Whitaker

2023-11-14