In Depth Analysis of Shark Attacks

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import matplotlib.patches as mpatches
warnings.filterwarnings("ignore")

filepath = "C:/Users/txrus/Sem 6/Data Visual/Python/Deliverable/global_shark_attacks.csv"
df=pd.read_csv(filepath)

Introduction

Shark attacks, while statistically rare, have long captivated public attention and stirred both fear and fascination within beach-goers. Beyond the news headlines, these incidents offer a unique lens through which one can explore human interaction with marine environments. By analyzing available historical data on shark attacks spanning the last couple centuries, this report aims to uncover patterns in when, where, and how these attacks happened.

The analysis reveals compelling seasonal and hourly trends, suggesting that environmental and behavioral factors play a significant role in the frequency of attacks. Certain coastal regions, especially in the United States, have emerged as consistent hot spots, while, unsurprisingly, activities like surfing and swimming account for a majority of the attacks. Additionally, the data highlights stark differences in fatality rates across countries.

Through data analysis, this report sheds light on the broader context of shark attacks in order to gain a better understanding of the causes of shark attacks.

Note: This analysis is based on available data, which may not be complete.

Descriptive Statistics of the Data

The dataset consists primarily of categorical variables, meaning traditional summary statistics (such as mean or standard deviation) offer limited insight. Even numerical variables like age contain inconsistent or messy data that would require cleaning to be useful for analysis. However, since age is not used in any visualizations or graphs in this report, it will not be cleaned and will be treated as a categorical variable.

Below is a description of each variable in the dataset:

for col in df.columns:
  print(f"\n--- Description for column: {col.title()}---\n")
  df[col].describe()

## 
## --- Description for column: Date---
## 
## count           6587
## unique          5558
## top       1957-01-01
## freq              11
## Name: date, dtype: object
## 
## --- Description for column: Year---
## 
## count    6758.000000
## mean     1970.935928
## std        56.227881
## min         1.000000
## 25%      1950.000000
## 50%      1986.000000
## 75%      2009.000000
## max      2023.000000
## Name: year, dtype: float64
## 
## --- Description for column: Type---
## 
## count           6871
## unique            11
## top       Unprovoked
## freq            5065
## Name: type, dtype: object
## 
## --- Description for column: Country---
## 
## count     6839
## unique     215
## top        USA
## freq      2522
## Name: country, dtype: object
## 
## --- Description for column: Area---
## 
## count        6409
## unique        862
## top       Florida
## freq         1174
## Name: area, dtype: object
## 
## --- Description for column: Location---
## 
## count                                 6325
## unique                                4427
## top       New Smyrna Beach, Volusia County
## freq                                   192
## Name: location, dtype: object
## 
## --- Description for column: Activity---
## 
## count        6304
## unique       1553
## top       Surfing
## freq         1112
## Name: activity, dtype: object
## 
## --- Description for column: Name---
## 
## count     6670
## unique    5638
## top       male
## freq       669
## Name: name, dtype: object
## 
## --- Description for column: Sex---
## 
## count     6318
## unique       6
## top          M
## freq      5545
## Name: sex, dtype: object
## 
## --- Description for column: Age---
## 
## count     3903
## unique     232
## top       19.0
## freq        89
## Name: age, dtype: object
## 
## --- Description for column: Fatal_Y_N---
## 
## count     6890
## unique       9
## top          N
## freq      4804
## Name: fatal_y_n, dtype: object
## 
## --- Description for column: Time---
## 
## count          3372
## unique          397
## top       Afternoon
## freq            215
## Name: time, dtype: object
## 
## --- Description for column: Species---
## 
## count            3772
## unique           1560
## top       White shark
## freq              192
## Name: species, dtype: object

Date

There are 6,587 recorded dates for shark attacks in the dataset, with 5,558 of them being unique. The most frequently occurring date is 1957-01-01, which appears 11 times — a notably high count for a single day. This repetition may suggest placeholder values or potential data entry inconsistencies, raising questions about the reliability or completeness of some records.

Year

This is the only fully numerical variable in the data set.It includes 6,758 entries for the year of each shark attack. The values range from as early as year 1 to as recent as 2023, with a median year of 1986. The most recent quarter of data (75th percentile) starts in 2009, while the earliest quarter ends in 1950. The presence of extremely low values like 1 suggests possible errors or missing data encoded incorrectly, which could affect analyses involving time trends unless filtered or corrected.

Type

The type variable describes the nature of each shark encounter, with 6,871 recorded entries and 11 unique categories. The most common type is Unprovoked, accounting for 5,065 incidents. This overwhelming majority suggests that most shark attacks occur without deliberate human interaction or provocation. 11 unique categories could show entry inconsistencies in the data set.

Country

The country column contains 6,839 entries and 215 unique values, representing the global distribution of shark attacks. The USA appears most frequently, with 2,522 incidents, which is a significant portion of the data set. This could reflect either a higher incidence of shark-human interactions in the region, more thorough reporting practices, or both. The large number of unique countries indicates wide geographic coverage across the globe.

Area

The area column provides more specific regional detail within countries, with 6,409 entries and 862 unique values. The most commonly reported area is Florida, appearing 1,174 times, which is further emphasizing the United States as a hotspot for reported shark activity.

Location

The location column contains 6,325 entries with 4,427 unique values, providing highly specific information about where each shark attack occurred. The most frequently reported location is New Smyrna Beach, Volusia County, which appears 192 times.

Activity

The activity column describes what the individual was doing at the time of the shark attack, with 6,304 entries and 1,553 unique activities recorded. The most frequent activity is Surfing, which accounts for 1,112 cases. The large number of unique values suggests that this column captures a wide variety of behaviors. There are some overlap with similar entries written in different formats (“swimming” vs “Swimming near shore”).

Name

The name column contains 6,670 entries and 5,638 unique values, presumably identifying the individuals involved in each shark attack. Interestingly, the most common entry is simply “male”, appearing 669 times, which suggests that in many cases, the individual’s name was unknown and replaced with a descriptor.

Sex

The sex column records the gender of individuals involved in shark attacks, with 6,318 entries and 6 unique values. The most frequent entry is “M” (Male), accounting for 5,545 cases. While the majority of values are consistent, the presence of six unique entries indicated likely inconsistencies.

Age

The age column has 3,903 non-null entries and 232 unique values. The most common age recorded is 19.0, appearing 89 times. Although this is a numerical variable, it is stored as an object type and includes inconsistencies such as ranges (e.g., “13 or 14”), approximate values, and text-based entries.

Fatal_Y_N

The fatal_y_n column indicates whether a shark attack was fatal, with 6,890 entries and 9 unique values. The most common value is “N” (non-fatal), accounting for 4,804 records. While this variable is meant to represent a binary outcome, the presence of nine unique values suggests inconsistencies or unknown data.

Time

The time column contains 3,372 non-null entries with 397 unique values, capturing the reported time of each shark attack. The most common entry is “Afternoon”, occurring 215 times. However, the wide variety of unique entries suggests inconsistent formatting, including vague descriptors (“Just before noon”) as well as specific timestamps (“14h00”). These inconsistencies would require significant parsing and standardization to be used reliably for time-based analysis. In this report, the time data was cleaned and transformed into an hour column for visualization.

Species

The species column identifies the type of shark involved in each attack, with 3,772 entries and 1,560 unique values. The most frequently reported species is the White shark, appearing 192 times.

Visualizations

The following section presents visualizations designed to explore key patterns in the shark attack dataset. These graphs highlight trends across time, geography, activity type, and other relevant factors. By transforming the data into visual formats, we can more effectively identify insights, anomalies, and areas that warrant further investigation.

Shark Attacks by Decade and Month

date_df = df[ (df.date.notna()) & (df.year.notna())]
date_df.drop('year', axis=1, inplace=True)
date_df.reset_index(inplace=True)

for row in range(len(date_df)):
    date = date_df.iloc[row].date
    date = date.split('-')

    # Year 
    
    if len(date[0]) == 4:   # Error check
       date_df.loc[row, 'Year'] = date[0]
    else:
        print (date[0]) # 202 & 144, i am guessing suppose to be 2022 and 1444
        if date[0] == 202:
            date_df.loc[row, 'Year'] = 2022
        if date[0] == 144:
            date_df.loc[row, 'Year'] = 1444

    # Month
    if len(date[1]) == 2:   # Error check
       date_df.loc[row, 'Month'] = date[1]


        
x = date_df.groupby( ['Year','Month'])['Year'].count().reset_index(name='count')

x.sort_values('count',ascending = False, inplace=True)


x['Decade'] = (x['Year'].astype('int') // 10) * 10

x2 = x.groupby(['Decade', 'Month']).agg({'count': 'sum'}).reset_index()

x2 = pd.DataFrame(x2)


x2.sort_values(['Month'], ascending=True, inplace=True)
x2.reset_index(inplace=True, drop=True)
x2['Month'] = x2['Month'].astype(int)

x2 = x2[x2['Decade'] >= 1400]  # Remove crazy outlier

fig = plt.figure(figsize = (30,20))

plt.scatter(x2.Decade, x2.Month, marker = '8', cmap='viridis', c=x2['count'].astype('int'), 
    s=x2['count'].astype('int')*4, edgecolors ='black')

# Title and labels
plt.title('Shark Attacks by Decade and Month\n Since 1400',fontsize=28)
plt.xlabel('Year',fontsize=26)
plt.ylabel('Month', fontsize=26)

# y ticks
y_ticks = list(range(1,13))
y_labels = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
plt.yticks(y_ticks, y_labels, fontsize=22, color='black')

# x ticks
x_ticks = np.arange(x2.Decade.min(), x2.Decade.max() + 20, 25)
plt.xticks(x_ticks, fontsize=22, color='black')

# colorbar
cbar = plt.colorbar()
cbar.set_label('Number of Shark Attacks', rotation=270, fontsize=24, color = 'black' , labelpad=30 )
my_colorbar_ticks = [*range(1, int(x2['count'].max())+1, 10)]
cbar.set_ticks(my_colorbar_ticks)
my_colorbar_tick_labels = [*range(0, int(x2['count'].max()), 10)]
cbar.set_ticklabels(my_colorbar_tick_labels)


plt.show()

This scatter plot displays the distribution of shark attacks over time, organized by both decade and month, dating back to 1550. Each point represents the number of attacks during a specific month in a given decade, with both the size and color of the point reflecting the number of incidents, with larger, brighter points indicate higher counts.

The graph shows a clear increase in reported shark attacks over time. Records are sparse prior to the 1800s, but attack frequency begins to rise notably throughout the 1900s and into the 2000s. This trend is likely due to a combination of improved record-keeping and a rise in human activity in the ocean. Additionally, attacks appear to be most common between June and September, revealing a strong seasonal pattern during the summer months. This aligns with increased recreational water use, as beach-goers and aquatic activities peak during this time.

Attacks by Hour


date_df = df[-df['date'].isna() & -df['time'].isna()]
date_df['date'] = date_df['date'].astype(str)

date_df['month'] = date_df['date'].str.split('-').str[1]

date_df['month'] = pd.to_numeric(date_df['month'])

def month_to_quarter(month):
    return (month - 1) // 3 + 1

date_df['quarter'] = date_df['month'].apply(month_to_quarter)
date_df['time'] = date_df['time'].astype(str)
date_df['hour'] = date_df['time'].str.split('h').str[0] # gets the hour

date_df['hour'] = date_df['hour'].astype(str).str.lower().str.strip()


# Source: Stack Overflow and AI to create the keywords dictionary
# I put the list of unique values for hour into a LLM and had it create the keywords dictionary below

import re

def extract_hour(val):
    
    # For numbers
    if re.match(r'^\d{1,2}$', val):
        return int(val)
        
    # Match HHMM or HHjMM or weird formats
    match = re.search(r'(\d{1,2})[hj:.\s]?', val)
    if match:
        h = int(match.group(1))
        if 0 <= h <= 23:
            return h
    # Keyword based
    keywords = {
        'morning': 9,
        'mid-morning': 10,
        'late morning': 11,
        'afternoon': 14,
        'early afternoon': 13,
        'late afternoon': 16,
        'evening': 18,
        'early evening': 17,
        'late evening': 20,
        'night': 21,
        'late night': 23,
        'dusk': 19,
        'dawn': 6,
        'midday': 12,
        'noon': 12,
        'sunset': 19,
        'daybreak': 6,
        'a.m.': 9,
        'p.m.': 15,
        'midnight': 0,
        'dark': 21,
        'after dark': 22,
        'just before dawn': 5,
        'just before noon': 11,
        'just after 12': 13
    }
    
    for key, hour in keywords.items():
        if key in val:
            return hour
    return np.nan

date_df['hour'] = date_df['hour'].apply(extract_hour)
date_df['hour'] = date_df['hour'].astype(str).str.strip('.')
date_df = date_df[date_df['hour']!= 'nan']
quarter_hour_df = date_df.groupby(['quarter', 'hour']).size().reset_index(name='num_attacks')
quarter_hour_df['hour'] = quarter_hour_df['hour'].astype(float)
quarter_hour_df['quarter'] = quarter_hour_df['quarter'].astype(int)
quarter_hour_df = quarter_hour_df.sort_values(['quarter', 'hour']).reset_index(drop=True)

fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1,1,1)

colors = {1:'red',2:'blue',3:'green',4:'gray'}

for quarter, grp in quarter_hour_df.groupby('quarter'):
    grp.plot(ax=ax, kind='line',x='hour',y='num_attacks',label=quarter, marker='8',color = colors[quarter])

plt.title("Number of Attacks by Hour\n and Quarter",fontsize=20,pad=12)
ax.set_xlabel('Hours (24 Hour Interval)',fontsize=18,labelpad=12)
ax.set_ylabel('Total Number of Attacks',fontsize=18)

ax.set_xticks(np.arange(24))

ax.tick_params(axis='x',labelsize=14)
ax.tick_params(axis='y',labelsize=14)

handles, labels = ax.get_legend_handles_labels()
labels = ['Quarter 1','Quarter 2','Quarter 3','Quarter 4']
plt.legend(handles, labels,fontsize=16)

plt.show()

This line graph displays the total number of shark attacks by hour of the day, segmented by quarter. Each line represents a different quarter of the year, allowing for comparisons between seasonal trends in attack timing. The x-axis spans a 24-hour day, while the y-axis shows the total number of attacks that occurred during each hour across all available years. Building on the seasonal trends identified in the previous graph—where most shark attacks occur during summer months—this chart further reveals that attacks also follow a daily rhythm, peaking in the early afternoon hours. Across all quarters, there is a noticeable spike around 2 PM, with Quarter 3 showing the highest overall activity throughout the afternoon. This corresponds to typical beach and water activity hours, when sunlight, temperature, and human presence in the ocean are all at their peak.

The combined evidence from this graph and the previous month-by-decade analysis suggests that shark attacks are not randomly distributed—they’re strongly influenced by human ocean use patterns, especially during warmer months and active daylight hours. Understanding these temporal patterns can help inform beach safety measures and guide public awareness around when the risk of shark encounters is statistically more likely.

Activity and Casue

pie_df = df.groupby(['activity','type'])['activity'].count().reset_index(name='numatt')
pie_df.sort_values(['numatt'],inplace=True,ascending=False)
pie_df.reset_index(inplace=True)
top_activities_df = (
    pie_df.groupby(['activity'])['numatt']
    .sum()
    .sort_values(ascending=False)
    .head(10)
    .reset_index()
)
pie_df['type_grouped'] = pie_df['type'].apply(
    lambda x: x if x in ['Provoked', 'Unprovoked'] else 'Other')
type_df = (pie_df.groupby('type_grouped')['numatt'].sum().reset_index())




from matplotlib.patches import Patch

fig = plt.figure(figsize=(9,9))
ax = fig.add_subplot(1,1,1)

colormap = plt.get_cmap("tab20c")
outer_colors = colormap(np.arange(len(top_activities_df))*2)
number_inside_colors = len( type_df.numatt.unique())
all_color_ref_number = np.arange((len(top_activities_df)) + number_inside_colors)
inside_color_ref_number = []
for each in all_color_ref_number:
    if each not in np.arange(len(top_activities_df))*2:
        inside_color_ref_number.append(each)
inner_colors = colormap(inside_color_ref_number)


all_labels = top_activities_df['activity'].tolist()
pie_labels = ['Surfing', 'Swimming', 'Fishing', 'Spearfishing'] + [''] * 6
legend_labels = all_labels[4:]

top_activities_df['numatt'].plot( kind='pie',
        labels= pie_labels,
        radius = 1, colors = outer_colors,
        pctdistance = 0.9, labeldistance=1.05,
        wedgeprops = dict(edgecolor='w'),
        autopct='%1.1f%%',
        startangle=90)


type_df.numatt.plot(kind='pie',
                    radius = 0.6, colors=inner_colors,pctdistance = 0.55, labels = type_df.type_grouped,
                   labeldistance = 0.7, startangle = 250,autopct = '%1.2f%%', wedgeprops = dict(edgecolor='w'))

legend_patches = [Patch(facecolor=outer_colors[i + 4], edgecolor='w', label=legend_labels[i]) for i in range(6)]  # stack overflow
plt.legend(handles=legend_patches,
           title='Other Activities (< 5%)',
           loc = 1, bbox_to_anchor=(1.15, 0.9))
hole = plt.Circle( (0,0), 0.25, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)


ax.axis('equal')

ax.yaxis.set_visible(False)
plt.title('Top 10 Activities by Number of Attacks\n and Type')

ax.text(0,0,'Number of Attacks:\n'+ str(top_activities_df.numatt.sum()), ha='center',va='center',size=11)

plt.tight_layout()
plt.show()

This donut chart breaks down shark attacks by type (inner ring) and associated human activity (outer ring), based on 3,803 recorded attacks. The majority of attacks (75%) are classified as unprovoked, meaning they occurred without human interference or direct provocation of the shark. The remaining incidents are categorized as either provoked (9.43%) or other/unknown (15.74%). Among the activities represented(outer ring), surfing (29.2%) and swimming (26.5%) account for over half of all reported shark attacks. These are followed by fishing (13.1%) and spearfishing (10.1%), while other activities like wading, snorkeling, diving, and standing in shallow water make up smaller shares. This activity-based analysis ties directly into the previous graphs by reinforcing the idea that human behavior is a key driver of shark attack patterns. The peak months and hours of the day for attacks—June through September and early to mid-afternoon—coincide with when people are most likely to be engaged in high-risk water activities like surfing and swimming. The data emphasizes that shark attacks are not evenly distributed across all ocean users but are concentrated among those most exposed in the surf zone or open water.

Location of Attack

loc_df = df[df.location.notna()]
location_counts = df.groupby('location').agg({'country': 'first', 'location': 'count'}).rename(columns={'location': 'count'}).reset_index()
location_counts.sort_values('count', ascending =False, inplace=True)
location_counts.reset_index(inplace=True, drop=True)
loc_25 = location_counts.head(25)

colors = []
for each in loc_25.country:
    if each == 'USA':
        colors.append('royalblue')
    elif each == 'BRAZIL':
        colors.append('forestgreen')
    elif each == 'SOUTH AFRICA':
        colors.append('pink')
    elif each == 'IRAN':
        colors.append('burlywood')
    elif each == 'MOZAMBIQUE':
        colors.append('yellow')
    elif each == 'AUSTRALIA':
        colors.append('teal')
    else:
        print(each)

usa = mpatches.Patch(color='royalblue', label = 'USA')
bra = mpatches.Patch(color='forestgreen', label = 'BRAZIL')
safr = mpatches.Patch(color='pink', label = 'SOUTH AFRICA')
iran = mpatches.Patch(color='burlywood', label = 'IRAN')
moz = mpatches.Patch(color='yellow', label = 'MOZAMBIQUE')
aus = mpatches.Patch(color='teal', label = 'AUSTRALIA')

fig = plt.figure(figsize = (26,20))

ax1 = fig.add_subplot(1,1,1)
ax1.barh(loc_25.location,loc_25['count'], color = colors, edgecolor = 'black')

ax1.set_title('Top 25 Locations for Shark Attacks',size = 30)
ax1.set_xlabel('Number of Shark Attacks',fontsize = 26)
ax1.set_ylabel('Location', fontsize=26)
plt.xticks(fontsize=24)

plt.yticks(fontsize=14, rotation = 30)

ax1.legend(loc='upper right', fontsize=24, handles = [ usa, bra, safr, iran, moz, aus])

for row_counter, value_at_row_counter in enumerate(loc_25['count']):
    ax1.text(
        value_at_row_counter + 3, row_counter, str(value_at_row_counter), 
        color='black', size=22, fontweight='bold', ha='left', va='center')


plt.show()

This horizontal bar chart ranks the top 25 global locations with the highest number of recorded shark attacks. Each bar represents a specific beach or coastal area, with color-coded countries indicating geographic distribution. The United States, mostly Florida, dominates the chart, accounting for the majority of the top-ranked locations. New Smyrna Beach, Florida alone accounts for 192 attacks, making it the most common location by far. This spatial concentration supports earlier observations about when and how attacks occur: Florida’s warm climate, extensive coastline, and high tourism-driven beach traffic create ideal conditions for frequent water activity. It also ties in with the earlier seen seasonal and hourly patterns, as high-risk behaviors peak during summer months and daylight hours. Additionally, countries like Australia, South Africa, and Brazil also appear in the top 25, highlighting that while shark attacks are a global phenomenon, specific coastal ecosystems and usage patterns make certain areas far more prone to encounters. These locations are often home to both high human presence and active shark populations, reinforcing the idea that geography, behavior, and environment all play an interconnected role in shark attack risk.

Fatal Attack Ratio by Country

total_attacks = df.groupby('country').size().reset_index(name='total_attacks')
total_attacks = total_attacks[total_attacks.total_attacks >= 25]
total_attacks.reset_index(inplace=True, drop=True)

top_countries = total_attacks.sort_values(by='total_attacks', ascending=False)

fatal_attacks = df[df['fatal_y_n'] == 'Y']
fatal_counts = (
    fatal_attacks[fatal_attacks['country'].isin(top_countries['country'])]
    .groupby('country')
    .size()
    .reset_index(name='fatal_attacks')
)

fatal_ratio_df = pd.merge(top_countries, fatal_counts, on='country', how='left')
fatal_ratio_df['fatal_attacks'] = fatal_ratio_df['fatal_attacks'].fillna(0)
fatal_ratio_df['fatal_ratio'] = fatal_ratio_df['fatal_attacks'] / fatal_ratio_df['total_attacks']

fatal_ratio_df = fatal_ratio_df.sort_values(by='fatal_ratio', ascending=False)

fatal_ratio_df.reset_index(inplace=True, drop = True)
fatal_ratio_df.sort_values('fatal_ratio', ascending=False, inplace=True)
fatal_ratio_df = fatal_ratio_df.head(15)




fig = plt.figure(figsize=(18,10))
ax1 = fig.add_subplot(1,1,1)
ax2 = ax1.twinx()
bar_width = 0.4

def autolabel(these_bars, this_axis, symbol):
    for each_bar in these_bars:
        height = each_bar.get_height()
        this_axis.text(each_bar.get_x() + each_bar.get_width()/2 , height *1.01, 
                       symbol+format(height), fontsize=11, color='black', ha = 'center', va='bottom')

x_pos = np.arange(len(fatal_ratio_df))

total = ax1.bar(
    x_pos-(0.5 * bar_width), fatal_ratio_df.total_attacks, 
    bar_width, color = 'darkblue', edgecolor='white', label = 'Total Attacks'
)

kills = ax1.bar(
    x_pos+(0.5 * bar_width), fatal_ratio_df.fatal_attacks, 
    bar_width, color = 'red', edgecolor='white', label = 'Fatal Attacks'
)

ax1.set_xlabel('Country', fontsize=18)
ax1.set_ylabel('Total Number of Attacks', fontsize=18, labelpad = 15)
ax2.set_ylabel('Number of Fatal Attacks', fontsize=18, labelpad=20, rotation=270)

ax1.tick_params(axis='y', labelsize=14)
ax2.tick_params(axis='y', labelsize=14)

ax1.set_xticks(x_pos)
ax1.set_xticklabels(fatal_ratio_df.country,rotation = 40,ha = 'right',fontsize=10)

legend = ax1.legend(fontsize=18, loc = 'upper left',frameon=False)

autolabel(total,ax1,'')
autolabel(kills,ax1,'')

ax1.set_ylim([0,160])

for i, ratio in enumerate(fatal_ratio_df['fatal_ratio']):
        max_height = max(fatal_ratio_df.total_attacks[i], fatal_ratio_df.fatal_attacks[i])
        ax1.text(x_pos[i], max_height + 15,
        f'{ratio:.2f}', ha='center', va='bottom',
        fontsize=11,fontweight='bold', color='green')
        ax1.vlines(x=x_pos[i], ymin=0, ymax= max_height + 15, color='black', linestyle='-', linewidth=1)

plt.title("Top 15 Countries by Fatal Attack Ratio",fontsize = 22,pad=25)

plt.suptitle('Text in green is the ratio of fatal shark attacks', y=0.9)

plt.show()

This grouped bar chart ranks the top 15 countries by fatal shark attack ratio, offering a deeper look into not just where attacks happen, but how deadly they are. Each country is represented with two bars: total attacks (blue) and fatal attacks (red). The green labels above each pair represent the fatality ratio, which is the proportion of shark attacks in that country that resulted in death. From this visualization, we see that countries like the Philippines (0.57), Panama (0.56), and Jamaica (0.53) have some of the highest fatality ratios, despite not having the highest total number of attacks. In contrast, nations with a high volume of attacks, like Mexico and Papua New Guinea, have a slightly lower but still significant fatality rate of around 0.45–0.48. This suggests that while places like Florida (as seen in the previous chart) may have high numbers of shark encounters, the likelihood of a fatal outcome is often lower, likely due to better rescue response times, proximity to medical care, and awareness protocols.

Conclusion

This analysis of shark attack data reveals clear patterns in how, when, and where these incidents are most likely to occur. The time-stamp analysis show a sharp rise in attacks over the last century, largely concentrated in the summer months and afternoon hours, periods when human presence in the water is at its peak. Activity based data highlights that most attacks happen during unprovoked encounters with surfers and swimmers, reinforcing the link between recreational ocean use and shark encounters. Geographic patterns further demonstrate that certain locations, especially along the U.S. coastline in Florida, account for a disproportionate share of global shark attacks. However, while these areas see the highest frequency, they do not necessarily have the highest fatality rates. The fatal attack ratio varies considerably by country, suggesting that medical infrastructure, emergency response, and public awareness play key roles in determining outcomes when attacks occur. Together, these findings underscore that shark attacks may not be random. They are closely tied to human behavior, environmental conditions, and regional preparedness. Recognizing these patterns can inform safety strategies, guide public education efforts, and contribute to a more rational and data-informed understanding of our relationship with sharks in shared marine environments.