Birdstrikes Project

US Airways Flight 1549 lands in the Hudson River after striking a flock of Canadian geese.

Introduction

The controlled landing of US Airways Flight 1549 into the Hudson River is one of the most famous plane crashes in America. It drew the aviation industry’s attention to an unlikely danger lurking in the sky: birds. Although small, birds have the potential to bring down a passenger jet when struck. Which birds cause the most strikes? How much damage do they do to aircraft? Those are a few of the questions this project seeks to answer. The dataset used in this project comes from the Federal Aviation Administration’s Wildlife Strike Database and covers strikes from 1990-2023.

Dataset

The data used in this project was found on Kaggle, linked here: https://www.kaggle.com/datasets/dianaddx/aircraft-wildlife-strikes-1990-2023?resource=download. There are 288,810 birdstrikes recorded in this data set. It includes information about species and aircraft type, along with various contributing factors. There are 110 collected columns.

For this project I used:

INCIDENT_DATE
INCIDENT_MONTH
INCIDENT_YEAR
AIRCRAFT
AIRPORT_ID
AIRPORT
OPERATOR
PHASE_OF_FLIGHT
SPECIES
SKY
PRECIPITATION
SIZE
AC_CLASS

All visualizations have “NAs” and “Unknowns” removed to offer a more detailed picture of the effect of birdstrikes.

Findings

Visualizations include a bar chart dashboard showing the top 150 and top 15 aircraft by number of birdstrikes; a scatterplot of the top 15 most commonly struck bird species and the top 15 airports where they are struck; a dual-axis chart showing the phase of flight when strikes were reported, along with the average speed and height for each phase; a stacked bar chart of the top 10 operators, stacked by the type of aircraft struck; two pie charts showing the damage caused by each strike and size of the bird struck; a bump chart ranking the number of strikes by month and year; and a heatmap of the outside conditions reported by pilots when a strike occurred.

Top Aircraft by Number of Birdstrikes

This mini dashboard looks at the aircraft most commonly struck by birds. The charts are divided by color; they show the number of aircraft that have an above average number of strikes, a number of strikes within 1% of the average, and a below average number of strikes.

The top bar chart shows the top 150 aircraft and the mean number of strikes (935). The bottom bar chart breaks the data down further into the top 15 aircraft and includes labels, along with the mean number of strikes for the top 15 (4,953).

#Creating Bar Chart (chart 1)
aircraft_df = df[['AIRCRAFT', 'INCIDENT_DATE', 'HEIGHT']]

#Removing NAs from Height column
aircraft_df = aircraft_df[aircraft_df['HEIGHT'].notna()]

#Reformatting Incident Date column
aircraft_df['INCIDENT_DATE'] = pd.to_datetime(aircraft_df['INCIDENT_DATE'], format = '%m/%d/%Y')

aircraft_df['MONTH'] = aircraft_df.INCIDENT_DATE.dt.month
aircraft_df['YEAR'] = aircraft_df.INCIDENT_DATE.dt.year
aircraft_df['DAY'] = aircraft_df.INCIDENT_DATE.dt.day
aircraft_df['WEEK_DAY'] = aircraft_df.INCIDENT_DATE.dt.strftime('%a')
aircraft_df['MONTH_NAME'] = aircraft_df.INCIDENT_DATE.dt.strftime('%b')

#Removing 'Unknown' from Aircraft column, no NAs or other missing data
aircraft_df = aircraft_df.drop(aircraft_df[aircraft_df.AIRCRAFT=="UNKNOWN"].index)

#Creating new df with Aircraft count and average height where birdstrikes occurred
aircraft_df2 = aircraft_df.groupby(['AIRCRAFT']).agg({'AIRCRAFT':['count'], 'HEIGHT':['mean']}).reset_index()

#Renaming columns
aircraft_df2.columns = ['AIRCRAFT', 'COUNT', 'AVERAGE_HEIGHT']

#Sorting Count in descending order, which aircraft model had the most birdstrikes. Resetting index.
aircraft_df2 = aircraft_df2.sort_values('COUNT', ascending=False)

aircraft_df2.reset_index(inplace=True, drop=True)

#Function to make colors for dashboard
def mean_count_colors(chart_2data):
    colors=[]
    avg = chart_2data.COUNT.mean()
    for each in chart_2data.COUNT:
        if each > avg*1.01:
            colors.append('springgreen')
        elif each < avg*0.99:
            colors.append('gray')
        else:
            colors.append('black')
    return colors
  
#Setting up dashboard
import matplotlib.patches as mpatches

#1st chart
bottom1 = 0
top1 = 150
d1 = aircraft_df2.loc[bottom1:top1]

my_colors1 = mean_count_colors(d1)

#2nd chart
bottom2 = 0
top2 = 15
d2 = aircraft_df2.loc[bottom2:top2]

my_colors2 = mean_count_colors(d2)

#1st chart
Above = mpatches.Patch(color='springgreen', label='Above Average')
At = mpatches.Patch(color='black', label='Within 1% of Average')
Below = mpatches.Patch(color='gray', label='Below Average')

fig = plt.figure(figsize=(18, 16))
fig.suptitle('Aircraft Types Most Commonly Struck by Wildlife', fontsize=18, fontweight='bold')

ax1 = fig.add_subplot(2, 1, 1)
ax1.bar(d1.AIRCRAFT, d1.COUNT, label = 'Count', color=my_colors1)
ax1.legend(handles=[Above, At, Below], fontsize=12)
plt.axhline(d1.COUNT.mean(), color='black', linestyle='solid')
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)

ax1.axes.xaxis.set_visible(False)
ax1.set_title('Top ' + str(top1) + ' Aircraft', fontsize=18)
ax1.text(top1-10, d1.COUNT.mean()+150, 'Mean = ' + str(round(d1.COUNT.mean(), 2)), rotation=0, fontsize=12)

ax1.set_yticklabels(['{:,}'.format(int(d1)) for d1 in ax1.get_yticks().tolist()])

ax2 = fig.add_subplot(2, 1, 2)
ax2.bar(d2.AIRCRAFT, d2.COUNT, label = 'Count', color=my_colors2)
#ax1.legend(fontsize=14)
ax2.legend(handles=[Above, At, Below], fontsize=12)
plt.axhline(d2.COUNT.mean(), color='black', linestyle='dashed')
ax2.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)

ax2.axes.xaxis.set_visible(True)
ax2.set_title('Top ' + str(top2) + ' Aircraft', fontsize=18)
ax2.text(top2-1, d2.COUNT.mean()+150, 'Mean = ' + '{:,.2f}'.format(int(round(d2.COUNT.mean(), 2))), rotation=0, fontsize=12)

ax2.set_yticklabels(['{:,}'.format(int(d2)) for d2 in ax2.get_yticks().tolist()])

fig.subplots_adjust(hspace = 0.35)

plt.show()

The Boeing 737-700 is by far the most commonly struck aircraft. Next is the Airbus A-320. Both aircraft are incredibly popular for short-haul flights, with both competing for the record of most commercial aircraft delivered. Other Boeing (B-) and Airbus (A-) aircraft also appear in the top 15, unsurprising considering both companies are the titans of the aviation industry.

Smaller companies that appeared were Bombardier (CRJ100/200 and 700), Embraer (EMB-170 and 145), producers known for their regional jets that are popular as commuter aircraft. The Cessna (C-172) is also included. The Cessna 172 is actually the most produced aircraft in history and the only non-passenger jet in the top 15. However, it is commonly used as a training aircraft for flight schools.

The McDonnell Douglas MD-82 is a revamped version of the DC-9, another competitor with the 737 and A-320, but not as widely used anymore. Only 116 of these aircraft were in service by 2022. This data set begins coverage in 1990, closer to when the MD-80 variants were still popular.

There were no military aircraft included in the top 15.

This mini dashboard demonstrates that popular aircraft with more flights will suffer more birdstrikes, most likely by virtue of production level and time in the air that other less common aircraft do not have.

Top 15 Species Hit at Top 15 Airports

This scatter plot looks at the top 15 species that cause birdstrikes and the top 15 airports where the strikes occur. After creating a table that counted up the top 15 species by the number of times they were struck, I discovered that mourning doves were struck the most (13,892 times) for all airports. Most strikes occurred at the Denver International Airport (6,910 strikes) for all species.

#Creating scatterplot, Species and Airport top 15 (chart 2)
df_2 = df[['AIRPORT_ID', 'SPECIES']]

#Removing unknowns from Airport_ID (6 NAs - will remove when making airport ID dataframe)
df_2 = df_2.drop(df_2[df_2.AIRPORT_ID=='ZZZZ'].index)

#Removing unknowns from Species (1 NAs - will remove when making species dataframe) 
df_2 = df_2.drop(df_2[(df_2.SPECIES=='Unknown bird - small')|(df_2.SPECIES=='Unknown bird - medium')|(df_2.SPECIES=='Unknown bird - large')|(df_2.SPECIES=='Unknown bird')].index)
                
#Create df of top 15 airport IDs with the most birdstrikes
airports15 = df_2[df_2['AIRPORT_ID'].notna()]
airports15 = airports15.groupby(['AIRPORT_ID'])['AIRPORT_ID'].count().reset_index(name='COUNT')
airports15 = airports15.sort_values(by = 'COUNT', ascending=False).reset_index(drop=True)
airports15 = airports15[0:15]

#Use for loop to make a list of the top 15 airports
airport15L = []

for each in airports15.AIRPORT_ID:
    airport15L.append(each)

#Make df of top 15 most frequently hit birds
species15 = df_2[df_2['SPECIES'].notna()].reset_index(drop=True)
species15 = species15.groupby(['SPECIES'])['SPECIES'].count().reset_index(name='COUNT')
species15 = species15.sort_values(by='COUNT', ascending=False).reset_index(drop=True)
species15 = species15[0:15]

#Use for loop to make list of top 15 species
specieslist = []

for each in species15.SPECIES:
    specieslist.append(each)

#Create df containing only airport and species names included in airport15L and specieslist, from original df.
AISP_df = df_2[df_2['AIRPORT_ID'].notna() & df_2['SPECIES'].notna()]
AISP_df = AISP_df[(AISP_df['AIRPORT_ID'].isin(airport15L)) & (AISP_df['SPECIES'].isin(specieslist))].reset_index(drop=True)

#Create dataframe to use in scatter plot, counts up airport ID/species combos from AISP_df
AISP_scatter = AISP_df.groupby(['AIRPORT_ID', 'SPECIES'])["AIRPORT_ID"].count().reset_index(name='COUNT')

#Creating scatter plot

plt.figure(figsize=(15, 15))

plt.scatter(AISP_scatter['AIRPORT_ID'], AISP_scatter['SPECIES'], marker='o', cmap='Spectral',
            c=AISP_scatter['COUNT'], s=AISP_scatter['COUNT'], edgecolors='black')

plt.title('Birdstrikes by Species and Airport, Top 15', fontsize=18, fontweight='bold')
plt.xlabel('Airport ID', fontsize=14)
plt.ylabel('Species', fontsize=14)

cbar = plt.colorbar()
cbar.set_label('Number of Strikes', rotation = 270, fontsize = 14, color = 'black', labelpad = 30)

my_colorbar_ticks3 = [*range(100, int(AISP_scatter['COUNT'].max()), 100)]
cbar.set_ticks(my_colorbar_ticks3)

my_colorbar_tick_labels3 = [*range(100, int(AISP_scatter['COUNT'].max()), 100)]
my_colorbar_tick_labels3 = ['{:,}'.format(each) for each in my_colorbar_tick_labels3]
cbar.set_ticklabels(my_colorbar_tick_labels3)


plt.show()

The airports in this chart from left to right:

Hartsfield-Jackson Atlanta International Airport
Charlotte Douglas International Airport
Cincinnati/Northern Kentucky International Airport
Denver International Airport
Dallas/Fort Worth International Airport
Detroit Metropolitan Airport
John F. Kennedy International Airport
Los Angeles International Airport
LaGuardia Airport
Kansas City International Airport
Orlando International Airport
O’Hare International Airport
Portland International Airport
Salt Lake City International Airport
Teterboro Airport

The number of horned lark strikes at Denver stands out in this table at almost 3,000 strikes from 1990-2023. The next closest is mourning doves struck at Dallas/Fort Worth with about 1,400 strikes, and the rest of the strikes range from 0 to approximately 900.

While many airports have programs in place to keep birds away for their safety as well as that of passengers, it is impossible to achieve a 100% success rate. The horned lark prefers to inhabit open ground, including airports, according to the Audubon Field Guide. They are most common up north and also prefer tundra or high mountains; the Denver Airport is located near the Rockies.

Mourning doves are the most common dove in the Dallas/Fort Worth area and will nest in trees and building ledges.

Airport location can affect how many and what species of birds are involved in strikes.

Phase of Flight Strikes Reported

This dual-axis plot shows the average speed and height of an aircraft when it reported being struck by birds. Looking at the phase of flight and speed/height can lead the aviation industry to create new practices to make flying safer.

#Dual Axis chart - Speed and Height for Each Phase of Flight (chart 3)
df_3 = df[['SPEED', 'HEIGHT', 'PHASE_OF_FLIGHT']]

#Creating new dataframe with NAs removed from all columns
df_3 = df_3[df_3['PHASE_OF_FLIGHT'].notna() & df_3['SPEED'].notna() & df_3['HEIGHT'].notna()].reset_index(drop=True)

#Creating new dataframe using df with phase of flight aggregated and counted, and average for height and speed
#Grouped by phase of flight
PoF = df_3.groupby(['PHASE_OF_FLIGHT']).agg({'PHASE_OF_FLIGHT':['count'], 'HEIGHT':['mean'], 'SPEED':['mean']}).reset_index()

#Rename columns for easier reading
PoF.columns = ['PHASE_OF_FLIGHT', 'COUNT', 'AVG_HEIGHT', 'AVG_SPEED']

#Sort by count of values in descending order
PoF = PoF.sort_values('COUNT', ascending = False).reset_index(drop=True)

#Make function for labeling bars with avg speed and avg height
def autolabel(these_bars, this_ax, place_of_decimals):
    for each_bar in these_bars:
        height = each_bar.get_height()
        this_ax.text(each_bar.get_x()+each_bar.get_width()/2, height*1.01, format(height, place_of_decimals), 
                    fontsize=12, color ='black', ha='center', va='bottom')
                    
#Making dual axis chart, average height and speed for each phase of flight

#Set up figure size and dual axis information
fig = plt.figure(figsize = (20, 10))
ax1 = fig.add_subplot(1, 1, 1)
ax2 = ax1.twinx()

bar_width = 0.4

#Setting up average speed bar
x_pos = np.arange(9)
speed_bars = ax1.bar(x_pos-(0.5*bar_width), PoF.AVG_SPEED, bar_width, color = 'springgreen', edgecolor = 'black', 
                     label = 'Average Speed')

#Setting up average height bar
height_bars = ax2.bar(x_pos+(0.5*bar_width), PoF.AVG_HEIGHT, bar_width, color = 'gray', edgecolor = 'black', 
                     label = 'Average Height')

ax1.set_xlabel('Phase of Flight', fontsize=18)
ax1.set_ylabel('Average Speed\n (Knots)', fontsize=18, labelpad=20)

ax2.set_ylabel('Average Height\n (Feet Above Ground Level)', fontsize=18, rotation=270, labelpad=40)
ax2.set_yticklabels(['{:,}'.format(int(PoF)) for PoF in ax2.get_yticks().tolist()])

ax1.tick_params(axis='y', labelsize=14)
ax2.tick_params(axis='y', labelsize=14)

plt.title('Average Speed and Height by Phase of Flight', fontsize=18, fontweight='bold')
ax1.set_xticks(x_pos)
ax1.set_xticklabels(PoF.PHASE_OF_FLIGHT, fontsize=14)

#Setting up legend
speed_color, speed_label = ax1.get_legend_handles_labels()
height_color, height_label = ax2.get_legend_handles_labels()
legend = ax1.legend(speed_color + height_color, speed_label + height_label, loc='upper left', frameon=True, ncol=1,
                   borderpad=1, fontsize=12)

ax1.set_ylim(0, PoF.AVG_SPEED.max()*1.50)

autolabel(speed_bars, ax1, ',.1f')
autolabel(height_bars, ax2, ',.1f')

plt.show()

Birds can fly at many different heights, from only a few inches above the ground to more than 30,000 feet–the highest flight ever recorded was a Ruppell’s vulture flying at 37,000 feet in 1974. It struck an airplane also flying at that height. Because of this, birdstrikes are a risk at all phases of flight, although the average heights are generally lower.

Top 10 Operators by Number of Birdstrikes

This stacked bar chart shows the number of birdstrikes reported by the top 10 aircraft operators and is stacked by the number of strikes per aircraft. The labels at the end of each column show the sum of each stacked bar.

#Stacked Bar - Total Birdstrikes per Top 10 Operator and Aircraft Type (chart 4)
op_df = df[['OPERATOR', 'AIRCRAFT', 'INCIDENT_YEAR']]

#Removing Unknown/Other and NA from data
op_df = op_df.drop(op_df[(op_df.AIRCRAFT=='UNKNOWN')].index)

op_df = op_df[op_df['OPERATOR'].notna() & op_df['AIRCRAFT'].notna() & op_df['INCIDENT_YEAR'].notna()].reset_index(drop=True)

#Create df of top 10 operators with the most birdstrikes
operators10 = op_df[op_df['OPERATOR'].notna()]
operators10 = operators10.groupby(['OPERATOR'])['OPERATOR'].count().reset_index(name='COUNT')
operators10 = operators10.sort_values(by = 'COUNT', ascending=False).reset_index(drop=True)
operators10 = operators10[0:10]

#Use for loop to make a list of the top 10 operators
operatorslist = []

for each in operators10.OPERATOR:
    operatorslist.append(each)

#Make df of top 10 most frequently hit aircraft
AC10 = op_df[op_df['AIRCRAFT'].notna()].reset_index(drop=True)
AC10 = AC10.groupby(['AIRCRAFT'])['AIRCRAFT'].count().reset_index(name='COUNT')
AC10 = AC10.sort_values(by='COUNT', ascending=False).reset_index(drop=True)
AC10 = AC10[0:10]

#Use for loop to make list of top 10 aircraft
AClist = []

for each in AC10.AIRCRAFT:
    AClist.append(each)
    
#Create df containing only operator, incident year, and aircraft names included in lists made earlier
stacked_df = op_df[op_df['OPERATOR'].notna() & op_df['AIRCRAFT'].notna() & op_df['INCIDENT_YEAR'].notna()]
stacked_df = stacked_df[(stacked_df['OPERATOR'].isin(operatorslist)) & (stacked_df['AIRCRAFT'].isin(AClist)) 
                       &(stacked_df['INCIDENT_YEAR'])].reset_index(drop=True)

stacked_df = stacked_df.groupby(['OPERATOR', 'AIRCRAFT'])['INCIDENT_YEAR'].count().reset_index(name='COUNT')

stacked_df = stacked_df.pivot(index='OPERATOR', columns = 'AIRCRAFT', values='COUNT')

#Making stacked bar chart
fig = plt.figure(figsize=(22,10))

ax = fig.add_subplot(1,1,1)

stacked_df.plot(kind='barh', stacked=True, ax=ax)

plt.ylabel('Operator', fontsize=16)
plt.title('Total Birdstrikes by Top 10 Operators and Aircraft Types \n From 1990-2023', fontsize=20, fontweight='bold')
plt.xticks(horizontalalignment = 'center', fontsize=16)

plt.yticks(fontsize=14)

ax.set_xlabel('Total Birdstrikes', fontsize=20)
ax.set_xticklabels(['{:,}'.format(int(stacked_df)) for stacked_df in ax.get_xticks().tolist()])

#Add sum labels to top of bars
totals = stacked_df.sum(axis=1)

y_offset = 10

for aircraft, total in enumerate(totals):
    ax.text(total + y_offset, aircraft, '{:,}'.format(round(total)), color = 'black', fontsize=14, va='center')
    

plt.show()

Southwest Airlines has reported the most birdstrikes overall, with 10,531 strikes total. This data lines up with the top 10 data from the mini dashboard because the variants of the Boeing 737 have an above average number of birdstrikes reported. Southwest only operates Boeing 737s, and the 737-700 is the aircraft with the most reported birdstrikes, so it makes up a large part of Southwest’s total. Other major operators like United, JetBlue, Delta, and American all use a more diversified pool of aircraft, including Airbus, which all have a below average number of birdstrikes reported.

The smaller operators like Skywest, Business travel, and 1US Airways use older variants of the 737 and regional jets like the CRJ 100/200 and Embraer 145 and 170. These planes have fewer strikes reported so the smaller operators have fewer strikes recorded in total.

UPS and FedEx both only use the Boeing 757-200 according to this data, so their reported strikes are fairly low. Surprisingly, UPS has more strikes reported for their single operated aircraft than 1US Airways and Business travel despite their multiple operated aircraft. This could be because UPS operates more flights than those smaller airlines. It also has more strikes reported than FedEx, another cargo operator.

Damage Level and Size of Bird Struck

The two pie charts shown here count up the totals for Damage Level and Size of bird reported for 18,523 birdstrikes. “Damage Level” is the amount of damage that the aircraft sustained after the strike. Each damage level is defined as such: * Minor: damage that can be fixed by simple repairs and inspections * Undetermined: the aircraft sustained damage but the details are lacking * Substantial: damage or structural failure that adversely affects the structural strength and will take major effort to fix * Destroyed: the damage sustained is serious enough that it is inadvisable to repair the aircraft

The Size pie chart is split up into Small, Medium, and Large birds as reported by the pilots on a relative scale.

#2 donuts - Damage Level to Aircraft (chart 5)

#Building df
pie_df = df[['SIZE', 'DAMAGE_LEVEL', 'INCIDENT_YEAR']]

#Replacing N, M, M?, S, and D with actual meaning
d = {'N':'None', 'M':'Minor', 'M?':'Undetermined', 'S':'Substantial', 'D':'Destroyed'}

pie_df = pie_df.replace(d)

pie_df = pie_df[pie_df['SIZE'].notna() & pie_df['DAMAGE_LEVEL'].notna()]

#Removing 'None'  from damage level
pie_df = pie_df.drop(pie_df[(pie_df.DAMAGE_LEVEL=='None')].index)

size_df = pie_df.groupby(['DAMAGE_LEVEL', 'SIZE'])['INCIDENT_YEAR'].count().reset_index(name='COUNT')

#Set up inside and outside reference numbers for colors
number_outside_colors = len(size_df.DAMAGE_LEVEL.unique())
outside_color_ref_number = np.arange(number_outside_colors)*4

number_inside_colors = len(size_df.SIZE.unique())
all_color_ref_number = np.arange(number_inside_colors)

#Building pie chart

fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(1, 1, 1)

colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)


#Add up all strikes in data frame, for donut hole
all_strikes = size_df.COUNT.sum()

#Making pie chart
size_df.groupby(['DAMAGE_LEVEL'])['COUNT'].sum().plot(
    kind='pie', radius=1, colors = outer_colors, pctdistance = .85, labeldistance = 1.1,
    wedgeprops = dict(edgecolor='white'), textprops = dict(fontsize=18), 
    autopct = lambda p: '{:.2f}%\n{:,.0f}'.format(p, (p/100)*all_strikes),
    startangle = 90) 

hole = plt.Circle((0,0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Total Birdstrikes by Damage to Aircraft', fontsize=18, fontweight='bold')

ax.text(0,0, 'Total Birdstrikes\n' + '{:,}'.format(int(all_strikes)), size=18, ha='center', va='center')

ax.axis('equal')

plt.tight_layout()

plt.show()

The majority of aircraft suffered only minor damage after being struck by birds and were able to return to service after proper repairs. Destroyed aircraft make up less than 1% of the damage reported; that could mean there are fewer fatalities after bird strikes than other emergencies. However, Substantial damage makes up a fair percentage of damage reported, and that could translate to injuries or fatalities among passengers. It is surprising that so much damage after birdstrikes is undetermined because understanding the damage that strikes cause is crucial to making aircraft safer in the future.

#2 donuts - Size of Bird (chart 6)

#Building df
pie_df = df[['SIZE', 'DAMAGE_LEVEL', 'INCIDENT_YEAR']]

#Replacing N, M, M?, S, and D with actual meaning
d = {'N':'None', 'M':'Minor', 'M?':'Undetermined', 'S':'Substantial', 'D':'Destroyed'}

pie_df = pie_df.replace(d)

pie_df = pie_df[pie_df['SIZE'].notna() & pie_df['DAMAGE_LEVEL'].notna()]

#Removing 'None'  from damage level
pie_df = pie_df.drop(pie_df[(pie_df.DAMAGE_LEVEL=='None')].index)

size_df = pie_df.groupby(['DAMAGE_LEVEL', 'SIZE'])['INCIDENT_YEAR'].count().reset_index(name='COUNT')

#Set up inside and outside reference numbers for colors
number_outside_colors = len(size_df.DAMAGE_LEVEL.unique())
outside_color_ref_number = np.arange(number_outside_colors)*4

number_inside_colors = len(size_df.SIZE.unique())
all_color_ref_number = np.arange(number_inside_colors)

#Building pie chart - size
fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(1, 1, 1)

colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)


#Add up all strikes in data frame, for donut hole
all_strikes = size_df.COUNT.sum()

#Making pie chart
size_df.groupby(['SIZE'])['COUNT'].sum().plot(
    kind='pie', radius=1, colors = outer_colors, pctdistance = .85, labeldistance = 1.05,
    wedgeprops = dict(edgecolor='white'), textprops = dict(fontsize=18), 
    autopct = lambda p: '{:.2f}%\n{:,.0f}'.format(p, (p/100)*all_strikes),
    startangle = 90) 

hole = plt.Circle((0,0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Total Birdstrikes by Size of Bird', fontsize=18, fontweight='bold')

ax.text(0,0, 'Total Birdstrikes\n' + '{:,}'.format(int(all_strikes)), size=18, ha='center', va='center')

ax.axis('equal')

plt.tight_layout()

plt.show()

The relative scale that pilots are allowed to grade size on does allow for mistakes, but according to the data almost half of the birdstrikes reported here involved medium sized birds. This is somewhat in line with the data shown in the scatterplot; most birds in the top 10 most struck species are classified as “medium-sized” when searched online. The killdeer, European starling, eastern/western meadowlarks, rock pigeon, mourning dove, horned lark, cliff/barn swallows, American kestrel, red-tailed hawk, and certain gulls and sparrows are all medium-sized birds. Canadian Geese and American kestrels can be considered large. Out of the top 10 birds, there are no small sized birds.

Strikes Ranked by Month and Year

This dataset spans from 1990 to 2023, so I have determined the top 5 years with the most strikes and then ranked them in a bump chart. 2017-2022 are the years included, and the bump chart is ranked from 1 as the most strikes to 5 as the least. The years are ranked by the number of strikes per month.

#Building bump chart - Total strikes ranked by month and year (chart 7)

#Clean up month and year data
#Data goes from 1990 to 2023, use only top 10 years instead
bump_df = df[['INCIDENT_DATE']]

#Pulling year and month name data from INCIDENT_DATE column
bump_df['INCIDENT_DATE'] = pd.to_datetime(bump_df['INCIDENT_DATE'], format = '%m/%d/%Y')

bump_df['INCIDENT_YEAR'] = bump_df['INCIDENT_DATE'].dt.year
bump_df['INCIDENT_MONTH'] = bump_df['INCIDENT_DATE'].dt.month
bump_df['MONTH_NAME'] = bump_df['INCIDENT_DATE'].dt.strftime('%b')

#Create df of top 5 years with the most birdstrikes to choose which years to use
years7 = df[df['INCIDENT_YEAR'].notna()]
years7 = years7.groupby(['INCIDENT_YEAR'])['INCIDENT_YEAR'].count().reset_index(name='COUNT')
years7 = years7.sort_values(by = 'COUNT', ascending=False).reset_index(drop=True)
years7 = years7[0:5]

#Use for loop to make list of top 10 years
Yearlist = []

for each in years7.INCIDENT_YEAR:
    Yearlist.append(each)
    
#Creating df for bump chart
bump_df = bump_df.groupby(['INCIDENT_YEAR', 'MONTH_NAME'])['INCIDENT_YEAR'].count().reset_index(name='COUNT')

#Remove all years that aren't in top 10 years list (will return 2013-2022)
bump_df = bump_df[(bump_df['INCIDENT_YEAR'].isin(Yearlist))].reset_index(drop=True)

#Pivot df so month names go along the top and years go down the side
bump_df = bump_df.pivot(index='INCIDENT_YEAR', columns='MONTH_NAME', values='COUNT')

#Reorder months
month_order = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

bump_df = bump_df.reindex(columns=month_order)

#Drop NAs, only 2023
bump_df = bump_df.dropna()

#Ranking data for bump chart - 1 is highest, 10 is lowest
bump_df_ranked = bump_df.rank(0, ascending=False, method='min')

bump_df_ranked = bump_df_ranked.T

#Plot bump chart
fig = plt.figure(figsize = (18,10))
ax = fig.add_subplot(1, 1, 1)

bump_df_ranked.plot(kind='line', ax=ax, marker='o', markeredgewidth=1, linewidth=5, 
                   markersize=45, 
                   markerfacecolor='white')

#Putting rank 1 at the top and rank 5 at the bottom
ax.invert_yaxis()

#Setting up variables for number of rows/columns
num_rows = bump_df_ranked.shape[0]
num_cols = bump_df_ranked.shape[1]

#Labels and titles
plt.ylabel('Monthly Ranking', fontsize=18, labelpad=10)
plt.title('Ranking of Birdstrike Occurrences by Month and Year\n Top 5', fontsize=20, fontweight='bold')
plt.xticks(np.arange(num_rows), month_order, fontsize=14)

plt.yticks(range(1, num_cols+1, 1), fontsize=14)

ax.set_xlabel('Month', fontsize=18)

#Reorder legend
handles, labels = ax.get_legend_handles_labels()
handles = [handles[4], handles[3], handles[2], handles[1], handles[0]]
labels = [labels[4], labels[3], labels[2], labels[1], labels[0]]

#Fixing circles in legend; make them smaller
ax.legend(handles, labels, bbox_to_anchor=(1.01, 1.01), fontsize=14, 
         labelspacing = 1,
         markerscale = .5,
         borderpad = 1,
         handletextpad = 0.8)

#Adding strike numbers into circles
rowcount = 0

colcount = 0

for eachcol in bump_df_ranked.columns:
    for eachrow in bump_df_ranked.index:
        this_rank = bump_df_ranked.iloc[rowcount, colcount]
        ax.text(rowcount, this_rank, '' + '{:,}'.format(int(bump_df.iloc[colcount, rowcount])) + '', ha='center', va='center', fontsize=12)
        rowcount+=1
    colcount+=1
    rowcount=0
        
plt.show()

2019 is the year with the most birdstrikes occurring per month, with 2022 as a fairly close second. September of 2019 has the most strikes of all (2,569) and the number of strikes go up in August, September, October, and November. This may have to do with migratory patterns, which start in the fall and go through the winter, and the number of holidays in the last few months.

Strikes are down in December, January, and February because most birds are gone and there is less movement, and then start increasing in March and April as birds migrate north for the Spring. In the summer all the birds that flew South are back and strikes remain fairly high.

Migratory patterns and seasonal changes seem to have an effect on the number of birdstrikes reported.

Sky and Precipitation when Birdstrikes Occur

This heatmap looks at the number of birdstrikes reported during certain weather conditions. Precipitation conditions include Fog, Rain, Snow, and various combinations of the three. Sky conditions include No Cloud, Overcast, and Some Cloud. The blank spaces signify that no strikes that meet those conditions occurred.

#Heatmap of sky and precipitaton conditions (chart 8)
condf = df[['SKY', 'PRECIPITATION']]

#Removing NAs from condf
condf = condf[condf['SKY'].notna() & condf['PRECIPITATION'].notna()]

#Group and count instances of sky and precipitation conditions
condf = condf.groupby(['SKY', 'PRECIPITATION'])['SKY'].count().reset_index(name='COUNT')

#Make new df to use for heatmap
hm_df = pd.pivot_table(condf, index='PRECIPITATION', columns='SKY', values='COUNT')

#Creating heatmap
fig = plt.figure(figsize=(15,10))
ax = fig.add_subplot(1,1,1)

formatter = FuncFormatter(lambda x, p: format(int(x), ','))

ax = sns.heatmap(hm_df, linewidth=0.2, annot=True, cmap='Spectral', fmt=',.0f', 
                 square = False, annot_kws={'size':11},
                 cbar_kws={'format':formatter, 'orientation':'vertical'})

plt.title('Sky and Precipitation Conditions Reported During Birdstrike Incidences', fontsize=20, fontweight='bold', pad=15)
plt.xlabel('Sky Condition', fontsize=18, labelpad=10)
plt.ylabel('Precipitation Condition', fontsize=18, labelpad=10)
plt.yticks(size=14)

plt.xticks(size=14)

cbar = ax.collections[0].colorbar

cbar.set_label('Number of Occurrences', rotation = 270., fontsize=14, labelpad=20)


plt.show()

The overwhelming majority of birdstrikes happen in overcast conditions. Rainy, overcast conditions were reported for 6,246 strikes, while foggy, overcast conditions were reported for 1,782 strikes and 1,378 strikes were reported during rain with some cloud. Overcast conditions make it harder for pilots to see the environment around them, forcing them to rely more on their instruments than visual cues, and birds are small moving objects. They would likely be hard to see on a dark day, especially if it was raining too.

Poor conditions could also make it harder for birds to see and avoid planes, and they often choose to fly lower in stormy conditions. The average height of each phase of flight from the dual axis chart shows that many strikes occur close to the ground, so weather conditions and height are closely related to the number of birdstrikes that occur.

Conclusion

After studying each visualization, the conditions that generate the most birdstrikes are clear.

Typically, pilots should watch out for medium-sized birds on overcast and rainy or foggy days, especially during the fall and spring. Most strikes cause minor damage to the aircraft and large, popular passenger jets are struck more than smaller regional jets or private aircraft. Strikes happen during all phases of flight but have a much higher average height when the aircraft is descending, implying that strikes during descent happen at much higher altitudes than any other phase of flight. Passengers flying from large international airports have a higher chance of experiencing birdstrikes; the environment around the airport can affect which species are struck most.

Birdstrikes will continue to threaten aircraft for as long as the two exist in the same environment. Hopefully, through study of this phenomenon, the aviation industry can create and implement practices to protect both passengers and the birds we share the sky with.