library(leaflet)
library(tidyverse)
library(knitr)
library(dplyr)
library(highcharter)Project2
Aircraft Wildlife Strikes Project
Introduction
For my Project, I chose a dataset about strikes on wildlife by aircrafts. I chose this topic and data because I have been seeing a lot of roadkill lately and was curious to see if there way anything similar with planes since I will be going on a trip next month. This data was collected by the Federal Aviation Administration (FAA) and includes data from 1990 to 2023. This dataset is detailed heavy featuring 100 data columns. They include strike occurrence info such as date, time of day, location, airport name, airline name/operator, aircraft info, phase of flight, speed, height, weather, damage level and cost, info of struck species, and many more variables. Any other information you may think of, is most likely included in there.
I am cleaning my data by first selecting columns that I am interested in using for my project which are information on date and time, coordinates, phase of flight aircraft and operators, weather, damage level, repair cost, species struck. I did this by using the “select()” code. I also cleaned the data by removing any NA’s on coordinate column and removed “unknown” values from the species column. For the species code, I had to research how to filter out values that contained “unknown” because there were many values that had “unknown bird - small”, “unknown bird - medium”, etc. I found the code “filter(!grepl(”unknown”, SPECIES, ignore.case = TRUE))” for that. I decided to filter OUT for any incidents that resulted with NO damage because I want to highlight the ones that did have damage, I did this by using “filter(DAMAGE_LEVEL !=”N”)” . Lastly, I wanted to focus on incidents near MD so I filtered for states such as NY, VA, PA, DE, and MD.
setwd("~/Desktop/Desktop - Jackie’s MacBook Pro/DATA 110")
aircraft <- read_csv("aircraft_wildlife_strikes_faa.csv")aircraft_clean <- aircraft |>
select(INCIDENT_DATE, INCIDENT_YEAR, INCIDENT_MONTH,
LATITUDE, LONGITUDE, STATE, AIRPORT, AIRPORT_ID,
AIRCRAFT, OPERATOR, PHASE_OF_FLIGHT,
SKY, PRECIPITATION, TIME_OF_DAY,
DAMAGE_LEVEL, INDICATED_DAMAGE,
COST_REPAIRS_INFL_ADJ, COST_OTHER_INFL_ADJ,
SPECIES, SIZE, NUM_SEEN, NUM_STRUCK, SPEED, HEIGHT) |> # filtering for columns that I may only use
filter(!is.na(LATITUDE), !is.na(LONGITUDE)) |> # there was some rows with no coordinates so I removed them
filter(DAMAGE_LEVEL != "N") |> #filtered out occurrences with NO damage to plane
filter(!grepl("unknown", SPECIES, ignore.case = TRUE)) |> ## Used this to filter out specie values that contained "unknown" ** CODE FOUND FROM: https://www.geeksforgeeks.org/r-language/filtering-row-which-contains-a-certain-string-using-dplyr-in-r/ **
filter(STATE == "NY"| STATE == "NJ"| STATE == "DE"| STATE == "DC"| STATE == "MD"| STATE == "VA") # filtering for surrounding states of MD# more cleaning:
aircraft_final <- aircraft_clean |>
mutate(DAMAGE_LEVEL = recode(DAMAGE_LEVEL,
"M" = "Minor",
"M?" = "Undetermined",
"S" = "Substantial",
"D" = "Destroyed")) |> ## Changed damage level to words instead of letters so audience has better understanding.
mutate(
STATE = ifelse(
STATE == "DC",
"District of Columbia",
state.name[match(STATE, state.abb)]
)
) |> ## using full state names instead of abbreviation. ** CODE FOUND: https://stackoverflow.com/questions/5411979/state-name-to-abbreviation ** and used AI to include DC since it is not a state.
filter(PHASE_OF_FLIGHT == "Approach"| PHASE_OF_FLIGHT == "Climb"| PHASE_OF_FLIGHT == "Take-off Run"| PHASE_OF_FLIGHT == "Landing Roll") |> # using top 4 phase of flight.
mutate(NUM_STRUCK = recode(NUM_STRUCK, "10-Feb" = "2-10")) # in some values in the data, the num_struck appeared as "Feb-10" so when I looked into it, I concluded that it was a input error hat was meant to be seen as 2 to 10 . sully <- aircraft_final |>
filter(INCIDENT_DATE == "1/15/2009") # searching for the "Miracle on the Hudson" event where Sully safely landed a plane after striking canada geese. Found it!Exploring Simple Plots
strike_plot <-
ggplot(aircraft_final, aes(x = SPEED,
y = HEIGHT,
color =PHASE_OF_FLIGHT)) +
geom_point(size = 2) +
labs(
title = "Aircraft Speed and Height During Wildlife Strikes in 1990-2023",
subtitle = "DMV and Surrounding States",
caption = "Aircraft Wildlife Strikes Dataset by FAA",
x = "Speed (in knots)",
y = "Height (feet above ground)",
color = "Phase of Flight") +
theme_bw() +
scale_color_brewer(palette = "Dark2")
strike_plotWarning: Removed 564 rows containing missing values or values outside the scale range
(`geom_point()`).
strike_plot <-
ggplot(data = aircraft_final, aes(x = STATE, fill = DAMAGE_LEVEL)) +
geom_bar(position = "dodge") +
labs(
title = "Aircraft Wildlife Strike Damage Levels in DMV and Surrounding States 1990-2023",
x = "State",
y = "Number of Wildlife Strikes",
fill = "Damage Level",
caption = "Aircraft Wildlife Strikes Dataset by FAA"
) +
theme_minimal() +
scale_fill_manual(values = c("#3b2747", "#ffa345", "#624185",
"#ff6f4b"))
strike_plotHighcharter Plot
# code before inputing into highcharter
top_5 <- aircraft_final |>
filter(SPECIES == "Gulls"| SPECIES == "White-tailed deer"| SPECIES == "Canada goose"| SPECIES == "Herring gull"| SPECIES == "Ring-billed gull") |> #filtering for top 5 most hit species
mutate(
STRUCK_SIZE = case_when(
NUM_STRUCK %in% "1" ~ 20,
NUM_STRUCK %in% "2-10" ~ 22,
NUM_STRUCK %in% "11-100" ~ 25,
NUM_STRUCK %in% "More than 100" ~ 27, # Sizing my scatter points for plot based on number of wildlife struck
)
) # had to go back and look for this code because I remember we used it. Found in week 8 materialhighchart() |>
hc_add_series(data = top_5,
type = "scatter",
hcaes(x = SPEED,
y = HEIGHT,
group = SPECIES,
size = STRUCK_SIZE # sized based on # struck
)
) |>
hc_title(text = "Aircraft Speed and Height During Wildlife Strikes in 1990-2023") |>
hc_xAxis(title = list(text = "Speed (knots)")) |>
hc_yAxis(title = list(text = "Height (feet above ground)")) |>
hc_caption(text = "Aircraft Wildlife Strikes Dataset by FAA") |>
hc_subtitle(text = "Occurences in DMV and a few surrounding states") |>
hc_colors(c("#8CE4FF", "#F08787", "#5D866C", "#896C6C", "#FFC7C7")) |>
hc_tooltip(
shared = TRUE,
pointFormat = paste(
"Date: {point.INCIDENT_DATE} <br> Speed: {point.SPEED} <br> Height: {point.HEIGHT} <br> Times Struck: {point.NUM_STRUCK} <br> State: {point.STATE} <br> Airport: {point.AIRPORT} <br>"
)
) |>
hc_add_theme(hc_theme_flatdark())Map Visualization
# tooltip info
tooltip_strikes <- paste0(
"<b>Date: </b>", top_5$INCIDENT_DATE, "<br>",
"<br>",
"<b>Specie: </b>", top_5$SPECIES, "<br>",
"<b>Times Struck: </b>", top_5$NUM_STRUCK, "<br>",
"<b>Speed: </b>", top_5$SPEED, "<br>",
"<b>Height: </b>", top_5$HEIGHT, "<br>",
"<b>Phase of Flight: </b>", top_5$PHASE_OF_FLIGHT, "<br>",
"<b>Airline: </b>", top_5$OPERATOR, "<br>"
)# setting color pallete
pal <- colorFactor(palette = c("#8CE4FF", "#F08787", "#008000", "#7F5933", "#FFC7C7"),
levels = c("Canada goose", "Gulls", "Herring gull", "Ring-billed gull", "White-tailed deer"), top_5$SPECIES)
#code for map
leaflet() |>
setView(lng = -77.0478, lat = 40.9176, zoom =5.5) |>
addProviderTiles("Stadia.AlidadeSmoothDark") |> # picked this theme to allow my color points pop out
addCircles(
data = top_5,
stroke = FALSE,
radius = sqrt(2.3^top_5$STRUCK_SIZE),
fillColor = ~pal(top_5$SPECIES),
fillOpacity = .6,
popup = tooltip_strikes
)Assuming "LONGITUDE" and "LATITUDE" are longitude and latitude, respectively
Conclusion
For my Highcharter visualization, I showcased how speed and height of the plane are correlated when a wildlife strike occurs for the top 5 species. I sorted the points by color and size, for color, I sorted it by the top 5 specie type and for the size, I sorted it by the amount of wildlife struck. I noticed that there was an abundance of points that had “0” height which means that a lot of the strikes occured during rolling into takeoff or landing. There was a slight correlation with height and speed as I saw that the faster the plane was, the higher the strike occurred. I also noticed that most of the occurrences hit 2-10 wildlife at a time. What surprised me when I was cleaning data to sort for the top 5 species struck, I found that Deers were in the top 5 meaning a lot of them were on the airstrips.
For my map visualization, I highlighted aircraft wildlife strikes in the DMV and a few surroundings states such as New York, New Jersey, Pennsylvania, and Delaware. I similarly colored and adjusted the points to match the number and type of wildlife struck so, the more struck, the bigger the point. I found that New York State had more of the bigger bubbles meaning they had more occurrences where 11 to 100+ of wildlife were struck. Also, the most common type of species struck were Canada Geese and Gulls as I saw more of their point colors on the map. I really wanted to highlight the famous event of “Miracle of the Hudson” where a plane struck a flock of Canada geese that disabled the engines and required an emergency landning on the Hudson River in NY. I attempted to highlight it by filtering for the event and making the point yellow but that did not work.
Citations:
Changing the State abbreviation to the full State name:
{mutate( STATE = ifelse( STATE == "DC", "District of Columbia", state.name[match(STATE, state.abb)] ) )}
Source: https://stackoverflow.com/questions/5411979/state-name-to-abbreviation
AND used Google’s AI Search to find how to include DC since it is not a state.
Source: Text generated by Gemini, Google, November 16, 2025, https://gemini.google.com/ ___________________
Filtering OUT specie values that contained “unknown”
{filter(!grepl("unknown", SPECIES, ignore.case = TRUE))}