Introduction
Although not usual, bird strikes represent a significant safety and operational challenge in aviation. It poses risks to both airlines and wildlife. The occurrence of bird strikes may be influenced by several factors including the environment and flying technical patterns. In consequence, this document examines that situation through the following question:” How do environmental conditions and flying technical aspects influence the likeliness of bird strikes?”
A dataset provided by open intro is used to conduct the analysis. It contains 19,302 observations distributed across 17 variables. It is accessible through the following link: https://www.openintro.org/data/index.php?data=birds
Key variables include: - opid: operator ID
- phase_of_flight: telling if the flight is taking off, landing, or en route
- remarks: Verbal remarks regarding the collision
- ac_mass: Mass of the aircraft
- time_of_day: light conditions: Day, night, dawn, dusk
- height: distance above grouf level
- sky: type of cloud cover
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(corrplot)
## corrplot 0.95 loaded
df<-read_csv("birds.csv")
## Rows: 19302 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): opid, operator, atype, remarks, phase_of_flt, date, time_of_day, s...
## dbl (4): ac_mass, num_engs, height, speed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(df)
## # A tibble: 6 × 17
## opid operator atype remarks phase_of_flt ac_mass num_engs date time_of_day
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 AAL AMERICAN … MD-80 NO DAM… Descent 4 2 9/30… Night
## 2 USA US AIRWAYS FK-2… 2 BIRD… Climb 4 2 11/2… Day
## 3 AAL AMERICAN … B-72… <NA> Approach 4 3 8/13… Day
## 4 AAL AMERICAN … MD-82 <NA> Climb 4 2 10/7… Day
## 5 AAL AMERICAN … MD-82 NO DAM… Climb 4 2 9/25… Day
## 6 GFT GULFSTREA… BE-99 FLT 71… Landing Roll 2 2 9/20… Day
## # ℹ 8 more variables: state <chr>, height <dbl>, speed <dbl>, effect <chr>,
## # sky <chr>, species <chr>, birds_seen <chr>, birds_struck <chr>
str(df)
## spc_tbl_ [19,302 × 17] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ opid : chr [1:19302] "AAL" "USA" "AAL" "AAL" ...
## $ operator : chr [1:19302] "AMERICAN AIRLINES" "US AIRWAYS" "AMERICAN AIRLINES" "AMERICAN AIRLINES" ...
## $ atype : chr [1:19302] "MD-80" "FK-28-4000" "B-727-200" "MD-82" ...
## $ remarks : chr [1:19302] "NO DAMAGE" "2 BIRDS, NO DAMAGE." NA NA ...
## $ phase_of_flt: chr [1:19302] "Descent" "Climb" "Approach" "Climb" ...
## $ ac_mass : num [1:19302] 4 4 4 4 4 2 4 2 4 2 ...
## $ num_engs : num [1:19302] 2 2 3 2 2 2 2 1 3 2 ...
## $ date : chr [1:19302] "9/30/1990 0:00:00" "11/29/1993 0:00:00" "8/13/1993 0:00:00" "10/7/1993 0:00:00" ...
## $ time_of_day : chr [1:19302] "Night" "Day" "Day" "Day" ...
## $ state : chr [1:19302] "IL" "MD" "TN" "VA" ...
## $ height : num [1:19302] 7000 10 400 100 50 0 4000 100 200 3000 ...
## $ speed : num [1:19302] 250 140 140 200 170 40 230 50 130 NA ...
## $ effect : chr [1:19302] NA "None" "None" "None" ...
## $ sky : chr [1:19302] "No Cloud" "No Cloud" "Some Cloud" "Overcast" ...
## $ species : chr [1:19302] "UNKNOWN BIRD - MEDIUM" "UNKNOWN BIRD - MEDIUM" "UNKNOWN BIRD - SMALL" "UNKNOWN BIRD - SMALL" ...
## $ birds_seen : chr [1:19302] NA "2-10" "2-10" NA ...
## $ birds_struck: chr [1:19302] "1" "2-10" "1" "1" ...
## - attr(*, "spec")=
## .. cols(
## .. opid = col_character(),
## .. operator = col_character(),
## .. atype = col_character(),
## .. remarks = col_character(),
## .. phase_of_flt = col_character(),
## .. ac_mass = col_double(),
## .. num_engs = col_double(),
## .. date = col_character(),
## .. time_of_day = col_character(),
## .. state = col_character(),
## .. height = col_double(),
## .. speed = col_double(),
## .. effect = col_character(),
## .. sky = col_character(),
## .. species = col_character(),
## .. birds_seen = col_character(),
## .. birds_struck = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Since the dataset contains a lot of variables, and not all of them will be used, a subset is created below to allow the process to use only necessary information. In addition, to avoid dropping to many rows, which would have a considerable impact on the data, the missing values are replaced by imputing the median and the mode, respectively for the numerical and categorical variables.
#subseting the data, taking only needed info
info<- df|>
select(opid, phase_of_flt, remarks, height, time_of_day, sky, ac_mass, speed)
#checking NAs
colSums(is.na(info))
## opid phase_of_flt remarks height time_of_day sky
## 0 1783 2786 3193 2077 3579
## ac_mass speed
## 1284 7008
#calculating the summary statistics
mod_phase<-mode(info$phase_of_flt)
mod_remarks<- mode(info$remarks)
md_height<- median(info$height, na.rm=TRUE)
mod_time<- mode(info$time_of_day)
mod_sky<- mode(info$sky)
md_mass<- median(info$ac_mass, na.rm=TRUE)
#substituting the missing values by imputing the modes and the median
info<- info|>
mutate(height=ifelse(is.na(height), md_height, height),
remarks=ifelse(is.na(remarks), mod_remarks, remarks),
phase_of_flt=ifelse(is.na(phase_of_flt), mod_phase, phase_of_flt),
time_of_day=ifelse(is.na(time_of_day), mod_time, time_of_day),
ac_mass=ifelse(is.na(ac_mass), md_mass, ac_mass),
sky=ifelse(is.na(sky),mod_sky, sky)
)
info$sky<-gsub("character", "other", info$sky)
To understand the environmental influence on bird strikes, the chunk below will generate a bar graph showing the distribution of collision based on cloud cover. It will highlight whether strikes are more frequent under cloudy, overcast,.. conditions. It will help assess the role of this environmental condition in collision risk.
table(info$sky)
##
## No Cloud other Overcast Some Cloud
## 7113 3579 3356 5254
ggplot(info, aes(x=sky))+
geom_bar(color="black", fill="red")+
labs(x="sky conditions", y="Counts", title="Distribution of collision in regard to sky conditions")+
theme_classic()
Based on the graph returned, more collision were registered under the
“No cloud” category with 7,113 counts. It is followed by “some cloud”
with 5,254 counts. “Overcast” and “Other” count approximately 3,000
registration each.
table(info$phase_of_flt)
##
## Approach character Climb Descent En Route Landing Roll
## 6470 1783 3222 599 585 3047
## Parked Take-off run Taxi
## 11 3506 79
ggplot(info, aes(x=phase_of_flt))+
geom_bar(color="black", fill="blue")+
labs(x="Flying phase", y="Counts", title="Distribution of collision in regard to flying phase")+
theme_minimal()
Above is another bar graph that shows the number collisions counted over each category. As observed, more collisions ccurred when planes where in approaching phase. Also, planes strike birds quite often when making their take-off run.
ggplot(info, aes(x=height, y=speed, color=phase_of_flt))+
geom_point(alpha=0.5)+
labs(
title = "Scatterplot of height vs Speed",
x = "Height",
y = "Speed",
color="Flying phase"
) +
theme_classic()
Above is scatterplot that correlates plane height and speed while taking
their flying phase into account. A strong positive correlation is
observed.This indicates that as planes gain in altitude, their speed
tend to increase.
Hypothesis testing
\(H_0\): \(\mu_1\) = \(\mu_2\) = \(\mu_3\) = \(\mu_4\)
\(H_a\): not all \(\mu_i\) are equal
# Perform ANOVA
anova_t <- aov(height~ sky, data = info)
anova_t
## Call:
## aov(formula = height ~ sky, data = info)
##
## Terms:
## sky Residuals
## Sum of Squares 1234478228 52073576204
## Deg. of Freedom 3 19298
##
## Residual standard error: 1642.678
## Estimated effects may be unbalanced
summary(anova_t)
## Df Sum Sq Mean Sq F value Pr(>F)
## sky 3 1.234e+09 411492743 152.5 <2e-16 ***
## Residuals 19298 5.207e+10 2698392
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
An anova test was performed to test whether the mean height of collisions differs across sky conditions. The test returned a <2e-16 p-value, which is statistically significant. Therefore, since there is enough evidence to support that the mean height of collisions differ for the sky conditions, we reject the null.
TukeyHSD(anova_t)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = height ~ sky, data = info)
##
## $sky
## diff lwr upr p adj
## other-No Cloud -631.00567 -717.49889 -544.5124 0.0000000
## Overcast-No Cloud -546.06731 -634.45160 -457.6830 0.0000000
## Some Cloud-No Cloud -295.09508 -371.87022 -218.3199 0.0000000
## Overcast-other 84.93836 -16.47409 186.3508 0.1369842
## Some Cloud-other 335.91059 244.43860 427.3826 0.0000000
## Some Cloud-Overcast 250.97223 157.71007 344.2344 0.0000000
To further understand the difference in means, and which exact groups differ, a TukeyHSD test was also conducted. All categories show a significant difference, except overcast and other.
conclusion
The analysis was centered on understanding how environmental n flying technical patterns influence bird striking. The analysis revealed that sky conditions do not significantly influence the occurrence of birds strikes. Instead, flying phases closer to the ground show higher frequency compared to other phases. Therefore, when planes are at lower altitude, they are more likely to encounter bird activity.
An anova test was conducted to examine whether the mean altitude of bird strikes differ across sky conditions. The test returned a significant p-value, which demonstrates that at least one group mean differs.
While environmental conditions may be the primary influencer, the operational context of flight phase plays a critical role in bird strike risk.
Future analysis can go deeper by examining the implication of airlines, including also the operational practices.