DATA 110 Project 2

Author

Charlie Roth

Strike Report Dataset

The dataset I am exploring analyzes reports submitted by pilots of aircrafts hitting mainly birds but a few other animals. This dataset shows 288810 different reports with 100 variables. Many variables are included such as the year of the accident (INCIDENT_YEAR), the species of the animal, mainly birds (SPECIES), whether the incident happened during dawn, day, dusk, or night (TIME_OF_DAY), the speed the aircraft was going in knots (SPEED), the weight of the aircraft on a 1-5 scale – 1 being 2,250 kg or less, 2 being between 2251 and 5700 kg, 3 being between 5,701 and 27,000 kg, 4 being between 27,001 and 272,000 kg, and 5 is above 272,000 kg – (AC_MASS), The distance the aircraft was from the airport (DISTANCE), the cloud coverage at the time of the incident (SKY), whether the bird was small, medium, or large (SIZE), and the height the aircraft was in the time of the incident in feet (HEIGHT). I chose this dataset because I’m very interested in birds and how the rise of air travel has affected them. My source for this dataset is from the Federal Aviation Administration (FAA).

Load Packages and Data

I loaded the packages ‘tidyverse’, ‘magick’, ‘raster’, and ‘highcharter’. I then set my working directory with setwd() and loaded my dataset using read_csv and named it “birds”. I then used the head() function to show the first 6 observations in my dataset.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(magick)
Linking to ImageMagick 6.9.12.93
Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
Disabled features: fftw, ghostscript, x11
library(raster)
Loading required package: sp

Attaching package: 'raster'

The following object is masked from 'package:dplyr':

    select
setwd("/Users/allisonroth/Downloads/DATA 110")
birds <- read_csv("STRIKE_REPORTS.csv")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 288810 Columns: 100
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (41): INCIDENT_DATE, TIME, TIME_OF_DAY, AIRPORT_ID, AIRPORT, RUNWAY, STA...
dbl (16): INDEX_NR, INCIDENT_MONTH, INCIDENT_YEAR, LATITUDE, LONGITUDE, AC_M...
num  (4): COST_REPAIRS, COST_OTHER, COST_REPAIRS_INFL_ADJ, COST_OTHER_INFL_ADJ
lgl (39): INGESTED_OTHER, INDICATED_DAMAGE, STR_RAD, DAM_RAD, STR_WINDSHLD, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(birds)
# A tibble: 6 × 100
  INDEX_NR INCIDENT_DATE INCIDENT_MONTH INCIDENT_YEAR TIME  TIME_OF_DAY
     <dbl> <chr>                  <dbl>         <dbl> <chr> <chr>      
1   608242 6/22/1996                  6          1996 <NA>  <NA>       
2   608243 6/26/1996                  6          1996 <NA>  <NA>       
3   608244 7/1/1996                   7          1996 <NA>  <NA>       
4   608245 7/1/1996                   7          1996 <NA>  <NA>       
5   608246 7/1/1996                   7          1996 <NA>  <NA>       
6   608247 5/6/1991                   5          1991 <NA>  Day        
# ℹ 94 more variables: AIRPORT_ID <chr>, AIRPORT <chr>, LATITUDE <dbl>,
#   LONGITUDE <dbl>, RUNWAY <chr>, STATE <chr>, FAAREGION <chr>,
#   LOCATION <chr>, ENROUTE_STATE <chr>, OPID <chr>, OPERATOR <chr>, REG <chr>,
#   FLT <chr>, AIRCRAFT <chr>, AMA <chr>, AMO <chr>, EMA <chr>, EMO <chr>,
#   AC_CLASS <chr>, AC_MASS <dbl>, TYPE_ENG <chr>, NUM_ENGS <dbl>,
#   ENG_1_POS <dbl>, ENG_2_POS <dbl>, ENG_3_POS <dbl>, ENG_4_POS <dbl>,
#   PHASE_OF_FLIGHT <chr>, HEIGHT <dbl>, SPEED <dbl>, DISTANCE <dbl>, …

Checking for NAs

I used the function anyNA() to check if there are any NAs in the variables I’m going to use. If there are NAs, when I run the chunk TRUE will show up.

anyNA(birds$INCIDENT_YEAR)
[1] FALSE
anyNA(birds$TIME_OF_DAY)
[1] TRUE
anyNA(birds$SPECIES)
[1] TRUE
anyNA(birds$HEIGHT)
[1] TRUE
anyNA(birds$SPEED)
[1] TRUE
anyNA(birds$AC_MASS)
[1] TRUE
anyNA(birds$DISTANCE)
[1] TRUE
anyNA(birds$SKY)
[1] TRUE
anyNA(birds$SIZE)
[1] TRUE

Removing NAs and Filtering

I used filter(!is.na()) to remove the NAs from every variable I checked above that showed they had NAs. I made a new dataset called “birds2” with the adjusted variables. I filtered for reports only from 2023 and reports with heights of 100 ft or more.

birds2 <- birds |>
  filter(INCIDENT_YEAR == "2023", HEIGHT >= 100, !is.na(TIME_OF_DAY), !is.na(SPECIES), !is.na(HEIGHT), !is.na(SPEED), !is.na(AC_MASS), !is.na(DISTANCE), !is.na(SKY), !is.na(SIZE))

Species

I used unique() to list all of the species in my dataset “birds2”.

unique(birds2$SPECIES)
  [1] "Northern pintail"                                                   
  [2] "Unknown bird - small"                                               
  [3] "Band-tailed pigeon"                                                 
  [4] "European starling"                                                  
  [5] "Magnificent frigatebird"                                            
  [6] "Western gull"                                                       
  [7] "Pine warbler"                                                       
  [8] "Unknown bird - medium"                                              
  [9] "Turkey vulture"                                                     
 [10] "Unknown bird - large"                                               
 [11] "American wigeon"                                                    
 [12] "American robin"                                                     
 [13] "Barn swallow"                                                       
 [14] "Osprey"                                                             
 [15] "Cackling goose"                                                     
 [16] "Geese"                                                              
 [17] "Perching birds (y)"                                                 
 [18] "Gulls"                                                              
 [19] "Wilson's snipe"                                                     
 [20] "Horned lark"                                                        
 [21] "Bald eagle"                                                         
 [22] "Mallard"                                                            
 [23] "Sandhill crane"                                                     
 [24] "Yellow-bellied sapsucker"                                           
 [25] "Franklin's gull"                                                    
 [26] "Northern parula"                                                    
 [27] "Northern flicker"                                                   
 [28] "Red-winged blackbird"                                               
 [29] "Chipping sparrow"                                                   
 [30] "American pipit"                                                     
 [31] "Hawks"                                                              
 [32] "Swainson's thrush"                                                  
 [33] "White-throated sparrow"                                             
 [34] "Ovenbird"                                                           
 [35] "Brazilian free-tailed bat"                                          
 [36] "American tree sparrow"                                              
 [37] "Swamp sparrow"                                                      
 [38] "Short-eared owl"                                                    
 [39] "Chimney swift"                                                      
 [40] "Red-bellied woodpecker"                                             
 [41] "Blue grosbeak"                                                      
 [42] "Gray-cheeked thrush"                                                
 [43] "Snow goose"                                                         
 [44] "Grackles"                                                           
 [45] "Nashville warbler"                                                  
 [46] "Semipalmated sandpiper"                                             
 [47] "Gray catbird"                                                       
 [48] "White-headed gull complex"                                          
 [49] "Black vulture"                                                      
 [50] "Double-crested cormorant"                                           
 [51] "Killdeer"                                                           
 [52] "Swallows"                                                           
 [53] "Cassin's vireo"                                                     
 [54] "Red-breasted merganser"                                             
 [55] "Bufflehead"                                                         
 [56] "Canada goose"                                                       
 [57] "Lesser scaup"                                                       
 [58] "Snow bunting"                                                       
 [59] "Raptors: Hawks, eagles, vultures, kites, osprey, falcons, caracaras"
 [60] "Purple martin"                                                      
 [61] "Ring-billed gull"                                                   
 [62] "California gull"                                                    
 [63] "Pectoral sandpiper"                                                 
 [64] "Wilson's warbler"                                                   
 [65] "Song sparrow"                                                       
 [66] "Red-eyed vireo"                                                     
 [67] "Red-masked parakeet"                                                
 [68] "Broad-winged hawk"                                                  
 [69] "Ducks"                                                              
 [70] "Cliff swallow"                                                      
 [71] "White-/golden-crown sparrow complex"                                
 [72] "Cedar waxwing"                                                      
 [73] "Doves"                                                              
 [74] "American woodcock"                                                  
 [75] "Blackpoll warbler"                                                  
 [76] "Sora"                                                               
 [77] "Hoary bat"                                                          
 [78] "Snow goose/Ross's goose complex"                                    
 [79] "Red-tailed hawk"                                                    
 [80] "Cattle egret"                                                       
 [81] "Mourning dove"                                                      
 [82] "American crow"                                                      
 [83] "House finch"                                                        
 [84] "Rufous hummingbird"                                                 
 [85] "Kites, eagles, hawks"                                               
 [86] "Whimbrel"                                                           
 [87] "Mitred Parakeet"                                                    
 [88] "Stilt sandpiper"                                                    
 [89] "American kestrel"                                                   
 [90] "Eastern meadowlark"                                                 
 [91] "New World vultures"                                                 
 [92] "Anhinga"                                                            
 [93] "Silver-haired bat"                                                  
 [94] "Rock pigeon"                                                        
 [95] "House sparrow"                                                      
 [96] "Blackbirds"                                                         
 [97] "Warbling vireo"                                                     
 [98] "Unknown Bird"                                                       
 [99] "Evening bat"                                                        
[100] "Yellow-crowned night-heron"                                         
[101] "Swainson's hawk"                                                    
[102] "Eastern red bat"                                                    
[103] "Seminole bat"                                                       
[104] "Brewer's sparrow"                                                   

Mutating & More Filtering

I went through the list of species and filtered out species that were not birds and unknown bird species. I also mutated speed from knots to MPH and distance from nautical miles to miles.

birds3 <- birds2 |>
  filter(SPECIES != "Seminole bat",
         SPECIES != "Evening bat",
         SPECIES != "Silver-haired bat", 
         SPECIES != "Eastern red bat", 
         SPECIES != "Hoary bat", 
         SPECIES != "Brazilian free-tailed bat", 
         SPECIES != "Unknown Bird",
         SPECIES != "Unknown bird - small",
         SPECIES != "Unknown bird - medium", 
         SPECIES != "Unknown bird - large") |>
  mutate(
    SPEED = (SPEED * 1.151),
    DISTANCE = (DISTANCE * 1.15)
  )

Mutating Variable

I used the function mutate() the change the variable ‘AC_MASS’ from quantitative to categorical. 1 = “2,250 kg or less”, 2 = “2,251 kg to 5,700 kg”, 3 = “5,701 kg to 27,000 kg”, 4 = “27,001 kg to 272,000 kg”, and 5 = “above 272,000 kg”.

birds3 <- birds3 |>
  mutate(
    AC_MASS = factor(AC_MASS, levels = 1:5, labels = c("2,250 kg or less", "2,251 kg to 5,700 kg", "5,701 kg to 27,000 kg", "27,001 kg to 272,000 kg", "above 272,000 kg")))
head(birds3)
# A tibble: 6 × 100
  INDEX_NR INCIDENT_DATE INCIDENT_MONTH INCIDENT_YEAR TIME  TIME_OF_DAY
     <dbl> <chr>                  <dbl>         <dbl> <chr> <chr>      
1  1405944 1/5/2023                   1          2023 17:30 Night      
2  1406361 1/5/2023                   1          2023 22:31 Night      
3  1406684 1/6/2023                   1          2023 09:30 Day        
4  1407081 1/17/2023                  1          2023 15:35 Day        
5  1407394 1/27/2023                  1          2023 14:45 Day        
6  1407830 2/7/2023                   2          2023 19:00 Night      
# ℹ 94 more variables: AIRPORT_ID <chr>, AIRPORT <chr>, LATITUDE <dbl>,
#   LONGITUDE <dbl>, RUNWAY <chr>, STATE <chr>, FAAREGION <chr>,
#   LOCATION <chr>, ENROUTE_STATE <chr>, OPID <chr>, OPERATOR <chr>, REG <chr>,
#   FLT <chr>, AIRCRAFT <chr>, AMA <chr>, AMO <chr>, EMA <chr>, EMO <chr>,
#   AC_CLASS <chr>, AC_MASS <fct>, TYPE_ENG <chr>, NUM_ENGS <dbl>,
#   ENG_1_POS <dbl>, ENG_2_POS <dbl>, ENG_3_POS <dbl>, ENG_4_POS <dbl>,
#   PHASE_OF_FLIGHT <chr>, HEIGHT <dbl>, SPEED <dbl>, DISTANCE <dbl>, …

Exploring

I made a bar plot to see how many reports are in each weight category.

plot1 <- birds3 |>
  group_by(AC_MASS)
ggplot(plot1, aes(AC_MASS)) +
  geom_bar(color = "black", fill = c("#9a41e5", "#fdbafa", "#b93eab", "#45046b")) +
  
  labs(title = "Number of Reports in Each Weight Group\n(2023)",
       x = "Weight in kg",
       y = "Number of Reports")

The main number of reports are from aircrafts that weigh between 27,001 kg and 272,000 kg. I decided to do this plot to see if one weight category has more reports and it does.

Exploring 2

I made a boxplot of with the different types of cloud coverage and the distance of the aircrafts from the airport when the incident happened.

plot2 <- birds3 |>
  ggplot(aes(DISTANCE, SKY)) +
  geom_boxplot(color = "black", fill = c("lightskyblue2", "cornflowerblue", "royalblue4")) +
  theme_light() +
  labs(title = "The Cloud Coverage and Distance from the Airport\nEach Aircraft was During the Strike",
       y = "Cloud Coverage",
       x = "Distance in Miles From the Airport")
plot2

I made this plot because I noticed when going through the dataset that a lot of the distances were 0 so I wanted to see the mean and I though this was an easy way to do it.

Linear Regression

I used the function cor() to find the correlation coefficient of the linear regression. Then I used lm(y ~ x) to find the slope, intercept, and p-value.

cor(birds2$DISTANCE, birds2$SPEED)
[1] 0.587492
birds_lm <- lm(DISTANCE ~ SPEED, data = birds2)
summary(birds_lm)

Call:
lm(formula = DISTANCE ~ SPEED, data = birds2)

Residuals:
    Min      1Q  Median      3Q     Max 
-11.339  -3.265  -1.173   2.096  53.031 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -9.345097   0.837751  -11.15   <2e-16 ***
SPEED        0.090737   0.004884   18.58   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.846 on 655 degrees of freedom
Multiple R-squared:  0.3451,    Adjusted R-squared:  0.3441 
F-statistic: 345.2 on 1 and 655 DF,  p-value: < 2.2e-16

The correlation coefficient is 0.59 which shows a moderate to weak correlation. The equation for this linear regression is: DISTANCE = 2e-16(SPEED) - 9.35. The p-value has three asterisks which suggests that it is meaningful but we have to take into account the adjusted R-squared which shows that only 35% of observations may be explained by the model.

Model

I made a linear regression model.

plot_lm <- birds2 |>
  ggplot(aes(DISTANCE, SPEED)) +
  geom_point(color = "seagreen", fill = "seagreen2", shape = 21, size = 3, alpha = .6) +
  geom_smooth(method = 'lm', formula = y~x, linewidth = 1, color = "navyblue") +
  theme_classic() +
  labs(x = "Distance in Nautical Miles",
       y = "Speed in Knots")
plot_lm

The regression line is pretty concentrated and although the adjusted R-square is 35% there are many variables in this data set so I would say that there is a relationship between distance and speed that would make sense to explore.

Exploring 3

I made 4 separate bar graphs that show the 3 sizes of the birds struck listed in the report and I separated them into the 4 times of the day.

plot3 <- birds3 |>
  group_by(TIME_OF_DAY)
ggplot(plot3, aes(SIZE, fill = AC_MASS)) +
  geom_bar() +
  theme_classic() +
  scale_fill_manual(values = c("#e76f51", "#f4a261", "#2a9d8f", "#264653" )) +
  facet_wrap(~TIME_OF_DAY) +
  labs(title = "Number and Size of Birds that Hit or Got Hit by an Aircaft in 2023\nfor the 4 Times of the Day",
       x = "Size of Bird",
       y = "Number of Birds",
       fill = "Weight of Aircraft")

I decided not to use the size of the birds as one of my main variables because there are barely any for dawn and dusk. Like I noticed in my previous graph, the most reports are from the weight group “27,001 kg to 272,000 kg” this is shown in the graph above.

Importing Image

I used ‘image_read()’ and ‘image_fill()’ from the ‘magick’ package. ‘image_read()’ reads the link of the image I chose and ‘image_fill()’ makes the image able to use as the background. The ‘as.raster()’ function coerces the image so it can be implemented in the graph.

bird_img <- image_read("https://media.istockphoto.com/id/1218631737/photo/seagull-flying-in-the-blue-sky.webp?b=1&s=170667a&w=0&k=20&c=0D8xdiWIavUFEkaIsj4Hayii5NWOJekXgkjsJHL8Qcg=")

img3 <- image_fill(bird_img, 'none')
img4 <- as.raster(img3)

#Got the code from here: https://www.rdocumentation.org/packages/magick/versions/2.8.3/topics/image_ggplot

ggplot plot

I made a ggplot version of my final plot that shows speed in MPH on the x axis and height in feet on the y axis. The fill is the time of day and the size is the weight of each aircraft.

main_plot <- birds3 |>
  ggplot(aes(SPEED, HEIGHT, fill = TIME_OF_DAY)) +
  annotation_raster(img4, -Inf, Inf, -Inf, Inf, interpolate = F) +
  geom_point(shape = 21, color = "white", alpha = .8, aes(size = AC_MASS)) +
  guides(size = guide_legend(title.theme = element_blank(), label = F)) +
  scale_fill_manual(values = c("#e88504", "#ffcc25", "#a7408b", "#352389")) +
  theme_classic() +
  labs(title = "The Speed vs Height from Ground Level\nEach Aircraft Was when the Strikes Occurred in 2023",
       x = "Speed in MPH",
       y = "Height Above Ground Level in Feet",
       caption = "Size of Each Dot is Correlated to the Weight of Each Aircraft",
       fill = "Time of Day")
main_plot
Warning: Using size for a discrete variable is not advised.

highcharter plot

This is my final plot in highcharter that shows speed in MPH on the x axis and height in feet on the y axis. The fill is the time of day and it has a tooltip which shows species, weight, height, and speed.

library(highcharter)
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
main_plot2 <- highchart() |>
  hc_add_series(data = birds3,
                   type = "scatter",
                   hcaes(x = SPEED,
                   y = HEIGHT,
                   group = TIME_OF_DAY)) |>
  hc_colors(colors = c("#e88504", "#ffcc25", "#a7408b", "#352389")) |>
  hc_plotOptions(scatter = list(marker = list(radius = 6, lineColor = "white", symbol = "circle"))) |>
  hc_chart(backgroundColor = "transparent", 
           divBackgroundImage =   "https://media.istockphoto.com/id/1218631737/photo/seagull-flying-in-the-blue-sky.webp?b=1&s=170667a&w=0&k=20&c=0D8xdiWIavUFEkaIsj4Hayii5NWOJekXgkjsJHL8Qcg=") |>
  hc_title(text = "The Speed vs Height from Ground Level\nEach Aircraft Was When the Strikes Occurred (Reports from 2023)") |>
  hc_xAxis(title = list(text = "Speed in Knots")) |>
  hc_yAxis(title = list(text = "Height Above Ground Level in Feet")) |>
  hc_caption(text = "FAA") |> 
  hc_legend(align = "center", verticalAlign = "bottom", title = list(text = "Time of Day")) |>
  hc_tooltip(shared = F, pointFormat = "Species: {point.SPECIES}<br> Weight: {point.AC_MASS}<br> Height: {point.y} ft<br> Speed: {point.x} MPH")
main_plot2

Conclusion

Based on the information I researched, bird strikes are very common, and they happen more during the summer and fall months because of migration. Bird strikes also are a big threat because they happen when the plane is taking off or landing which is when the plane is going the slowest. This information is from the Pilot Institute : https://pilotinstitute.com/bird-strike/

From my final graph you can see that the most amount of reports happened in the day and the least amount of reports occurred during dawn. You an also see from my graph that reports in dawn and dusk were not very high and were not very fast which could be because the planes were taking off or landing. Reports of strikes at night and day happen at all heights and speeds. I wish I could have included size in the highcharter plot like I did in the ggplot one but I couldn’t figure it out.