#Name the new column "date" flights2 <- flightsnoNA |>unite (date, year, month, day, sep ="-") |>#reformat the dates mutate(newdate =as.Date(date, format ="%Y - %m - %d" ))
Filter for dates between Thanksgiving and New Year’s Eve
#Include Nov 22 in the date range flights_holidayseason <- flights2 |>filter(newdate >="2023-11-22"& newdate <="2023-12-31")
Join the dataset with “airports”
flights_hs <-left_join(flights_holidayseason, airports, by =c("origin"="faa")) |>#create a new column that spells out the airport codes mutate("airport"= name) #reformat the data flights_hs$`airport`<-gsub("John F Kennedy", "John F. Kennedy", flights_hs$`airport`) #remove duplicate columns flights_hs2 <- flights_hs |>select(-origin, -date)##create a new column called "period" flights_hs3 <- flights_hs2 |>mutate(period ="holiday season")
# A tibble: 3 × 3
airport period avg_dep_delay
<chr> <chr> <dbl>
1 John F. Kennedy International Airport holiday season 10.4
2 La Guardia Airport holiday season 6.21
3 Newark Liberty International Airport holiday season 6.80
Create a dataset for comparison, repeating the steps above
This dataset excludes all dates between November 22 and December 31, 2023.
flights_y2 <-left_join(flights_year, airports, by =c("origin"="faa")) |>mutate("airport"= name) #clean up the data flights_y2$airport <-gsub("John F Kennedy", "John F. Kennedy", flights_y2$airport)#remove duplicates flights_y3 <- flights_y2 |>select(-origin, -date)#create column for period flights_y4 <- flights_y3 |>mutate(period ="rest of year")
# A tibble: 3 × 3
airport period avg_dep_delay
<chr> <chr> <dbl>
1 John F. Kennedy International Airport rest of year 16.4
2 La Guardia Airport rest of year 11.2
3 Newark Liberty International Airport rest of year 16.2
Data Visualization
#Combine the two datasets flights_final <-bind_rows(by_airport, by_airport2) #Remove redundant words flights_final$airport <-gsub("International", " ", flights_final$airport)#Repeat flights_final$airport <-gsub("Airport", " ", flights_final$airport)
ggplot(flights_final, aes(x = airport, y = avg_dep_delay, fill = period)) +geom_bar(position ="dodge", stat ="identity") +scale_fill_manual(name ="Time Period", labels =c("Holiday Season", "Rest of Year"), values =c("red3", "orange")) +scale_y_continuous(lim =c(0, 20)) +labs (x ="NYC Airports",y ="Avg Departure Delay (minutes)", title ="Average Departure Delays of NYC Flights during \n 2023 Holiday Season* Compared to Rest of Year", subtitle ="*From Nov 22 to Dec 31", caption ="Source: FAA Aircraft Registry")+theme_gray() +theme(plot.title =element_text(hjust = .6, face ="bold")) +theme(plot.subtitle =element_text(hjust = .5, face ="italic")) +theme(plot.caption =element_text(hjust = .5, vjust =-2, face ="italic")) +theme(axis.text.x =element_text(vjust =-.5)) +theme(axis.title.x =element_text(vjust =-2))+theme(axis.title.y =element_text(vjust =2))
Description of Data Visualization
I wanted to see if average departure delays of NYC flights would be longer during the holiday season than the rest of the year, so I created a bar chart comparing the two time periods. In this chart, the “holiday season” is the period from November 22 to December 31, New Year’s Eve. I included November 22 because it was the Wednesday before Thanksgiving in 2023, and that day is usually one of the busiest travel days of the year. The “rest of year” time period is from January 1 to November 21. Since the holiday season is a busy time for travel, I had assumed that the increased number of flights would lead to longer delays. Surprisingly, the chart shows that average departure delays were actually shorter during the holiday season. I think this chart is most useful for making comparisons among and within the three NYC airports. For example, Newark Liberty had the greatest difference between average departure delays during the holiday season and the rest of the year, while La Guardia had the smallest. Of the three, John F. Kennedy had the longest average departure delay during the rest of the year. These comparisons raise questions about practices each airport could implement to reduce departure delay times during the holiday season or the entire year.
Use of AI Disclosure
I was stuck on how to make each bar on the chart represent a different time period (holiday season vs. rest of year), so I asked Google Gemini (2.5 Flash Version) for suggestions.
Prompt used: “Working in R studio, I have created two datasets, data1 and data2, one showing flights in x time period and another showing flights in y time period. I want to plot them in a grouped bar chart, with one bar showing x period and another y period (they should be different colors). What is the line of code I should start with?”
Answer from Google Gemini: “The essential first step is to combine your datasets and add a column to identify the time period for each observation. This prepares the data for ggplot2’s grouping aesthetic.The line of code you should start with is:
I used Gemini’s suggestion to write the “mutate(period =”holiday season”)” and “mutate(period =”rest of year”)” code and to combine the two datasets using “flights_final <- bind_rows(by_airport, by_airport2)”.