by_month <- flights_nona |>group_by(month) |># group all destinations (where is it landing)summarise(count =n(), # counts totals for each destinationavg_dist =mean(distance), # calculates the mean distance traveledavg_arr_delay =mean(arr_delay), # calculates the mean arrival delayavg_dep_delay =mean(dep_delay), # calculates the mean dep delay.groups ="drop") |># remove the grouping structure after summarizingarrange(avg_arr_delay) |>filter(avg_dist <3000)head(by_month)
ggplot(by_month, aes(month_label, avg_dep_delay)) +geom_point(aes(size = count, color = avg_arr_delay), alpha = .5) +theme_minimal()+labs(x ="Months",y ="Average Departure Delay",size ="Number of flights",color ="Average Arrival Delay",caption ="Source: FAA Aircraft registry",title ="Average Departure and Arrival Delays \nby Months from NY Flights")
I created a visualization using the scatter plot graph that shows the average departure delay and average arrival delay by months. I also have the size of the points change based on the number of flights there were in those months. There is a gradient for average arrival delays to easily show which months have more while not interfering with the size of the points. I noticed that months like June and July have the highest departure delay, with a sudden drop in August and then later on in November and December it has the lowest number of flights, average departure and average arrival delay. There are more departure and arrival delays during the summer and it’s worse in the winter. It’s really interesting to notice that trend because it was not something I expected. I assumed there were more flights in the summer time because of everyone being on vacation.