ggplot(delay, aes(mean_air_time, mean_delay)) +geom_point(aes(size = total_flights, colour=mean_distance), alpha = .3) +geom_smooth() +scale_size_area() +theme_bw() +labs(x ="Average Flight Time (minutes)",y ="Average Departure Delay (minutes)",caption ="FAA Aircraft registry",title ="Average Flight Time and Average Departure Delays | Flights from NY")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
This dataset is big enough to be useful, but I wish it had some more. For example, grouping by tail numbers is OK, but grouping by anonymous pilot IDs would’ve been much more useful. In the future, it might be interesting to compare arrival delays and departure delays with the same tail number, and see if pilots try to speed up their flights when they’re running late. When looking at my data visualization, I noticed that average flight times tend to group around 230 minutes and 320 minutes. Excepting outliers, delays tended to be under 15 minutes. What I found odd was how few long-distance flights there were. 320 minutes is only 5 and a third hours, which wouldn’t even get you to London. Either this means that New York had very little international flights, or more interestingly, long-distance flights tended to be on time more often. That would be an interesting analysis to make.