flights2 <-left_join(flights, airlines, by ="carrier")flights2$name <-gsub("Inc\\.|Co\\.", "", flights2$name)
newark <- flights2 |>filter(origin =="EWR")
#united_hubs <- newark |>#filter(carrier == "UA")#filter(dest == "ORD" | dest == "DEN" | dest == "IAH" | dest == "LAX" | dest == "SFO" | dest == "IAD")
#united_hubs <- united_hubs |>#filter(dest == "ORD" | dest == "DEN" | dest == "IAH" | dest == "LAX" | dest == "SFO" | dest == "IAD")
#united_hubs <- united_hubs |>#group_by(dest)
ggplot(newark, aes(x = name, y = distance, fill = name)) +geom_boxplot() +labs(x ="Airlines",y ="Distance (Miles)",fill ="Airline",title ="Route Distance Distribution by Airline at Newark",caption ="Source: FAA") +scale_fill_brewer(palette ="Paired") +coord_flip() +theme_minimal()
Essay
I decided to create a quite basic visualization because I believed that even thought it was simple it would reveal a lot. My graph is a series of boxplots showing the distance distribution of flights from Newark only by airline. I decided to do Newark because I like it more, and I needed to use dplyr commands. I find this graph interesting because you can use it to learn what kind of operations each of these airlines has at Newark. Skywest, Republic, Envoy and Endeavor are all regional airlines that contract out with some of the bigger airlines to operate their shorter and less in demand routes. You can see this because their range of distances is shorter and the median distance overall is shorter then the other airlines. The next thing you can see are some of the other major airlines, Allegiant, American, Delta and JetBlue. They don’t have hubs at Newark, but since it serves the New York City Metro area, the countries biggest market, they still have to provide flights to be competitive and provide what their customers want. Because of this desire, they will typically only have a few flights to their other hubs across the country, which are further away, so their distributions have few shorter flights. In Alligant’s case the minimum, first quartile and median are extremely close together. Then you have Alaska Airlines, which has a slightly different business model compared to the rest of the major US airlines, which is primarily long-haul trans-continental flights. That is why their median is so high and the range so small, beacsue they only serve a few destinations that are all similarly far away (Seattle, San Francisco, etc). Finally we get to the 2 airlines that use Newark as a major hub, Spirit and United. That is why they have much larger ranges then the other airlines, as they serve a much wider range of destinations then the other airlines. Also in United’s case, you can see they are the only airline at Newark to serve Alaska and Hawaii, which are represented by the 3 outliers that are further then any other airlines destinations.