noAcronym <-left_join(flights_filter, airlines, by ="carrier")noAcronym$name <-gsub("Inc\\.|Co\\.", "", noAcronym$name)noAirline <- noAcronym %>%filter(!str_detect(name, "American Airlines")) #Their late value was super high and made the early section look too small so I filtered American Airlines out
library(ggplot2)noAirline$deviation = noAirline$arr_delay # I got tired of rewriting all the underscores when debugging so I made a new variableggplot(noAirline, aes(x = name, y = deviation)) +geom_col(aes(fill = deviation >0), position =position_dodge()) +scale_fill_manual(values =c("TRUE"="lightblue", "FALSE"="lightpink"), labels =c("FALSE"="Early", "TRUE"="Late")) +#Not sure if this command is used in class but I saw it in "The Book of R" textbook.theme_minimal() +geom_hline(yintercept =0, linetype ="dashed", color ="darkgrey") +scale_y_continuous(limits =c(-300, 1450)) +coord_flip() +labs(fill ="Tardiness", x ="Airline Carrier", y="Typical Arrival Time \n Early Late", caption ="Data gotten from NYC Flights") # I couldn't find out how to line the labels up to both sides of the chart so I had to hard code it... sorry
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_col()`).
I made a bar graph that shows the spread of data for how early or late airlines tend to be. I would like to highlight how there is a different color for the negative values and if a flight is early, it is stored in a different bar than if it were late. I think it is interesting to see just how late some flights are, for example, SkyWest Airlines is significantly more late than Envoy Air. It was quite a challenge to create two different bar graphs for the positive and negative values and I encountered a lot of errors before using Dyplr commands to create an entire new dataset derived from the original dataset.