── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycflights23)data(flights)data(airlines)
# create the summarized dataset of average arrival delay by airlineavg_delay <- flights %>%group_by(carrier) %>%summarize(mean_arr_delay =mean(arr_delay, na.rm =TRUE)) %>%left_join(airlines, by ="carrier") %>%# Add airline namesarrange(desc(mean_arr_delay))# create bar graphggplot(avg_delay, aes(x =reorder(name, mean_arr_delay), y = mean_arr_delay, fill = mean_arr_delay >0)) +geom_bar(stat ="identity") +scale_fill_manual(values =c("steelblue", "firebrick"),labels =c("On Time/Early", "Delayed")) +labs(title ="Average Arrival Delay for each Airline",x ="Airline",y ="Arrival Delay in Minutes",fill ="Delay Status",caption ="Source: nycflights23 dataset") +theme_minimal() +coord_flip()
The visualization I created shows the average arrival delay by airline carrier for flights departing from New York City in 2023. Using dplyr I was able to group the data by airline carrier and calculated the average arrival time for each. The pictured bar graph shows the overall performance of each airline, which makes it easier to compare their rankings in terms of other airlines. I used two colors for my graph, the blue for airlines that arrived earlier on average, and red for airlines that on average arrived later. The blue is associated with better performance, while red is associated with worse performance.
One important aspect that is shown in this plot is how airline performance can vary significantly from airline to airlines. This visualization helps show the viewers which airlines perform better overall on average, versus airlines that perform worse overall on average. It also can show the specific airlines that have the highest delay time on average and the lowest delay time on average.