── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)library(ggplot2)
Reading in the data
data_url <-"https://raw.githubusercontent.com/Jeovany97/Data-607/refs/heads/main/Assignment%205a/airline_delays.csv"raw_data <-read_csv(data_url, na =c("", "NA"))
Rows: 5 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Airline, Status
dbl (5): Los Angeles, Phoenix, San Diego, San Francisco, Seattle
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Based on the data given we can see the AM West currently has a much larger operation nearly double the total flights. Despite this AM WEST manages to maintain a slightly lower percentage of delays. Rounding both delay rates to the hundredth that mean a ALASKA customer has 22% increase chance to experience a delay.
Comparing the the delay and arrive time for each City
city_delay_rates <- tidy_flights %>%group_by(Airline, City, Status) %>%summarise(total =sum(Flights), .groups ="drop") %>%pivot_wider(names_from = Status, values_from = total) %>%mutate(total_flights =`on time`+ delayed,delay_rate = delayed / total_flights )kable(city_delay_rates, caption ="Comparsion of delays between each city")
Comparsion of delays between each city
Airline
City
delayed
on time
total_flights
delay_rate
ALASKA
Los Angeles
62
497
559
0.1109123
ALASKA
Phoenix
12
221
233
0.0515021
ALASKA
San Diego
20
212
232
0.0862069
ALASKA
San Francisco
102
503
605
0.1685950
ALASKA
Seattle
305
1841
2146
0.1421249
AM WEST
Los Angeles
117
694
811
0.1442663
AM WEST
Phoenix
415
4840
5255
0.0789724
AM WEST
San Diego
65
383
448
0.1450893
AM WEST
San Francisco
129
320
449
0.2873051
AM WEST
Seattle
61
201
262
0.2328244
While previous data suggested AM WEST to be the better airline this city specific data reveals that ALASKA has the lower data. ALASKA beats AM WEST in every city. The reason AM WEST appeared better is because the concentration in phoenix. AM WEST concentrated 73% of all their flights in the city that has lowest average. This volume bias poisoned the previous data.
Creating graph to show difference between city flights
# Creating graphggplot(city_delay_rates, aes(x = City, y = total_flights, fill = Airline)) +geom_col(position ="dodge") +# Titles and labelslabs(title ="Flight Volume by City and Airline",subtitle ="Comparing the number of flights for each city",x ="Destination City",y ="Total Number of Flights",fill ="Airline" ) +# Clean up the theme and rotate city names for better readabilitytheme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))