After loading in the necessary packages, I proceed to read the csv file within Rstudios into the delays_table vector.
delays_table <- read_csv("C:/Users/aldai/Documents/DATA 607/DATA607 Week 4/607Week4.csv")
delays_table
## # A tibble: 5 × 7
## ...1 ...2 `Los Angeles` Phoenix `San Diego` `San Francisco` Seattle
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ALASKA on time 497 221 212 503 1841
## 2 <NA> delayed 62 12 20 102 305
## 3 <NA> <NA> NA NA NA NA NA
## 4 AM WEST on time 694 4840 383 320 201
## 5 <NA> delayed 117 415 65 129 61
Using pipes and an alternate way of for-looping, the various missing data values, along with the obscurity of the column/row names or lack of is what makes this data table messy. The first subsection/vector assigns some needed columns to the data such as “Destination” for the places of the flights as well as “Flights” as the total count of flights that have either made it “on time” or “delayed”.
The original excel data assumes the names of the airlines and status columns as ‘…1’ and ‘…2’ so that gets renamed to more appropriately distinguish. The “fill” vector will simply assign the following airlines to their respective NA values as there are delayed rows without the airline name.
Lastly, I filter out any remaining NA values assuming its only missing both the flights count and the status, meaning that data was left behind after the cleanup.
flights_clean <- delays_table %>%
pivot_longer(
cols = c("Los Angeles", "Phoenix", "San Diego", "San Francisco", "Seattle"),
names_to = "Destination",
values_to = "Flights"
) %>%
rename(Airline = `...1`, Status = `...2`) %>%
fill(Airline, .direction = "down") %>%
filter(!is.na(Status), !is.na(Flights))
flights_clean
## # A tibble: 20 × 4
## Airline Status Destination Flights
## <chr> <chr> <chr> <dbl>
## 1 ALASKA on time Los Angeles 497
## 2 ALASKA on time Phoenix 221
## 3 ALASKA on time San Diego 212
## 4 ALASKA on time San Francisco 503
## 5 ALASKA on time Seattle 1841
## 6 ALASKA delayed Los Angeles 62
## 7 ALASKA delayed Phoenix 12
## 8 ALASKA delayed San Diego 20
## 9 ALASKA delayed San Francisco 102
## 10 ALASKA delayed Seattle 305
## 11 AM WEST on time Los Angeles 694
## 12 AM WEST on time Phoenix 4840
## 13 AM WEST on time San Diego 383
## 14 AM WEST on time San Francisco 320
## 15 AM WEST on time Seattle 201
## 16 AM WEST delayed Los Angeles 117
## 17 AM WEST delayed Phoenix 415
## 18 AM WEST delayed San Diego 65
## 19 AM WEST delayed San Francisco 129
## 20 AM WEST delayed Seattle 61
The following code is used to summerize the table in terms of the delayed flights followed by the total flights of each airline. Additionally, I provide visualization using a bar graph to show the differences in the ratio of delays.
Based on the summary, Alaska had airlines had a total of 501 delayed flights while America West Airlines had 787 delayed flights, which is more. However, if we look into the ratios of the total delayed flights, Alaska actually has 13.27% delayed ratio while America West Airline has 10.89%. The visualization of this data in the form of a bar graph also proves this point, complimenting the differences between the two airlines.
flights_clean %>%
group_by(Airline) %>%
summarise(
delayed_flights = sum(Flights[Status == "delayed"]),
total_flights = sum(Flights),
ratio_delayed = delayed_flights / total_flights
)
## # A tibble: 2 × 4
## Airline delayed_flights total_flights ratio_delayed
## <chr> <dbl> <dbl> <dbl>
## 1 ALASKA 501 3775 0.133
## 2 AM WEST 787 7225 0.109
delay_summary <- flights_clean %>%
group_by(Airline) %>%
summarise(
delayed_flights = sum(Flights[Status == "delayed"]),
total_flights = sum(Flights),
ratio_delayed = delayed_flights / total_flights
)
ggplot(delay_summary, aes(x = Airline, y = ratio_delayed, fill = Airline)) +
geom_col()
After changing the format to better display the data and to paint a clearer picture for my self, I use the following data to compare the total flights, delays, and on times. This exchange lead me to calcuate the ratios of the flights to determine which airline is more likely to be delayed. Based on my calculations and summary, America West Airlines is less likely to delay flights with flights being 10.89% delayed, in comparison to Alaska Airlines’ flights being 13.27% delayed.