Answer: The .CSV file created will be attached to the assignment file
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
## Load Data into R
Flight_ArrDelay <- read.csv(file="https://raw.githubusercontent.com/Badigun/Data-607-Assignments/refs/heads/main/Flights%20Arrival%3Adelays%20DATA%20607.csv", header = TRUE, sep = ",")
## View the loaded data to make sure it's loaded correctly
head (Flight_ArrDelay)
## X X.1 X.2 X.3
## 1 Destination Alaska (On Time) Alaska (Delayed) AM West (On Time)
## 2 Los Angeles 497 62 694
## 3 Phoenix 221 12 4840
## 4 San Diego 212 20 383
## 5 San Francisco 503 102 320
## 6 Seattle 1841 305 201
## X.4 X.5 X.6
## 1 AM West (Delayed) NA NA
## 2 117 NA NA
## 3 415 NA NA
## 4 65 NA NA
## 5 129 NA NA
## 6 61 NA NA
# Create the dataset Manually
FlightArrDelay <- tibble(
Destination = c("Los Angeles", "Phoenix", "San Diego", "San Francisco", "Seattle"),
`Alaska (On Time)` = c(497, 221, 212, 503, 1841),
`Alaska (Delayed)` = c(62, 12, 20, 102, 305),
`AM West (On Time)` = c(694, 4840, 383, 320, 201),
`AM West (Delayed)` = c(117, 415, 65, 129, 61)
)
# View the data
head(FlightArrDelay)
## # A tibble: 5 × 5
## Destination `Alaska (On Time)` `Alaska (Delayed)` `AM West (On Time)`
## <chr> <dbl> <dbl> <dbl>
## 1 Los Angeles 497 62 694
## 2 Phoenix 221 12 4840
## 3 San Diego 212 20 383
## 4 San Francisco 503 102 320
## 5 Seattle 1841 305 201
## # ℹ 1 more variable: `AM West (Delayed)` <dbl>
# (tidyr) Pivot the data to a long format for easier analysis
longer_FlightArrDelay <- FlightArrDelay %>%
pivot_longer(
cols = c(`Alaska (On Time)`, `Alaska (Delayed)`, `AM West (On Time)`, `AM West (Delayed)`),
names_to = c("Airline", "Status"),
names_pattern = "(.*) \\((.*)\\)",
values_to = "Flights"
)
# View the tidied data
head(longer_FlightArrDelay)
## # A tibble: 6 × 4
## Destination Airline Status Flights
## <chr> <chr> <chr> <dbl>
## 1 Los Angeles Alaska On Time 497
## 2 Los Angeles Alaska Delayed 62
## 3 Los Angeles AM West On Time 694
## 4 Los Angeles AM West Delayed 117
## 5 Phoenix Alaska On Time 221
## 6 Phoenix Alaska Delayed 12
# Filter the data for "On Time" flights
ontime_flights <- longer_FlightArrDelay %>%
filter(Status == "On Time")
# View the filtered data
head(ontime_flights)
## # A tibble: 6 × 4
## Destination Airline Status Flights
## <chr> <chr> <chr> <dbl>
## 1 Los Angeles Alaska On Time 497
## 2 Los Angeles AM West On Time 694
## 3 Phoenix Alaska On Time 221
## 4 Phoenix AM West On Time 4840
## 5 San Diego Alaska On Time 212
## 6 San Diego AM West On Time 383
# Filter the data for "Delayed" flights
Delayed_flights <- longer_FlightArrDelay %>%
filter(Status == "Delayed")
# View the filtered data
head(Delayed_flights)
## # A tibble: 6 × 4
## Destination Airline Status Flights
## <chr> <chr> <chr> <dbl>
## 1 Los Angeles Alaska Delayed 62
## 2 Los Angeles AM West Delayed 117
## 3 Phoenix Alaska Delayed 12
## 4 Phoenix AM West Delayed 415
## 5 San Diego Alaska Delayed 20
## 6 San Diego AM West Delayed 65
# View the total flights for each airline and status
totalflights <- longer_FlightArrDelay %>%
group_by(Airline, Status) %>%
summarise(Total_Flights = sum(Flights)) %>%
ungroup()
## `summarise()` has grouped output by 'Airline'. You can override using the
## `.groups` argument.
print(totalflights)
## # A tibble: 4 × 3
## Airline Status Total_Flights
## <chr> <chr> <dbl>
## 1 AM West Delayed 787
## 2 AM West On Time 6438
## 3 Alaska Delayed 501
## 4 Alaska On Time 3274
# Calculate the proportion of delayed flights for each airline
proportion_delayed <- longer_FlightArrDelay %>%
group_by(Airline) %>%
summarise(
Total_On_Time = sum(Flights[Status == "On Time"]),
Total_Delayed = sum(Flights[Status == "Delayed"]),
Total_Flights = sum(Flights),
Proportion_Delayed = Total_Delayed / Total_Flights
) %>%
ungroup()
print(proportion_delayed)
## # A tibble: 2 × 5
## Airline Total_On_Time Total_Delayed Total_Flights Proportion_Delayed
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AM West 6438 787 7225 0.109
## 2 Alaska 3274 501 3775 0.133
# Calculate the average number of delayed flights for each airline
avg_delayed_flights <- longer_FlightArrDelay %>%
filter(Status == "Delayed") %>%
group_by(Airline) %>%
summarise(Average_Delayed_Flights = mean(Flights)) %>%
ungroup()
print(avg_delayed_flights)
## # A tibble: 2 × 2
## Airline Average_Delayed_Flights
## <chr> <dbl>
## 1 AM West 157.
## 2 Alaska 100.
# Load ggplot2 if not already loaded
library(ggplot2)
# Create a bar chart for the proportion of delayed flights
ggplot(proportion_delayed, aes(x = Airline, y = Proportion_Delayed, fill = Airline)) +
geom_bar(stat = "identity") +
labs(
title = "Proportion of Delayed Flights for Each Airline",
x = "Airline",
y = "Proportion of Delayed Flights"
) +
theme_minimal() +
scale_y_continuous(labels = scales::percent) +
theme(legend.position = "none")
Yes, I did build a tidy data frame using the pivot_longer() function from the tidyr package. For this dataset, the variables include the destination, airline, status (On-time vs. Delayed), and flights. Tidying Process; Initially, the data was in a “wide” format, where the airlines’ delay counts were spread across multiple columns. This wasn’t ideal for analysis because it made it harder to compare airlines, destinations, and delay statuses simultaneously but by using pivot_longer(), I transformed the data so that each observation (a combination of airline, status, and destination) appeared as a single row. This allowed for better comparison and manipulation with dplyr for further analysis.
Total Delays Comparison I first summarized the total delays for both airlines, broken down by On-Time and Delayed. The insight from that analysis is this, Alaska Airlines: Total on-time flights is higher than delayed flights.Total delayed flights is the significant number of delays, but fewer compared to on-time flights. AM West: Total on-time flights is much lower than delayed flights. AM West has more delayed flights compared to on-time flights. Total delayed flights is very high.
Proportion of Delayed Flights Based on the data, I calculated the proportion of delayed flights for each airline. The proportion of delayed flights is the ratio of delayed flights to the total number of flights (both on-time and delayed).
Visualizing the Comparison/Delays: The bar chart would show Alaska Airlines with a taller bar (for the 13.3% delay proportion), indicating a higher percentage of delayed flights, while AM West would have a shorter bar (for the 10.9% delay proportion), suggesting fewer delays relative to the total number of flights.
A paradox seems to appear: AM West, despite being a major airline, seems to have a very high number of delays compared to Alaska. This could be due to: Operational issues, AM West may have more frequent cancellations or delays at certain airports. Seasonal issues or specific destinations where AM West faces more delays (e.g., weather-related, air traffic issues, or other operational difficulties).
The paradox here is that AM West has more delayed flights, yet we might assume a larger airline would generally have better operational efficiency. This might suggest operational challenges specific to AM West.
I would prefer flying with Alaska Airlines because they have more on-time flights compared to AM West. The delays at most destinations are lower for Alaska Airlines. Where?, I’d likely fly to Los Angeles or San Francisco, where both airlines have a relatively balanced number of delays, but overall, the data suggests Alaska Airlines has a better performance in terms of arriving on time.
The analysis of arrival delays for Alaska Airlines and AM West Airlines revealed the following:
Proportion of Delayed Flights: Alaska Airlines has a higher proportion of delayed flights (13.3%) compared to AM West Airlines (10.9%).
Average Number of Delayed Flights: AM West Airlines experiences, on average, more delayed flights (157.4 per destination) than Alaska Airlines (100.2 per destination).
In addition, Alaska Airlines operates more flights overall, which may provide more flexibility despite the higher proportion of delays. The paradox in the data suggests that while AM West has more total delays, Alaska’s delays are more concentrated, leading to a higher proportion. Ultimately, if you value fewer delays relative to flights, AM West is the better option, but Alaska Airlines might be more suitable for those seeking more flight options. ____