# Read the csv file
data <- read.csv("C:\\Users\\HP\\Documents\\arrival delays for two airlines.csv")
print(data)## Airline Status Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 Alaska on time 497 221 212 503 1841
## 2 Alaska delayed 62 12 20 102 305
## 3 Am West on time 694 4840 383 320 201
## 4 Am West delayed 117 415 65 129 61
The above section reads data from a CSV file and inspects it for any inconsistencies ### Tidy the data using tidyr and diplyr
tidy_airline_data <- data %>%
pivot_longer(cols = c(`Los.Angeles`, Phoenix, `San.Diego`, `San.Francisco`, Seattle),
names_to = "Destination",
values_to = "Count") %>%
arrange(Airline, Status, Destination)
print(tidy_airline_data)## # A tibble: 20 × 4
## Airline Status Destination Count
## <chr> <chr> <chr> <int>
## 1 Alaska delayed Los.Angeles 62
## 2 Alaska delayed Phoenix 12
## 3 Alaska delayed San.Diego 20
## 4 Alaska delayed San.Francisco 102
## 5 Alaska delayed Seattle 305
## 6 Alaska on time Los.Angeles 497
## 7 Alaska on time Phoenix 221
## 8 Alaska on time San.Diego 212
## 9 Alaska on time San.Francisco 503
## 10 Alaska on time Seattle 1841
## 11 Am West delayed Los.Angeles 117
## 12 Am West delayed Phoenix 415
## 13 Am West delayed San.Diego 65
## 14 Am West delayed San.Francisco 129
## 15 Am West delayed Seattle 61
## 16 Am West on time Los.Angeles 694
## 17 Am West on time Phoenix 4840
## 18 Am West on time San.Diego 383
## 19 Am West on time San.Francisco 320
## 20 Am West on time Seattle 201
In this section the data was tidied using the tidyr and
dplyr packages in R. The pivot_longer function
was used to reshape the data into a long format, making it easier to
analyze and visualize
# Summarize the on-time and delayed counts by airline
summary_delays <- tidy_airline_data %>%
group_by(Airline, Status) %>%
summarize(Total_Count = sum(Count))## `summarise()` has grouped output by 'Airline'. You can override using the
## `.groups` argument.
## # A tibble: 4 × 3
## # Groups: Airline [2]
## Airline Status Total_Count
## <chr> <chr> <int>
## 1 Alaska delayed 501
## 2 Alaska on time 3274
## 3 Am West delayed 787
## 4 Am West on time 6438
# Plot the delays
ggplot(tidy_airline_data, aes(x = Destination, y = Count, fill = Status)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ Airline) +
theme_minimal() +
labs(title = "Comparison of Arrival Delays by Airline",
x = "Destination",
y = "Count",
fill = "Status")The above section has the summary statistics. The summary statistics for the on-time and delayed counts were calculated for each airline. The total counts were determined for both statuses. Additionally, a bar plot was created to visually compare the arrival delays for the two airlines across different destinations.
Based on the analysis, it appears that Am West has a significantly higher count of on-time arrivals compared to Alaska, especially for the Phoenix destination. However, Am West also has a higher count of delays compared to Alaska. The visualizations provide a clear comparison of the performance of the two airlines across different destinations for the two statuses.
…