We will analyze the following chart to compare the arrival delays for the two airlines: Alaska and AM West.
library(tidyr)
library(dplyr)
library(magrittr)
library(knitr)
library(ggplot2)
# Read the csv file
airline <- read.csv(file = "airline_arrivals.csv", stringsAsFactors = FALSE)
kable(airline)
| Airline | ArrivalStatus | Los.Angeles | Phoenix | San.Diego | San.Francisco | Seattle |
|---|---|---|---|---|---|---|
| ALASKA | on time | 497 | 221 | 212 | 503 | 1841 |
| ALASKA | delayed | 62 | 12 | 20 | 102 | 305 |
| AM WEST | on time | 694 | 4840 | 383 | 320 | 201 |
| AM WEST | delayed | 117 | 415 | 65 | 129 | 61 |
# Make table from wide to long and get delayed rows
airline_delayed <- airline %>%
gather("ArrivalCity", "Flights", 3:7) %>%
arrange(Airline) %>%
filter(ArrivalStatus == "delayed") %>%
select(Airline, ArrivalCity, Flights)
# Get on time rows for percentage calculations
airline_ontime<- airline %>%
gather("ArrivalCity", "Flights", 3:7) %>%
arrange(Airline) %>%
filter(ArrivalStatus == "on time") %>%
select(Airline, ArrivalCity, Flights)
# Add DelayRate column
airline_delayed["DelayRate"] <- round(airline_delayed$Flights / (airline_delayed$Flights+airline_ontime$Flights),2)
kable(airline_delayed)
| Airline | ArrivalCity | Flights | DelayRate |
|---|---|---|---|
| ALASKA | Los.Angeles | 62 | 0.11 |
| ALASKA | Phoenix | 12 | 0.05 |
| ALASKA | San.Diego | 20 | 0.09 |
| ALASKA | San.Francisco | 102 | 0.17 |
| ALASKA | Seattle | 305 | 0.14 |
| AM WEST | Los.Angeles | 117 | 0.14 |
| AM WEST | Phoenix | 415 | 0.08 |
| AM WEST | San.Diego | 65 | 0.15 |
| AM WEST | San.Francisco | 129 | 0.29 |
| AM WEST | Seattle | 61 | 0.23 |
# Plot airlines by number of delays
p <- ggplot(airline_delayed, aes(fill=Airline, x=ArrivalCity, y=Flights))
p + geom_bar(position="dodge", stat="identity") + labs(title = "Delay in Numbers")
From looking at number of delays chart above, it appears AM West has more delays than Alaska 80% (4/5) of the time. Phoenix being the number one and that is because more flights land there than Alaska airlines.
# Plot airlines by percentage of delays
p2 <- ggplot(airline_delayed, aes(fill=Airline, x=ArrivalCity, y=DelayRate))
p2 + geom_bar(position="dodge", stat="identity") + scale_fill_brewer(palette="Set2") + labs(title = "Delay in Percentages")
Looking at the rate of delays in both airlines, again AM West has more delays, however this time it is at all destinations, including Seattle!
```