This homework uses dplyr, gridExtra, ggplot2 and tidyr not included in code.
flights <- read.csv("C:/Users/Exped/Desktop/Textbooks/607 Homeworks/Completed607/flights.csv")
flights
## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seatlle
## 1 ALASKA on time 497 221 212 503 1841
## 2 delayed 62 12 20 102 305
## 3 NA NA NA NA NA
## 4 AMWEST on time 694 4840 383 320 201
## 5 delayed 117 415 65 129 61
flights = flights[!is.na(flights$Los.Angeles),]
names(flights)[1:2] = c("Airline","Arrival")
flights
## Airline Arrival Los.Angeles Phoenix San.Diego San.Francisco Seatlle
## 1 ALASKA on time 497 221 212 503 1841
## 2 delayed 62 12 20 102 305
## 4 AMWEST on time 694 4840 383 320 201
## 5 delayed 117 415 65 129 61
If there were more than 4 rows, I Would use the seq() method to fill in the table. (Provided there was an innate order to the flights data.)
flights$Airline[2] = 'ALASKA'
flights$Airline[4] = 'AMWEST'
The next two lines of code, we use Tidyr to gather all the individual city columns between columns 2 and 8 Once we have that, we continue to tidy the dataframe by spreading the data so that The Arrival column, becomes less redundant by turning its complementary values into their own columns, while eliminating the Arrival column.
flights = gather(flights,"City","Freq",3:7)
flights = spread(flights,Arrival,Freq)
flights = mutate(flights,
total = delayed + `on time`)
flights = mutate(flights,
delay = round(100*delayed/(`on time` + delayed),2))
flights
## Airline City delayed on time total delay
## 1 ALASKA Los.Angeles 62 497 559 11.09
## 2 ALASKA Phoenix 12 221 233 5.15
## 3 ALASKA San.Diego 20 212 232 8.62
## 4 ALASKA San.Francisco 102 503 605 16.86
## 5 ALASKA Seatlle 305 1841 2146 14.21
## 6 AMWEST Los.Angeles 117 694 811 14.43
## 7 AMWEST Phoenix 415 4840 5255 7.90
## 8 AMWEST San.Diego 65 383 448 14.51
## 9 AMWEST San.Francisco 129 320 449 28.73
## 10 AMWEST Seatlle 61 201 262 23.28
avgDelayALASKA = mean(select(flights,delay)[[1]][1:5])
avgDelayAMWEST = mean(select(flights,delay)[[1]][6:10])
avgDelayALASKA # Average of probability of delayed flights per city (ALASKA airlines)
## [1] 11.186
avgDelayAMWEST # Average of probability of delayed flights per city (AMWEST airlines)
## [1] 17.77
The numbers above imply that if ALASKA airlines were to host flights to other cities, we could expect less delayed flights than if AMWEST were to host the same flights. If we look at the data from some ggplots, we’ll see that AMWEST has more delays in every city, and conducts a majority of their flights from Phoenix. Interesting to note in regards to AMWEST, that as the number of flights per city decrease, the ratio of delayed flights increase.
figure1 =ggplot(flights, aes(x = Airline, y=delay/100, fill = City)) +
geom_bar(stat="identity",position="dodge") +
xlab("Airlines") + ylab("Average probability of delay")
figure2 = ggplot(flights, aes(x = Airline, y=total, fill = City)) +
geom_bar(stat="identity",position="dodge") +
xlab("Airlines") + ylab("Amount of flights")
grid.arrange(figure1,figure2,ncol=2)
Next we demonstrate some piping (not TOO good at yet, getting then hang of it), and we get the average probability of delay per airline. This is a better approach than the average of individual probabilities because it accounts for the variance in number of city flights.
summarisedData = flights %>% group_by(Airline) %>% summarise(totalDelayed = sum(delayed), totalOntime = sum(`total`), avgDelay = totalDelayed/totalOntime*100)
summarisedData
## # A tibble: 2 × 4
## Airline totalDelayed totalOntime avgDelay
## <fctr> <int> <int> <dbl>
## 1 ALASKA 501 3775 13.27152
## 2 AMWEST 787 7225 10.89273
From here, we see that the above statement is not wrong, but does not give us an accurate account to the majority of all flights per airline; because AMWEST hosts a majority of their flights from Phoenix, and there are fewer delays in Phoenix for AMWEST, the actual probability of a delay for a randomly selected flight from AMWEST is actually lower than ALASKA airlines which did show us less probability for delays specific for a newly acquired destination (city.)