## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: ggplot2
## Loading required package: tidyr
airlines <- read.csv("https://raw.githubusercontent.com/wco1216/Data-607/master/hw5.csv", TRUE, ",")
airlines## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 Alaska on time 497 221 212 503 1840
## 2 delayed 62 12 20 102 305
## 3 NA NA NA NA NA
## 4 AM West on time 694 4840 383 320 201
## 5 delayed 117 415 65 129 61
airlines$X[2] <- "Alaska"
airlines$X[5] <- "AM West"
airlines <- airlines[-3,]
colnames(airlines) <- c("Airline", "Status", "LA", "Phoenix", "San Diego", "San Fran", "Seattle")
airlines## Airline Status LA Phoenix San Diego San Fran Seattle
## 1 Alaska on time 497 221 212 503 1840
## 2 Alaska delayed 62 12 20 102 305
## 4 AM West on time 694 4840 383 320 201
## 5 AM West delayed 117 415 65 129 61
airlines <- gather(airlines, "city", "n", 3:7)
airlines <- spread(airlines, Status, n)
airlines <- mutate(airlines, ratio = round(delayed / (delayed + `on time`), 3))
airlines## Airline city delayed on time ratio
## 1 Alaska LA 62 497 0.111
## 2 Alaska Phoenix 12 221 0.052
## 3 Alaska San Diego 20 212 0.086
## 4 Alaska San Fran 102 503 0.169
## 5 Alaska Seattle 305 1840 0.142
## 6 AM West LA 117 694 0.144
## 7 AM West Phoenix 415 4840 0.079
## 8 AM West San Diego 65 383 0.145
## 9 AM West San Fran 129 320 0.287
## 10 AM West Seattle 61 201 0.233
ggplot(airlines, aes(city, ratio))+
geom_boxplot() +
labs(x = "City",
y = "Probability of Being Late")Smaller box plots such as Phoenix and LA mean similar ratios of being late between the two airlines. Because both airlines are late approximately the same amount, can we assume it is the airport’s fault rather than the airline’s fault. Perhaps weather is usually bad in Phoenix and LA so you can expect a consistent amount of delayed flights in these cities?
Notice in the box plot above the cities Phoenix, LA and San Diego all have a smaller distribution between ‘late’ ratios. These ‘late’ ratios represent the number of times a flight is delayed over the total number of flights. If we notice a smaller distribution between these ratios that means both airlines are delayed around the same percentages of time. From this small distribution we conclude that it may not be the airlines fault if in fact both airlines are late the same amount of times. This may mean that an airport such as Phoenix or LA expereince worse weather. Cities such as San Francisco have larger distributions so one airline has a higher liklihood of being late. So when flying to San Francisco or Seattle it is smart to do your research on the airline.