Github for homework: https://github.com/rweberc/Data607_Assignment5
schedDf <- read.csv("https://raw.githubusercontent.com/rweberc/Data607_Assignment5/master/PlaneTimes.csv",
stringsAsFactors = FALSE)
schedDf
## X X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 delayed 62 12 20 102 305
## 3 NA NA NA NA NA
## 4 AM WEST on time 694 4840 383 320 201
## 5 delayed 117 415 65 129 61
colnames(schedDf)[1:2] <- c("company", "timeType")
schedDf
## company timeType Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 delayed 62 12 20 102 305
## 3 NA NA NA NA NA
## 4 AM WEST on time 694 4840 383 320 201
## 5 delayed 117 415 65 129 61
schedDf <- schedDf %>% filter(timeType != "")
schedDf
## company timeType Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 delayed 62 12 20 102 305
## 3 AM WEST on time 694 4840 383 320 201
## 4 delayed 117 415 65 129 61
schedDf$company[schedDf$company == ""] <- NA
schedDf <- schedDf %>% fill(company)
schedDf
## company timeType Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 ALASKA on time 497 221 212 503 1841
## 2 ALASKA delayed 62 12 20 102 305
## 3 AM WEST on time 694 4840 383 320 201
## 4 AM WEST delayed 117 415 65 129 61
schedDf <- gather(schedDf, "destination", "occurrences", 3:length(schedDf))
schedDf$occurrences <- as.numeric(schedDf$occurrences)
schedDf$timeType[schedDf$timeType == "on time"] <- "on.time"
schedDf$destination <- str_replace(schedDf$destination, "\\.", " ")
Create two dataframes: one with overall values (compDf), another that preserves the destination information (compDestDf)
compDf <- schedDf %>%
group_by(company, timeType) %>%
summarize(occurrences = sum(occurrences))
compDestDf <- schedDf %>% spread(timeType, occurrences)
compDf <- compDf %>% spread(timeType, occurrences)
compDestDf
## company destination delayed on.time
## 1 ALASKA Los Angeles 62 497
## 2 ALASKA Phoenix 12 221
## 3 ALASKA San Diego 20 212
## 4 ALASKA San Francisco 102 503
## 5 ALASKA Seattle 305 1841
## 6 AM WEST Los Angeles 117 694
## 7 AM WEST Phoenix 415 4840
## 8 AM WEST San Diego 65 383
## 9 AM WEST San Francisco 129 320
## 10 AM WEST Seattle 61 201
compDf
## # A tibble: 2 x 3
## # Groups: company [2]
## company delayed on.time
## <chr> <dbl> <dbl>
## 1 ALASKA 501 3274
## 2 AM WEST 787 6438
compDf$percentDelayed <- round(compDf$delayed/(compDf$delayed + compDf$on.time), 2)
compDestDf$percentDelayed <- round(compDestDf$delayed/(compDestDf$delayed + compDestDf$on.time), 2)
compDestDf
## company destination delayed on.time percentDelayed
## 1 ALASKA Los Angeles 62 497 0.11
## 2 ALASKA Phoenix 12 221 0.05
## 3 ALASKA San Diego 20 212 0.09
## 4 ALASKA San Francisco 102 503 0.17
## 5 ALASKA Seattle 305 1841 0.14
## 6 AM WEST Los Angeles 117 694 0.14
## 7 AM WEST Phoenix 415 4840 0.08
## 8 AM WEST San Diego 65 383 0.15
## 9 AM WEST San Francisco 129 320 0.29
## 10 AM WEST Seattle 61 201 0.23
compDf
## # A tibble: 2 x 4
## # Groups: company [2]
## company delayed on.time percentDelayed
## <chr> <dbl> <dbl> <dbl>
## 1 ALASKA 501 3274 0.130
## 2 AM WEST 787 6438 0.110
ggplot(compDf, aes(x=company, y=percentDelayed)) +
geom_bar(stat = "identity") +
labs(title = "Overall Percent of Delayed Flights", x = "Company", y = "Percent Delayed Flights") +
theme(plot.title = element_text(hjust = 0.5))
From the above, we can see that even though AM WEST has a slightly lower percentage of delayed flights than ALASKA; however…
ggplot(compDestDf, aes(x=company, y=percentDelayed)) +
geom_bar(aes(fill = destination), position = "dodge", stat = "identity") +
labs(title = "Delayed Flights By Destination", x = "Company", y = "Percent Delayed Flights") +
theme(plot.title = element_text(hjust = 0.5))
Looking at individual destinations, we can see that trips to Phoenix have the smallest percentage of delayed flights for both companies, and AM WEST happens to have the majority of their arrivals to this location. AM WEST actually has a much greater percentage of delayed flights to all locations.