Your task is to:
Reading in the table and removing the na’s from the empty row.
library(tidyverse)
arrivals <- read.csv("Arrivals.csv", sep = ",", stringsAsFactors = FALSE)
arrivals_wide <- arrivals %>% drop_na()
arrivals_wide## Airlines Arrivals Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1 Alaska on time 497 221 212 503 1841
## 2 delayed 62 12 20 102 305
## 4 AM West on time 694 4840 383 320 201
## 5 delayed 117 415 65 129 61
str(arrivals_wide)## 'data.frame': 4 obs. of 7 variables:
## $ Airlines : chr "Alaska" "" "AM West" ""
## $ Arrivals : chr "on time" "delayed" "on time " "delayed"
## $ Los.Angeles : int 497 62 694 117
## $ Phoenix : int 221 12 4840 415
## $ San.Diego : int 212 20 383 65
## $ San.Francisco: int 503 102 320 129
## $ Seattle : int 1841 305 201 61
Tidying the wide data by first cleaning up the Arrivals, replacing empty spaces with NA and trimming the empty space around Arrivals, followed by filling in the Airlines for each arrival. The final step is gathering the on time and delayed arrivals in each city and arranging by Airline:
arrivals_long <- arrivals_wide %>%
mutate(Airline = ifelse(Airlines == "", NA, Airlines)) %>%
mutate(Arrival = str_trim(Arrivals, side="both")) %>%
select(-Airlines, -Arrivals) %>%
fill(Airline, .direction = "down") %>%
gather("City", "NumFlights", Los.Angeles:Seattle) %>%
arrange(Airline)
arrivals_long## Airline Arrival City NumFlights
## 1 Alaska on time Los.Angeles 497
## 2 Alaska delayed Los.Angeles 62
## 3 Alaska on time Phoenix 221
## 4 Alaska delayed Phoenix 12
## 5 Alaska on time San.Diego 212
## 6 Alaska delayed San.Diego 20
## 7 Alaska on time San.Francisco 503
## 8 Alaska delayed San.Francisco 102
## 9 Alaska on time Seattle 1841
## 10 Alaska delayed Seattle 305
## 11 AM West on time Los.Angeles 694
## 12 AM West delayed Los.Angeles 117
## 13 AM West on time Phoenix 4840
## 14 AM West delayed Phoenix 415
## 15 AM West on time San.Diego 383
## 16 AM West delayed San.Diego 65
## 17 AM West on time San.Francisco 320
## 18 AM West delayed San.Francisco 129
## 19 AM West on time Seattle 201
## 20 AM West delayed Seattle 61
Comparing the on time arrival percentages for each airline.
ggplot(arrivals_long, aes(x=Airline, y=NumFlights)) +
geom_bar(stat="identity", aes(fill=Arrival), position="fill") +
ylab("Ratio of Flights") +
coord_flip() +
theme(legend.position = "top") AM West’s on time percentages are higher than Alaska’s when comparing the overall set of arrivals.
And further comparing the arrival on time arrival percentages by city.
ggplot(arrivals_long, aes(x=City, y=NumFlights)) +
geom_bar(stat="identity", aes(fill=Arrival), position="fill") +
ylab("Ratio of Flights") +
coord_flip() +
theme(legend.position = "top") Phoenix has the best on time percentage while San Francsico has the worst on time arrival percentage.