For this assignment, I created a wide table, with the arrival data for AM West and Alaska, in Postgresql for further transformation and analysis.
library(tidyverse)
## -- Attaching packages ---------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(DBI)
Results are hidden as it contains db access info.
dbListTables(con)
## [1] "arrivals"
Data <- dbGetQuery(con,"Select * from Arrivals")
Df <- data.frame(Data)
df <- Df %>%
gather(Destination,flights,
Los.Angeles,
Phoenix,
San.Diego,
San.Francisco,
Seattle)
df
## Airline Delays Destination flights
## 1 Alaska On-Time Los.Angeles 497
## 2 Alaska Delayed Los.Angeles 62
## 3 AM West On time Los.Angeles 694
## 4 AM West Delayed Los.Angeles 117
## 5 Alaska On-Time Phoenix 221
## 6 Alaska Delayed Phoenix 12
## 7 AM West On time Phoenix 4840
## 8 AM West Delayed Phoenix 415
## 9 Alaska On-Time San.Diego 212
## 10 Alaska Delayed San.Diego 20
## 11 AM West On time San.Diego 383
## 12 AM West Delayed San.Diego 65
## 13 Alaska On-Time San.Francisco 503
## 14 Alaska Delayed San.Francisco 102
## 15 AM West On time San.Francisco 320
## 16 AM West Delayed San.Francisco 129
## 17 Alaska On-Time Seattle 1841
## 18 Alaska Delayed Seattle 305
## 19 AM West On time Seattle 201
## 20 AM West Delayed Seattle 61
OT <- df %>%
group_by(Airline) %>%
filter(Delays == 'On-Time'|Delays == "On time" ) %>%
summarize(average_ot=mean(flights))
## `summarise()` ungrouping output (override with `.groups` argument)
ggplot(OT)+
geom_col(mapping=aes(x=Airline,y=average_ot,fill=Airline))+
ggtitle("On Time Flights by Airline")
AM West seems to have more on-time arrivals than Alaska.
OTCity <- df %>%
group_by(Airline,Destination) %>%
filter(Delays == 'On-Time'|Delays == "On time" ) %>%
summarize(flights)
## `summarise()` regrouping output by 'Airline' (override with `.groups` argument)
ggplot(OTCity)+
geom_col(mapping=aes(x=Destination,y=flights,fill=Airline))+
ggtitle("On Time Flights by Airline Per City")
Looks like AM West has had more on-time arrivals than Alaska in each city as well,except for in SF and Seattle.
df %>%
group_by(Airline) %>%
filter(Delays == 'Delayed' ) %>%
summarize(average_ot=mean(flights))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 2
## Airline average_ot
## <chr> <dbl>
## 1 Alaska 100.
## 2 AM West 157.
It looks like AM West has more delays as well overall. What about per city:
DelayCity <- df %>%
group_by(Airline,Destination) %>%
filter(Delays == 'Delayed' ) %>%
summarize(flights)
## `summarise()` regrouping output by 'Airline' (override with `.groups` argument)
arrange(DelayCity,flights)
## # A tibble: 10 x 3
## # Groups: Airline [2]
## Airline Destination flights
## <chr> <chr> <int>
## 1 Alaska Phoenix 12
## 2 Alaska San.Diego 20
## 3 AM West Seattle 61
## 4 Alaska Los.Angeles 62
## 5 AM West San.Diego 65
## 6 Alaska San.Francisco 102
## 7 AM West Los.Angeles 117
## 8 AM West San.Francisco 129
## 9 Alaska Seattle 305
## 10 AM West Phoenix 415
Seems like AM West has more delays per city as well.
AM West offers more flights than Alaska, which may explain why it has more of both on-time arrivals and delayed arrivals.
df %>%
group_by(Airline) %>%
summarize(ot_flights=sum(flights))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 2
## Airline ot_flights
## <chr> <int>
## 1 Alaska 3775
## 2 AM West 7225