Librairies
## -- Attaching packages -------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ----------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
Load data
airlines <- read.csv("https://raw.githubusercontent.com/jnataky/DATA-607/master/A5_Data_transformation/airlines_dest.csv")
Data Analysis
Plotting the airlines performance
Airlines on time comparison per city
on_time1 <- airlines_df%>%
group_by(dest, carrier) %>%
summarise(ontime_percent)
## `summarise()` regrouping output by 'dest' (override with `.groups` argument)
on_time1 %>%
kbl(caption = "On time performance per city", align = 'c') %>%
kable_material(c("striped", "hover")) %>%
row_spec(0, color = "indigo")
On time performance per city
|
dest
|
carrier
|
ontime_percent
|
|
los_angeles
|
ALASKA
|
0.889
|
|
los_angeles
|
AM WEST
|
0.856
|
|
phoenix
|
ALASKA
|
0.948
|
|
phoenix
|
AM WEST
|
0.921
|
|
san_diego
|
ALASKA
|
0.914
|
|
san_diego
|
AM WEST
|
0.855
|
|
san_francisco
|
ALASKA
|
0.831
|
|
san_francisco
|
AM WEST
|
0.713
|
|
seattle
|
ALASKA
|
0.858
|
|
seattle
|
AM WEST
|
0.767
|
# Plotting on time performance
ggplot(data = on_time1, aes(x = dest, y = ontime_percent, fill = carrier)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("City") + ylab("On time % ") + ggtitle("Carriers on time performance per city")

A note on discrepancy
When it comes to on-time performance per city for both airlines, Alaska Airlines performs better but it is worse in overall on-time performance. In per city performance, AM West doesn’t perform better than Alaska might be due to the number of flights its operates.
Take Away
The analyze has shown some discrepancy when comparing the performance between per city and overall performance.
Before digging into the conclusion, let have a look on delays and analyze the overall number of flights per city, and see how it goes!
Graphs and insights
on_time3 <- airlines_df%>%
group_by(dest, carrier) %>%
summarise(ontime_percent)
## `summarise()` regrouping output by 'dest' (override with `.groups` argument)
# Plotting on time performance
ggplot(data = on_time3, aes(x = dest, y = delayed_percent, fill = carrier)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("City") + ylab("Delayed % ") + ggtitle("Carriers delay per city")

on_time4 <- airlines_df%>%
group_by(dest, carrier) %>%
summarise(n_total)
## `summarise()` regrouping output by 'dest' (override with `.groups` argument)
# Plotting on time performance
ggplot(data = on_time4, aes(x = dest, y = n_total, fill = carrier)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("City") + ylab("Number of flights ") + ggtitle("Number of flights per city")

Conclusion
Here’s: Looking at the graphs above, we can see that Alaska operates more flights in Seattle and San Francisco. In San Francisco, there is not much of difference in number of flights. This explain why Alaska Airlines beat AM West in per city performance. In the city where AM West operates more flights, it has significantly more flights than Alaska, this could explain why there are more delays. But taking into consideration the number of flights AM West operates in these 5 cities, I will recommend AM West to review its reservation system. With that, it will perform way better than Alaska. In the other hand, Alaska Airlines needs to review the internal problem that causes flights to delay.
