The purpose of this assignment is to load in a given csv and perform analysis to compare 2 airlines. These comaperisons will be based on the arrival delays for both airlines. When complete an R Markdown file will be posted to Github and Rpubs.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
Flights = read.csv('https://raw.githubusercontent.com/bwolin99/TestRepo/refs/heads/main/Assignment%205/Assignment5_Flights.csv');
names(Flights)[1:2] <- c('Airline','Delay')
Flights = Flights[-3, ]
Flights[2,1] = 'Alaska'
Flights[4,1] = 'AM West'
Flights = gather(Flights,'Destination','Count',3:7)
Flights = spread(Flights,key = Delay,value = Count)
names(Flights)[4] = 'on_time'
head(Flights)
## Airline Destination delayed on_time
## 1 Alaska Los.Angeles 62 497
## 2 Alaska Phoenix 12 221
## 3 Alaska San.Diego 20 212
## 4 Alaska San.Francisco 102 503
## 5 Alaska Seattle 305 1841
## 6 AM West Los.Angeles 117 694
Flights = Flights %>%
mutate(Delay_Rate = delayed/(on_time + delayed))
Flights = Flights %>%
mutate(On_Time_Rate = 1 - Delay_Rate)
head(Flights)
## Airline Destination delayed on_time Delay_Rate On_Time_Rate
## 1 Alaska Los.Angeles 62 497 0.11091234 0.8890877
## 2 Alaska Phoenix 12 221 0.05150215 0.9484979
## 3 Alaska San.Diego 20 212 0.08620690 0.9137931
## 4 Alaska San.Francisco 102 503 0.16859504 0.8314050
## 5 Alaska Seattle 305 1841 0.14212488 0.8578751
## 6 AM West Los.Angeles 117 694 0.14426634 0.8557337
Now we will calculate the mean and median delay rates for each airline.
Alaska = Flights[Flights$Airline == 'Alaska',]
AM_West = Flights[Flights$Airline == 'AM West',]
Airlines = c('AM West','Alaska')
Mean_DRate = c(mean(AM_West$Delay_Rate),mean(Alaska$Delay_Rate))
Mean_ORate = c(mean(AM_West$On_Time_Rate),mean(Alaska$On_Time_Rate))
Median_DRate = c(median(AM_West$Delay_Rate),median(Alaska$Delay_Rate))
Median_ORate = c(median(AM_West$On_Time_Rate),median(Alaska$On_Time_Rate))
Results = data.frame(Airlines,Mean_DRate,Mean_ORate,Median_DRate,Median_ORate)
head(Results)
## Airlines Mean_DRate Mean_ORate Median_DRate Median_ORate
## 1 AM West 0.1776915 0.8223085 0.1450893 0.8549107
## 2 Alaska 0.1118683 0.8881317 0.1109123 0.8890877
Looking at these results we can see that AM West has a lower On Time Rate and a higher Delay Rate. This observation can be seen in both median and mean of the rates for all destinations. In conclusion, Alaska is on time more often and is therefore a better Airline in that regard.