Intoduction

The purpose of this assignment is to load in a given csv and perform analysis to compare 2 airlines. These comaperisons will be based on the arrival delays for both airlines. When complete an R Markdown file will be posted to Github and Rpubs.

Loading Libraries

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Loading in data from the CSV

Flights = read.csv('https://raw.githubusercontent.com/bwolin99/TestRepo/refs/heads/main/Assignment%205/Assignment5_Flights.csv');

Transforming data to prepare it for analysis

names(Flights)[1:2] <- c('Airline','Delay')
Flights = Flights[-3, ]
Flights[2,1] = 'Alaska'
Flights[4,1] = 'AM West'
Flights = gather(Flights,'Destination','Count',3:7)
Flights = spread(Flights,key = Delay,value = Count)
names(Flights)[4] = 'on_time'
head(Flights)
##   Airline   Destination delayed on_time
## 1  Alaska   Los.Angeles      62     497
## 2  Alaska       Phoenix      12     221
## 3  Alaska     San.Diego      20     212
## 4  Alaska San.Francisco     102     503
## 5  Alaska       Seattle     305    1841
## 6 AM West   Los.Angeles     117     694

Analyzing the Data

Flights = Flights %>%
  mutate(Delay_Rate = delayed/(on_time + delayed))

Flights = Flights %>%
  mutate(On_Time_Rate = 1 - Delay_Rate)
head(Flights)
##   Airline   Destination delayed on_time Delay_Rate On_Time_Rate
## 1  Alaska   Los.Angeles      62     497 0.11091234    0.8890877
## 2  Alaska       Phoenix      12     221 0.05150215    0.9484979
## 3  Alaska     San.Diego      20     212 0.08620690    0.9137931
## 4  Alaska San.Francisco     102     503 0.16859504    0.8314050
## 5  Alaska       Seattle     305    1841 0.14212488    0.8578751
## 6 AM West   Los.Angeles     117     694 0.14426634    0.8557337

Now we will calculate the mean and median delay rates for each airline.

Alaska = Flights[Flights$Airline == 'Alaska',]
AM_West = Flights[Flights$Airline == 'AM West',]
Airlines = c('AM West','Alaska')
Mean_DRate = c(mean(AM_West$Delay_Rate),mean(Alaska$Delay_Rate))
Mean_ORate = c(mean(AM_West$On_Time_Rate),mean(Alaska$On_Time_Rate))
Median_DRate = c(median(AM_West$Delay_Rate),median(Alaska$Delay_Rate))
Median_ORate = c(median(AM_West$On_Time_Rate),median(Alaska$On_Time_Rate))
Results = data.frame(Airlines,Mean_DRate,Mean_ORate,Median_DRate,Median_ORate)
head(Results)
##   Airlines Mean_DRate Mean_ORate Median_DRate Median_ORate
## 1  AM West  0.1776915  0.8223085    0.1450893    0.8549107
## 2   Alaska  0.1118683  0.8881317    0.1109123    0.8890877

Looking at these results we can see that AM West has a lower On Time Rate and a higher Delay Rate. This observation can be seen in both median and mean of the rates for all destinations. In conclusion, Alaska is on time more often and is therefore a better Airline in that regard.