Assignment:-
The chart above describes arrival delays for two airlines across five destinations. Your task is to:
(1) Create a .CSV file (or optionally, a MySQL database!) that includes all of the information above. You’re encouraged to use a “wide” structure similar to how the information appears above, so that you can practice tidying and transformations as described below.
(2) Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data.
(3) Perform analysis to compare the arrival delays for the two airlines

  1. Create a .CSV file (or optionally, a MySQL database!) that includes all of the information above. You’re encouraged to use a “wide” structure similar to how the information appears above, so that you can practice tidying and transformations as described below
data <- rbind(c(NA, NA, "Los Angeles", "Phoenix", "San Diego", "San Francisco", "Seattle"),
             c("ALASKA", "On Time", 497, 221, 212, 503, 1841),
             c(NA, "Delayed", 62, 12, 20, 102, 305),
             c("AM WEST", "On Time", 694, 4840, 383, 320, 201),
             c(NA, "Delayed", 117, 415, 65, 129, 61))

write.table(data, file = "/Users/ashishsm1986/git/Cuny-Assignments/week5-assignment/Assignment5.csv", sep = ",", col.names=F, row.names=F)   #Populate the data in a .csv file
  1. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data.
wide_data <- read.csv(paste0("/Users/ashishsm1986/git/Cuny-Assignments/week5-assignment/Assignment5.csv"), stringsAsFactors = F); wide_data
##       NA.   NA..1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  ALASKA On Time         497     221       212           503    1841
## 2    <NA> Delayed          62      12        20           102     305
## 3 AM WEST On Time         694    4840       383           320     201
## 4    <NA> Delayed         117     415        65           129      61
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
long_data <- wide_data %>% #Data is converted into a “long” structure using tidyr and dplyr.
mutate(AL_Airline = NA., AM_Airline=lag(NA.)) %>% #Mutate function is used to duplicate the first column twice
mutate(Airline = coalesce(AL_Airline,AM_Airline), Status = NA..1)  %>% #Combine the new columns into one column without null values.
gather("Destination", "Flight", 3:7) %>% #Transform the data from its crosstab appearance to a more normalized format.
select(Airline:Flight) %>% #Query the columns we want in the final structure
arrange(Airline, desc(Status), Destination); #Split the data into the two groups that will be compared.
long_data
##    Airline  Status   Destination Flight
## 1   ALASKA On Time   Los.Angeles    497
## 2   ALASKA On Time       Phoenix    221
## 3   ALASKA On Time     San.Diego    212
## 4   ALASKA On Time San.Francisco    503
## 5   ALASKA On Time       Seattle   1841
## 6   ALASKA Delayed   Los.Angeles     62
## 7   ALASKA Delayed       Phoenix     12
## 8   ALASKA Delayed     San.Diego     20
## 9   ALASKA Delayed San.Francisco    102
## 10  ALASKA Delayed       Seattle    305
## 11 AM WEST On Time   Los.Angeles    694
## 12 AM WEST On Time       Phoenix   4840
## 13 AM WEST On Time     San.Diego    383
## 14 AM WEST On Time San.Francisco    320
## 15 AM WEST On Time       Seattle    201
## 16 AM WEST Delayed   Los.Angeles    117
## 17 AM WEST Delayed       Phoenix    415
## 18 AM WEST Delayed     San.Diego     65
## 19 AM WEST Delayed San.Francisco    129
## 20 AM WEST Delayed       Seattle     61
  1. Perform analysis to compare the arrival delays for the two airlines.
  Performance <- long_data %>%
  group_by(Airline) %>%
  mutate(Arrival_Tot = sum(Flight)) %>% 
  group_by(Airline, Status) %>%
  mutate(Sum_Tot = sum(Flight), Perf_percntage = Sum_Tot / Arrival_Tot) %>% group_by(Airline, Destination) %>%
  mutate(D_Tot = sum(Flight), D_Perf = Flight / D_Tot)
  
# Get Airlines Overall Performance
data.frame(Performance[c(1,10,11,20), c(1,2,7)])
##   Airline  Status Perf_percntage
## 1  ALASKA On Time      0.8672848
## 2  ALASKA Delayed      0.1327152
## 3 AM WEST On Time      0.8910727
## 4 AM WEST Delayed      0.1089273

Analysys Results:-

Alaska Airlines were on-time 86.72% of the time and delayed 13.28% of the times.
AM West Airlines were on-time 89.11% of the time and delayed 10.89% of the times.