The chart above describes arrival delays for two airlines across five destinations. Your task is to: (1) Create a .CSV file (or optionally, a MySQL database!) that includes all of the information above. You’re encouraged to use a “wide” structure similar to how the information appears above, so that you can practice tidying and transformations as described below. (2) Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. (3) Perform analysis to compare the arrival delays for the two airlines. (4) Your code should be in an R Markdown file, posted to rpubs.com, and should include narrative descriptions of your data cleanup work, analysis, and conclusions. Please include in your homework submission: ??? The URL to the .Rmd file in your GitHub repository. and ??? The URL for your rpubs.com web page

Load Library

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

1)Create CSV of Data

2)Loading data into R

write.table(csv, file = "Week5FY.csv", sep = ",", col.names=F, row.names=F)
Rawwide_data <- read.csv(paste0("https://raw.githubusercontent.com/Fyoun123/Data607/master/Week%205/Week5FY.csv"),stringsAsFactors = F); Rawwide_data
##         X     X.1 Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  ALASKA On Time         497     221       212           503    1841
## 2  ALASKA Delayed          62      12        20           102     305
## 3 AM WEST On Time         694    4840       383           320     201
## 4 AM WEST Delayed         117     415        65           129      61

Using Gather to create Key-Value Pairs:

LongConversion <- gather(Rawwide_data,city,number,Los.Angeles:Seattle, factor_key = TRUE)
LongConversion
##          X     X.1          city number
## 1   ALASKA On Time   Los.Angeles    497
## 2   ALASKA Delayed   Los.Angeles     62
## 3  AM WEST On Time   Los.Angeles    694
## 4  AM WEST Delayed   Los.Angeles    117
## 5   ALASKA On Time       Phoenix    221
## 6   ALASKA Delayed       Phoenix     12
## 7  AM WEST On Time       Phoenix   4840
## 8  AM WEST Delayed       Phoenix    415
## 9   ALASKA On Time     San.Diego    212
## 10  ALASKA Delayed     San.Diego     20
## 11 AM WEST On Time     San.Diego    383
## 12 AM WEST Delayed     San.Diego     65
## 13  ALASKA On Time San.Francisco    503
## 14  ALASKA Delayed San.Francisco    102
## 15 AM WEST On Time San.Francisco    320
## 16 AM WEST Delayed San.Francisco    129
## 17  ALASKA On Time       Seattle   1841
## 18  ALASKA Delayed       Seattle    305
## 19 AM WEST On Time       Seattle    201
## 20 AM WEST Delayed       Seattle     61

Renaming the Columns to be more approriate:

colnames(LongConversion)[2] <- "Ontime/Delay"
colnames(LongConversion)[1] <- "Vendor"

Spread allows us to take two columns and create to columns, In this case breaking up out Ontime/Delay column in liue of the flight info.

LongConversion$Vendor <- factor(LongConversion$Vendor)
LongConversion2 <- spread(LongConversion,"Ontime/Delay","number")
LongConversion2
##     Vendor          city Delayed On Time
## 1   ALASKA   Los.Angeles      62     497
## 2   ALASKA       Phoenix      12     221
## 3   ALASKA     San.Diego      20     212
## 4   ALASKA San.Francisco     102     503
## 5   ALASKA       Seattle     305    1841
## 6  AM WEST   Los.Angeles     117     694
## 7  AM WEST       Phoenix     415    4840
## 8  AM WEST     San.Diego      65     383
## 9  AM WEST San.Francisco     129     320
## 10 AM WEST       Seattle      61     201

We use the Diplyr Fucniton Mutate to add new variables taht are functions of existing varaibles, in this case calcullating intitally the total number of flights. Group_by allows us to utilize our vendor as the driver when coupled with summarise. We can then simply divide the on time flights by the total flights.

Performace <- LongConversion2 %>% mutate(total = Delayed + `On Time`) %>% group_by(Vendor) %>% summarise(DT = sum(Delayed), OT = sum(`On Time`), Total = sum(total), OntimeP = round(OT/Total*100,digits = 2))
Performace
## # A tibble: 2 x 5
##   Vendor     DT    OT Total OntimeP
##   <fct>   <int> <int> <int>   <dbl>
## 1 ALASKA    501  3274  3775    86.7
## 2 AM WEST   787  6438  7225    89.1

Same synthax as above, this time utizlizing the city as our groupping rather than our airline:

Performace2 <- LongConversion2 %>% mutate(total = Delayed + `On Time`) %>% group_by(city) %>% summarise(DT = sum(Delayed), OT = sum(`On Time`), Total = sum(total), OntimeP = round(OT/Total*100,digits = 2))
Performace2
## # A tibble: 5 x 5
##   city             DT    OT Total OntimeP
##   <fct>         <int> <int> <int>   <dbl>
## 1 Los.Angeles     179  1191  1370    86.9
## 2 Phoenix         427  5061  5488    92.2
## 3 San.Diego        85   595   680    87.5
## 4 San.Francisco   231   823  1054    78.1
## 5 Seattle         366  2042  2408    84.8