Details and Goal of Assignment:

  1. Create a .CSV file (or optionally, a MySQL database!) that includes all of the information above. You’re encouraged to use a “wide” structure similar to how the information appears above, so that you can practice tidying and transformations as described below.
  2. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data.
  3. Perform analysis to compare the arrival delays for the two airlines.
#Let's begin by first checking if the package is installed, and installing packages if not
if(!require('tidyr')) {
  install.packages('tidyr')
  library(tidyr)
}
## Loading required package: tidyr
if(!require('dplyr')) {
  install.packages('dplyr')
  library(tidyr)
}
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringr)
delay_csv <- rbind(c("Airline", "Status", "Los Angeles", "Phoenix", "San Diego", "San Francisco", "Seattle"),
             c("ALASKA", "On Time", 497, 221, 212, 503, 1841),
             c("ALASKA", "Delayed", 62, 12, 20, 102, 305),
             c("AM WEST", "On Time", 694, 4840, 383, 320, 201),
             c("AM WEST", "Delayed", 117, 415, 65, 129, 61))
delay_csv
##      [,1]      [,2]      [,3]          [,4]      [,5]       
## [1,] "Airline" "Status"  "Los Angeles" "Phoenix" "San Diego"
## [2,] "ALASKA"  "On Time" "497"         "221"     "212"      
## [3,] "ALASKA"  "Delayed" "62"          "12"      "20"       
## [4,] "AM WEST" "On Time" "694"         "4840"    "383"      
## [5,] "AM WEST" "Delayed" "117"         "415"     "65"       
##      [,6]            [,7]     
## [1,] "San Francisco" "Seattle"
## [2,] "503"           "1841"   
## [3,] "102"           "305"    
## [4,] "320"           "201"    
## [5,] "129"           "61"
write.table(delay_csv, file = "Airline.csv", sep = ",", col.names=F, row.names=F)
#Next, let's read data from the file
delay_csv <- read.csv(paste0("Airline.csv"), stringsAsFactors = F);
delay_csv[,2] <- sapply(delay_csv[,2], str_replace, " ", "_")
delay_csv
##   Airline  Status Los.Angeles Phoenix San.Diego San.Francisco Seattle
## 1  ALASKA On_Time         497     221       212           503    1841
## 2  ALASKA Delayed          62      12        20           102     305
## 3 AM WEST On_Time         694    4840       383           320     201
## 4 AM WEST Delayed         117     415        65           129      61
#Gather function can be used to move data from the wide format to the tall format
delay_csv2 <- gather(delay_csv, "City", "Count", 3:7)
delay_csv2
##    Airline  Status          City Count
## 1   ALASKA On_Time   Los.Angeles   497
## 2   ALASKA Delayed   Los.Angeles    62
## 3  AM WEST On_Time   Los.Angeles   694
## 4  AM WEST Delayed   Los.Angeles   117
## 5   ALASKA On_Time       Phoenix   221
## 6   ALASKA Delayed       Phoenix    12
## 7  AM WEST On_Time       Phoenix  4840
## 8  AM WEST Delayed       Phoenix   415
## 9   ALASKA On_Time     San.Diego   212
## 10  ALASKA Delayed     San.Diego    20
## 11 AM WEST On_Time     San.Diego   383
## 12 AM WEST Delayed     San.Diego    65
## 13  ALASKA On_Time San.Francisco   503
## 14  ALASKA Delayed San.Francisco   102
## 15 AM WEST On_Time San.Francisco   320
## 16 AM WEST Delayed San.Francisco   129
## 17  ALASKA On_Time       Seattle  1841
## 18  ALASKA Delayed       Seattle   305
## 19 AM WEST On_Time       Seattle   201
## 20 AM WEST Delayed       Seattle    61
#In order to move the status into its own variable, use the spread function
delay_csv2 <- spread(delay_csv2, Status, Count)
delay_csv2
##    Airline          City Delayed On_Time
## 1   ALASKA   Los.Angeles      62     497
## 2   ALASKA       Phoenix      12     221
## 3   ALASKA     San.Diego      20     212
## 4   ALASKA San.Francisco     102     503
## 5   ALASKA       Seattle     305    1841
## 6  AM WEST   Los.Angeles     117     694
## 7  AM WEST       Phoenix     415    4840
## 8  AM WEST     San.Diego      65     383
## 9  AM WEST San.Francisco     129     320
## 10 AM WEST       Seattle      61     201
#Analyzing delays for both airlines, by airlines: 

delay_csv2 %>% group_by(Airline) %>%
summarise(
TotDelay=sum(Delayed),TotOn_Time=sum(On_Time), 
PctDelayed=(TotDelay/(TotDelay+TotOn_Time)*100),
PctOnTime=(TotOn_Time/(TotDelay+TotOn_Time)*100)
)
## # A tibble: 2 x 5
##   Airline TotDelay TotOn_Time PctDelayed PctOnTime
##   <chr>      <int>      <int>      <dbl>     <dbl>
## 1 ALASKA       501       3274       13.3      86.7
## 2 AM WEST      787       6438       10.9      89.1
#Analyzing delays for both airlines, by city:
delay_csv2 %>% group_by(City) %>%
summarise(
TotDelay=sum(Delayed),TotOn_Time=sum(On_Time), 
PctDelayed=(TotDelay/(TotDelay+TotOn_Time)*100),
PctOnTime=(TotOn_Time/(TotDelay+TotOn_Time)*100)
)
## # A tibble: 5 x 5
##   City          TotDelay TotOn_Time PctDelayed PctOnTime
##   <chr>            <int>      <int>      <dbl>     <dbl>
## 1 Los.Angeles        179       1191      13.1       86.9
## 2 Phoenix            427       5061       7.78      92.2
## 3 San.Diego           85        595      12.5       87.5
## 4 San.Francisco      231        823      21.9       78.1
## 5 Seattle            366       2042      15.2       84.8

Overall, AM West airline has better on time performance compared to ALASKA airline

Of the cities compared, Phoenix seems to be the best on time while San Francisco shows a higher amount of delays