(1) Create a .CSV file that includes all of the information above.

The CSV file is saved in the Github.

(2) Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data.

library(tidyr)

## Warning: package 'tidyr' was built under R version 3.2.5

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.2.5

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

require(knitr)

## Loading required package: knitr

## Warning: package 'knitr' was built under R version 3.2.5

data <- read.csv("https://raw.githubusercontent.com/xkong100/IS607/master/Assignment-5/Airline.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = c("", "NA"))
kable(head(data))

		Los Angeles	Phonexi	San Diego	San Francisco	Seattle
ALASKA	on time	497	221	21	503	1841
NA	delayed	62	12	20	102	305
NA	NA	NA	NA	NA	NA	NA
AM WEST	on time	694	4840	383	320	201
NA	delayed	117	415	65	129	61

remove the NA in the 3rd row

data <- data.frame(data[-3, ])
kable(head(data))

	Var.1	Var.2	Los.Angeles	Phonexi	San.Diego	San.Francisco	Seattle
1	ALASKA	on time	497	221	21	503	1841
2	NA	delayed	62	12	20	102	305
4	AM WEST	on time	694	4840	383	320	201
5	NA	delayed	117	415	65	129	61

Give name for rows and columns and replace the rows which still have NA

colnames(data)[1]="Airline"
colnames(data)[2]="Status"
data[2,1]="ALASKA"
data[4,1]="AM WEST"
kable(head(data))

	Airline	Status	Los.Angeles	Phonexi	San.Diego	San.Francisco	Seattle
1	ALASKA	on time	497	221	21	503	1841
2	ALASKA	delayed	62	12	20	102	305
4	AM WEST	on time	694	4840	383	320	201
5	AM WEST	delayed	117	415	65	129	61

Now, we can use “gather” of tidyr to gather the rows and columns

tdata <- gather(data, "Destination", "Number_of_time", 3:7)
kable(head(tdata))

Airline	Status	Destination	Number_of_time
ALASKA	on time	Los.Angeles	497
ALASKA	delayed	Los.Angeles	62
AM WEST	on time	Los.Angeles	694
AM WEST	delayed	Los.Angeles	117
ALASKA	on time	Phonexi	221
ALASKA	delayed	Phonexi	12

spread the “on time” and “delay”

tdata1 <- spread(tdata, key= Status, value = Number_of_time)
colnames(tdata1)[4]="ontime"
kable(head(tdata1))

Airline	Destination	delayed	ontime
ALASKA	Los.Angeles	62	497
ALASKA	Phonexi	12	221
ALASKA	San.Diego	20	21
ALASKA	San.Francisco	102	503
ALASKA	Seattle	305	1841
AM WEST	Los.Angeles	117	694

(3) Perform analysis to compare the arrival delays for the two airlines.

Convert delayed and ontime to numerical value so that it is easy to calculate.

tdata1 <- within(tdata1, {delayed <- as.numeric(as.character(delayed))
ontime <-as.numeric(as.character(ontime))})

# Use "mutate" to preserve the old list but add new columns of "total" and "PercentDelayed"

tdata1<-mutate(tdata1, Total= delayed + ontime, PercentDelayed = delayed / Total*100)
tdata1 <- arrange(tdata1, PercentDelayed)
kable(tdata1)

Airline	Destination	delayed	ontime	Total	PercentDelayed
ALASKA	Phonexi	12	221	233	5.150215
AM WEST	Phonexi	415	4840	5255	7.897241
ALASKA	Los.Angeles	62	497	559	11.091234
ALASKA	Seattle	305	1841	2146	14.212488
AM WEST	Los.Angeles	117	694	811	14.426634
AM WEST	San.Diego	65	383	448	14.508929
ALASKA	San.Francisco	102	503	605	16.859504
AM WEST	Seattle	61	201	262	23.282443
AM WEST	San.Francisco	129	320	449	28.730512
ALASKA	San.Diego	20	21	41	48.780488

group and find the average delayed percent by Airlines

delays <- tdata1 %>% group_by(Airline) %>% summarise(MeanPercent= round(mean(PercentDelayed, 0)))
delays

## # A tibble: 2 × 2
##   Airline MeanPercent
##     <chr>       <dbl>
## 1  ALASKA          19
## 2 AM WEST          18

Assignment 5

Vivian Kong

September 30, 2016

(1) Create a .CSV file that includes all of the information above.

(2) Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data.

remove the NA in the 3rd row

Give name for rows and columns and replace the rows which still have NA

Now, we can use “gather” of tidyr to gather the rows and columns

spread the “on time” and “delay”

(3) Perform analysis to compare the arrival delays for the two airlines.

Convert delayed and ontime to numerical value so that it is easy to calculate.

group and find the average delayed percent by Airlines

Alaska is delayed 19% and AM WEST is 18% of the time.