Introduction

Using the csv data of arrival times in the Alaska and AM West airlines, I will transform from it into a tidy table and perform analysis.

Download Library

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Download Data

Clean Data

To facilitate the change from a wide to a tidy table, I added missing data and renamed the first column.

colnames(plane_data)[1] = "Airline"
plane_data$Airline[2] = "ALASKA"
plane_data$Airline[4] = "AM WEST"

Change Table Format

I first moved the City Column names into a column of it’s own and then in plane data 2 move the delayed and on time objects into columns.

plane_data1 = gather(plane_data,"City","n",3:7)

plane_data2 = spread(plane_data1,X,n)

Further Transformation

I realized that if I wanted to perform statistical analysis, I would have to make sure that columns labeled on time and delayed were numerical. Below I checked the class of the columns, removed the comma in one of the objects and transformed the columns from character to numeric.

as.data.frame(sapply(plane_data2,class))
##         sapply(plane_data2, class)
## Airline                  character
## City                     character
## delayed                  character
## on time                  character
plane_data2$'on time' = as.character(gsub("\\,","",plane_data2$'on time'))
plane_data2$'on time' = as.numeric(plane_data2$'on time')

plane_data2$'delayed' = as.numeric(plane_data2$'delayed')

Summary

plane_data2 %>% group_by(Airline) %>%
summarise(mean = mean(delayed), sum = sum(delayed))
## # A tibble: 2 x 3
##   Airline  mean   sum
##   <chr>   <dbl> <dbl>
## 1 ALASKA   100.   501
## 2 AM WEST  157.   787
plane_data2 %>% group_by(City) %>%
summarise(mean = mean(delayed), sum = sum(delayed)) 
## # A tibble: 5 x 3
##   City           mean   sum
##   <chr>         <dbl> <dbl>
## 1 Los.Angeles    89.5   179
## 2 Phoenix       214.    427
## 3 San.Diego      42.5    85
## 4 San.Francisco 116.    231
## 5 Seattle       183     366

Conclusion

AM West had significantly higher amount of delays in the flights. I also noticed that in all of the cities, Phoenix had the most delays. In a future analysis, I would be interested in seeing the city delay times broken down by airlines. I would also want to know the dates and times so that we could further examine the source of delays.