Using the csv data of arrival times in the Alaska and AM West airlines, I will transform from it into a tidy table and perform analysis.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
To facilitate the change from a wide to a tidy table, I added missing data and renamed the first column.
colnames(plane_data)[1] = "Airline"
plane_data$Airline[2] = "ALASKA"
plane_data$Airline[4] = "AM WEST"
I first moved the City Column names into a column of itβs own and then in plane data 2 move the delayed and on time objects into columns.
plane_data1 = gather(plane_data,"City","n",3:7)
plane_data2 = spread(plane_data1,X,n)
I realized that if I wanted to perform statistical analysis, I would have to make sure that columns labeled on time and delayed were numerical. Below I checked the class of the columns, removed the comma in one of the objects and transformed the columns from character to numeric.
as.data.frame(sapply(plane_data2,class))
## sapply(plane_data2, class)
## Airline character
## City character
## delayed character
## on time character
plane_data2$'on time' = as.character(gsub("\\,","",plane_data2$'on time'))
plane_data2$'on time' = as.numeric(plane_data2$'on time')
plane_data2$'delayed' = as.numeric(plane_data2$'delayed')
plane_data2 %>% group_by(Airline) %>%
summarise(mean = mean(delayed), sum = sum(delayed))
## # A tibble: 2 x 3
## Airline mean sum
## <chr> <dbl> <dbl>
## 1 ALASKA 100. 501
## 2 AM WEST 157. 787
plane_data2 %>% group_by(City) %>%
summarise(mean = mean(delayed), sum = sum(delayed))
## # A tibble: 5 x 3
## City mean sum
## <chr> <dbl> <dbl>
## 1 Los.Angeles 89.5 179
## 2 Phoenix 214. 427
## 3 San.Diego 42.5 85
## 4 San.Francisco 116. 231
## 5 Seattle 183 366
AM West had significantly higher amount of delays in the flights. I also noticed that in all of the cities, Phoenix had the most delays. In a future analysis, I would be interested in seeing the city delay times broken down by airlines. I would also want to know the dates and times so that we could further examine the source of delays.