Tidy Practice

Read in the data

setwd("C:\\Users\\soh1\\Box Sync\\CUNY\\Fall_2017\\Data 607\\Week5")
AirlineRaw <- read.csv("AirlineTime.csv")

Gather City into one column. And then spread out delay vs on time on separate columns

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyselect)
## 
## Attaching package: 'tidyselect'
## The following objects are masked from 'package:dplyr':
## 
##     contains, ends_with, everything, matches, num_range, one_of,
##     starts_with
AirlineTrans <- AirlineRaw %>%
  gather(City, Frequency, 3:7) %>%
  spread(Time, Frequency)

Let’s also summarize by Airlines

AirlineSum <- AirlineTrans %>%
  group_by(Airplane) %>%
  summarise(TotalDelay = sum(delayed))

Now let’s use ggplot to compare the airlines performance

library(ggplot2)
ggplot(AirlineSum) + aes(Airplane,TotalDelay) + geom_bar(stat="identity", position=position_dodge())

AM West is doing much worse than Alaska overall.

Let’s take a look at how airlines perform by the city.

ggplot(AirlineTrans) + aes(City,delayed, fill=Airplane) + geom_bar(stat="identity", position=position_dodge())

AM West does terribly in Phoenix and Alaska does terribly in Seattle.