Read in the data
setwd("C:\\Users\\soh1\\Box Sync\\CUNY\\Fall_2017\\Data 607\\Week5")
AirlineRaw <- read.csv("AirlineTime.csv")
Gather City into one column. And then spread out delay vs on time on separate columns
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyselect)
##
## Attaching package: 'tidyselect'
## The following objects are masked from 'package:dplyr':
##
## contains, ends_with, everything, matches, num_range, one_of,
## starts_with
AirlineTrans <- AirlineRaw %>%
gather(City, Frequency, 3:7) %>%
spread(Time, Frequency)
Let’s also summarize by Airlines
AirlineSum <- AirlineTrans %>%
group_by(Airplane) %>%
summarise(TotalDelay = sum(delayed))
Now let’s use ggplot to compare the airlines performance
library(ggplot2)
ggplot(AirlineSum) + aes(Airplane,TotalDelay) + geom_bar(stat="identity", position=position_dodge())
AM West is doing much worse than Alaska overall.
Let’s take a look at how airlines perform by the city.
ggplot(AirlineTrans) + aes(City,delayed, fill=Airplane) + geom_bar(stat="identity", position=position_dodge())
AM West does terribly in Phoenix and Alaska does terribly in Seattle.