This is R Bridage Final project. Data are a subset of hflights. Visualizing the airline arrival delay and depature delay by carriers, and exploring the relationships delays with other factors.
library(hflights)
library(ggplot2)
Subsetting data and displaying descriptive summary statistics
sub.delay <- hflights[hflights$Month == 12,
c("ArrDelay", "DepDelay", "UniqueCarrier", "DayOfWeek",
"AirTime", "Year", "Month", "DepTime", "Distance")]
str(sub.delay)
## 'data.frame': 19117 obs. of 9 variables:
## $ ArrDelay : int 47 -2 -17 53 -23 -6 -12 -12 -12 -7 ...
## $ DepDelay : int 63 -6 -3 58 -2 15 11 5 3 -3 ...
## $ UniqueCarrier: chr "AA" "AA" "AA" "AA" ...
## $ DayOfWeek : int 4 5 7 1 2 3 4 5 1 2 ...
## $ AirTime : int 44 39 46 54 41 43 39 36 43 53 ...
## $ Year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
## $ Month : int 12 12 12 12 12 12 12 12 12 12 ...
## $ DepTime : int 2113 2004 2007 2108 2008 2025 2021 2015 2013 2007 ...
## $ Distance : int 224 224 224 224 224 224 224 224 224 224 ...
head(sub.delay)
## ArrDelay DepDelay UniqueCarrier DayOfWeek AirTime Year Month
## 5596743 47 63 AA 4 44 2011 12
## 5596744 -2 -6 AA 5 39 2011 12
## 5596745 -17 -3 AA 7 46 2011 12
## 5596746 53 58 AA 1 54 2011 12
## 5596747 -23 -2 AA 2 41 2011 12
## 5596748 -6 15 AA 3 43 2011 12
## DepTime Distance
## 5596743 2113 224
## 5596744 2004 224
## 5596745 2007 224
## 5596746 2108 224
## 5596747 2008 224
## 5596748 2025 224
summary(sub.delay)
## ArrDelay DepDelay UniqueCarrier DayOfWeek
## Min. :-57.000 Min. :-33.00 Length:19117 Min. :1.000
## 1st Qu.:-10.000 1st Qu.: -2.00 Class :character 1st Qu.:2.000
## Median : -2.000 Median : 1.00 Mode :character Median :4.000
## Mean : 5.013 Mean : 10.99 Mean :4.018
## 3rd Qu.: 9.000 3rd Qu.: 11.00 3rd Qu.:6.000
## Max. :978.000 Max. :970.00 Max. :7.000
## NA's :240 NA's :164
## AirTime Year Month DepTime Distance
## Min. : 24.0 Min. :2011 Min. :12 Min. : 1 Min. : 127
## 1st Qu.: 61.0 1st Qu.:2011 1st Qu.:12 1st Qu.:1018 1st Qu.: 427
## Median :107.0 Median :2011 Median :12 Median :1408 Median : 844
## Mean :108.9 Mean :2011 Mean :12 Mean :1390 Mean : 809
## 3rd Qu.:141.0 3rd Qu.:2011 3rd Qu.:12 3rd Qu.:1755 3rd Qu.:1075
## Max. :541.0 Max. :2011 Max. :12 Max. :2359 Max. :3904
## NA's :240 NA's :164
Facet the delays by Day of Week
ggplot(sub.delay, aes(x = as.factor(ArrDelay))) + geom_bar() +
coord_cartesian(xlim = c(0, 200)) +
facet_wrap(~DayOfWeek, nrow = 1) +
xlab("Arrival Delay") +
ylab("frequency")+
labs(title = "Arrival Delay by Day of the Week in Dec. of 2011")
ggplot(sub.delay, aes(x = as.factor(DepDelay))) + geom_bar() +
coord_cartesian(xlim = c(0, 200)) +
facet_wrap(~DayOfWeek, nrow = 1) +
xlab("Departure Delay") +
ylab("frequency")+
labs(title = "Depature Delay by Day of the Week in Dec. of 2011")
Rank carriers by delay frequence and explore relationships between delays and distance.
ggplot(sub.delay[, c("ArrDelay", "UniqueCarrier")],
aes(x=reorder(UniqueCarrier, UniqueCarrier,
function(x)-length(x)))) + geom_bar() +
xlab("Arrival Delay by Carrier") +
ylab("Arrival Delay")+
labs(title = "Arrival Delay by Carriers in Dec. of 2011")
ggplot(sub.delay[, c("DepDelay", "UniqueCarrier")],
aes(x=reorder(UniqueCarrier, UniqueCarrier,
function(x)-length(x)))) + geom_bar() +
xlab("Depature Delay") +
ylab("Departure Delay") +
labs(title = "Depature Delay by Carriers in Dec. of 2011")
ggplot(sub.delay[sub.delay$ArrDelay > 0, c("ArrDelay", "DepDelay")],
aes(x = DepDelay, y = ArrDelay)) + geom_point() +
xlab("Depature Delay") +
ylab("Arrival Delay") +
labs(title = "Depature Delay vs. Arrival Delay in Dec. of 2011")
## Warning: Removed 240 rows containing missing values (geom_point).
ggplot(sub.delay[sub.delay$ArrDelay > 0, c("ArrDelay", "Distance")],
aes(x = Distance, y = ArrDelay)) + geom_point()+
xlab("Distance") +
ylab("Arrival Delay") +
labs(title = "Distance vs. Arrival Delay in Dec. of 2011")
## Warning: Removed 240 rows containing missing values (geom_point).
The observations show: - Delay data have outliers. - Departure delay leads to arrival delay. - Delays are the highest on Friday - The carriers on the top frequency of delay list: XE, CO, WN, and OO. - Distance has no direct impact on delays.