July 26, 2015

Select - UniqueCarrier and DepDelay columns and filter for delays over 60 minutes

## Source: local data frame [10,242 x 2]
## 
##    UniqueCarrier DepDelay
## 1             AA       90
## 2             AA       67
## 3             AA       74
## 4             AA      125
## 5             AA       82
## 6             AA       99
## 7             AA       70
## 8             AA       61
## 9             AA       74
## 10            AS       73
## ..           ...      ...

Add new columns with transmutate(), create new variables that are functions of existing variables

## Source: local data frame [227,496 x 3]
## 
##      Dep_date Time_makeup Air_speed
## 1  2011-01-01         -10  336.0000
## 2  2011-01-02         -10  298.6667
## 3  2011-01-03           0  280.0000
## 4  2011-01-04           0  344.6154
## 5  2011-01-05          -8  305.4545
## 6  2011-01-06          -6  298.6667
## 7  2011-01-07           0  312.5581
## 8  2011-01-08         -11  336.0000
## 9  2011-01-09           1  327.8049
## 10 2011-01-10           0  298.6667
## ..        ...         ...       ...

Group - for each destination, show the number of cancelled and not cancelled flights

##      Cancelled
## Dest     0  1
##   ABQ 2787 25
##   AEX  712 12
##   AGS    1  0
##   AMA 1265 32
##   ANC  125  0
##   ASE  120  5

Summarise - For each day of the week, count the total number of flights from the two Houston airports

## Source: local data frame [14 x 3]
## Groups: Origin
## 
##    Origin DayOfWeek flight_count
## 1     HOU         2         8004
## 2     HOU         3         7967
## 3     HOU         5         7952
## 4     HOU         4         7928
## 5     HOU         1         7913
## 6     HOU         7         6843
## 7     HOU         6         5692
## 8     IAH         5        27020
## 9     IAH         4        26974
## 10    IAH         1        26447
## 11    IAH         7        25215
## 12    IAH         3        23959
## 13    IAH         2        23645
## 14    IAH         6        21937

Advanced grouping and filtering. For each carrier, calculate which two days of the year they had their longest departure delays

## Source: local data frame [30 x 4]
## Groups: UniqueCarrier
## 
##    UniqueCarrier Month DayofMonth DepDelay
## 1             AA    12         12      970
## 2             AA    11         19      677
## 3             AS     2         28      172
## 4             AS     7          6      138
## 5             B6    10         29      310
## 6             B6     8         19      283
## 7             CO     8          1      981
## 8             CO     1         20      780
## 9             DL    10         25      730
## 10            DL     4          5      497
## ..           ...   ...        ...      ...

Using ggplot package, here is a bar chart showing the carriers and the number of flights each of them operated

## Loading required package: ggplot2

More complex Chart

Histogram