Part 1: Data Setup

Reading the Delay Data into R

Get the number of rows and columns (dimensions) of the imported data airDelay.df

## [1] 1048575      10

Attach the dataframe airDelay.df

Part 2: Descriptive Statistics

Measure the mean and standard deviation of the DepDelayMinutes in the dataframe airDelay.df

## [1] 11.33995
## [1] 41.05392

Find the minimun, maximum & Range of DepDelayMinutes in the dataframe airDelay.df

## [1] 0
## [1] 2109
## [1]    0 2109

Display a summary of the variable DepDelayMinutes in the dataframe airDelay.df

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00   11.34    5.00 2109.00

Display a summary of the whole dataframe airDelay.df

##        X            FlightNumber     DepDelayMinutes   DepDelay  
##  Min.   :      1   511    :    575   Min.   :   0.00   0:877147  
##  1st Qu.: 264329   34     :    564   1st Qu.:   0.00   1:171428  
##  Median : 528996   352    :    551   Median :   0.00             
##  Mean   : 529263   423    :    525   Mean   :  11.34             
##  3rd Qu.: 793788   566    :    509   3rd Qu.:   5.00             
##  Max.   :1059843   1905   :    509   Max.   :2109.00             
##                    (Other):1045342                               
##     Duration        Distance      DepTime         DepDay      
##  Min.   : 21.0   Min.   :  31.0   AM:438993   Weekday:788004  
##  1st Qu.: 88.0   1st Qu.: 358.0   PM:609582   Weekend:260571  
##  Median :120.0   Median : 628.0                               
##  Mean   :140.1   Mean   : 796.8                               
##  3rd Qu.:170.0   3rd Qu.:1024.0                               
##  Max.   :675.0   Max.   :4983.0                               
##                                                               
##       Airline         OriginStateName  
##  American :152027   California:110633  
##  Delta    :157957   Texas     :109364  
##  Others   :569196   Florida   : 80479  
##  Southwest:169395   Georgia   : 64433  
##                     Illinois  : 63093  
##                     New York  : 60224  
##                     (Other)   :560349

Display descriptives of the whole dataframe airDelay.df

##                 vars       n      mean        sd min     max   range     se
## X                  1 1048575 529263.23 305867.50   1 1059843 1059842 298.70
## FlightNumber       2 1048575       NaN        NA Inf    -Inf    -Inf     NA
## DepDelayMinutes    3 1048575     11.34     41.05   0    2109    2109   0.04
## DepDelay           4 1048575       NaN        NA Inf    -Inf    -Inf     NA
## Duration           5 1048575    140.06     73.11  21     675     654   0.07
## Distance           6 1048575    796.78    599.01  31    4983    4952   0.58
## DepTime            7 1048575       NaN        NA Inf    -Inf    -Inf     NA
## DepDay             8 1048575       NaN        NA Inf    -Inf    -Inf     NA
## Airline            9 1048575       NaN        NA Inf    -Inf    -Inf     NA
## OriginStateName   10 1048575       NaN        NA Inf    -Inf    -Inf     NA

Display means of the perticular variables broken down by Airline in the dataframe airDelay.df using aggregate()

##     Group.1 DepDelayMinutes Duration Distance
## 1  American       12.102929 166.0034 992.1403
## 2     Delta        8.057687 147.7035 857.5109
## 3    Others       12.388894 134.8432 743.1459
## 4 Southwest       10.191210 127.1974 745.0165

Part 3: Frequency & Contingency Tables

Create a frequency table for the variables Airline data frame airDelay.df

## Airline
##  American     Delta    Others Southwest 
##    152027    157957    569196    169395

Create a proportion table for the variables Airline data frame airDelay.df

## Airline
##  American     Delta    Others Southwest 
##     14.50     15.06     54.28     16.15

Create a contingency table for the variables Airline & DepDay in the data frame airDelay.df

##            DepDay
## Airline     Weekday Weekend   Sum
##   American    10.85    3.65 14.50
##   Delta       11.47    3.59 15.06
##   Others      40.56   13.72 54.28
##   Southwest   12.26    3.89 16.15
##   Sum         75.14   24.85 99.99

Create a contingency table for the variables Airline & DepDelay in the data frame airDelay.df

##            DepDelay
## Airline          0      1    Sum
##   American   12.07   2.43  14.50
##   Delta      13.18   1.89  15.07
##   Others     45.33   8.95  54.28
##   Southwest  13.07   3.09  16.16
##   Sum        83.65  16.36 100.01

Create a 3-way table for the variables Airline & DepDelay & DepDay in the data frame airDelay.df

## , , DepDay = Weekday
## 
##            DepDelay
## Airline         0     1   Sum
##   American   9.01  1.84 10.85
##   Delta      9.92  1.55 11.47
##   Others    33.62  6.94 40.56
##   Southwest  9.89  2.37 12.26
##   Sum       62.44 12.70 75.14
## 
## , , DepDay = Weekend
## 
##            DepDelay
## Airline         0     1   Sum
##   American   3.06  0.59  3.65
##   Delta      3.26  0.33  3.59
##   Others    11.71  2.01 13.72
##   Southwest  3.18  0.71  3.89
##   Sum       21.21  3.64 24.85
## 
## , , DepDay = Sum
## 
##            DepDelay
## Airline         0     1   Sum
##   American  12.07  2.43 14.50
##   Delta     13.18  1.88 15.06
##   Others    45.33  8.95 54.28
##   Southwest 13.07  3.08 16.15
##   Sum       83.65 16.34 99.99

Part 4: Visualizing Data