Airline Ticket Pricing Analysis

Sameer Mathur

PART 1

READING AND PREPARING DATA

1a. Reading data

Number of Rows and Columns

# reading data into R
airline.df <- read.csv(paste("AirlinePricingData.csv"))
# rows and columns of the data frame
dim(airline.df)
[1] 305  24

Column Names

# data column names
colnames(airline.df)
 [1] "FlightNumber"        "Airline"             "DepartureCityCode"  
 [4] "ArrivalCityCode"     "DepartureTime"       "ArrivalTime"        
 [7] "Departure"           "FlyingMinutes"       "Aircraft"           
[10] "PlaneModel"          "Capacity"            "SeatPitch"          
[13] "SeatWidth"           "DataCollectionDate"  "DateDeparture"      
[16] "Price"               "AdvancedBookingDays" "IsDiwali"           
[19] "DayBeforeDiwali"     "DayAfterDiwali"      "MetroDeparture"     
[22] "MetroArrival"        "MarketShare"         "LoadFactor"         

1b. Data Types

# data types of the data columns
str(airline.df)
'data.frame':   305 obs. of  24 variables:
 $ FlightNumber       : Factor w/ 63 levels "6E 129","6E 155",..: 25 25 25 25 25 32 32 32 32 32 ...
 $ Airline            : Factor w/ 4 levels "Air India","IndiGo",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ DepartureCityCode  : Factor w/ 2 levels "BOM","DEL": 2 2 2 2 2 1 1 1 1 1 ...
 $ ArrivalCityCode    : Factor w/ 2 levels "BOM","DEL": 1 1 1 1 1 2 2 2 2 2 ...
 $ DepartureTime      : int  225 225 225 225 225 300 300 300 300 300 ...
 $ ArrivalTime        : int  435 435 435 435 435 505 505 505 505 505 ...
 $ Departure          : Factor w/ 2 levels "AM","PM": 1 1 1 1 1 1 1 1 1 1 ...
 $ FlyingMinutes      : int  130 130 130 130 130 125 125 125 125 125 ...
 $ Aircraft           : Factor w/ 2 levels "Airbus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
 $ PlaneModel         : Factor w/ 9 levels "738","739","77W",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Capacity           : int  156 156 156 156 156 156 156 156 156 156 ...
 $ SeatPitch          : int  30 30 30 30 30 30 30 30 30 30 ...
 $ SeatWidth          : num  17 17 17 17 17 17 17 17 17 17 ...
 $ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 2 2 2 2 2 4 4 4 4 4 ...
 $ DateDeparture      : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 12 16 2 1 4 14 18 6 1 2 ...
 $ Price              : int  3453 4251 4681 4051 3553 4628 4628 3604 11587 5678 ...
 $ AdvancedBookingDays: int  2 7 56 54 30 2 7 30 52 54 ...
 $ IsDiwali           : int  0 0 1 1 0 0 0 0 1 1 ...
 $ DayBeforeDiwali    : int  0 0 0 1 0 0 0 0 1 0 ...
 $ DayAfterDiwali     : int  0 0 1 0 0 0 0 0 0 1 ...
 $ MetroDeparture     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MetroArrival       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MarketShare        : num  15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 ...
 $ LoadFactor         : num  83.3 83.3 83.3 83.3 83.3 ...

1c. Conversion of some data types of data columns into factor

# convert 'IsDiwali' into  factor
airline.df$IsDiwali <- as.factor(airline.df$IsDiwali)
# convert 'DayBeforeDiwali' into  factor
airline.df$DayBeforeDiwali <- as.factor(airline.df$DayBeforeDiwali)
# convert 'DayAfterDiwali' into  factor
airline.df$DayAfterDiwali <- as.factor(airline.df$DayAfterDiwali)
# convert 'MetroDeparture' into  factor
airline.df$MetroDeparture <- as.factor(airline.df$MetroDeparture)
# convert 'MetroArrival' into  factor
airline.df$MetroArrival <- as.factor(airline.df$MetroArrival)

# verify
str(airline.df)
'data.frame':   305 obs. of  24 variables:
 $ FlightNumber       : Factor w/ 63 levels "6E 129","6E 155",..: 25 25 25 25 25 32 32 32 32 32 ...
 $ Airline            : Factor w/ 4 levels "Air India","IndiGo",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ DepartureCityCode  : Factor w/ 2 levels "BOM","DEL": 2 2 2 2 2 1 1 1 1 1 ...
 $ ArrivalCityCode    : Factor w/ 2 levels "BOM","DEL": 1 1 1 1 1 2 2 2 2 2 ...
 $ DepartureTime      : int  225 225 225 225 225 300 300 300 300 300 ...
 $ ArrivalTime        : int  435 435 435 435 435 505 505 505 505 505 ...
 $ Departure          : Factor w/ 2 levels "AM","PM": 1 1 1 1 1 1 1 1 1 1 ...
 $ FlyingMinutes      : int  130 130 130 130 130 125 125 125 125 125 ...
 $ Aircraft           : Factor w/ 2 levels "Airbus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
 $ PlaneModel         : Factor w/ 9 levels "738","739","77W",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Capacity           : int  156 156 156 156 156 156 156 156 156 156 ...
 $ SeatPitch          : int  30 30 30 30 30 30 30 30 30 30 ...
 $ SeatWidth          : num  17 17 17 17 17 17 17 17 17 17 ...
 $ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 2 2 2 2 2 4 4 4 4 4 ...
 $ DateDeparture      : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 12 16 2 1 4 14 18 6 1 2 ...
 $ Price              : int  3453 4251 4681 4051 3553 4628 4628 3604 11587 5678 ...
 $ AdvancedBookingDays: int  2 7 56 54 30 2 7 30 52 54 ...
 $ IsDiwali           : Factor w/ 2 levels "0","1": 1 1 2 2 1 1 1 1 2 2 ...
 $ DayBeforeDiwali    : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 2 1 ...
 $ DayAfterDiwali     : Factor w/ 2 levels "0","1": 1 1 2 1 1 1 1 1 1 2 ...
 $ MetroDeparture     : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
 $ MetroArrival       : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
 $ MarketShare        : num  15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 ...
 $ LoadFactor         : num  83.3 83.3 83.3 83.3 83.3 ...

PART 2

DATA SUMMARY

2a. Descriptive statistics of the data frame

# attaching data columns
attach(airline.df)
# descriptive statistics
library(psych)
describe(airline.df)[c(7, 10:12, 15:17, 22, 23), c(2:5, 8:9)]
                      n    mean      sd median    min     max
Departure*          305    1.45    0.50    1.0    1.0     2.0
PlaneModel*         305    3.82    2.71    3.0    1.0     9.0
Capacity            305  176.36   32.39  180.0  138.0   303.0
SeatPitch           305   30.26    0.93   30.0   29.0    33.0
DateDeparture*      305    8.14    6.69    7.0    1.0    20.0
Price               305 5394.54 2388.29 4681.0 2607.0 18015.0
AdvancedBookingDays 305   28.90   22.30   30.0    2.0    61.0
MetroArrival*       305    1.00    0.00    1.0    1.0     1.0
MarketShare         305   21.18   11.04   15.4   13.2    39.6

2b. Number of flights by airlines in Diwali

# flights by airline in Diwali
table(Airline, IsDiwali)
           IsDiwali
Airline      0  1
  Air India 25 16
  IndiGo    48 32
  Jet       87 57
  Spice Jet 24 16

2c. Average ticket price and advanced booking days by airline

# average ticket price by airline
agg0 <- aggregate(list(Price, AdvancedBookingDays), by = list(Airline), mean)
colnames(agg0) <- c("Airline", "AverageTicketPrice", "AdvancedBookingDays")
agg0
    Airline AverageTicketPrice AdvancedBookingDays
1 Air India           6335.000            26.78049
2    IndiGo           4879.525            31.40000
3       Jet           5496.146            28.54167
4 Spice Jet           5094.850            27.40000

2d. Mean and SD of airline ticket price by airline

# average ticket price by airline
agg1 <- aggregate(Price, by = list(Airline), mean)
colnames(agg1) <- c("Airline", "AverageTicketPrice")
#agg1

# sd ticket price by airline
agg2 <- aggregate(Price, by = list(Airline), sd)
colnames(agg2) <- c("Airline", "SD")
#agg2

# average ticket price and standard deviation
agg <- merge(agg1, agg2, by = c("Airline"))
agg
    Airline AverageTicketPrice       SD
1 Air India           6335.000 2693.637
2    IndiGo           4879.525 2956.455
3       Jet           5496.146 1989.659
4 Spice Jet           5094.850 1787.422

PART 3

REGRESSION ANALYSIS

3a. Linear OLS Model

# linear OLS model
fitLinOLSModel <- lm(Price ~ 
                     IsDiwali 
                   + Airline 
                   + AdvancedBookingDays 
                   + Departure, 
                   data = airline.df)
# summary of linear OLS model
summary(fitLinOLSModel)

Call:
lm(formula = Price ~ IsDiwali + Airline + AdvancedBookingDays + 
    Departure, data = airline.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-3094.3 -1241.2  -582.9   632.9 11926.7 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          7335.27     403.98  18.157  < 2e-16 ***
IsDiwali1            4510.43     581.50   7.757 1.40e-13 ***
AirlineIndiGo       -1067.19     415.66  -2.567  0.01073 *  
AirlineJet           -743.56     380.52  -1.954  0.05163 .  
AirlineSpice Jet    -1581.70     497.42  -3.180  0.00163 ** 
AdvancedBookingDays   -89.89      12.82  -7.012 1.57e-11 ***
DeparturePM          -658.29     264.66  -2.487  0.01342 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2145 on 298 degrees of freedom
Multiple R-squared:  0.2091,    Adjusted R-squared:  0.1932 
F-statistic: 13.13 on 6 and 298 DF,  p-value: 3.426e-13

3b. Log-linear OLS Model

# log-linear OLS model
fitLogLinOLSModel <- lm(log(Price) ~ 
                     IsDiwali 
                   + Airline 
                   + AdvancedBookingDays 
                   + Departure, 
                   data = airline.df)
# summary of log-linear OLS model
summary(fitLogLinOLSModel)

Call:
lm(formula = log(Price) ~ IsDiwali + Airline + AdvancedBookingDays + 
    Departure, data = airline.df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.61740 -0.17271 -0.06197  0.15253  1.23127 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)          8.835518   0.057652 153.257  < 2e-16 ***
IsDiwali1            0.770497   0.082984   9.285  < 2e-16 ***
AirlineIndiGo       -0.237650   0.059319  -4.006 7.80e-05 ***
AirlineJet          -0.117291   0.054303  -2.160 0.031576 *  
AirlineSpice Jet    -0.249983   0.070987  -3.522 0.000496 ***
AdvancedBookingDays -0.015091   0.001829  -8.250 5.17e-15 ***
DeparturePM         -0.074840   0.037770  -1.981 0.048457 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3061 on 298 degrees of freedom
Multiple R-squared:  0.2858,    Adjusted R-squared:  0.2714 
F-statistic: 19.87 on 6 and 298 DF,  p-value: < 2.2e-16

PART 4

INTERACTION MODEL

4a. Linear OLS Interaction Model

# linear OLS intearction model
fitLinOLSIntModel <- lm(Price ~ 
                     IsDiwali 
                   + Airline * AdvancedBookingDays 
                   + Departure, 
                   data = airline.df)
# summary of linear OLS interaction model
summary(fitLinOLSIntModel)

Call:
lm(formula = Price ~ IsDiwali + Airline * AdvancedBookingDays + 
    Departure, data = airline.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-3589.5 -1162.2  -422.9   828.5 11145.5 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)
(Intercept)                           5943.73     573.72  10.360  < 2e-16
IsDiwali1                             4263.05     570.13   7.477 8.71e-13
AirlineIndiGo                         1153.02     662.55   1.740   0.0829
AirlineJet                             392.06     613.35   0.639   0.5232
AirlineSpice Jet                      -305.80     794.36  -0.385   0.7005
AdvancedBookingDays                    -34.86      20.27  -1.720   0.0865
DeparturePM                           -631.40     257.85  -2.449   0.0149
AirlineIndiGo:AdvancedBookingDays      -78.75      18.87  -4.172 3.97e-05
AirlineJet:AdvancedBookingDays         -43.08      18.03  -2.389   0.0175
AirlineSpice Jet:AdvancedBookingDays   -47.20      23.12  -2.041   0.0421

(Intercept)                          ***
IsDiwali1                            ***
AirlineIndiGo                        .  
AirlineJet                              
AirlineSpice Jet                        
AdvancedBookingDays                  .  
DeparturePM                          *  
AirlineIndiGo:AdvancedBookingDays    ***
AirlineJet:AdvancedBookingDays       *  
AirlineSpice Jet:AdvancedBookingDays *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2089 on 295 degrees of freedom
Multiple R-squared:  0.2574,    Adjusted R-squared:  0.2348 
F-statistic: 11.36 on 9 and 295 DF,  p-value: 2.812e-15

4b. Log-linear OLS Intearction Model

# log-linear OLS interaction model
fitLogLinOLSIntModel <- lm(log(Price) ~ 
                     IsDiwali 
                   + Airline * AdvancedBookingDays 
                   + Departure, 
                   data = airline.df)
# summary of log-linear OLS interaction model
summary(fitLogLinOLSIntModel)

Call:
lm(formula = log(Price) ~ IsDiwali + Airline * AdvancedBookingDays + 
    Departure, data = airline.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.6551 -0.1874 -0.0555  0.1891  1.1508 

Coefficients:
                                      Estimate Std. Error t value Pr(>|t|)
(Intercept)                           8.677048   0.083001 104.542  < 2e-16
IsDiwali1                             0.744040   0.082481   9.021  < 2e-16
AirlineIndiGo                         0.006169   0.095852   0.064  0.94872
AirlineJet                            0.018354   0.088734   0.207  0.83628
AirlineSpice Jet                     -0.104455   0.114920  -0.909  0.36413
AdvancedBookingDays                  -0.008847   0.002932  -3.017  0.00277
DeparturePM                          -0.071880   0.037303  -1.927  0.05495
AirlineIndiGo:AdvancedBookingDays    -0.008678   0.002731  -3.178  0.00164
AirlineJet:AdvancedBookingDays       -0.005127   0.002609  -1.965  0.05035
AirlineSpice Jet:AdvancedBookingDays -0.005385   0.003345  -1.610  0.10853

(Intercept)                          ***
IsDiwali1                            ***
AirlineIndiGo                           
AirlineJet                              
AirlineSpice Jet                        
AdvancedBookingDays                  ** 
DeparturePM                          .  
AirlineIndiGo:AdvancedBookingDays    ** 
AirlineJet:AdvancedBookingDays       .  
AirlineSpice Jet:AdvancedBookingDays    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3022 on 295 degrees of freedom
Multiple R-squared:  0.3108,    Adjusted R-squared:  0.2898 
F-statistic: 14.78 on 9 and 295 DF,  p-value: < 2.2e-16

PART 5

REGRESSION DIAGNOSTICS

5a. Hetroscedasticity Test

# Goldfeld-Quandt Test
library(lmtest)
gqtest(fitLinOLSIntModel, order.by = ~ Price, data = airline.df)

    Goldfeld-Quandt test

data:  fitLinOLSIntModel
GQ = 19.14, df1 = 143, df2 = 142, p-value < 2.2e-16
alternative hypothesis: variance increases from segment 1 to 2

5b. Breusch-Pagan Test

library(lmtest)
# BP-test for log OLS model
bptest(fitLinOLSIntModel)

    studentized Breusch-Pagan test

data:  fitLinOLSIntModel
BP = 55.248, df = 9, p-value = 1.092e-08

5c. Normality plots of residuals

# distribution of studentized residuals
library(MASS)
sresid <- studres(fitLinOLSIntModel) 
hist(sresid, freq=FALSE, 
   main="Distribution of Studentized Residuals")
xfit<-seq(min(sresid),max(sresid),length=40) 
yfit<-dnorm(xfit) 
lines(xfit, yfit)

plot of chunk unnamed-chunk-21

5d. Residuals Plots for Log-Linear FGLS Model

# residual plots of log OLS model
par(mfrow=c(2,2))
plot(fitLinOLSIntModel)

plot of chunk unnamed-chunk-22

PART 6

FGLS REGRESSION MODEL

6a. Linear FGLS Model

# Step 1:Residuals of linear OLS Model
LogOLSModelRes <- resid(fitLinOLSIntModel)

# Step 2: Taking square of the residuals of linear OLS Model
LogOLSModelResSq <- LogOLSModelRes^2

# Step 3: Taking natural log of the squared residuals of linear OLS Model
lnOLSResSq <- log(LogOLSModelResSq)

# Step 4: Running auxiliary OLS Model
auxOLSModel <- lm(lnOLSResSq ~ 
                    IsDiwali 
                   + Airline * AdvancedBookingDays 
                   + Departure, 
                data = airline.df)

# Step 5: Get fitted value of auxiliary OLS Model i.e. 'auxOLSModel'
fittedValue <- fitted(auxOLSModel)

# Step 6: Compute exponential values of fiited value for auxialiary OLS Model
expValue <- exp(fittedValue)

# Step 7: Fit Log-linear FGLS Model
fitLinFGLSIntModel <- lm(Price ~ 
                     IsDiwali 
                   + Airline * AdvancedBookingDays 
                   + Departure, 
                   weights = 1/expValue, data = airline.df)
# summary of linear FGLS model
summary(fitLinFGLSIntModel)

Call:
lm(formula = Price ~ IsDiwali + Airline * AdvancedBookingDays + 
    Departure, data = airline.df, weights = 1/expValue)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-5.8414 -0.9924 -0.3923  0.9229  8.2168 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)
(Intercept)                           5511.96     243.15  22.669   <2e-16
IsDiwali1                             3180.36     322.25   9.869   <2e-16
AirlineIndiGo                         -625.09     411.69  -1.518   0.1300
AirlineJet                             383.80     334.18   1.148   0.2517
AirlineSpice Jet                      -136.29     595.45  -0.229   0.8191
AdvancedBookingDays                    -21.97      11.37  -1.933   0.0542
DeparturePM                            -86.80     117.91  -0.736   0.4622
AirlineIndiGo:AdvancedBookingDays      -38.13      15.06  -2.532   0.0119
AirlineJet:AdvancedBookingDays         -35.54      14.17  -2.509   0.0127
AirlineSpice Jet:AdvancedBookingDays   -21.34      21.58  -0.989   0.3235

(Intercept)                          ***
IsDiwali1                            ***
AirlineIndiGo                           
AirlineJet                              
AirlineSpice Jet                        
AdvancedBookingDays                  .  
DeparturePM                             
AirlineIndiGo:AdvancedBookingDays    *  
AirlineJet:AdvancedBookingDays       *  
AirlineSpice Jet:AdvancedBookingDays    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.683 on 295 degrees of freedom
Multiple R-squared:  0.4453,    Adjusted R-squared:  0.4284 
F-statistic: 26.31 on 9 and 295 DF,  p-value: < 2.2e-16

6b. Log-linear FGLS Model

# Step 1:Residuals of linear OLS Model
LogOLSModelRes <- resid(fitLogLinOLSIntModel)

# Step 2: Taking square of the residuals of linear OLS Model
LogOLSModelResSq <- LogOLSModelRes^2

# Step 3: Taking natural log of the squared residuals of linear OLS Model
lnOLSResSq <- log(LogOLSModelResSq)

# Step 4: Running auxiliary OLS Model
auxOLSModel <- lm(lnOLSResSq ~ 
                    IsDiwali 
                   + Airline * AdvancedBookingDays 
                   + Departure, 
                data = airline.df)

# Step 5: Get fitted value of auxiliary OLS Model i.e. 'auxOLSModel'
fittedValue <- fitted(auxOLSModel)

# Step 6: Compute exponential values of fiited value for auxialiary OLS Model
expValue <- exp(fittedValue)

# Step 7: Fit Log-linear FGLS Model
fitLogLinFGLSIntModel <- lm(log(Price) ~ 
                     IsDiwali 
                   + Airline * AdvancedBookingDays 
                   + Departure, 
                   weights = 1/expValue, data = airline.df)
# summary of linear FGLS model
summary(fitLogLinFGLSIntModel)

Call:
lm(formula = Price ~ IsDiwali + Airline * AdvancedBookingDays + 
    Departure, data = airline.df, weights = 1/expValue)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-5.8414 -0.9924 -0.3923  0.9229  8.2168 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)
(Intercept)                           5511.96     243.15  22.669   <2e-16
IsDiwali1                             3180.36     322.25   9.869   <2e-16
AirlineIndiGo                         -625.09     411.69  -1.518   0.1300
AirlineJet                             383.80     334.18   1.148   0.2517
AirlineSpice Jet                      -136.29     595.45  -0.229   0.8191
AdvancedBookingDays                    -21.97      11.37  -1.933   0.0542
DeparturePM                            -86.80     117.91  -0.736   0.4622
AirlineIndiGo:AdvancedBookingDays      -38.13      15.06  -2.532   0.0119
AirlineJet:AdvancedBookingDays         -35.54      14.17  -2.509   0.0127
AirlineSpice Jet:AdvancedBookingDays   -21.34      21.58  -0.989   0.3235

(Intercept)                          ***
IsDiwali1                            ***
AirlineIndiGo                           
AirlineJet                              
AirlineSpice Jet                        
AdvancedBookingDays                  .  
DeparturePM                             
AirlineIndiGo:AdvancedBookingDays    *  
AirlineJet:AdvancedBookingDays       *  
AirlineSpice Jet:AdvancedBookingDays    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.683 on 295 degrees of freedom
Multiple R-squared:  0.4453,    Adjusted R-squared:  0.4284 
F-statistic: 26.31 on 9 and 295 DF,  p-value: < 2.2e-16

PART 7

MULTICOLLINEARITY TEST

7a. VIF

library(car)
vif(fitLinOLSIntModel)
                                 GVIF Df GVIF^(1/(2*Df))
IsDiwali                     5.436007  1        2.331525
Airline                     22.001184  3        1.673944
AdvancedBookingDays         14.223236  1        3.771371
Departure                    1.147835  1        1.071371
Airline:AdvancedBookingDays 85.026222  3        2.096970

7b. Correlation Matrix

expVar <- airline.df[c("Price", "AdvancedBookingDays")]
#cor(expVar)
library(Hmisc)
rcorr(as.matrix(expVar))
                    Price AdvancedBookingDays
Price                1.00               -0.01
AdvancedBookingDays -0.01                1.00

n= 305 


P
                    Price  AdvancedBookingDays
Price                      0.8732             
AdvancedBookingDays 0.8732                    

7c. Farrar - Glauber Test

The mctest package in R provides the Farrar-Glauber test and other relevant tests for multicollinearity. There are two functions viz. omcdiag() and imcdiag() under mctest package in R which will provide the overall and individual diagnostic checking for multicollinearity respectively.

library(mctest)
omcdiag(as.matrix(expVar), Price)

Call:
omcdiag(x = as.matrix(expVar), y = Price)


Overall Multicollinearity Diagnostics

                       MC Results detection
Determinant |X'X|:         0.9999         0
Farrar Chi-Square:         0.0252         0
Red Indicator:             0.0092         0
Sum of Lambda Inverse:     2.0002         0
Theil's Method:           -0.9998         0
Condition Number:          5.8290         0

1 --> COLLINEARITY is detected by the test 
0 --> COLLINEARITY is not detected by the test
imcdiag(expVar, Price)

Call:
imcdiag(x = expVar, y = Price)


All Individual Multicollinearity Diagnostics Result

                       VIF    TOL     Wi  Fi Leamer CVIF Klein
Price               1.0001 0.9999 0.0255 Inf      1    0     0
AdvancedBookingDays 1.0001 0.9999 0.0255 Inf      1    0     0

1 --> COLLINEARITY is detected by the test 
0 --> COLLINEARITY is not detected by the test

AdvancedBookingDays , coefficient(s) are non-significant may be due to multicollinearity

R-square of y on all x: 1 

* use method argument to check which regressors may be the reason of collinearity