Multicollinearity Demonstration

Sameer Mathur

Multicollinearity Demonstration in BOM-DEL-BOM data

Regression Diagnostics

---

Reading and Describing Data

Reading data

Reading data into a dataframe

# reading data into R
airline.df <- read.csv(paste("BOMDELBOM.csv"))
# attaching data columns of the dataframe
attach(airline.df)

Number of rows and columns

# dimension of the dataframe
dim(airline.df)
[1] 305  23

Descriptive Statistics

# descriptive statistics
library(psych)
describe(airline.df)[, c(1:5, 8:9)]
                    vars   n    mean      sd  median     min      max
FlightNumber*          1 305   31.86   18.35   32.00    1.00    63.00
Airline*               2 305    2.60    0.88    3.00    1.00     4.00
DepartureCityCode*     3 305    1.57    0.50    2.00    1.00     2.00
ArrivalCityCode*       4 305    1.43    0.50    1.00    1.00     2.00
DepartureTime          5 305 1249.54  579.86 1035.00  225.00  2320.00
ArrivalTime            6 305 1329.31  613.52 1215.00   20.00  2345.00
Departure*             7 305    1.45    0.50    1.00    1.00     2.00
FlyingMinutes          8 305  136.03    4.71  135.00  125.00   145.00
Aircraft*              9 305    1.54    0.50    2.00    1.00     2.00
PlaneModel*           10 305    3.82    2.71    3.00    1.00     9.00
Capacity              11 305  176.36   32.39  180.00  138.00   303.00
SeatPitch             12 305   30.26    0.93   30.00   29.00    33.00
SeatWidth             13 305   17.41    0.49   17.00   17.00    18.00
DataCollectionDate*   14 305    4.36    1.98    5.00    1.00     7.00
DateDeparture*        15 305    8.14    6.69    7.00    1.00    20.00
IsWeekend*            16 305    1.13    0.34    1.00    1.00     2.00
Price                 17 305 5394.54 2388.29 4681.00 2607.00 18015.00
AdvancedBookingDays   18 305   28.90   22.30   30.00    2.00    61.00
IsDiwali*             19 305    1.40    0.49    1.00    1.00     2.00
DayBeforeDiwali*      20 305    1.19    0.40    1.00    1.00     2.00
DayAfterDiwali*       21 305    1.20    0.40    1.00    1.00     2.00
MarketShare           22 305   21.18   11.04   15.40   13.20    39.60
LoadFactor            23 305   85.13    4.32   83.32   78.73    94.06

A subset of the Data

# first few rows of the dataframe
head(airline.df)[, c(1,2,22)]
  FlightNumber   Airline MarketShare
1       9W 313       Jet        15.4
2       9W 339       Jet        15.4
3       SG 161 Spice Jet        13.2
4       6E 171    IndiGo        39.6
5       SG 160 Spice Jet        13.2
6       9W 762       Jet        15.4

Frequency Table of Market Share by Airline

# 'MarketShare' divided by 'Airline'
table(Airline, MarketShare)
           MarketShare
Airline     13.2 13.3 15.4 39.6
  Air India    0   41    0    0
  IndiGo       0    0    0   80
  Jet          0    0  144    0
  Spice Jet   40    0    0    0

Regression Models

Regression Model 1

Regression of Price with Airline

Note: Airline is a factor variable

# Model 1
fitOLSModel1 <- lm(Price ~ 
                    AdvancedBookingDays 
                  + Airline 
                  + Departure 
                  + IsWeekend 
                  + IsDiwali 
                  + DepartureCityCode 
                  + FlyingMinutes 
                  + SeatPitch 
                  + SeatWidth, 
                  data = airline.df)
summary(fitOLSModel1)

Call:
lm(formula = Price ~ AdvancedBookingDays + Airline + Departure + 
    IsWeekend + IsDiwali + DepartureCityCode + FlyingMinutes + 
    SeatPitch + SeatWidth, data = airline.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2671.2 -1266.2  -456.4   517.4 11953.9 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)          -4292.94    8897.87  -0.482   0.6298    
AdvancedBookingDays    -87.70      12.47  -7.033 1.43e-11 ***
AirlineIndiGo         -577.17     778.64  -0.741   0.4591    
AirlineJet            -120.75     436.69  -0.277   0.7823    
AirlineSpice Jet     -1118.38     697.85  -1.603   0.1101    
DeparturePM           -589.79     275.23  -2.143   0.0329 *  
IsWeekendYes          -345.92     408.06  -0.848   0.3973    
IsDiwaliYes           4346.80     568.14   7.651 2.90e-13 ***
DepartureCityCodeDEL -1413.46     351.54  -4.021 7.38e-05 ***
FlyingMinutes           38.97      29.27   1.331   0.1841    
SeatPitch             -279.19     226.64  -1.232   0.2190    
SeatWidth              868.58     507.54   1.711   0.0881 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2079 on 293 degrees of freedom
Multiple R-squared:  0.2695,    Adjusted R-squared:  0.2421 
F-statistic: 9.828 on 11 and 293 DF,  p-value: 3.604e-15

Regression Model 2

Regression of Price with MarketShare

Note: MarketShare is a continuous variable

Note: Airline is excluded from this regression model.

# Model 2
fitOLSModel2 <- lm(Price ~ 
                    AdvancedBookingDays 
               #  + Airline        
                  + Departure 
                  + IsWeekend 
                  + IsDiwali 
                  + DepartureCityCode 
                  + FlyingMinutes 
                  + SeatPitch 
                  + SeatWidth
                  + MarketShare,   
                  data = airline.df)
summary(fitOLSModel2)

Call:
lm(formula = Price ~ AdvancedBookingDays + Departure + IsWeekend + 
    IsDiwali + DepartureCityCode + FlyingMinutes + SeatPitch + 
    SeatWidth + MarketShare, data = airline.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2992.3 -1285.8  -413.1   533.2 12094.8 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          -9608.675   8195.592  -1.172   0.2420    
AdvancedBookingDays    -86.321     12.481  -6.916 2.88e-11 ***
DeparturePM           -427.476    262.803  -1.627   0.1049    
IsWeekendYes          -153.076    392.867  -0.390   0.6971    
IsDiwaliYes           4338.015    569.799   7.613 3.64e-13 ***
DepartureCityCodeDEL -1584.553    327.227  -4.842 2.07e-06 ***
FlyingMinutes           54.078     28.288   1.912   0.0569 .  
SeatPitch              -16.405    174.302  -0.094   0.9251    
SeatWidth              569.942    484.705   1.176   0.2406    
MarketShare              5.523     20.799   0.266   0.7908    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2085 on 295 degrees of freedom
Multiple R-squared:  0.2602,    Adjusted R-squared:  0.2376 
F-statistic: 11.53 on 9 and 295 DF,  p-value: 1.685e-15

Regression Model 12

Regression of Price with Airline and its MarketShare

Note: Airline is a factor variable

Note: MarketShare is a continuous variable

# Model 12
fitOLSModel12 <- lm(Price ~ 
                    AdvancedBookingDays 
                  + Airline        
                  + Departure 
                  + IsWeekend 
                  + IsDiwali 
                  + DepartureCityCode 
                  + FlyingMinutes 
                  + SeatPitch 
                  + SeatWidth
                  + MarketShare,   
                  data = airline.df)
summary(fitOLSModel12)

Call:
lm(formula = Price ~ AdvancedBookingDays + Airline + Departure + 
    IsWeekend + IsDiwali + DepartureCityCode + FlyingMinutes + 
    SeatPitch + SeatWidth + MarketShare, data = airline.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2671.2 -1266.2  -456.4   517.4 11953.9 

Coefficients: (1 not defined because of singularities)
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)          -4292.94    8897.87  -0.482   0.6298    
AdvancedBookingDays    -87.70      12.47  -7.033 1.43e-11 ***
AirlineIndiGo         -577.17     778.64  -0.741   0.4591    
AirlineJet            -120.75     436.69  -0.277   0.7823    
AirlineSpice Jet     -1118.38     697.85  -1.603   0.1101    
DeparturePM           -589.79     275.23  -2.143   0.0329 *  
IsWeekendYes          -345.92     408.06  -0.848   0.3973    
IsDiwaliYes           4346.80     568.14   7.651 2.90e-13 ***
DepartureCityCodeDEL -1413.46     351.54  -4.021 7.38e-05 ***
FlyingMinutes           38.97      29.27   1.331   0.1841    
SeatPitch             -279.19     226.64  -1.232   0.2190    
SeatWidth              868.58     507.54   1.711   0.0881 .  
MarketShare                NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2079 on 293 degrees of freedom
Multiple R-squared:  0.2695,    Adjusted R-squared:  0.2421 
F-statistic: 9.828 on 11 and 293 DF,  p-value: 3.604e-15

Multicollinearity

Regression Model 12 shows evidence of Multicollinearity

Since MarketShare and Airline are 100% correlated, we should not include both variables in our Regression.

Regression Model 12B

Regression of Price with Airline and its MarketShare

Note: Airline is a factor variable

Note: MarketShare is also a factor variable

# Model 12B
# converting 'MarketShare' into a factor variable
airline.df$MarketShare <- as.factor(airline.df$MarketShare)
fitOLSModel12B <- lm(Price ~ 
                    AdvancedBookingDays 
                  + Airline        
                  + Departure 
                  + IsWeekend 
                  + IsDiwali 
                  + DepartureCityCode 
                  + FlyingMinutes 
                  + SeatPitch 
                  + SeatWidth
                  + MarketShare,   
                  data = airline.df)
summary(fitOLSModel12B)

Call:
lm(formula = Price ~ AdvancedBookingDays + Airline + Departure + 
    IsWeekend + IsDiwali + DepartureCityCode + FlyingMinutes + 
    SeatPitch + SeatWidth + MarketShare, data = airline.df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2671.2 -1266.2  -456.4   517.4 11953.9 

Coefficients: (3 not defined because of singularities)
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)          -4292.94    8897.87  -0.482   0.6298    
AdvancedBookingDays    -87.70      12.47  -7.033 1.43e-11 ***
AirlineIndiGo         -577.17     778.64  -0.741   0.4591    
AirlineJet            -120.75     436.69  -0.277   0.7823    
AirlineSpice Jet     -1118.38     697.85  -1.603   0.1101    
DeparturePM           -589.79     275.23  -2.143   0.0329 *  
IsWeekendYes          -345.92     408.06  -0.848   0.3973    
IsDiwaliYes           4346.80     568.14   7.651 2.90e-13 ***
DepartureCityCodeDEL -1413.46     351.54  -4.021 7.38e-05 ***
FlyingMinutes           38.97      29.27   1.331   0.1841    
SeatPitch             -279.19     226.64  -1.232   0.2190    
SeatWidth              868.58     507.54   1.711   0.0881 .  
MarketShare13.3            NA         NA      NA       NA    
MarketShare15.4            NA         NA      NA       NA    
MarketShare39.6            NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2079 on 293 degrees of freedom
Multiple R-squared:  0.2695,    Adjusted R-squared:  0.2421 
F-statistic: 9.828 on 11 and 293 DF,  p-value: 3.604e-15

Multicollinearity

It does not matter whether MarketShare is modeled as Continuous or Factor.

Regression Model 12B also shows evidence of Multicollinearity

Since MarketShare and Airline are 100% correlated, we should not include both variables in our regression.

Perfect Multicollinearity