Airline Pricing Assgn.rpres

Team 3
24.10.2018

Looking at the Data

airline=read.csv("AirlinePricingData.csv")
str(airline)
'data.frame':   305 obs. of  25 variables:
 $ FlightNumber       : Factor w/ 63 levels "6E 129","6E 155",..: 25 32 62 4 61 45 57 16 59 17 ...
 $ Airline            : Factor w/ 4 levels "Air India","IndiGo",..: 3 3 4 2 4 3 4 2 4 3 ...
 $ DepartureCityCode  : Factor w/ 2 levels "BOM","DEL": 2 1 2 2 1 1 2 2 1 1 ...
 $ ArrivalCityCode    : Factor w/ 2 levels "BOM","DEL": 1 2 1 1 2 2 1 1 2 2 ...
 $ DepartureTime      : int  225 300 350 455 555 605 635 640 645 700 ...
 $ ArrivalTime        : int  435 505 605 710 805 815 850 855 855 915 ...
 $ Departure          : Factor w/ 2 levels "AM","PM": 1 1 1 1 1 1 1 1 1 1 ...
 $ FlyingMinutes      : int  130 125 135 135 130 130 135 135 130 135 ...
 $ Aircraft           : Factor w/ 2 levels "Airbus","Boeing": 2 2 2 1 2 2 2 1 2 2 ...
 $ PlaneModel         : Factor w/ 9 levels "738","739","77W",..: 1 1 1 6 1 1 1 6 1 2 ...
 $ Capacity           : int  156 156 189 180 189 156 189 180 189 138 ...
 $ SeatPitch          : int  30 30 29 30 29 30 29 30 29 30 ...
 $ SeatWidth          : num  17 17 17 18 17 17 17 18 17 17 ...
 $ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 2 4 6 7 6 4 6 7 6 4 ...
 $ DateDeparture      : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ IsWeekend          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Price              : int  4051 11587 3977 4234 6837 6518 3189 4234 8623 6833 ...
 $ AdvancedBookingDays: int  54 52 48 59 48 52 48 59 48 52 ...
 $ IsDiwali           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ DayBeforeDiwali    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ DayAfterDiwali     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ MetroDeparture     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MetroArrival       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MarketShare        : num  15.4 15.4 13.2 39.6 13.2 15.4 13.2 39.6 13.2 15.4 ...
 $ LoadFactor         : num  83.3 83.3 94.1 87.2 94.1 ...

Looking at the Data

Here, the dependent variable is Price, and the independent variables we will examine are - Airline, DepartureCityCode, DepartureTime, ArrivalTime, Departure, FlyingMinutes, Aircraft, PlaneModel, Capacity, SeatPitch, SeatWidth, IsWeekend, AdvancedBookingDays, IsDiwali, DayBeforeDiwali, DayAfterDiwali, LoadFactor

Linear Regression_v1

m=lm(Price~Airline+DepartureCityCode+DepartureTime+ArrivalTime+Departure+FlyingMinutes+Aircraft+PlaneModel+Capacity+SeatPitch+SeatWidth+IsWeekend+AdvancedBookingDays+IsDiwali+DayBeforeDiwali+DayAfterDiwali+LoadFactor, data = airline)
summary(m)

Call:
lm(formula = Price ~ Airline + DepartureCityCode + DepartureTime + 
    ArrivalTime + Departure + FlyingMinutes + Aircraft + PlaneModel + 
    Capacity + SeatPitch + SeatWidth + IsWeekend + AdvancedBookingDays + 
    IsDiwali + DayBeforeDiwali + DayAfterDiwali + LoadFactor, 
    data = airline)

Residuals:
    Min      1Q  Median      3Q     Max 
-2989.1 -1260.6  -383.2   668.5 11985.6 

Coefficients: (7 not defined because of singularities)
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           2244.0076  4949.9312   0.453 0.650648    
AirlineIndiGo          367.8422  1644.3583   0.224 0.823151    
AirlineJet            -435.9205  1171.8260  -0.372 0.710168    
AirlineSpice Jet     -1190.8027  1269.8674  -0.938 0.349173    
DepartureCityCodeDEL -1462.4608   372.4379  -3.927 0.000108 ***
DepartureTime           -0.7703     0.5701  -1.351 0.177739    
ArrivalTime              0.1960     0.2712   0.723 0.470412    
DeparturePM            186.0938   685.4131   0.272 0.786198    
FlyingMinutes           40.2232    34.3457   1.171 0.242526    
AircraftBoeing         403.3378   993.1710   0.406 0.684966    
PlaneModel739          -68.5214   422.7360  -0.162 0.871350    
PlaneModel77W         -997.1281  1488.3451  -0.670 0.503427    
PlaneModel788        -1577.8316  1444.6641  -1.092 0.275677    
PlaneModelA319        1178.3138  1785.6922   0.660 0.509874    
PlaneModelA320               NA         NA      NA       NA    
PlaneModelA321         200.9935  1387.0639   0.145 0.884888    
PlaneModelA332         913.2756  1043.4455   0.875 0.382175    
PlaneModelA333               NA         NA      NA       NA    
Capacity                     NA         NA      NA       NA    
SeatPitch                    NA         NA      NA       NA    
SeatWidth                    NA         NA      NA       NA    
IsWeekendYes          -397.7911   407.0898  -0.977 0.329320    
AdvancedBookingDays    -86.5245    12.4170  -6.968 2.23e-11 ***
IsDiwali              3883.1493   604.3007   6.426 5.49e-10 ***
DayBeforeDiwali        823.7030   376.7495   2.186 0.029604 *  
DayAfterDiwali               NA         NA      NA       NA    
LoadFactor                   NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2065 on 285 degrees of freedom
Multiple R-squared:  0.2995,    Adjusted R-squared:  0.2528 
F-statistic: 6.412 on 19 and 285 DF,  p-value: 8.665e-14

Linear Regression_v2


Call:
lm(formula = Price ~ Airline + DepartureCityCode + DepartureTime + 
    Departure + FlyingMinutes + IsWeekend + AdvancedBookingDays + 
    DayBeforeDiwali + DayAfterDiwali + LoadFactor, data = airline)

Residuals:
    Min      1Q  Median      3Q     Max 
-3044.1 -1286.2  -389.2   637.6 11970.2 

Coefficients: (1 not defined because of singularities)
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           1222.9741  3939.3125   0.310   0.7564    
AirlineIndiGo          439.7026   517.9370   0.849   0.3966    
AirlineJet              31.2515   401.9289   0.078   0.9381    
AirlineSpice Jet      -755.5249   529.2152  -1.428   0.1545    
DepartureCityCodeDEL -1488.6234   348.1394  -4.276 2.58e-05 ***
DepartureTime           -1.0093     0.5268  -1.916   0.0563 .  
DeparturePM            583.8948   604.5056   0.966   0.3349    
FlyingMinutes           50.2177    29.7721   1.687   0.0927 .  
IsWeekendYes          -356.9015   404.5941  -0.882   0.3784    
AdvancedBookingDays    -85.9436    12.3747  -6.945 2.45e-11 ***
DayBeforeDiwali       4700.2426   585.1014   8.033 2.34e-14 ***
DayAfterDiwali        3872.4324   602.4410   6.428 5.24e-10 ***
LoadFactor                   NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2059 on 293 degrees of freedom
Multiple R-squared:  0.2834,    Adjusted R-squared:  0.2565 
F-statistic: 10.54 on 11 and 293 DF,  p-value: 2.692e-16

Linear Regression_v3


Call:
lm(formula = Price ~ city + timing + FlyingMinutes + weekend + 
    AdvancedBookingDays + IsDiwali, data = airline)

Residuals:
    Min      1Q  Median      3Q     Max 
-3158.4 -1243.6  -473.8   356.8 12467.8 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          2341.71    3584.37   0.653    0.514    
city1               -1294.71     264.57  -4.894 1.62e-06 ***
timing1              -278.37     245.42  -1.134    0.258    
FlyingMinutes          34.58      26.56   1.302    0.194    
weekend1             -251.41     384.34  -0.654    0.514    
AdvancedBookingDays   -83.82      12.47  -6.724 9.03e-11 ***
IsDiwali             4216.76     568.35   7.419 1.24e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2093 on 298 degrees of freedom
Multiple R-squared:  0.2475,    Adjusted R-squared:  0.2323 
F-statistic: 16.33 on 6 and 298 DF,  p-value: 2.898e-16

Linear Regression_v4


Call:
lm(formula = Price ~ city + AdvancedBookingDays + IsDiwali + 
    MarketShare + timing, data = airline)

Residuals:
    Min      1Q  Median      3Q     Max 
-3307.1 -1208.0  -415.9   583.5 12171.6 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          6702.78     312.34  21.460  < 2e-16 ***
city1               -1469.42     286.94  -5.121 5.45e-07 ***
AdvancedBookingDays   -85.59      12.48  -6.857 4.04e-11 ***
IsDiwali             4343.61     566.14   7.672 2.40e-13 ***
MarketShare            19.73      12.90   1.529    0.127    
timing1              -296.40     244.71  -1.211    0.227    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2088 on 299 degrees of freedom
Multiple R-squared:  0.2482,    Adjusted R-squared:  0.2356 
F-statistic: 19.74 on 5 and 299 DF,  p-value: < 2.2e-16

Linear Regression_v5


Call:
lm(formula = Price ~ city + AdvancedBookingDays + IsDiwali, data = airline)

Residuals:
    Min      1Q  Median      3Q     Max 
-3214.5 -1246.7  -503.8   487.0 12616.8 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          6854.58     249.87  27.433  < 2e-16 ***
city1               -1289.79     244.43  -5.277 2.52e-07 ***
AdvancedBookingDays   -83.29      12.45  -6.692 1.07e-10 ***
IsDiwali             4253.85     565.19   7.526 6.09e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2093 on 301 degrees of freedom
Multiple R-squared:  0.2393,    Adjusted R-squared:  0.2318 
F-statistic: 31.57 on 3 and 301 DF,  p-value: < 2.2e-16

Linear Regression_v5


Call:
lm(formula = Price ~ Airline + DepartureCityCode + DepartureTime + 
    Departure + FlyingMinutes + Aircraft + IsWeekend + AdvancedBookingDays + 
    DayBeforeDiwali + DayAfterDiwali + MarketShare + LoadFactor, 
    data = airline)

Residuals:
    Min      1Q  Median      3Q     Max 
-2927.5 -1288.1  -412.2   656.4 11963.8 

Coefficients: (2 not defined because of singularities)
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           1310.9943  3928.9775   0.334   0.7389    
AirlineIndiGo          236.6486   531.6554   0.445   0.6566    
AirlineJet             247.1373   422.6021   0.585   0.5591    
AirlineSpice Jet      -397.9990   572.4546  -0.695   0.4875    
DepartureCityCodeDEL -1510.7315   347.4631  -4.348 1.90e-05 ***
DepartureTime           -0.8209     0.5382  -1.525   0.1283    
DeparturePM            351.7589   619.8104   0.568   0.5708    
FlyingMinutes           50.2658    29.6911   1.693   0.0915 .  
AircraftBoeing        -568.4491   352.5141  -1.613   0.1079    
IsWeekendYes          -346.8845   403.5414  -0.860   0.3907    
AdvancedBookingDays    -86.1787    12.3419  -6.983 1.96e-11 ***
DayBeforeDiwali       4706.4125   583.5226   8.066 1.90e-14 ***
DayAfterDiwali        3884.2991   600.8475   6.465 4.25e-10 ***
MarketShare                  NA         NA      NA       NA    
LoadFactor                   NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2054 on 292 degrees of freedom
Multiple R-squared:  0.2897,    Adjusted R-squared:  0.2606 
F-statistic: 9.927 on 12 and 292 DF,  p-value: 2.756e-16

Results of Linear Regressions

After trying different variations, we find that only three variables are significant, namely, Departure City, Advanced Booking Days, and DayAfterDiwali, However, the adjusted R-squared is coming higher after including some additional factors. Let's try to plot the graph of actual(in red) and predicted values(in green).

m6=lm(Price~Airline+DepartureCityCode+DepartureTime+Departure+FlyingMinutes+Aircraft+IsWeekend+AdvancedBookingDays+DayBeforeDiwali+DayAfterDiwali+MarketShare+LoadFactor, data = airline)
summary(m6)

Call:
lm(formula = Price ~ Airline + DepartureCityCode + DepartureTime + 
    Departure + FlyingMinutes + Aircraft + IsWeekend + AdvancedBookingDays + 
    DayBeforeDiwali + DayAfterDiwali + MarketShare + LoadFactor, 
    data = airline)

Residuals:
    Min      1Q  Median      3Q     Max 
-2927.5 -1288.1  -412.2   656.4 11963.8 

Coefficients: (2 not defined because of singularities)
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           1310.9943  3928.9775   0.334   0.7389    
AirlineIndiGo          236.6486   531.6554   0.445   0.6566    
AirlineJet             247.1373   422.6021   0.585   0.5591    
AirlineSpice Jet      -397.9990   572.4546  -0.695   0.4875    
DepartureCityCodeDEL -1510.7315   347.4631  -4.348 1.90e-05 ***
DepartureTime           -0.8209     0.5382  -1.525   0.1283    
DeparturePM            351.7589   619.8104   0.568   0.5708    
FlyingMinutes           50.2658    29.6911   1.693   0.0915 .  
AircraftBoeing        -568.4491   352.5141  -1.613   0.1079    
IsWeekendYes          -346.8845   403.5414  -0.860   0.3907    
AdvancedBookingDays    -86.1787    12.3419  -6.983 1.96e-11 ***
DayBeforeDiwali       4706.4125   583.5226   8.066 1.90e-14 ***
DayAfterDiwali        3884.2991   600.8475   6.465 4.25e-10 ***
MarketShare                  NA         NA      NA       NA    
LoadFactor                   NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2054 on 292 degrees of freedom
Multiple R-squared:  0.2897,    Adjusted R-squared:  0.2606 
F-statistic: 9.927 on 12 and 292 DF,  p-value: 2.756e-16

Plot of Linear Regressions

plot(airline$Price, col="red", type="l")
points(fitted(m6), col="green", type="l")

plot of chunk unnamed-chunk-9