Team 3
24.10.2018
airline=read.csv("AirlinePricingData.csv")
str(airline)
'data.frame': 305 obs. of 25 variables:
$ FlightNumber : Factor w/ 63 levels "6E 129","6E 155",..: 25 32 62 4 61 45 57 16 59 17 ...
$ Airline : Factor w/ 4 levels "Air India","IndiGo",..: 3 3 4 2 4 3 4 2 4 3 ...
$ DepartureCityCode : Factor w/ 2 levels "BOM","DEL": 2 1 2 2 1 1 2 2 1 1 ...
$ ArrivalCityCode : Factor w/ 2 levels "BOM","DEL": 1 2 1 1 2 2 1 1 2 2 ...
$ DepartureTime : int 225 300 350 455 555 605 635 640 645 700 ...
$ ArrivalTime : int 435 505 605 710 805 815 850 855 855 915 ...
$ Departure : Factor w/ 2 levels "AM","PM": 1 1 1 1 1 1 1 1 1 1 ...
$ FlyingMinutes : int 130 125 135 135 130 130 135 135 130 135 ...
$ Aircraft : Factor w/ 2 levels "Airbus","Boeing": 2 2 2 1 2 2 2 1 2 2 ...
$ PlaneModel : Factor w/ 9 levels "738","739","77W",..: 1 1 1 6 1 1 1 6 1 2 ...
$ Capacity : int 156 156 189 180 189 156 189 180 189 138 ...
$ SeatPitch : int 30 30 29 30 29 30 29 30 29 30 ...
$ SeatWidth : num 17 17 17 18 17 17 17 18 17 17 ...
$ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 2 4 6 7 6 4 6 7 6 4 ...
$ DateDeparture : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 1 1 1 1 1 1 1 1 1 1 ...
$ IsWeekend : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Price : int 4051 11587 3977 4234 6837 6518 3189 4234 8623 6833 ...
$ AdvancedBookingDays: int 54 52 48 59 48 52 48 59 48 52 ...
$ IsDiwali : int 1 1 1 1 1 1 1 1 1 1 ...
$ DayBeforeDiwali : int 1 1 1 1 1 1 1 1 1 1 ...
$ DayAfterDiwali : int 0 0 0 0 0 0 0 0 0 0 ...
$ MetroDeparture : int 1 1 1 1 1 1 1 1 1 1 ...
$ MetroArrival : int 1 1 1 1 1 1 1 1 1 1 ...
$ MarketShare : num 15.4 15.4 13.2 39.6 13.2 15.4 13.2 39.6 13.2 15.4 ...
$ LoadFactor : num 83.3 83.3 94.1 87.2 94.1 ...
Here, the dependent variable is Price, and the independent variables we will examine are - Airline, DepartureCityCode, DepartureTime, ArrivalTime, Departure, FlyingMinutes, Aircraft, PlaneModel, Capacity, SeatPitch, SeatWidth, IsWeekend, AdvancedBookingDays, IsDiwali, DayBeforeDiwali, DayAfterDiwali, LoadFactor
m=lm(Price~Airline+DepartureCityCode+DepartureTime+ArrivalTime+Departure+FlyingMinutes+Aircraft+PlaneModel+Capacity+SeatPitch+SeatWidth+IsWeekend+AdvancedBookingDays+IsDiwali+DayBeforeDiwali+DayAfterDiwali+LoadFactor, data = airline)
summary(m)
Call:
lm(formula = Price ~ Airline + DepartureCityCode + DepartureTime +
ArrivalTime + Departure + FlyingMinutes + Aircraft + PlaneModel +
Capacity + SeatPitch + SeatWidth + IsWeekend + AdvancedBookingDays +
IsDiwali + DayBeforeDiwali + DayAfterDiwali + LoadFactor,
data = airline)
Residuals:
Min 1Q Median 3Q Max
-2989.1 -1260.6 -383.2 668.5 11985.6
Coefficients: (7 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2244.0076 4949.9312 0.453 0.650648
AirlineIndiGo 367.8422 1644.3583 0.224 0.823151
AirlineJet -435.9205 1171.8260 -0.372 0.710168
AirlineSpice Jet -1190.8027 1269.8674 -0.938 0.349173
DepartureCityCodeDEL -1462.4608 372.4379 -3.927 0.000108 ***
DepartureTime -0.7703 0.5701 -1.351 0.177739
ArrivalTime 0.1960 0.2712 0.723 0.470412
DeparturePM 186.0938 685.4131 0.272 0.786198
FlyingMinutes 40.2232 34.3457 1.171 0.242526
AircraftBoeing 403.3378 993.1710 0.406 0.684966
PlaneModel739 -68.5214 422.7360 -0.162 0.871350
PlaneModel77W -997.1281 1488.3451 -0.670 0.503427
PlaneModel788 -1577.8316 1444.6641 -1.092 0.275677
PlaneModelA319 1178.3138 1785.6922 0.660 0.509874
PlaneModelA320 NA NA NA NA
PlaneModelA321 200.9935 1387.0639 0.145 0.884888
PlaneModelA332 913.2756 1043.4455 0.875 0.382175
PlaneModelA333 NA NA NA NA
Capacity NA NA NA NA
SeatPitch NA NA NA NA
SeatWidth NA NA NA NA
IsWeekendYes -397.7911 407.0898 -0.977 0.329320
AdvancedBookingDays -86.5245 12.4170 -6.968 2.23e-11 ***
IsDiwali 3883.1493 604.3007 6.426 5.49e-10 ***
DayBeforeDiwali 823.7030 376.7495 2.186 0.029604 *
DayAfterDiwali NA NA NA NA
LoadFactor NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2065 on 285 degrees of freedom
Multiple R-squared: 0.2995, Adjusted R-squared: 0.2528
F-statistic: 6.412 on 19 and 285 DF, p-value: 8.665e-14
Call:
lm(formula = Price ~ Airline + DepartureCityCode + DepartureTime +
Departure + FlyingMinutes + IsWeekend + AdvancedBookingDays +
DayBeforeDiwali + DayAfterDiwali + LoadFactor, data = airline)
Residuals:
Min 1Q Median 3Q Max
-3044.1 -1286.2 -389.2 637.6 11970.2
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1222.9741 3939.3125 0.310 0.7564
AirlineIndiGo 439.7026 517.9370 0.849 0.3966
AirlineJet 31.2515 401.9289 0.078 0.9381
AirlineSpice Jet -755.5249 529.2152 -1.428 0.1545
DepartureCityCodeDEL -1488.6234 348.1394 -4.276 2.58e-05 ***
DepartureTime -1.0093 0.5268 -1.916 0.0563 .
DeparturePM 583.8948 604.5056 0.966 0.3349
FlyingMinutes 50.2177 29.7721 1.687 0.0927 .
IsWeekendYes -356.9015 404.5941 -0.882 0.3784
AdvancedBookingDays -85.9436 12.3747 -6.945 2.45e-11 ***
DayBeforeDiwali 4700.2426 585.1014 8.033 2.34e-14 ***
DayAfterDiwali 3872.4324 602.4410 6.428 5.24e-10 ***
LoadFactor NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2059 on 293 degrees of freedom
Multiple R-squared: 0.2834, Adjusted R-squared: 0.2565
F-statistic: 10.54 on 11 and 293 DF, p-value: 2.692e-16
Call:
lm(formula = Price ~ city + timing + FlyingMinutes + weekend +
AdvancedBookingDays + IsDiwali, data = airline)
Residuals:
Min 1Q Median 3Q Max
-3158.4 -1243.6 -473.8 356.8 12467.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2341.71 3584.37 0.653 0.514
city1 -1294.71 264.57 -4.894 1.62e-06 ***
timing1 -278.37 245.42 -1.134 0.258
FlyingMinutes 34.58 26.56 1.302 0.194
weekend1 -251.41 384.34 -0.654 0.514
AdvancedBookingDays -83.82 12.47 -6.724 9.03e-11 ***
IsDiwali 4216.76 568.35 7.419 1.24e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2093 on 298 degrees of freedom
Multiple R-squared: 0.2475, Adjusted R-squared: 0.2323
F-statistic: 16.33 on 6 and 298 DF, p-value: 2.898e-16
Call:
lm(formula = Price ~ city + AdvancedBookingDays + IsDiwali +
MarketShare + timing, data = airline)
Residuals:
Min 1Q Median 3Q Max
-3307.1 -1208.0 -415.9 583.5 12171.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6702.78 312.34 21.460 < 2e-16 ***
city1 -1469.42 286.94 -5.121 5.45e-07 ***
AdvancedBookingDays -85.59 12.48 -6.857 4.04e-11 ***
IsDiwali 4343.61 566.14 7.672 2.40e-13 ***
MarketShare 19.73 12.90 1.529 0.127
timing1 -296.40 244.71 -1.211 0.227
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2088 on 299 degrees of freedom
Multiple R-squared: 0.2482, Adjusted R-squared: 0.2356
F-statistic: 19.74 on 5 and 299 DF, p-value: < 2.2e-16
Call:
lm(formula = Price ~ city + AdvancedBookingDays + IsDiwali, data = airline)
Residuals:
Min 1Q Median 3Q Max
-3214.5 -1246.7 -503.8 487.0 12616.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6854.58 249.87 27.433 < 2e-16 ***
city1 -1289.79 244.43 -5.277 2.52e-07 ***
AdvancedBookingDays -83.29 12.45 -6.692 1.07e-10 ***
IsDiwali 4253.85 565.19 7.526 6.09e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2093 on 301 degrees of freedom
Multiple R-squared: 0.2393, Adjusted R-squared: 0.2318
F-statistic: 31.57 on 3 and 301 DF, p-value: < 2.2e-16
Call:
lm(formula = Price ~ Airline + DepartureCityCode + DepartureTime +
Departure + FlyingMinutes + Aircraft + IsWeekend + AdvancedBookingDays +
DayBeforeDiwali + DayAfterDiwali + MarketShare + LoadFactor,
data = airline)
Residuals:
Min 1Q Median 3Q Max
-2927.5 -1288.1 -412.2 656.4 11963.8
Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1310.9943 3928.9775 0.334 0.7389
AirlineIndiGo 236.6486 531.6554 0.445 0.6566
AirlineJet 247.1373 422.6021 0.585 0.5591
AirlineSpice Jet -397.9990 572.4546 -0.695 0.4875
DepartureCityCodeDEL -1510.7315 347.4631 -4.348 1.90e-05 ***
DepartureTime -0.8209 0.5382 -1.525 0.1283
DeparturePM 351.7589 619.8104 0.568 0.5708
FlyingMinutes 50.2658 29.6911 1.693 0.0915 .
AircraftBoeing -568.4491 352.5141 -1.613 0.1079
IsWeekendYes -346.8845 403.5414 -0.860 0.3907
AdvancedBookingDays -86.1787 12.3419 -6.983 1.96e-11 ***
DayBeforeDiwali 4706.4125 583.5226 8.066 1.90e-14 ***
DayAfterDiwali 3884.2991 600.8475 6.465 4.25e-10 ***
MarketShare NA NA NA NA
LoadFactor NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2054 on 292 degrees of freedom
Multiple R-squared: 0.2897, Adjusted R-squared: 0.2606
F-statistic: 9.927 on 12 and 292 DF, p-value: 2.756e-16
After trying different variations, we find that only three variables are significant, namely, Departure City, Advanced Booking Days, and DayAfterDiwali, However, the adjusted R-squared is coming higher after including some additional factors. Let's try to plot the graph of actual(in red) and predicted values(in green).
m6=lm(Price~Airline+DepartureCityCode+DepartureTime+Departure+FlyingMinutes+Aircraft+IsWeekend+AdvancedBookingDays+DayBeforeDiwali+DayAfterDiwali+MarketShare+LoadFactor, data = airline)
summary(m6)
Call:
lm(formula = Price ~ Airline + DepartureCityCode + DepartureTime +
Departure + FlyingMinutes + Aircraft + IsWeekend + AdvancedBookingDays +
DayBeforeDiwali + DayAfterDiwali + MarketShare + LoadFactor,
data = airline)
Residuals:
Min 1Q Median 3Q Max
-2927.5 -1288.1 -412.2 656.4 11963.8
Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1310.9943 3928.9775 0.334 0.7389
AirlineIndiGo 236.6486 531.6554 0.445 0.6566
AirlineJet 247.1373 422.6021 0.585 0.5591
AirlineSpice Jet -397.9990 572.4546 -0.695 0.4875
DepartureCityCodeDEL -1510.7315 347.4631 -4.348 1.90e-05 ***
DepartureTime -0.8209 0.5382 -1.525 0.1283
DeparturePM 351.7589 619.8104 0.568 0.5708
FlyingMinutes 50.2658 29.6911 1.693 0.0915 .
AircraftBoeing -568.4491 352.5141 -1.613 0.1079
IsWeekendYes -346.8845 403.5414 -0.860 0.3907
AdvancedBookingDays -86.1787 12.3419 -6.983 1.96e-11 ***
DayBeforeDiwali 4706.4125 583.5226 8.066 1.90e-14 ***
DayAfterDiwali 3884.2991 600.8475 6.465 4.25e-10 ***
MarketShare NA NA NA NA
LoadFactor NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2054 on 292 degrees of freedom
Multiple R-squared: 0.2897, Adjusted R-squared: 0.2606
F-statistic: 9.927 on 12 and 292 DF, p-value: 2.756e-16
plot(airline$Price, col="red", type="l")
points(fitted(m6), col="green", type="l")