x=read.csv("AirlinePricingData.csv")
str(x)
'data.frame': 305 obs. of 25 variables:
$ FlightNumber : Factor w/ 63 levels "6E 129","6E 155",..: 25 32 62 4 61 45 57 16 59 17 ...
$ Airline : Factor w/ 4 levels "Air India","IndiGo",..: 3 3 4 2 4 3 4 2 4 3 ...
$ DepartureCityCode : Factor w/ 2 levels "BOM","DEL": 2 1 2 2 1 1 2 2 1 1 ...
$ ArrivalCityCode : Factor w/ 2 levels "BOM","DEL": 1 2 1 1 2 2 1 1 2 2 ...
$ DepartureTime : int 225 300 350 455 555 605 635 640 645 700 ...
$ ArrivalTime : int 435 505 605 710 805 815 850 855 855 915 ...
$ Departure : Factor w/ 2 levels "AM","PM": 1 1 1 1 1 1 1 1 1 1 ...
$ FlyingMinutes : int 130 125 135 135 130 130 135 135 130 135 ...
$ Aircraft : Factor w/ 2 levels "Airbus","Boeing": 2 2 2 1 2 2 2 1 2 2 ...
$ PlaneModel : Factor w/ 9 levels "738","739","77W",..: 1 1 1 6 1 1 1 6 1 2 ...
$ Capacity : int 156 156 189 180 189 156 189 180 189 138 ...
$ SeatPitch : int 30 30 29 30 29 30 29 30 29 30 ...
$ SeatWidth : num 17 17 17 18 17 17 17 18 17 17 ...
$ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 2 4 6 7 6 4 6 7 6 4 ...
$ DateDeparture : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 1 1 1 1 1 1 1 1 1 1 ...
$ IsWeekend : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Price : int 4051 11587 3977 4234 6837 6518 3189 4234 8623 6833 ...
$ AdvancedBookingDays: int 54 52 48 59 48 52 48 59 48 52 ...
$ IsDiwali : int 1 1 1 1 1 1 1 1 1 1 ...
$ DayBeforeDiwali : int 1 1 1 1 1 1 1 1 1 1 ...
$ DayAfterDiwali : int 0 0 0 0 0 0 0 0 0 0 ...
$ MetroDeparture : int 1 1 1 1 1 1 1 1 1 1 ...
$ MetroArrival : int 1 1 1 1 1 1 1 1 1 1 ...
$ MarketShare : num 15.4 15.4 13.2 39.6 13.2 15.4 13.2 39.6 13.2 15.4 ...
$ LoadFactor : num 83.3 83.3 94.1 87.2 94.1 ...
summary(x)
FlightNumber Airline DepartureCityCode ArrivalCityCode
6E 129 : 5 Air India: 41 BOM:130 BOM:175
6E 155 : 5 IndiGo : 80 DEL:175 DEL:130
6E 167 : 5 Jet :144
6E 171 : 5 Spice Jet: 40
6E 179 : 5
6E 181 : 5
(Other):275
DepartureTime ArrivalTime Departure FlyingMinutes Aircraft
Min. : 225 Min. : 20 AM:169 Min. :125 Airbus:140
1st Qu.: 755 1st Qu.: 935 PM:136 1st Qu.:135 Boeing:165
Median :1035 Median :1215 Median :135
Mean :1250 Mean :1329 Mean :136
3rd Qu.:1800 3rd Qu.:1925 3rd Qu.:140
Max. :2320 Max. :2345 Max. :145
PlaneModel Capacity SeatPitch SeatWidth
738 :113 Min. :138.0 Min. :29.00 Min. :17.00
A320 : 80 1st Qu.:156.0 1st Qu.:30.00 1st Qu.:17.00
739 : 36 Median :180.0 Median :30.00 Median :17.00
A321 : 25 Mean :176.4 Mean :30.26 Mean :17.41
A332 : 25 3rd Qu.:189.0 3rd Qu.:30.00 3rd Qu.:18.00
77W : 10 Max. :303.0 Max. :33.00 Max. :18.00
(Other): 16
DataCollectionDate DateDeparture IsWeekend Price
Sep 10 2018:40 Nov 8 2018 : 62 No :264 Min. : 2607
Sep 13 2018:30 Nov 6 2018 : 59 Yes: 41 1st Qu.: 4051
Sep 14 2018:30 Sep 21 2018: 23 Median : 4681
Sep 15 2018:45 Sep 17 2018: 17 Mean : 5395
Sep 17 2018:39 Oct 19 2018: 16 3rd Qu.: 5725
Sep 19 2018:81 Sep 26 2018: 16 Max. :18015
Sep 8 2018 :40 (Other) :112
AdvancedBookingDays IsDiwali DayBeforeDiwali DayAfterDiwali
Min. : 2.0 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.: 7.0 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :30.0 Median :0.0000 Median :0.0000 Median :0.0000
Mean :28.9 Mean :0.3967 Mean :0.1934 Mean :0.2033
3rd Qu.:52.0 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :61.0 Max. :1.0000 Max. :1.0000 Max. :1.0000
MetroDeparture MetroArrival MarketShare LoadFactor
Min. :1 Min. :1 Min. :13.20 Min. :78.73
1st Qu.:1 1st Qu.:1 1st Qu.:13.30 1st Qu.:83.32
Median :1 Median :1 Median :15.40 Median :83.32
Mean :1 Mean :1 Mean :21.18 Mean :85.13
3rd Qu.:1 3rd Qu.:1 3rd Qu.:39.60 3rd Qu.:87.20
Max. :1 Max. :1 Max. :39.60 Max. :94.06
model=lm(Price~Airline,data=x)
summary(model)
Call:
lm(formula = Price ~ Airline, data = x)
Residuals:
Min 1Q Median 3Q Max
-2877.1 -1445.1 -683.1 273.5 13135.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6335.0 368.0 17.214 < 2e-16 ***
AirlineIndiGo -1455.5 452.6 -3.216 0.00144 **
AirlineJet -838.9 417.1 -2.011 0.04522 *
AirlineSpice Jet -1240.2 523.7 -2.368 0.01851 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2356 on 301 degrees of freedom
Multiple R-squared: 0.03608, Adjusted R-squared: 0.02647
F-statistic: 3.755 on 3 and 301 DF, p-value: 0.01131
lookup = data.frame(PlaneModel=levels(x$PlaneModel), PlaneMod=c(1,2,3,4,5,6,7,8,9))
lookup2 = data.frame(Airline=levels(x$Airline), AirlineCode=c(1,2,3,4))
lookup3 = data.frame(DepartureCityCode=levels(x$DepartureCityCode), DCC1=c(1,2))
lookup4 = data.frame(ArrivalCityCode=levels(x$ArrivalCityCode), ACC1=c(1,2))
lookup5 = data.frame(Aircraft=levels(x$Aircraft), A1=c(1,2))
lookup6 = data.frame(IsWeekend=levels(x$IsWeekend), A1=c(1,2))
x$dept_mins <- with(x, x$DepartureTime %% 100)
x$dept_hr <- with(x, x$DepartureTime %/% 100)
x1 = merge(x, lookup, by="PlaneModel")
x2= merge(x1, lookup2, by="Airline")
x3= merge(x2, lookup3, by="DepartureCityCode")
x4= merge(x3, lookup4, by="ArrivalCityCode")
x5= merge(x4, lookup5, by="Aircraft")
x6= merge(x5, lookup6, by="IsWeekend")
head(x6)
IsWeekend Aircraft ArrivalCityCode DepartureCityCode Airline PlaneModel
1 No Airbus BOM DEL IndiGo A320
2 No Airbus BOM DEL IndiGo A320
3 No Airbus BOM DEL IndiGo A320
4 No Airbus BOM DEL IndiGo A320
5 No Airbus BOM DEL IndiGo A320
6 No Airbus BOM DEL IndiGo A320
FlightNumber DepartureTime ArrivalTime Departure FlyingMinutes Capacity
1 6E 197 1930 2145 PM 135 180
2 6E 179 830 1045 AM 135 180
3 6E 755 1600 1815 PM 135 180
4 6E 129 1730 1945 PM 135 180
5 6E 843 1835 2055 PM 140 180
6 6E 755 1600 1815 PM 135 180
SeatPitch SeatWidth DataCollectionDate DateDeparture Price
1 30 18 Sep 10 2018 Sep 17 2018 3305
2 30 18 Sep 8 2018 Oct 8 2018 3031
3 30 18 Sep 10 2018 Sep 12 2018 4234
4 30 18 Sep 10 2018 Nov 6 2018 4234
5 30 18 Sep 10 2018 Sep 12 2018 5153
6 30 18 Sep 10 2018 Nov 6 2018 4234
AdvancedBookingDays IsDiwali DayBeforeDiwali DayAfterDiwali
1 7 0 0 0
2 30 0 0 0
3 2 0 0 0
4 57 1 1 0
5 2 0 0 0
6 57 1 1 0
MetroDeparture MetroArrival MarketShare LoadFactor dept_mins dept_hr
1 1 1 39.6 87.2 30 19
2 1 1 39.6 87.2 30 8
3 1 1 39.6 87.2 0 16
4 1 1 39.6 87.2 30 17
5 1 1 39.6 87.2 35 18
6 1 1 39.6 87.2 0 16
PlaneMod AirlineCode DCC1 ACC1 A1.x A1.y
1 6 2 2 1 1 1
2 6 2 2 1 1 1
3 6 2 2 1 1 1
4 6 2 2 1 1 1
5 6 2 2 1 1 1
6 6 2 2 1 1 1
str(x6)
'data.frame': 305 obs. of 33 variables:
$ IsWeekend : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Aircraft : Factor w/ 2 levels "Airbus","Boeing": 1 1 1 1 1 1 1 1 1 1 ...
$ ArrivalCityCode : Factor w/ 2 levels "BOM","DEL": 1 1 1 1 1 1 1 1 1 2 ...
$ DepartureCityCode : Factor w/ 2 levels "BOM","DEL": 2 2 2 2 2 2 2 2 2 1 ...
$ Airline : Factor w/ 4 levels "Air India","IndiGo",..: 2 2 2 2 2 2 2 2 2 1 ...
$ PlaneModel : Factor w/ 9 levels "738","739","77W",..: 6 6 6 6 6 6 6 6 6 7 ...
$ FlightNumber : Factor w/ 63 levels "6E 129","6E 155",..: 10 5 13 1 14 13 16 8 2 51 ...
$ DepartureTime : int 1930 830 1600 1730 1835 1600 640 1500 2225 1300 ...
$ ArrivalTime : int 2145 1045 1815 1945 2055 1815 855 1710 40 1515 ...
$ Departure : Factor w/ 2 levels "AM","PM": 2 1 2 2 2 2 1 2 2 2 ...
$ FlyingMinutes : int 135 135 135 135 140 135 135 130 135 135 ...
$ Capacity : int 180 180 180 180 180 180 180 180 180 170 ...
$ SeatPitch : int 30 30 30 30 30 30 30 30 30 31 ...
$ SeatWidth : num 18 18 18 18 18 18 18 18 18 17 ...
$ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 1 7 1 1 1 1 7 7 1 6 ...
$ DateDeparture : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 14 9 11 1 11 1 1 10 3 17 ...
$ Price : int 3305 3031 4234 4234 5153 4234 4234 4597 2712 5357 ...
$ AdvancedBookingDays: int 7 30 2 57 2 57 59 2 30 2 ...
$ IsDiwali : int 0 0 0 1 0 1 1 0 0 0 ...
$ DayBeforeDiwali : int 0 0 0 1 0 1 1 0 0 0 ...
$ DayAfterDiwali : int 0 0 0 0 0 0 0 0 0 0 ...
$ MetroDeparture : int 1 1 1 1 1 1 1 1 1 1 ...
$ MetroArrival : int 1 1 1 1 1 1 1 1 1 1 ...
$ MarketShare : num 39.6 39.6 39.6 39.6 39.6 39.6 39.6 39.6 39.6 13.3 ...
$ LoadFactor : num 87.2 87.2 87.2 87.2 87.2 ...
$ dept_mins : num 30 30 0 30 35 0 40 0 25 0 ...
$ dept_hr : num 19 8 16 17 18 16 6 15 22 13 ...
$ PlaneMod : num 6 6 6 6 6 6 6 6 6 7 ...
$ AirlineCode : num 2 2 2 2 2 2 2 2 2 1 ...
$ DCC1 : num 2 2 2 2 2 2 2 2 2 1 ...
$ ACC1 : num 1 1 1 1 1 1 1 1 1 2 ...
$ A1.x : num 1 1 1 1 1 1 1 1 1 1 ...
$ A1.y : num 1 1 1 1 1 1 1 1 1 1 ...
nums=unlist(sapply(x6,is.numeric))
air1=x6[,nums]
model=lm(Price~.,data=air1)
summary(model)
Call:
lm(formula = Price ~ ., data = air1)
Residuals:
Min 1Q Median 3Q Max
-2965.3 -1257.6 -413.6 677.6 11948.7
Coefficients: (5 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3826.1580 27928.8825 -0.137 0.8911
DepartureTime -0.6683 0.2791 -2.395 0.0173 *
ArrivalTime 0.2175 0.2486 0.875 0.3823
FlyingMinutes 39.1162 30.1560 1.297 0.1956
Capacity -8.9722 10.5692 -0.849 0.3966
SeatPitch 40.3964 578.3512 0.070 0.9444
SeatWidth 914.3721 720.7139 1.269 0.2056
AdvancedBookingDays -86.4992 12.3763 -6.989 1.94e-11 ***
IsDiwali 3881.0407 602.3680 6.443 4.93e-10 ***
DayBeforeDiwali 824.6751 375.5277 2.196 0.0289 *
DayAfterDiwali NA NA NA NA
MetroDeparture NA NA NA NA
MetroArrival NA NA NA NA
MarketShare -14.1518 32.9831 -0.429 0.6682
LoadFactor -52.3100 128.0880 -0.408 0.6833
dept_mins 2.6666 8.4963 0.314 0.7539
dept_hr NA NA NA NA
PlaneMod -216.4960 398.5895 -0.543 0.5874
AirlineCode 30.1896 471.4382 0.064 0.9490
DCC1 -1477.2846 356.1260 -4.148 4.42e-05 ***
ACC1 NA NA NA NA
A1.x -1350.5017 1957.2817 -0.690 0.4908
A1.y -403.0218 405.7635 -0.993 0.3214
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2058 on 287 degrees of freedom
Multiple R-squared: 0.2992, Adjusted R-squared: 0.2577
F-statistic: 7.207 on 17 and 287 DF, p-value: 1.241e-14
##Inferences
R squared => 29.92%
The model is significantly more valuable than a naive model, going by the p-value of the F-stat
The significant variables are DepartureTime, AdvancedBookingDays, isDiwali, DayBeforeDiwali and DepartureCityCode
The Departure Time was negatively affecting price, indicating morning flights were costlier
More the advanced booking days, lesser was the price. For every advanced booking day, the ticket was cheaper by 87 Rs
On Diwali day, the ticket was costlier by Rs 3,881
On the day before Diwali, the ticket was costlier by Rs 824
For Departure City Code, a flight from Bombay was costlier by Rs 1477