For more details on authoring R presentations please visit https://support.rstudio.com/hc/en-us/articles/200486468.
x=read.csv("AirlinePricingData.csv")
str(x)
'data.frame': 305 obs. of 25 variables:
$ FlightNumber : Factor w/ 63 levels "6E 129","6E 155",..: 25 32 62 4 61 45 57 16 59 17 ...
$ Airline : Factor w/ 4 levels "Air India","IndiGo",..: 3 3 4 2 4 3 4 2 4 3 ...
$ DepartureCityCode : Factor w/ 2 levels "BOM","DEL": 2 1 2 2 1 1 2 2 1 1 ...
$ ArrivalCityCode : Factor w/ 2 levels "BOM","DEL": 1 2 1 1 2 2 1 1 2 2 ...
$ DepartureTime : int 225 300 350 455 555 605 635 640 645 700 ...
$ ArrivalTime : int 435 505 605 710 805 815 850 855 855 915 ...
$ Departure : Factor w/ 2 levels "AM","PM": 1 1 1 1 1 1 1 1 1 1 ...
$ FlyingMinutes : int 130 125 135 135 130 130 135 135 130 135 ...
$ Aircraft : Factor w/ 2 levels "Airbus","Boeing": 2 2 2 1 2 2 2 1 2 2 ...
$ PlaneModel : Factor w/ 9 levels "738","739","77W",..: 1 1 1 6 1 1 1 6 1 2 ...
$ Capacity : int 156 156 189 180 189 156 189 180 189 138 ...
$ SeatPitch : int 30 30 29 30 29 30 29 30 29 30 ...
$ SeatWidth : num 17 17 17 18 17 17 17 18 17 17 ...
$ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 2 4 6 7 6 4 6 7 6 4 ...
$ DateDeparture : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 1 1 1 1 1 1 1 1 1 1 ...
$ IsWeekend : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Price : int 4051 11587 3977 4234 6837 6518 3189 4234 8623 6833 ...
$ AdvancedBookingDays: int 54 52 48 59 48 52 48 59 48 52 ...
$ IsDiwali : int 1 1 1 1 1 1 1 1 1 1 ...
$ DayBeforeDiwali : int 1 1 1 1 1 1 1 1 1 1 ...
$ DayAfterDiwali : int 0 0 0 0 0 0 0 0 0 0 ...
$ MetroDeparture : int 1 1 1 1 1 1 1 1 1 1 ...
$ MetroArrival : int 1 1 1 1 1 1 1 1 1 1 ...
$ MarketShare : num 15.4 15.4 13.2 39.6 13.2 15.4 13.2 39.6 13.2 15.4 ...
$ LoadFactor : num 83.3 83.3 94.1 87.2 94.1 ...
summary(x)
FlightNumber Airline DepartureCityCode ArrivalCityCode
6E 129 : 5 Air India: 41 BOM:130 BOM:175
6E 155 : 5 IndiGo : 80 DEL:175 DEL:130
6E 167 : 5 Jet :144
6E 171 : 5 Spice Jet: 40
6E 179 : 5
6E 181 : 5
(Other):275
DepartureTime ArrivalTime Departure FlyingMinutes Aircraft
Min. : 225 Min. : 20 AM:169 Min. :125 Airbus:140
1st Qu.: 755 1st Qu.: 935 PM:136 1st Qu.:135 Boeing:165
Median :1035 Median :1215 Median :135
Mean :1250 Mean :1329 Mean :136
3rd Qu.:1800 3rd Qu.:1925 3rd Qu.:140
Max. :2320 Max. :2345 Max. :145
PlaneModel Capacity SeatPitch SeatWidth
738 :113 Min. :138.0 Min. :29.00 Min. :17.00
A320 : 80 1st Qu.:156.0 1st Qu.:30.00 1st Qu.:17.00
739 : 36 Median :180.0 Median :30.00 Median :17.00
A321 : 25 Mean :176.4 Mean :30.26 Mean :17.41
A332 : 25 3rd Qu.:189.0 3rd Qu.:30.00 3rd Qu.:18.00
77W : 10 Max. :303.0 Max. :33.00 Max. :18.00
(Other): 16
DataCollectionDate DateDeparture IsWeekend Price
Sep 10 2018:40 Nov 8 2018 : 62 No :264 Min. : 2607
Sep 13 2018:30 Nov 6 2018 : 59 Yes: 41 1st Qu.: 4051
Sep 14 2018:30 Sep 21 2018: 23 Median : 4681
Sep 15 2018:45 Sep 17 2018: 17 Mean : 5395
Sep 17 2018:39 Oct 19 2018: 16 3rd Qu.: 5725
Sep 19 2018:81 Sep 26 2018: 16 Max. :18015
Sep 8 2018 :40 (Other) :112
AdvancedBookingDays IsDiwali DayBeforeDiwali DayAfterDiwali
Min. : 2.0 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.: 7.0 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :30.0 Median :0.0000 Median :0.0000 Median :0.0000
Mean :28.9 Mean :0.3967 Mean :0.1934 Mean :0.2033
3rd Qu.:52.0 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :61.0 Max. :1.0000 Max. :1.0000 Max. :1.0000
MetroDeparture MetroArrival MarketShare LoadFactor
Min. :1 Min. :1 Min. :13.20 Min. :78.73
1st Qu.:1 1st Qu.:1 1st Qu.:13.30 1st Qu.:83.32
Median :1 Median :1 Median :15.40 Median :83.32
Mean :1 Mean :1 Mean :21.18 Mean :85.13
3rd Qu.:1 3rd Qu.:1 3rd Qu.:39.60 3rd Qu.:87.20
Max. :1 Max. :1 Max. :39.60 Max. :94.06
model=lm(Price~Airline,data=x)
summary(model)
Call:
lm(formula = Price ~ Airline, data = x)
Residuals:
Min 1Q Median 3Q Max
-2877.1 -1445.1 -683.1 273.5 13135.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6335.0 368.0 17.214 < 2e-16 ***
AirlineIndiGo -1455.5 452.6 -3.216 0.00144 **
AirlineJet -838.9 417.1 -2.011 0.04522 *
AirlineSpice Jet -1240.2 523.7 -2.368 0.01851 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2356 on 301 degrees of freedom
Multiple R-squared: 0.03608, Adjusted R-squared: 0.02647
F-statistic: 3.755 on 3 and 301 DF, p-value: 0.01131
lookup = data.frame(PlaneModel=levels(x$PlaneModel), PlaneMod=c(1,2,3,4,5,6,7,8,9))
lookup2 = data.frame(Airline=levels(x$Airline), AirlineCode=c(1,2,3,4))
lookup3 = data.frame(DepartureCityCode=levels(x$DepartureCityCode), DCC1=c(1,2))
lookup4 = data.frame(ArrivalCityCode=levels(x$ArrivalCityCode), ACC1=c(1,2))
lookup5 = data.frame(Aircraft=levels(x$Aircraft), A1=c(1,2))
lookup6 = data.frame(IsWeekend=levels(x$IsWeekend), A1=c(1,2))
x$dept_mins <- with(x, x$DepartureTime %% 100)
x$dept_hr <- with(x, x$DepartureTime %/% 100)
x1 = merge(x, lookup, by="PlaneModel")
x2= merge(x1, lookup2, by="Airline")
x3= merge(x2, lookup3, by="DepartureCityCode")
x4= merge(x3, lookup4, by="ArrivalCityCode")
x5= merge(x4, lookup5, by="Aircraft")
x6= merge(x5, lookup6, by="IsWeekend")
head(x6)
IsWeekend Aircraft ArrivalCityCode DepartureCityCode Airline PlaneModel
1 No Airbus BOM DEL IndiGo A320
2 No Airbus BOM DEL IndiGo A320
3 No Airbus BOM DEL IndiGo A320
4 No Airbus BOM DEL IndiGo A320
5 No Airbus BOM DEL IndiGo A320
6 No Airbus BOM DEL IndiGo A320
FlightNumber DepartureTime ArrivalTime Departure FlyingMinutes Capacity
1 6E 197 1930 2145 PM 135 180
2 6E 179 830 1045 AM 135 180
3 6E 755 1600 1815 PM 135 180
4 6E 129 1730 1945 PM 135 180
5 6E 843 1835 2055 PM 140 180
6 6E 755 1600 1815 PM 135 180
SeatPitch SeatWidth DataCollectionDate DateDeparture Price
1 30 18 Sep 10 2018 Sep 17 2018 3305
2 30 18 Sep 8 2018 Oct 8 2018 3031
3 30 18 Sep 10 2018 Sep 12 2018 4234
4 30 18 Sep 10 2018 Nov 6 2018 4234
5 30 18 Sep 10 2018 Sep 12 2018 5153
6 30 18 Sep 10 2018 Nov 6 2018 4234
AdvancedBookingDays IsDiwali DayBeforeDiwali DayAfterDiwali
1 7 0 0 0
2 30 0 0 0
3 2 0 0 0
4 57 1 1 0
5 2 0 0 0
6 57 1 1 0
MetroDeparture MetroArrival MarketShare LoadFactor dept_mins dept_hr
1 1 1 39.6 87.2 30 19
2 1 1 39.6 87.2 30 8
3 1 1 39.6 87.2 0 16
4 1 1 39.6 87.2 30 17
5 1 1 39.6 87.2 35 18
6 1 1 39.6 87.2 0 16
PlaneMod AirlineCode DCC1 ACC1 A1.x A1.y
1 6 2 2 1 1 1
2 6 2 2 1 1 1
3 6 2 2 1 1 1
4 6 2 2 1 1 1
5 6 2 2 1 1 1
6 6 2 2 1 1 1
str(x6)
'data.frame': 305 obs. of 33 variables:
$ IsWeekend : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ Aircraft : Factor w/ 2 levels "Airbus","Boeing": 1 1 1 1 1 1 1 1 1 1 ...
$ ArrivalCityCode : Factor w/ 2 levels "BOM","DEL": 1 1 1 1 1 1 1 1 1 2 ...
$ DepartureCityCode : Factor w/ 2 levels "BOM","DEL": 2 2 2 2 2 2 2 2 2 1 ...
$ Airline : Factor w/ 4 levels "Air India","IndiGo",..: 2 2 2 2 2 2 2 2 2 1 ...
$ PlaneModel : Factor w/ 9 levels "738","739","77W",..: 6 6 6 6 6 6 6 6 6 7 ...
$ FlightNumber : Factor w/ 63 levels "6E 129","6E 155",..: 10 5 13 1 14 13 16 8 2 51 ...
$ DepartureTime : int 1930 830 1600 1730 1835 1600 640 1500 2225 1300 ...
$ ArrivalTime : int 2145 1045 1815 1945 2055 1815 855 1710 40 1515 ...
$ Departure : Factor w/ 2 levels "AM","PM": 2 1 2 2 2 2 1 2 2 2 ...
$ FlyingMinutes : int 135 135 135 135 140 135 135 130 135 135 ...
$ Capacity : int 180 180 180 180 180 180 180 180 180 170 ...
$ SeatPitch : int 30 30 30 30 30 30 30 30 30 31 ...
$ SeatWidth : num 18 18 18 18 18 18 18 18 18 17 ...
$ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 1 7 1 1 1 1 7 7 1 6 ...
$ DateDeparture : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 14 9 11 1 11 1 1 10 3 17 ...
$ Price : int 3305 3031 4234 4234 5153 4234 4234 4597 2712 5357 ...
$ AdvancedBookingDays: int 7 30 2 57 2 57 59 2 30 2 ...
$ IsDiwali : int 0 0 0 1 0 1 1 0 0 0 ...
$ DayBeforeDiwali : int 0 0 0 1 0 1 1 0 0 0 ...
$ DayAfterDiwali : int 0 0 0 0 0 0 0 0 0 0 ...
$ MetroDeparture : int 1 1 1 1 1 1 1 1 1 1 ...
$ MetroArrival : int 1 1 1 1 1 1 1 1 1 1 ...
$ MarketShare : num 39.6 39.6 39.6 39.6 39.6 39.6 39.6 39.6 39.6 13.3 ...
$ LoadFactor : num 87.2 87.2 87.2 87.2 87.2 ...
$ dept_mins : num 30 30 0 30 35 0 40 0 25 0 ...
$ dept_hr : num 19 8 16 17 18 16 6 15 22 13 ...
$ PlaneMod : num 6 6 6 6 6 6 6 6 6 7 ...
$ AirlineCode : num 2 2 2 2 2 2 2 2 2 1 ...
$ DCC1 : num 2 2 2 2 2 2 2 2 2 1 ...
$ ACC1 : num 1 1 1 1 1 1 1 1 1 2 ...
$ A1.x : num 1 1 1 1 1 1 1 1 1 1 ...
$ A1.y : num 1 1 1 1 1 1 1 1 1 1 ...
nums=unlist(sapply(x6,is.numeric))
air1=x6[,nums]
model=lm(Price~.,data=air1)
summary(model)
Call:
lm(formula = Price ~ ., data = air1)
Residuals:
Min 1Q Median 3Q Max
-2965.3 -1257.6 -413.6 677.6 11948.7
Coefficients: (5 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3826.1580 27928.8825 -0.137 0.8911
DepartureTime -0.6683 0.2791 -2.395 0.0173 *
ArrivalTime 0.2175 0.2486 0.875 0.3823
FlyingMinutes 39.1162 30.1560 1.297 0.1956
Capacity -8.9722 10.5692 -0.849 0.3966
SeatPitch 40.3964 578.3512 0.070 0.9444
SeatWidth 914.3721 720.7139 1.269 0.2056
AdvancedBookingDays -86.4992 12.3763 -6.989 1.94e-11 ***
IsDiwali 3881.0407 602.3680 6.443 4.93e-10 ***
DayBeforeDiwali 824.6751 375.5277 2.196 0.0289 *
DayAfterDiwali NA NA NA NA
MetroDeparture NA NA NA NA
MetroArrival NA NA NA NA
MarketShare -14.1518 32.9831 -0.429 0.6682
LoadFactor -52.3100 128.0880 -0.408 0.6833
dept_mins 2.6666 8.4963 0.314 0.7539
dept_hr NA NA NA NA
PlaneMod -216.4960 398.5895 -0.543 0.5874
AirlineCode 30.1896 471.4382 0.064 0.9490
DCC1 -1477.2846 356.1260 -4.148 4.42e-05 ***
ACC1 NA NA NA NA
A1.x -1350.5017 1957.2817 -0.690 0.4908
A1.y -403.0218 405.7635 -0.993 0.3214
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2058 on 287 degrees of freedom
Multiple R-squared: 0.2992, Adjusted R-squared: 0.2577
F-statistic: 7.207 on 17 and 287 DF, p-value: 1.241e-14
##Inferences 1. The model is significantly more valuable than a naive model, going by the p-value of the F-stat 2. The significant variables are DepartureTime, AdvancedBookingDays, isDiwali, DayBeforeDiwali and DepartureCityCode
Error in plot_summs(model, scale = TRUE, plot.distributions = TRUE, inner_ci_level = 0.9) :
could not find function "plot_summs"