Air_Ticket

Slide With Code

x=read.csv("AirlinePricingData.csv")
str(x)
'data.frame':   305 obs. of  25 variables:
 $ FlightNumber       : Factor w/ 63 levels "6E 129","6E 155",..: 25 32 62 4 61 45 57 16 59 17 ...
 $ Airline            : Factor w/ 4 levels "Air India","IndiGo",..: 3 3 4 2 4 3 4 2 4 3 ...
 $ DepartureCityCode  : Factor w/ 2 levels "BOM","DEL": 2 1 2 2 1 1 2 2 1 1 ...
 $ ArrivalCityCode    : Factor w/ 2 levels "BOM","DEL": 1 2 1 1 2 2 1 1 2 2 ...
 $ DepartureTime      : int  225 300 350 455 555 605 635 640 645 700 ...
 $ ArrivalTime        : int  435 505 605 710 805 815 850 855 855 915 ...
 $ Departure          : Factor w/ 2 levels "AM","PM": 1 1 1 1 1 1 1 1 1 1 ...
 $ FlyingMinutes      : int  130 125 135 135 130 130 135 135 130 135 ...
 $ Aircraft           : Factor w/ 2 levels "Airbus","Boeing": 2 2 2 1 2 2 2 1 2 2 ...
 $ PlaneModel         : Factor w/ 9 levels "738","739","77W",..: 1 1 1 6 1 1 1 6 1 2 ...
 $ Capacity           : int  156 156 189 180 189 156 189 180 189 138 ...
 $ SeatPitch          : int  30 30 29 30 29 30 29 30 29 30 ...
 $ SeatWidth          : num  17 17 17 18 17 17 17 18 17 17 ...
 $ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 2 4 6 7 6 4 6 7 6 4 ...
 $ DateDeparture      : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ IsWeekend          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Price              : int  4051 11587 3977 4234 6837 6518 3189 4234 8623 6833 ...
 $ AdvancedBookingDays: int  54 52 48 59 48 52 48 59 48 52 ...
 $ IsDiwali           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ DayBeforeDiwali    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ DayAfterDiwali     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ MetroDeparture     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MetroArrival       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MarketShare        : num  15.4 15.4 13.2 39.6 13.2 15.4 13.2 39.6 13.2 15.4 ...
 $ LoadFactor         : num  83.3 83.3 94.1 87.2 94.1 ...
summary(x)
  FlightNumber      Airline    DepartureCityCode ArrivalCityCode
 6E 129 :  5   Air India: 41   BOM:130           BOM:175        
 6E 155 :  5   IndiGo   : 80   DEL:175           DEL:130        
 6E 167 :  5   Jet      :144                                    
 6E 171 :  5   Spice Jet: 40                                    
 6E 179 :  5                                                    
 6E 181 :  5                                                    
 (Other):275                                                    
 DepartureTime   ArrivalTime   Departure FlyingMinutes   Aircraft  
 Min.   : 225   Min.   :  20   AM:169    Min.   :125   Airbus:140  
 1st Qu.: 755   1st Qu.: 935   PM:136    1st Qu.:135   Boeing:165  
 Median :1035   Median :1215             Median :135               
 Mean   :1250   Mean   :1329             Mean   :136               
 3rd Qu.:1800   3rd Qu.:1925             3rd Qu.:140               
 Max.   :2320   Max.   :2345             Max.   :145               

   PlaneModel     Capacity       SeatPitch       SeatWidth    
 738    :113   Min.   :138.0   Min.   :29.00   Min.   :17.00  
 A320   : 80   1st Qu.:156.0   1st Qu.:30.00   1st Qu.:17.00  
 739    : 36   Median :180.0   Median :30.00   Median :17.00  
 A321   : 25   Mean   :176.4   Mean   :30.26   Mean   :17.41  
 A332   : 25   3rd Qu.:189.0   3rd Qu.:30.00   3rd Qu.:18.00  
 77W    : 10   Max.   :303.0   Max.   :33.00   Max.   :18.00  
 (Other): 16                                                  
   DataCollectionDate     DateDeparture IsWeekend     Price      
 Sep 10 2018:40       Nov 8 2018 : 62   No :264   Min.   : 2607  
 Sep 13 2018:30       Nov 6 2018 : 59   Yes: 41   1st Qu.: 4051  
 Sep 14 2018:30       Sep 21 2018: 23             Median : 4681  
 Sep 15 2018:45       Sep 17 2018: 17             Mean   : 5395  
 Sep 17 2018:39       Oct 19 2018: 16             3rd Qu.: 5725  
 Sep 19 2018:81       Sep 26 2018: 16             Max.   :18015  
 Sep 8 2018 :40       (Other)    :112                            
 AdvancedBookingDays    IsDiwali      DayBeforeDiwali  DayAfterDiwali  
 Min.   : 2.0        Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.: 7.0        1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :30.0        Median :0.0000   Median :0.0000   Median :0.0000  
 Mean   :28.9        Mean   :0.3967   Mean   :0.1934   Mean   :0.2033  
 3rd Qu.:52.0        3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
 Max.   :61.0        Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  

 MetroDeparture  MetroArrival  MarketShare      LoadFactor   
 Min.   :1      Min.   :1     Min.   :13.20   Min.   :78.73  
 1st Qu.:1      1st Qu.:1     1st Qu.:13.30   1st Qu.:83.32  
 Median :1      Median :1     Median :15.40   Median :83.32  
 Mean   :1      Mean   :1     Mean   :21.18   Mean   :85.13  
 3rd Qu.:1      3rd Qu.:1     3rd Qu.:39.60   3rd Qu.:87.20  
 Max.   :1      Max.   :1     Max.   :39.60   Max.   :94.06  

model=lm(Price~Airline,data=x)
summary(model)

Call:
lm(formula = Price ~ Airline, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2877.1 -1445.1  -683.1   273.5 13135.5 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)        6335.0      368.0  17.214  < 2e-16 ***
AirlineIndiGo     -1455.5      452.6  -3.216  0.00144 ** 
AirlineJet         -838.9      417.1  -2.011  0.04522 *  
AirlineSpice Jet  -1240.2      523.7  -2.368  0.01851 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2356 on 301 degrees of freedom
Multiple R-squared:  0.03608,   Adjusted R-squared:  0.02647 
F-statistic: 3.755 on 3 and 301 DF,  p-value: 0.01131
lookup = data.frame(PlaneModel=levels(x$PlaneModel), PlaneMod=c(1,2,3,4,5,6,7,8,9))
lookup2 = data.frame(Airline=levels(x$Airline), AirlineCode=c(1,2,3,4))
lookup3 = data.frame(DepartureCityCode=levels(x$DepartureCityCode), DCC1=c(1,2))
lookup4 = data.frame(ArrivalCityCode=levels(x$ArrivalCityCode), ACC1=c(1,2))
lookup5 = data.frame(Aircraft=levels(x$Aircraft), A1=c(1,2))
lookup6 = data.frame(IsWeekend=levels(x$IsWeekend), A1=c(1,2))
x$dept_mins <- with(x, x$DepartureTime %% 100)
 x$dept_hr <- with(x, x$DepartureTime %/% 100)
 x1 = merge(x, lookup, by="PlaneModel")
 x2= merge(x1, lookup2, by="Airline")
 x3= merge(x2, lookup3, by="DepartureCityCode")
 x4= merge(x3, lookup4, by="ArrivalCityCode")
 x5= merge(x4, lookup5, by="Aircraft")
 x6= merge(x5, lookup6, by="IsWeekend")
 head(x6)
  IsWeekend Aircraft ArrivalCityCode DepartureCityCode Airline PlaneModel
1        No   Airbus             BOM               DEL  IndiGo       A320
2        No   Airbus             BOM               DEL  IndiGo       A320
3        No   Airbus             BOM               DEL  IndiGo       A320
4        No   Airbus             BOM               DEL  IndiGo       A320
5        No   Airbus             BOM               DEL  IndiGo       A320
6        No   Airbus             BOM               DEL  IndiGo       A320
  FlightNumber DepartureTime ArrivalTime Departure FlyingMinutes Capacity
1       6E 197          1930        2145        PM           135      180
2       6E 179           830        1045        AM           135      180
3       6E 755          1600        1815        PM           135      180
4       6E 129          1730        1945        PM           135      180
5       6E 843          1835        2055        PM           140      180
6       6E 755          1600        1815        PM           135      180
  SeatPitch SeatWidth DataCollectionDate DateDeparture Price
1        30        18        Sep 10 2018   Sep 17 2018  3305
2        30        18         Sep 8 2018    Oct 8 2018  3031
3        30        18        Sep 10 2018   Sep 12 2018  4234
4        30        18        Sep 10 2018    Nov 6 2018  4234
5        30        18        Sep 10 2018   Sep 12 2018  5153
6        30        18        Sep 10 2018    Nov 6 2018  4234
  AdvancedBookingDays IsDiwali DayBeforeDiwali DayAfterDiwali
1                   7        0               0              0
2                  30        0               0              0
3                   2        0               0              0
4                  57        1               1              0
5                   2        0               0              0
6                  57        1               1              0
  MetroDeparture MetroArrival MarketShare LoadFactor dept_mins dept_hr
1              1            1        39.6       87.2        30      19
2              1            1        39.6       87.2        30       8
3              1            1        39.6       87.2         0      16
4              1            1        39.6       87.2        30      17
5              1            1        39.6       87.2        35      18
6              1            1        39.6       87.2         0      16
  PlaneMod AirlineCode DCC1 ACC1 A1.x A1.y
1        6           2    2    1    1    1
2        6           2    2    1    1    1
3        6           2    2    1    1    1
4        6           2    2    1    1    1
5        6           2    2    1    1    1
6        6           2    2    1    1    1
 str(x6)
'data.frame':   305 obs. of  33 variables:
 $ IsWeekend          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Aircraft           : Factor w/ 2 levels "Airbus","Boeing": 1 1 1 1 1 1 1 1 1 1 ...
 $ ArrivalCityCode    : Factor w/ 2 levels "BOM","DEL": 1 1 1 1 1 1 1 1 1 2 ...
 $ DepartureCityCode  : Factor w/ 2 levels "BOM","DEL": 2 2 2 2 2 2 2 2 2 1 ...
 $ Airline            : Factor w/ 4 levels "Air India","IndiGo",..: 2 2 2 2 2 2 2 2 2 1 ...
 $ PlaneModel         : Factor w/ 9 levels "738","739","77W",..: 6 6 6 6 6 6 6 6 6 7 ...
 $ FlightNumber       : Factor w/ 63 levels "6E 129","6E 155",..: 10 5 13 1 14 13 16 8 2 51 ...
 $ DepartureTime      : int  1930 830 1600 1730 1835 1600 640 1500 2225 1300 ...
 $ ArrivalTime        : int  2145 1045 1815 1945 2055 1815 855 1710 40 1515 ...
 $ Departure          : Factor w/ 2 levels "AM","PM": 2 1 2 2 2 2 1 2 2 2 ...
 $ FlyingMinutes      : int  135 135 135 135 140 135 135 130 135 135 ...
 $ Capacity           : int  180 180 180 180 180 180 180 180 180 170 ...
 $ SeatPitch          : int  30 30 30 30 30 30 30 30 30 31 ...
 $ SeatWidth          : num  18 18 18 18 18 18 18 18 18 17 ...
 $ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 1 7 1 1 1 1 7 7 1 6 ...
 $ DateDeparture      : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 14 9 11 1 11 1 1 10 3 17 ...
 $ Price              : int  3305 3031 4234 4234 5153 4234 4234 4597 2712 5357 ...
 $ AdvancedBookingDays: int  7 30 2 57 2 57 59 2 30 2 ...
 $ IsDiwali           : int  0 0 0 1 0 1 1 0 0 0 ...
 $ DayBeforeDiwali    : int  0 0 0 1 0 1 1 0 0 0 ...
 $ DayAfterDiwali     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ MetroDeparture     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MetroArrival       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MarketShare        : num  39.6 39.6 39.6 39.6 39.6 39.6 39.6 39.6 39.6 13.3 ...
 $ LoadFactor         : num  87.2 87.2 87.2 87.2 87.2 ...
 $ dept_mins          : num  30 30 0 30 35 0 40 0 25 0 ...
 $ dept_hr            : num  19 8 16 17 18 16 6 15 22 13 ...
 $ PlaneMod           : num  6 6 6 6 6 6 6 6 6 7 ...
 $ AirlineCode        : num  2 2 2 2 2 2 2 2 2 1 ...
 $ DCC1               : num  2 2 2 2 2 2 2 2 2 1 ...
 $ ACC1               : num  1 1 1 1 1 1 1 1 1 2 ...
 $ A1.x               : num  1 1 1 1 1 1 1 1 1 1 ...
 $ A1.y               : num  1 1 1 1 1 1 1 1 1 1 ...
nums=unlist(sapply(x6,is.numeric))
air1=x6[,nums]
model=lm(Price~.,data=air1)
summary(model)

Call:
lm(formula = Price ~ ., data = air1)

Residuals:
    Min      1Q  Median      3Q     Max 
-2965.3 -1257.6  -413.6   677.6 11948.7 

Coefficients: (5 not defined because of singularities)
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)         -3826.1580 27928.8825  -0.137   0.8911    
DepartureTime          -0.6683     0.2791  -2.395   0.0173 *  
ArrivalTime             0.2175     0.2486   0.875   0.3823    
FlyingMinutes          39.1162    30.1560   1.297   0.1956    
Capacity               -8.9722    10.5692  -0.849   0.3966    
SeatPitch              40.3964   578.3512   0.070   0.9444    
SeatWidth             914.3721   720.7139   1.269   0.2056    
AdvancedBookingDays   -86.4992    12.3763  -6.989 1.94e-11 ***
IsDiwali             3881.0407   602.3680   6.443 4.93e-10 ***
DayBeforeDiwali       824.6751   375.5277   2.196   0.0289 *  
DayAfterDiwali              NA         NA      NA       NA    
MetroDeparture              NA         NA      NA       NA    
MetroArrival                NA         NA      NA       NA    
MarketShare           -14.1518    32.9831  -0.429   0.6682    
LoadFactor            -52.3100   128.0880  -0.408   0.6833    
dept_mins               2.6666     8.4963   0.314   0.7539    
dept_hr                     NA         NA      NA       NA    
PlaneMod             -216.4960   398.5895  -0.543   0.5874    
AirlineCode            30.1896   471.4382   0.064   0.9490    
DCC1                -1477.2846   356.1260  -4.148 4.42e-05 ***
ACC1                        NA         NA      NA       NA    
A1.x                -1350.5017  1957.2817  -0.690   0.4908    
A1.y                 -403.0218   405.7635  -0.993   0.3214    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2058 on 287 degrees of freedom
Multiple R-squared:  0.2992,    Adjusted R-squared:  0.2577 
F-statistic: 7.207 on 17 and 287 DF,  p-value: 1.241e-14

##Inferences

R squared => 29.92%

  1. The model is significantly more valuable than a naive model, going by the p-value of the F-stat

  2. The significant variables are DepartureTime, AdvancedBookingDays, isDiwali, DayBeforeDiwali and DepartureCityCode

  3. The Departure Time was negatively affecting price, indicating morning flights were costlier

  4. More the advanced booking days, lesser was the price. For every advanced booking day, the ticket was cheaper by 87 Rs

  5. On Diwali day, the ticket was costlier by Rs 3,881

  6. On the day before Diwali, the ticket was costlier by Rs 824

  7. For Departure City Code, a flight from Bombay was costlier by Rs 1477

Slide With Plot

plot of chunk unnamed-chunk-2