Air_Ticket

First Slide

For more details on authoring R presentations please visit https://support.rstudio.com/hc/en-us/articles/200486468.

  • Bullet 1
  • Bullet 2
  • Bullet 3

Slide With Code

x=read.csv("AirlinePricingData.csv")
str(x)
'data.frame':   305 obs. of  25 variables:
 $ FlightNumber       : Factor w/ 63 levels "6E 129","6E 155",..: 25 32 62 4 61 45 57 16 59 17 ...
 $ Airline            : Factor w/ 4 levels "Air India","IndiGo",..: 3 3 4 2 4 3 4 2 4 3 ...
 $ DepartureCityCode  : Factor w/ 2 levels "BOM","DEL": 2 1 2 2 1 1 2 2 1 1 ...
 $ ArrivalCityCode    : Factor w/ 2 levels "BOM","DEL": 1 2 1 1 2 2 1 1 2 2 ...
 $ DepartureTime      : int  225 300 350 455 555 605 635 640 645 700 ...
 $ ArrivalTime        : int  435 505 605 710 805 815 850 855 855 915 ...
 $ Departure          : Factor w/ 2 levels "AM","PM": 1 1 1 1 1 1 1 1 1 1 ...
 $ FlyingMinutes      : int  130 125 135 135 130 130 135 135 130 135 ...
 $ Aircraft           : Factor w/ 2 levels "Airbus","Boeing": 2 2 2 1 2 2 2 1 2 2 ...
 $ PlaneModel         : Factor w/ 9 levels "738","739","77W",..: 1 1 1 6 1 1 1 6 1 2 ...
 $ Capacity           : int  156 156 189 180 189 156 189 180 189 138 ...
 $ SeatPitch          : int  30 30 29 30 29 30 29 30 29 30 ...
 $ SeatWidth          : num  17 17 17 18 17 17 17 18 17 17 ...
 $ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 2 4 6 7 6 4 6 7 6 4 ...
 $ DateDeparture      : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ IsWeekend          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Price              : int  4051 11587 3977 4234 6837 6518 3189 4234 8623 6833 ...
 $ AdvancedBookingDays: int  54 52 48 59 48 52 48 59 48 52 ...
 $ IsDiwali           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ DayBeforeDiwali    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ DayAfterDiwali     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ MetroDeparture     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MetroArrival       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MarketShare        : num  15.4 15.4 13.2 39.6 13.2 15.4 13.2 39.6 13.2 15.4 ...
 $ LoadFactor         : num  83.3 83.3 94.1 87.2 94.1 ...
summary(x)
  FlightNumber      Airline    DepartureCityCode ArrivalCityCode
 6E 129 :  5   Air India: 41   BOM:130           BOM:175        
 6E 155 :  5   IndiGo   : 80   DEL:175           DEL:130        
 6E 167 :  5   Jet      :144                                    
 6E 171 :  5   Spice Jet: 40                                    
 6E 179 :  5                                                    
 6E 181 :  5                                                    
 (Other):275                                                    
 DepartureTime   ArrivalTime   Departure FlyingMinutes   Aircraft  
 Min.   : 225   Min.   :  20   AM:169    Min.   :125   Airbus:140  
 1st Qu.: 755   1st Qu.: 935   PM:136    1st Qu.:135   Boeing:165  
 Median :1035   Median :1215             Median :135               
 Mean   :1250   Mean   :1329             Mean   :136               
 3rd Qu.:1800   3rd Qu.:1925             3rd Qu.:140               
 Max.   :2320   Max.   :2345             Max.   :145               

   PlaneModel     Capacity       SeatPitch       SeatWidth    
 738    :113   Min.   :138.0   Min.   :29.00   Min.   :17.00  
 A320   : 80   1st Qu.:156.0   1st Qu.:30.00   1st Qu.:17.00  
 739    : 36   Median :180.0   Median :30.00   Median :17.00  
 A321   : 25   Mean   :176.4   Mean   :30.26   Mean   :17.41  
 A332   : 25   3rd Qu.:189.0   3rd Qu.:30.00   3rd Qu.:18.00  
 77W    : 10   Max.   :303.0   Max.   :33.00   Max.   :18.00  
 (Other): 16                                                  
   DataCollectionDate     DateDeparture IsWeekend     Price      
 Sep 10 2018:40       Nov 8 2018 : 62   No :264   Min.   : 2607  
 Sep 13 2018:30       Nov 6 2018 : 59   Yes: 41   1st Qu.: 4051  
 Sep 14 2018:30       Sep 21 2018: 23             Median : 4681  
 Sep 15 2018:45       Sep 17 2018: 17             Mean   : 5395  
 Sep 17 2018:39       Oct 19 2018: 16             3rd Qu.: 5725  
 Sep 19 2018:81       Sep 26 2018: 16             Max.   :18015  
 Sep 8 2018 :40       (Other)    :112                            
 AdvancedBookingDays    IsDiwali      DayBeforeDiwali  DayAfterDiwali  
 Min.   : 2.0        Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.: 7.0        1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :30.0        Median :0.0000   Median :0.0000   Median :0.0000  
 Mean   :28.9        Mean   :0.3967   Mean   :0.1934   Mean   :0.2033  
 3rd Qu.:52.0        3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
 Max.   :61.0        Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  

 MetroDeparture  MetroArrival  MarketShare      LoadFactor   
 Min.   :1      Min.   :1     Min.   :13.20   Min.   :78.73  
 1st Qu.:1      1st Qu.:1     1st Qu.:13.30   1st Qu.:83.32  
 Median :1      Median :1     Median :15.40   Median :83.32  
 Mean   :1      Mean   :1     Mean   :21.18   Mean   :85.13  
 3rd Qu.:1      3rd Qu.:1     3rd Qu.:39.60   3rd Qu.:87.20  
 Max.   :1      Max.   :1     Max.   :39.60   Max.   :94.06  

model=lm(Price~Airline,data=x)
summary(model)

Call:
lm(formula = Price ~ Airline, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2877.1 -1445.1  -683.1   273.5 13135.5 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)        6335.0      368.0  17.214  < 2e-16 ***
AirlineIndiGo     -1455.5      452.6  -3.216  0.00144 ** 
AirlineJet         -838.9      417.1  -2.011  0.04522 *  
AirlineSpice Jet  -1240.2      523.7  -2.368  0.01851 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2356 on 301 degrees of freedom
Multiple R-squared:  0.03608,   Adjusted R-squared:  0.02647 
F-statistic: 3.755 on 3 and 301 DF,  p-value: 0.01131
lookup = data.frame(PlaneModel=levels(x$PlaneModel), PlaneMod=c(1,2,3,4,5,6,7,8,9))
lookup2 = data.frame(Airline=levels(x$Airline), AirlineCode=c(1,2,3,4))
lookup3 = data.frame(DepartureCityCode=levels(x$DepartureCityCode), DCC1=c(1,2))
lookup4 = data.frame(ArrivalCityCode=levels(x$ArrivalCityCode), ACC1=c(1,2))
lookup5 = data.frame(Aircraft=levels(x$Aircraft), A1=c(1,2))
lookup6 = data.frame(IsWeekend=levels(x$IsWeekend), A1=c(1,2))
x$dept_mins <- with(x, x$DepartureTime %% 100)
 x$dept_hr <- with(x, x$DepartureTime %/% 100)
 x1 = merge(x, lookup, by="PlaneModel")
 x2= merge(x1, lookup2, by="Airline")
 x3= merge(x2, lookup3, by="DepartureCityCode")
 x4= merge(x3, lookup4, by="ArrivalCityCode")
 x5= merge(x4, lookup5, by="Aircraft")
 x6= merge(x5, lookup6, by="IsWeekend")
 head(x6)
  IsWeekend Aircraft ArrivalCityCode DepartureCityCode Airline PlaneModel
1        No   Airbus             BOM               DEL  IndiGo       A320
2        No   Airbus             BOM               DEL  IndiGo       A320
3        No   Airbus             BOM               DEL  IndiGo       A320
4        No   Airbus             BOM               DEL  IndiGo       A320
5        No   Airbus             BOM               DEL  IndiGo       A320
6        No   Airbus             BOM               DEL  IndiGo       A320
  FlightNumber DepartureTime ArrivalTime Departure FlyingMinutes Capacity
1       6E 197          1930        2145        PM           135      180
2       6E 179           830        1045        AM           135      180
3       6E 755          1600        1815        PM           135      180
4       6E 129          1730        1945        PM           135      180
5       6E 843          1835        2055        PM           140      180
6       6E 755          1600        1815        PM           135      180
  SeatPitch SeatWidth DataCollectionDate DateDeparture Price
1        30        18        Sep 10 2018   Sep 17 2018  3305
2        30        18         Sep 8 2018    Oct 8 2018  3031
3        30        18        Sep 10 2018   Sep 12 2018  4234
4        30        18        Sep 10 2018    Nov 6 2018  4234
5        30        18        Sep 10 2018   Sep 12 2018  5153
6        30        18        Sep 10 2018    Nov 6 2018  4234
  AdvancedBookingDays IsDiwali DayBeforeDiwali DayAfterDiwali
1                   7        0               0              0
2                  30        0               0              0
3                   2        0               0              0
4                  57        1               1              0
5                   2        0               0              0
6                  57        1               1              0
  MetroDeparture MetroArrival MarketShare LoadFactor dept_mins dept_hr
1              1            1        39.6       87.2        30      19
2              1            1        39.6       87.2        30       8
3              1            1        39.6       87.2         0      16
4              1            1        39.6       87.2        30      17
5              1            1        39.6       87.2        35      18
6              1            1        39.6       87.2         0      16
  PlaneMod AirlineCode DCC1 ACC1 A1.x A1.y
1        6           2    2    1    1    1
2        6           2    2    1    1    1
3        6           2    2    1    1    1
4        6           2    2    1    1    1
5        6           2    2    1    1    1
6        6           2    2    1    1    1
 str(x6)
'data.frame':   305 obs. of  33 variables:
 $ IsWeekend          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ Aircraft           : Factor w/ 2 levels "Airbus","Boeing": 1 1 1 1 1 1 1 1 1 1 ...
 $ ArrivalCityCode    : Factor w/ 2 levels "BOM","DEL": 1 1 1 1 1 1 1 1 1 2 ...
 $ DepartureCityCode  : Factor w/ 2 levels "BOM","DEL": 2 2 2 2 2 2 2 2 2 1 ...
 $ Airline            : Factor w/ 4 levels "Air India","IndiGo",..: 2 2 2 2 2 2 2 2 2 1 ...
 $ PlaneModel         : Factor w/ 9 levels "738","739","77W",..: 6 6 6 6 6 6 6 6 6 7 ...
 $ FlightNumber       : Factor w/ 63 levels "6E 129","6E 155",..: 10 5 13 1 14 13 16 8 2 51 ...
 $ DepartureTime      : int  1930 830 1600 1730 1835 1600 640 1500 2225 1300 ...
 $ ArrivalTime        : int  2145 1045 1815 1945 2055 1815 855 1710 40 1515 ...
 $ Departure          : Factor w/ 2 levels "AM","PM": 2 1 2 2 2 2 1 2 2 2 ...
 $ FlyingMinutes      : int  135 135 135 135 140 135 135 130 135 135 ...
 $ Capacity           : int  180 180 180 180 180 180 180 180 180 170 ...
 $ SeatPitch          : int  30 30 30 30 30 30 30 30 30 31 ...
 $ SeatWidth          : num  18 18 18 18 18 18 18 18 18 17 ...
 $ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 1 7 1 1 1 1 7 7 1 6 ...
 $ DateDeparture      : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 14 9 11 1 11 1 1 10 3 17 ...
 $ Price              : int  3305 3031 4234 4234 5153 4234 4234 4597 2712 5357 ...
 $ AdvancedBookingDays: int  7 30 2 57 2 57 59 2 30 2 ...
 $ IsDiwali           : int  0 0 0 1 0 1 1 0 0 0 ...
 $ DayBeforeDiwali    : int  0 0 0 1 0 1 1 0 0 0 ...
 $ DayAfterDiwali     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ MetroDeparture     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MetroArrival       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ MarketShare        : num  39.6 39.6 39.6 39.6 39.6 39.6 39.6 39.6 39.6 13.3 ...
 $ LoadFactor         : num  87.2 87.2 87.2 87.2 87.2 ...
 $ dept_mins          : num  30 30 0 30 35 0 40 0 25 0 ...
 $ dept_hr            : num  19 8 16 17 18 16 6 15 22 13 ...
 $ PlaneMod           : num  6 6 6 6 6 6 6 6 6 7 ...
 $ AirlineCode        : num  2 2 2 2 2 2 2 2 2 1 ...
 $ DCC1               : num  2 2 2 2 2 2 2 2 2 1 ...
 $ ACC1               : num  1 1 1 1 1 1 1 1 1 2 ...
 $ A1.x               : num  1 1 1 1 1 1 1 1 1 1 ...
 $ A1.y               : num  1 1 1 1 1 1 1 1 1 1 ...
nums=unlist(sapply(x6,is.numeric))
air1=x6[,nums]
model=lm(Price~.,data=air1)
summary(model)

Call:
lm(formula = Price ~ ., data = air1)

Residuals:
    Min      1Q  Median      3Q     Max 
-2965.3 -1257.6  -413.6   677.6 11948.7 

Coefficients: (5 not defined because of singularities)
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)         -3826.1580 27928.8825  -0.137   0.8911    
DepartureTime          -0.6683     0.2791  -2.395   0.0173 *  
ArrivalTime             0.2175     0.2486   0.875   0.3823    
FlyingMinutes          39.1162    30.1560   1.297   0.1956    
Capacity               -8.9722    10.5692  -0.849   0.3966    
SeatPitch              40.3964   578.3512   0.070   0.9444    
SeatWidth             914.3721   720.7139   1.269   0.2056    
AdvancedBookingDays   -86.4992    12.3763  -6.989 1.94e-11 ***
IsDiwali             3881.0407   602.3680   6.443 4.93e-10 ***
DayBeforeDiwali       824.6751   375.5277   2.196   0.0289 *  
DayAfterDiwali              NA         NA      NA       NA    
MetroDeparture              NA         NA      NA       NA    
MetroArrival                NA         NA      NA       NA    
MarketShare           -14.1518    32.9831  -0.429   0.6682    
LoadFactor            -52.3100   128.0880  -0.408   0.6833    
dept_mins               2.6666     8.4963   0.314   0.7539    
dept_hr                     NA         NA      NA       NA    
PlaneMod             -216.4960   398.5895  -0.543   0.5874    
AirlineCode            30.1896   471.4382   0.064   0.9490    
DCC1                -1477.2846   356.1260  -4.148 4.42e-05 ***
ACC1                        NA         NA      NA       NA    
A1.x                -1350.5017  1957.2817  -0.690   0.4908    
A1.y                 -403.0218   405.7635  -0.993   0.3214    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2058 on 287 degrees of freedom
Multiple R-squared:  0.2992,    Adjusted R-squared:  0.2577 
F-statistic: 7.207 on 17 and 287 DF,  p-value: 1.241e-14

##Inferences 1. The model is significantly more valuable than a naive model, going by the p-value of the F-stat 2. The significant variables are DepartureTime, AdvancedBookingDays, isDiwali, DayBeforeDiwali and DepartureCityCode

Slide With Plot

Error in plot_summs(model, scale = TRUE, plot.distributions = TRUE, inner_ci_level = 0.9) : 
  could not find function "plot_summs"