The Rmd Document contains the Air Ticket Pricing Analysis of Six Airlines.

What factors explain the difference in price between an economy ticket and a premium-economy airline ticket?

For knowing the Price Difference that accounts for Airline Ticketing (in both Economy and Premium Classes), we need to consider the variables where both the classes are involved. (i.e) Pitch Difference, Width Difference, Seats Total, International/Domestic and Flight Duration.

“PART-2”

Reading Data into R

AirlinesDATA <- read.csv(paste("SixAirlinesDataV2.csv",sep=""))
#DataFrame Structure
str(AirlinesDATA)
## 'data.frame':    458 obs. of  18 variables:
##  $ Airline            : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Aircraft           : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
##  $ FlightDuration     : num  12.25 12.25 12.25 12.25 8.16 ...
##  $ TravelMonth        : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
##  $ IsInternational    : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
##  $ SeatsEconomy       : int  122 122 122 122 122 122 122 122 122 122 ...
##  $ SeatsPremium       : int  40 40 40 40 40 40 40 40 40 40 ...
##  $ PitchEconomy       : int  31 31 31 31 31 31 31 31 31 31 ...
##  $ PitchPremium       : int  38 38 38 38 38 38 38 38 38 38 ...
##  $ WidthEconomy       : int  18 18 18 18 18 18 18 18 18 18 ...
##  $ WidthPremium       : int  19 19 19 19 19 19 19 19 19 19 ...
##  $ PriceEconomy       : int  2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
##  $ PricePremium       : int  3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
##  $ PriceRelative      : num  0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
##  $ SeatsTotal         : int  162 162 162 162 162 162 162 162 162 162 ...
##  $ PitchDifference    : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ WidthDifference    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PercentPremiumSeats: num  24.7 24.7 24.7 24.7 24.7 ...
View(AirlinesDATA)

Regression Model:

Now from the “part-1”, we have the basic picture of what variables contribute for the Ticket Pricing.

model <- PricePremium ~ PitchDifference + WidthDifference + SeatsTotal + FlightDuration + PriceEconomy + PercentPremiumSeats + IsInternational

model2 <- PriceEconomy ~ PitchDifference + WidthDifference + SeatsTotal + FlightDuration + PricePremium + PercentPremiumSeats + IsInternational

Fit a Linear Regression Model using lm()

Model <- PricePremium ~  PriceEconomy + PitchDifference + WidthDifference + SeatsTotal + FlightDuration + PercentPremiumSeats + IsInternational 
fit <- lm(Model,data=AirlinesDATA)
summary(fit)
## 
## Call:
## lm(formula = Model, data = AirlinesDATA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1010.0  -258.4   -49.9   133.6  3416.7 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -1.213e+03  1.695e+02  -7.156 3.40e-12 ***
## PriceEconomy                  1.063e+00  3.077e-02  34.537  < 2e-16 ***
## PitchDifference               8.421e+01  3.656e+01   2.303 0.021722 *  
## WidthDifference               1.224e+02  3.373e+01   3.629 0.000318 ***
## SeatsTotal                    1.920e+00  3.241e-01   5.922 6.31e-09 ***
## FlightDuration                8.459e+01  8.507e+00   9.943  < 2e-16 ***
## PercentPremiumSeats           3.190e+01  5.220e+00   6.112 2.14e-09 ***
## IsInternationalInternational -7.412e+02  2.001e+02  -3.704 0.000238 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 479 on 450 degrees of freedom
## Multiple R-squared:  0.8638, Adjusted R-squared:  0.8617 
## F-statistic: 407.9 on 7 and 450 DF,  p-value: < 2.2e-16
Model2 <- PriceEconomy ~ PitchDifference + WidthDifference + SeatsTotal + FlightDuration + PricePremium 
fit2 <- lm(Model2,data=AirlinesDATA)
summary(fit2)
## 
## Call:
## lm(formula = Model2, data = AirlinesDATA)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2164.31  -187.76    -2.55   102.65  1030.42 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     441.87030  104.31163   4.236 2.76e-05 ***
## PitchDifference -26.24484   17.54055  -1.496   0.1353    
## WidthDifference -39.11664   26.33624  -1.485   0.1382    
## SeatsTotal       -0.49649    0.24004  -2.068   0.0392 *  
## FlightDuration  -10.27514    7.41826  -1.385   0.1667    
## PricePremium      0.71514    0.02026  35.290  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 418.9 on 452 degrees of freedom
## Multiple R-squared:  0.8223, Adjusted R-squared:  0.8203 
## F-statistic: 418.3 on 5 and 452 DF,  p-value: < 2.2e-16

Now we can neglect the model2(as the variables p>0.05 & R-Squared value). ‘model1’ fits the best.

Finding the Best Predictors

For MODEL1

library(leaps)
leap1 <- regsubsets(Model, data = AirlinesDATA, nbest=1)
plot(leap1, scale="adjr2")

FOR MODEL2

library(leaps)
leap2 <- regsubsets(Model2, data = AirlinesDATA, nbest=1)
plot(leap2, scale="adjr2")

CONCLUSION:

From the OLS Regression(Model1), Pricing of Premium Class varies with Pricing of Economy class, based on the factors(Independent Variables) : PriceEconomy, Pitch Difference, Width Difference, Total Seats, Percentage of Premium Seats