SUMMARY STATISTICS
airline.df<-read.csv(paste("SixAirlinesDataV2.csv", sep=""))
summary(airline.df)
## Airline Aircraft FlightDuration TravelMonth
## AirFrance: 74 AirBus:151 Min. : 1.250 Aug:127
## British :175 Boeing:307 1st Qu.: 4.260 Jul: 75
## Delta : 46 Median : 7.790 Oct:127
## Jet : 61 Mean : 7.578 Sep:129
## Singapore: 40 3rd Qu.:10.620
## Virgin : 62 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 40 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:418 1st Qu.:133.0 1st Qu.:21.00 1st Qu.:31.00
## Median :185.0 Median :36.00 Median :31.00
## Mean :202.3 Mean :33.65 Mean :31.22
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 65
## 1st Qu.:38.00 1st Qu.:18.00 1st Qu.:19.00 1st Qu.: 413
## Median :38.00 Median :18.00 Median :19.00 Median :1242
## Mean :37.91 Mean :17.84 Mean :19.47 Mean :1327
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1909
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 86.0 Min. :0.0200 Min. : 98 Min. : 2.000
## 1st Qu.: 528.8 1st Qu.:0.1000 1st Qu.:166 1st Qu.: 6.000
## Median :1737.0 Median :0.3650 Median :227 Median : 7.000
## Mean :1845.3 Mean :0.4872 Mean :236 Mean : 6.688
## 3rd Qu.:2989.0 3rd Qu.:0.7400 3rd Qu.:279 3rd Qu.: 7.000
## Max. :7414.0 Max. :1.8900 Max. :441 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.28
## Median :1.000 Median :13.21
## Mean :1.633 Mean :14.65
## 3rd Qu.:3.000 3rd Qu.:15.36
## Max. :4.000 Max. :24.69
The mean premium price varied for different airlines. It was the least for Jet(483) around to max for AirFrance(3065).
aggregate(airline.df$PricePremium~airline.df$Airline, FUN=mean)
## airline.df$Airline airline.df$PricePremium
## 1 AirFrance 3065.2162
## 2 British 1937.0286
## 3 Delta 684.6739
## 4 Jet 483.3607
## 5 Singapore 1239.9250
## 6 Virgin 2721.6935
Jet had the lowest economic price with the highest Economy price by AirFrance at 2770.
aggregate(airline.df$PriceEconomy~airline.df$Airline, FUN=mean)
## airline.df$Airline airline.df$PriceEconomy
## 1 AirFrance 2769.7838
## 2 British 1293.4800
## 3 Delta 560.9348
## 4 Jet 276.1639
## 5 Singapore 860.2500
## 6 Virgin 1603.5323
To check if the premium price was the result of higher differential pricing, the relative price for various airlines was checked. Surprisingly, jet which had the lowest economy and premium price, has the highest relative pricing between Economy and Premium. Delta had the lowest relative pricing between economy and premium.
## airline.df$Airline airline.df$PriceRelative
## 1 AirFrance 0.2047297
## 2 British 0.4375429
## 3 Delta 0.1250000
## 4 Jet 0.9396721
## 5 Singapore 0.5297500
## 6 Virgin 0.7606452
To check the effect of Percent Premium Seats on Relative Prices, the means were calculated. While there was no uniform trend, at higher percentage, the relative pricing seemed to be lower in general. This can be attributed to the lesser exclusivity enjoyed by passengers at high percentages or the other way round, that low relative pricing incentivised more flyers to opt for premium, making a higher percentage of premium seats viable.
aggregate(airline.df$PriceRelative~airline.df$PercentPremiumSeats, FUN=mean)
## airline.df$PercentPremiumSeats airline.df$PriceRelative
## 1 4.71 0.95263158
## 2 8.90 0.20928571
## 3 9.76 0.80000000
## 4 10.00 1.32000000
## 5 10.57 0.60416667
## 6 11.43 1.15370370
## 7 12.12 0.03000000
## 8 12.28 0.08590909
## 9 12.50 0.23125000
## 10 12.82 0.07200000
## 11 12.90 0.50414634
## 12 13.04 0.06200000
## 13 13.13 0.10750000
## 14 13.21 0.34958333
## 15 14.02 0.61111111
## 16 14.50 0.09500000
## 17 14.97 0.74000000
## 18 14.99 0.31000000
## 19 15.02 1.03523810
## 20 15.36 0.32211538
## 21 16.46 0.09000000
## 22 16.87 0.39625000
## 23 18.73 0.73500000
## 24 20.41 0.06125000
## 25 20.60 0.44666667
## 26 23.49 0.41600000
## 27 24.69 0.42254902
The idea that higher perks would lead to more relative pricing was confirmed. A width difference of 4 led to the almost doubling of prices from economic to premium.
aggregate(airline.df$PriceRelative~airline.df$WidthDifference, FUN=mean)
## airline.df$WidthDifference airline.df$PriceRelative
## 1 0 0.0847500
## 2 1 0.4184091
## 3 2 0.2296875
## 4 3 0.7282353
## 5 4 0.9707407
The relative pricing was higher for international flights and the highest relative pricing was for Jet. Though AirFrance had the lowest mean relative price, it had significant number of very high outliers, showing that its relative prices varied over a large range.
library(lattice)
## Warning: package 'lattice' was built under R version 3.4.3
bwplot(PriceRelative~Airline|IsInternational, data=airline.df)
Relative prices were higher for Boeing than Airbus and higher for international than domestic.Also, the range was pretty higher for international flights.
bwplot(PriceRelative~Aircraft|IsInternational, data=airline.df)
On the contrary, premium prices were higher for airbus.
bwplot(PricePremium~Aircraft|IsInternational, data=airline.df)
The relative price had little difference over the four months.
bwplot(PriceRelative~TravelMonth, data=airline.df)
T-test when conducted between relative prices and width difference led to p-value between them. Hence, null hpothesis that relative pricing will be same for width differences can be rejected.
t.test(airline.df$PriceRelative,airline.df$WidthDifference)
##
## Welch Two Sample t-test
##
## data: airline.df$PriceRelative and airline.df$WidthDifference
## t = -19.284, df = 585.55, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.262697 -1.029268
## sample estimates:
## mean of x mean of y
## 0.4872052 1.6331878
At low pitch differences, the relative pricing was lower in general. This confirms the hypothesis that premium seats with more perks would be priced more higher relative to economy tickets.
pairs(formula=~PriceRelative+PitchDifference, data=airline.df)
library(corrgram)
## Warning: package 'corrgram' was built under R version 3.4.3
corrgram(airline.df,lower.panel=panel.shade, upper.panel=NULL)
The NULL hypothesis that relative prices would be same for the same percentage of reserved premium seats can be rejected with a very low p-value.
t.test(airline.df$PriceRelative,airline.df$PercentPremiumSeats)
##
## Welch Two Sample t-test
##
## data: airline.df$PriceRelative and airline.df$PercentPremiumSeats
## t = -62.302, df = 464.91, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -14.60477 -13.71164
## sample estimates:
## mean of x mean of y
## 0.4872052 14.6454148
Using the probable factors, a regression model was proposed. It proved to be a weak one since it could explain only 35% of the variation.
fit1<-lm(PriceRelative~WidthDifference+PitchDifference+IsInternational+Aircraft+Airline+PercentPremiumSeats, data=airline.df)
summary(fit1)
##
## Call:
## lm(formula = PriceRelative ~ WidthDifference + PitchDifference +
## IsInternational + Aircraft + Airline + PercentPremiumSeats,
## data = airline.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.92920 -0.21430 -0.06758 0.11411 1.41175
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.189710 0.285841 -0.664 0.507230
## WidthDifference -0.031802 0.082394 -0.386 0.699699
## PitchDifference 0.052158 0.063768 0.818 0.413829
## IsInternationalInternational 0.248939 0.243519 1.022 0.307211
## AircraftBoeing 0.109465 0.044146 2.480 0.013520 *
## AirlineBritish 0.238393 0.111512 2.138 0.033073 *
## AirlineDelta 0.279896 0.185227 1.511 0.131469
## AirlineJet 0.558526 0.141688 3.942 9.38e-05 ***
## AirlineSingapore 0.305527 0.081038 3.770 0.000185 ***
## AirlineVirgin 0.622672 0.110779 5.621 3.35e-08 ***
## PercentPremiumSeats -0.015370 0.004589 -3.349 0.000879 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3672 on 447 degrees of freedom
## Multiple R-squared: 0.3505, Adjusted R-squared: 0.336
## F-statistic: 24.12 on 10 and 447 DF, p-value: < 2.2e-16
summary(airline.df$PriceRelative)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0200 0.1000 0.3650 0.4872 0.7400 1.8900
A regression model was proposed for the premium price using the economy price and other flight factors.It could explain 88% of the variation, making it to be a good fit.
fit1<-lm(PricePremium~PriceEconomy+WidthDifference+PitchDifference+IsInternational+Aircraft+Airline+PercentPremiumSeats, data=airline.df)
summary(fit1)
##
## Call:
## lm(formula = PricePremium ~ PriceEconomy + WidthDifference +
## PitchDifference + IsInternational + Aircraft + Airline +
## PercentPremiumSeats, data = airline.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -764.25 -273.97 -48.45 113.47 2992.49
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -907.41964 356.28049 -2.547 0.011203 *
## PriceEconomy 1.28330 0.03705 34.635 < 2e-16 ***
## WidthDifference -111.26957 102.88310 -1.082 0.280053
## PitchDifference 32.44698 79.48605 0.408 0.683316
## IsInternationalInternational 545.24275 320.37657 1.702 0.089475 .
## AircraftBoeing 106.07548 57.76813 1.836 0.066989 .
## AirlineBritish 778.75194 151.98215 5.124 4.46e-07 ***
## AirlineDelta 939.92534 231.05147 4.068 5.61e-05 ***
## AirlineJet 684.79734 192.31458 3.561 0.000409 ***
## AirlineSingapore 572.45834 125.92648 4.546 7.05e-06 ***
## AirlineVirgin 1377.93939 141.56578 9.734 < 2e-16 ***
## PercentPremiumSeats -18.71280 5.72975 -3.266 0.001175 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 457.5 on 446 degrees of freedom
## Multiple R-squared: 0.8769, Adjusted R-squared: 0.8739
## F-statistic: 288.8 on 11 and 446 DF, p-value: < 2.2e-16
summary(airline.df$PricePremium)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 86.0 528.8 1737.0 1845.3 2989.0 7414.0
Using less factors, the regression model predicted only 82% variability in premium prices.
fit2<-lm(PricePremium~PriceEconomy+WidthDifference+PitchDifference, data=airline.df)
summary(fit2)
##
## Call:
## lm(formula = PricePremium ~ PriceEconomy + WidthDifference +
## PitchDifference, data = airline.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -809.9 -325.8 -97.1 176.3 3470.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -33.37619 125.69676 -0.266 0.7907
## PriceEconomy 1.18456 0.02623 45.152 <2e-16 ***
## WidthDifference 26.25535 33.43032 0.785 0.4326
## PitchDifference 39.43892 22.59939 1.745 0.0816 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 551.5 on 454 degrees of freedom
## Multiple R-squared: 0.8179, Adjusted R-squared: 0.8167
## F-statistic: 679.9 on 3 and 454 DF, p-value: < 2.2e-16
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.