R Markdown

SUMMARY STATISTICS

airline.df<-read.csv(paste("SixAirlinesDataV2.csv", sep=""))
summary(airline.df)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69

The mean premium price varied for different airlines. It was the least for Jet(483) around to max for AirFrance(3065).

aggregate(airline.df$PricePremium~airline.df$Airline, FUN=mean)
##   airline.df$Airline airline.df$PricePremium
## 1          AirFrance               3065.2162
## 2            British               1937.0286
## 3              Delta                684.6739
## 4                Jet                483.3607
## 5          Singapore               1239.9250
## 6             Virgin               2721.6935

Jet had the lowest economic price with the highest Economy price by AirFrance at 2770.

aggregate(airline.df$PriceEconomy~airline.df$Airline, FUN=mean)
##   airline.df$Airline airline.df$PriceEconomy
## 1          AirFrance               2769.7838
## 2            British               1293.4800
## 3              Delta                560.9348
## 4                Jet                276.1639
## 5          Singapore                860.2500
## 6             Virgin               1603.5323

To check if the premium price was the result of higher differential pricing, the relative price for various airlines was checked. Surprisingly, jet which had the lowest economy and premium price, has the highest relative pricing between Economy and Premium. Delta had the lowest relative pricing between economy and premium.

##   airline.df$Airline airline.df$PriceRelative
## 1          AirFrance                0.2047297
## 2            British                0.4375429
## 3              Delta                0.1250000
## 4                Jet                0.9396721
## 5          Singapore                0.5297500
## 6             Virgin                0.7606452

To check the effect of Percent Premium Seats on Relative Prices, the means were calculated. While there was no uniform trend, at higher percentage, the relative pricing seemed to be lower in general. This can be attributed to the lesser exclusivity enjoyed by passengers at high percentages or the other way round, that low relative pricing incentivised more flyers to opt for premium, making a higher percentage of premium seats viable.

aggregate(airline.df$PriceRelative~airline.df$PercentPremiumSeats, FUN=mean)
##    airline.df$PercentPremiumSeats airline.df$PriceRelative
## 1                            4.71               0.95263158
## 2                            8.90               0.20928571
## 3                            9.76               0.80000000
## 4                           10.00               1.32000000
## 5                           10.57               0.60416667
## 6                           11.43               1.15370370
## 7                           12.12               0.03000000
## 8                           12.28               0.08590909
## 9                           12.50               0.23125000
## 10                          12.82               0.07200000
## 11                          12.90               0.50414634
## 12                          13.04               0.06200000
## 13                          13.13               0.10750000
## 14                          13.21               0.34958333
## 15                          14.02               0.61111111
## 16                          14.50               0.09500000
## 17                          14.97               0.74000000
## 18                          14.99               0.31000000
## 19                          15.02               1.03523810
## 20                          15.36               0.32211538
## 21                          16.46               0.09000000
## 22                          16.87               0.39625000
## 23                          18.73               0.73500000
## 24                          20.41               0.06125000
## 25                          20.60               0.44666667
## 26                          23.49               0.41600000
## 27                          24.69               0.42254902

The idea that higher perks would lead to more relative pricing was confirmed. A width difference of 4 led to the almost doubling of prices from economic to premium.

aggregate(airline.df$PriceRelative~airline.df$WidthDifference, FUN=mean)
##   airline.df$WidthDifference airline.df$PriceRelative
## 1                          0                0.0847500
## 2                          1                0.4184091
## 3                          2                0.2296875
## 4                          3                0.7282353
## 5                          4                0.9707407

The relative pricing was higher for international flights and the highest relative pricing was for Jet. Though AirFrance had the lowest mean relative price, it had significant number of very high outliers, showing that its relative prices varied over a large range.

library(lattice)
## Warning: package 'lattice' was built under R version 3.4.3
bwplot(PriceRelative~Airline|IsInternational, data=airline.df)

Relative prices were higher for Boeing than Airbus and higher for international than domestic.Also, the range was pretty higher for international flights.

bwplot(PriceRelative~Aircraft|IsInternational, data=airline.df)

On the contrary, premium prices were higher for airbus.

bwplot(PricePremium~Aircraft|IsInternational, data=airline.df)

The relative price had little difference over the four months.

bwplot(PriceRelative~TravelMonth, data=airline.df)

T-test when conducted between relative prices and width difference led to p-value between them. Hence, null hpothesis that relative pricing will be same for width differences can be rejected.

t.test(airline.df$PriceRelative,airline.df$WidthDifference)
## 
##  Welch Two Sample t-test
## 
## data:  airline.df$PriceRelative and airline.df$WidthDifference
## t = -19.284, df = 585.55, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.262697 -1.029268
## sample estimates:
## mean of x mean of y 
## 0.4872052 1.6331878

At low pitch differences, the relative pricing was lower in general. This confirms the hypothesis that premium seats with more perks would be priced more higher relative to economy tickets.

pairs(formula=~PriceRelative+PitchDifference, data=airline.df)

library(corrgram)
## Warning: package 'corrgram' was built under R version 3.4.3
corrgram(airline.df,lower.panel=panel.shade, upper.panel=NULL)

The NULL hypothesis that relative prices would be same for the same percentage of reserved premium seats can be rejected with a very low p-value.

t.test(airline.df$PriceRelative,airline.df$PercentPremiumSeats)
## 
##  Welch Two Sample t-test
## 
## data:  airline.df$PriceRelative and airline.df$PercentPremiumSeats
## t = -62.302, df = 464.91, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -14.60477 -13.71164
## sample estimates:
##  mean of x  mean of y 
##  0.4872052 14.6454148

Using the probable factors, a regression model was proposed. It proved to be a weak one since it could explain only 35% of the variation.

fit1<-lm(PriceRelative~WidthDifference+PitchDifference+IsInternational+Aircraft+Airline+PercentPremiumSeats, data=airline.df)
summary(fit1)
## 
## Call:
## lm(formula = PriceRelative ~ WidthDifference + PitchDifference + 
##     IsInternational + Aircraft + Airline + PercentPremiumSeats, 
##     data = airline.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.92920 -0.21430 -0.06758  0.11411  1.41175 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -0.189710   0.285841  -0.664 0.507230    
## WidthDifference              -0.031802   0.082394  -0.386 0.699699    
## PitchDifference               0.052158   0.063768   0.818 0.413829    
## IsInternationalInternational  0.248939   0.243519   1.022 0.307211    
## AircraftBoeing                0.109465   0.044146   2.480 0.013520 *  
## AirlineBritish                0.238393   0.111512   2.138 0.033073 *  
## AirlineDelta                  0.279896   0.185227   1.511 0.131469    
## AirlineJet                    0.558526   0.141688   3.942 9.38e-05 ***
## AirlineSingapore              0.305527   0.081038   3.770 0.000185 ***
## AirlineVirgin                 0.622672   0.110779   5.621 3.35e-08 ***
## PercentPremiumSeats          -0.015370   0.004589  -3.349 0.000879 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3672 on 447 degrees of freedom
## Multiple R-squared:  0.3505, Adjusted R-squared:  0.336 
## F-statistic: 24.12 on 10 and 447 DF,  p-value: < 2.2e-16
summary(airline.df$PriceRelative)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0200  0.1000  0.3650  0.4872  0.7400  1.8900

A regression model was proposed for the premium price using the economy price and other flight factors.It could explain 88% of the variation, making it to be a good fit.

fit1<-lm(PricePremium~PriceEconomy+WidthDifference+PitchDifference+IsInternational+Aircraft+Airline+PercentPremiumSeats, data=airline.df)
summary(fit1)
## 
## Call:
## lm(formula = PricePremium ~ PriceEconomy + WidthDifference + 
##     PitchDifference + IsInternational + Aircraft + Airline + 
##     PercentPremiumSeats, data = airline.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -764.25 -273.97  -48.45  113.47 2992.49 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -907.41964  356.28049  -2.547 0.011203 *  
## PriceEconomy                    1.28330    0.03705  34.635  < 2e-16 ***
## WidthDifference              -111.26957  102.88310  -1.082 0.280053    
## PitchDifference                32.44698   79.48605   0.408 0.683316    
## IsInternationalInternational  545.24275  320.37657   1.702 0.089475 .  
## AircraftBoeing                106.07548   57.76813   1.836 0.066989 .  
## AirlineBritish                778.75194  151.98215   5.124 4.46e-07 ***
## AirlineDelta                  939.92534  231.05147   4.068 5.61e-05 ***
## AirlineJet                    684.79734  192.31458   3.561 0.000409 ***
## AirlineSingapore              572.45834  125.92648   4.546 7.05e-06 ***
## AirlineVirgin                1377.93939  141.56578   9.734  < 2e-16 ***
## PercentPremiumSeats           -18.71280    5.72975  -3.266 0.001175 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 457.5 on 446 degrees of freedom
## Multiple R-squared:  0.8769, Adjusted R-squared:  0.8739 
## F-statistic: 288.8 on 11 and 446 DF,  p-value: < 2.2e-16
summary(airline.df$PricePremium)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    86.0   528.8  1737.0  1845.3  2989.0  7414.0

Using less factors, the regression model predicted only 82% variability in premium prices.

fit2<-lm(PricePremium~PriceEconomy+WidthDifference+PitchDifference, data=airline.df)
summary(fit2)
## 
## Call:
## lm(formula = PricePremium ~ PriceEconomy + WidthDifference + 
##     PitchDifference, data = airline.df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -809.9 -325.8  -97.1  176.3 3470.6 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -33.37619  125.69676  -0.266   0.7907    
## PriceEconomy      1.18456    0.02623  45.152   <2e-16 ***
## WidthDifference  26.25535   33.43032   0.785   0.4326    
## PitchDifference  39.43892   22.59939   1.745   0.0816 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 551.5 on 454 degrees of freedom
## Multiple R-squared:  0.8179, Adjusted R-squared:  0.8167 
## F-statistic: 679.9 on 3 and 454 DF,  p-value: < 2.2e-16

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.