Airline Pricing Analysis: Introduction

We often come across attractive offers associated with the different classes of tickets offered by various airlines. One of the most desirable in terms of luxury as well as affordability is the “PREMIUM ECONOMY CLASS”. This travel class is positioned between economy class and business class in terms of price, comfort, and amenities.

Various facilites provided include a free upgrade to premium members of frequent-flyer program and passengers flying full-fare economy; a separate section of the economy/coach cabin with more legroom (36-38" (91-96 cm) seat pitch), along with some form of leg rest; improved in-flight entertainment features; dedicated cabin crew; smaller cabin size; better seats (often fewer seats per row, to increase shoulder/elbow room); at-seat laptop power; at-seat telephone; lounge access (for some airlines); priority boarding; increased frequent flyer points; exclusive amenity kits; welcome drinks (juice or champagne); upgraded meals and drinks; increased luggage allowance; seats in front of the motors.

The data set provided to analyze includes information about the flight like

  1. Airline Name

  2. Aircraft Manufacturer

  3. Flight Duration in hours

  4. Month of Travelling

  5. International or Domestic Flight

  6. Number of Economy Seats

  7. Number of Premium Economy Seats

  8. Distance between two consecutive Economy Seats

  9. Distance between two consecutive Premium Economy Seats

  10. Width between armrests of an Economy Seat

  11. Width between armrests of a Premium Economy Seat

  12. Price of Economy Seat

  13. Price of Premium Economy Seat

Some mathematical operations done on the above data is stored in the data set as well.

PriceRelative = (PricePremium - PriceEconomy) / PriceEconomy

SeatsTotal = SeatsEconomy + SeatsPremium

PercentPremiumSeats = (SeatsPremium / SeatsTotal) * 100

PitchDifference = PitchPremium - PitchEconomy

WidthDifference = WidthPremium - WidthEconomy

Analysis

  1. Reading Data from a CSV format file into R
setwd("C:/Users/Dell/Desktop/Project/Week 3/Day 5")
airlines=read.csv("SixAirlinesDataV2.csv")
View(airlines)
  1. Checking the data types of all the variables in the data set
str(airlines)
## 'data.frame':    458 obs. of  18 variables:
##  $ Airline            : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Aircraft           : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
##  $ FlightDuration     : num  12.25 12.25 12.25 12.25 8.16 ...
##  $ TravelMonth        : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
##  $ IsInternational    : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
##  $ SeatsEconomy       : int  122 122 122 122 122 122 122 122 122 122 ...
##  $ SeatsPremium       : int  40 40 40 40 40 40 40 40 40 40 ...
##  $ PitchEconomy       : int  31 31 31 31 31 31 31 31 31 31 ...
##  $ PitchPremium       : int  38 38 38 38 38 38 38 38 38 38 ...
##  $ WidthEconomy       : int  18 18 18 18 18 18 18 18 18 18 ...
##  $ WidthPremium       : int  19 19 19 19 19 19 19 19 19 19 ...
##  $ PriceEconomy       : int  2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
##  $ PricePremium       : int  3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
##  $ PriceRelative      : num  0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
##  $ SeatsTotal         : int  162 162 162 162 162 162 162 162 162 162 ...
##  $ PitchDifference    : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ WidthDifference    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PercentPremiumSeats: num  24.7 24.7 24.7 24.7 24.7 ...
  1. Statistical summary of all the variables in the data set
summary(airlines)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69
library(psych)
describe(airlines)
##                     vars   n    mean      sd  median trimmed     mad   min
## Airline*               1 458    3.01    1.65    2.00    2.89    1.48  1.00
## Aircraft*              2 458    1.67    0.47    2.00    1.71    0.00  1.00
## FlightDuration         3 458    7.58    3.54    7.79    7.57    4.81  1.25
## TravelMonth*           4 458    2.56    1.17    3.00    2.58    1.48  1.00
## IsInternational*       5 458    1.91    0.28    2.00    2.00    0.00  1.00
## SeatsEconomy           6 458  202.31   76.37  185.00  194.64   85.99 78.00
## SeatsPremium           7 458   33.65   13.26   36.00   33.35   11.86  8.00
## PitchEconomy           8 458   31.22    0.66   31.00   31.26    0.00 30.00
## PitchPremium           9 458   37.91    1.31   38.00   38.05    0.00 34.00
## WidthEconomy          10 458   17.84    0.56   18.00   17.81    0.00 17.00
## WidthPremium          11 458   19.47    1.10   19.00   19.53    0.00 17.00
## PriceEconomy          12 458 1327.08  988.27 1242.00 1244.40 1159.39 65.00
## PricePremium          13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative         14 458    0.49    0.45    0.36    0.42    0.41  0.02
## SeatsTotal            15 458  235.96   85.29  227.00  228.73   90.44 98.00
## PitchDifference       16 458    6.69    1.76    7.00    6.76    0.00  2.00
## WidthDifference       17 458    1.63    1.19    1.00    1.53    0.00  0.00
## PercentPremiumSeats   18 458   14.65    4.84   13.21   14.31    2.68  4.71
##                         max   range  skew kurtosis    se
## Airline*               6.00    5.00  0.61    -0.95  0.08
## Aircraft*              2.00    1.00 -0.72    -1.48  0.02
## FlightDuration        14.66   13.41 -0.07    -1.12  0.17
## TravelMonth*           4.00    3.00 -0.14    -1.46  0.05
## IsInternational*       2.00    1.00 -2.91     6.50  0.01
## SeatsEconomy         389.00  311.00  0.72    -0.36  3.57
## SeatsPremium          66.00   58.00  0.23    -0.46  0.62
## PitchEconomy          33.00    3.00 -0.03    -0.35  0.03
## PitchPremium          40.00    6.00 -1.51     3.52  0.06
## WidthEconomy          19.00    2.00 -0.04    -0.08  0.03
## WidthPremium          21.00    4.00 -0.08    -0.31  0.05
## PriceEconomy        3593.00 3528.00  0.51    -0.88 46.18
## PricePremium        7414.00 7328.00  0.50     0.43 60.19
## PriceRelative          1.89    1.87  1.17     0.72  0.02
## SeatsTotal           441.00  343.00  0.70    -0.53  3.99
## PitchDifference       10.00    8.00 -0.54     1.78  0.08
## WidthDifference        4.00    4.00  0.84    -0.53  0.06
## PercentPremiumSeats   24.69   19.98  0.71     0.28  0.23
  1. Visualizing the data given about “Flight Information”
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(gtable)
library(grid)
f1=ggplot(airlines,aes(Airline))+stat_count(color="gray")
f2=ggplot(airlines,aes(Aircraft))+stat_count(color="gray")
f3=ggplot(airlines,aes(TravelMonth))+stat_count(color="gray")
f4=ggplot(airlines,aes(IsInternational))+stat_count(color="gray")
ff1=ggplot_gtable(ggplot_build(f1))
ff2=ggplot_gtable(ggplot_build(f2))
ff3=ggplot_gtable(ggplot_build(f3))
ff4=ggplot_gtable(ggplot_build(f4))
ff=gtable(widths=unit(c(1,2),"null"),heights=unit(c(2,1),"null"),name="Flight Information")
ff=gtable_add_grob(ff,ff1,1,1)
ff=gtable_add_grob(ff,ff2,1,2)
ff=gtable_add_grob(ff,ff3,2,1)
ff=gtable_add_grob(ff,ff4,2,2)
grid.draw(ff)

  1. Visualizing the data given about “Economy Class”
a1=ggplot(airlines,aes(WidthEconomy))+stat_bin(bins=20,color="gray")
a2=ggplot(airlines,aes(PitchEconomy))+stat_bin(bins=20,color="gray")
a3=ggplot(airlines,aes(SeatsEconomy))+stat_bin(bins=20,color="gray")
a4=ggplot(airlines,aes(PriceEconomy))+stat_bin(bins=20,color="gray")
aa1=ggplot_gtable(ggplot_build(a1))
aa2=ggplot_gtable(ggplot_build(a2))
aa3=ggplot_gtable(ggplot_build(a3))
aa4=ggplot_gtable(ggplot_build(a4))
aa=gtable(widths=unit(c(1,2),"null"),heights=unit(c(2,1),"null"),name="Economy Class")
aa=gtable_add_grob(aa,aa1,1,1)
aa=gtable_add_grob(aa,aa2,1,2)
aa=gtable_add_grob(aa,aa3,2,1)
aa=gtable_add_grob(aa,aa4,2,2)
grid.draw(aa)

  1. Visualizing the data given about “Premium Economy Class”
b1=ggplot(airlines,aes(WidthPremium))+stat_bin(bins=20,color="gray")
b2=ggplot(airlines,aes(PitchPremium))+stat_bin(bins=20,color="gray")
b3=ggplot(airlines,aes(SeatsPremium))+stat_bin(bins=20,color="gray")
b4=ggplot(airlines,aes(PricePremium))+stat_bin(bins=20,color="gray")
bb1=ggplot_gtable(ggplot_build(b1))
bb2=ggplot_gtable(ggplot_build(b2))
bb3=ggplot_gtable(ggplot_build(b3))
bb4=ggplot_gtable(ggplot_build(b4))
bb=gtable(widths=unit(c(1,2),"null"),heights=unit(c(2,1),"null"),name="Premium Economy Class")
bb=gtable_add_grob(bb,bb1,1,1)
bb=gtable_add_grob(bb,bb2,1,2)
bb=gtable_add_grob(bb,bb3,2,1)
bb=gtable_add_grob(bb,bb4,2,2)
grid.draw(bb)

  1. Constructing a Corrgram to see the intercorrelations pictorially
library(corrgram)
corrgram(airlines,order=FALSE,lower.panel=panel.shade,upper.panel=panel.pie,diag.panel=panel.minmax,text.panel=panel.txt,main="Corrgram of Airline intercorrelations")

On studying the corrgram closely, we can make out that there seem to exist some correlations between Relative Price of a Premium Economy Class seat against an Economy Class seat with variables like Percentage of Premium Seats in the plane, Pitch Difference, Width Difference and Flight Duration. It can be seen clearly in the following scatter plots

7A. Scatterplot of PriceRelative vs PercentPremiumSeats

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(PriceRelative~PercentPremiumSeats,data=airlines,spread=FALSE,smoother.args=list(lty=2),pch=19,main="Scatterplot of PriceRelative vs PercentPremiumSeats",xlab="PercentPremiumSeats",ylab="PriceRelative")

7B. Scatterplot of PriceRelative vs PitchDifference

scatterplot(PriceRelative~PitchDifference,data=airlines,spread=FALSE,smoother.args=list(lty=2),pch=19,main="Scatterplot of PriceRelative vs PitchDifference",xlab="PitchDifference",ylab="PriceRelative")

7C. Scatterplot of PriceRelative vs WidthDifference

scatterplot(PriceRelative~WidthDifference,data=airlines,spread=FALSE,smoother.args=list(lty=2),pch=19,main="Scatterplot of PriceRelative vs WidthDifference",xlab="WidthDifference",ylab="PriceRelative")

7D. Scatterplot of PriceRelative vs FlightDuration

scatterplot(PriceRelative~FlightDuration,data=airlines,spread=FALSE,smoother.args=list(lty=2),pch=19,main="Scatterplot of PriceRelative vs FlightDuration",xlab="FlightDuration",ylab="PriceRelative")

A few points to be noted are:

  1. The Percentage of Premium Seats as a variable is slightly negatively correlated with the Relative Price of a Premium Economy Class ticket

  2. On the other hand, Pitch and Width Difference are strongly positively correlated with the Relative Price.

  3. At the end, the Duration of the Flight is slightly positively correlated with the Relative Price.

After visualizing data, we need to formulate a hypothesis which needs to be tested with the help a regression model:

“What factors explain the difference in price between an economy ticket and a premium-economy airline ticket?”

In order to be able to answer this question, we need to develop a null hypothesis which states that “There is no significant difference between the prices of the seats of the two classes”

  1. Performing T-Test to test the hypothesis
library(MASS)
t.test(PriceEconomy~IsInternational,data=airlines)
## 
##  Welch Two Sample t-test
## 
## data:  PriceEconomy by IsInternational
## t = -20.368, df = 433.89, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1165.9247  -960.7104
## sample estimates:
##      mean in group Domestic mean in group International 
##                     356.625                    1419.943

Since the value of p is less than 0.05, there is a significant difference in the prices of the seats of the two classes with respect to whether the plane flies domestic or international. Thus, the null hypothesis is rejected. In order to validate more, we need to develop a regression model as folows:

9A. Regression for Economy Class

M1=lm(PriceEconomy~SeatsEconomy+PitchEconomy+WidthEconomy+FlightDuration,data=airlines)
summary(M1)
## 
## Call:
## lm(formula = PriceEconomy ~ SeatsEconomy + PitchEconomy + WidthEconomy + 
##     FlightDuration, data = airlines)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1612.3  -523.3    -9.7   626.7  1502.3 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -2570.394   1974.001  -1.302   0.1935    
## SeatsEconomy       1.174      0.491   2.390   0.0173 *  
## PitchEconomy     407.735     56.535   7.212 2.33e-12 ***
## WidthEconomy    -581.787     75.176  -7.739 6.58e-14 ***
## FlightDuration   172.793     11.229  15.388  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 742.9 on 453 degrees of freedom
## Multiple R-squared:  0.4399, Adjusted R-squared:  0.4349 
## F-statistic: 88.94 on 4 and 453 DF,  p-value: < 2.2e-16

All the factors involved are statistically significant and are contributing greatly towards deciding the price of an Economy Class Seat.

9B. Regression for Premium Economy Class

M2=lm(PricePremium~SeatsPremium+PitchPremium+WidthPremium+FlightDuration,data=airlines)
summary(M2)
## 
## Call:
## lm(formula = PricePremium ~ SeatsPremium + PitchPremium + WidthPremium + 
##     FlightDuration, data = airlines)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2359.6  -617.2    48.8   731.3  4214.2 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -1531.734   1344.056  -1.140   0.2550    
## SeatsPremium      11.244      3.474   3.236   0.0013 ** 
## PitchPremium      62.803     52.355   1.200   0.2309    
## WidthPremium     -57.276     62.752  -0.913   0.3619    
## FlightDuration   228.732     13.082  17.484   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 971.8 on 453 degrees of freedom
## Multiple R-squared:  0.4358, Adjusted R-squared:  0.4308 
## F-statistic: 87.47 on 4 and 453 DF,  p-value: < 2.2e-16

The very exact value of Pitch and Width of a Premium Economy Class seat will not affect its pricing greatly. However, the number of Premium Economy Class seats and the Flight Duration will have a direct impact on to the pricing.

9C. Regression for difference between the two classes

M3=lm(PriceRelative~PercentPremiumSeats+PitchDifference+WidthDifference+FlightDuration,data=airlines)
summary(M3)
## 
## Call:
## lm(formula = PriceRelative ~ PercentPremiumSeats + PitchDifference + 
##     WidthDifference + FlightDuration, data = airlines)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.79439 -0.29424 -0.03427  0.16197  1.13688 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -0.179033   0.101492  -1.764  0.07840 .  
## PercentPremiumSeats -0.005999   0.003898  -1.539  0.12454    
## PitchDifference      0.059311   0.015921   3.725  0.00022 ***
## WidthDifference      0.118140   0.024555   4.811 2.05e-06 ***
## FlightDuration       0.021707   0.005085   4.269 2.39e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.381 on 453 degrees of freedom
## Multiple R-squared:  0.2913, Adjusted R-squared:  0.285 
## F-statistic: 46.54 on 4 and 453 DF,  p-value: < 2.2e-16
(i) Beta coefficients and their visualizations
M3$coefficients
##         (Intercept) PercentPremiumSeats     PitchDifference 
##        -0.179033482        -0.005999062         0.059310975 
##     WidthDifference      FlightDuration 
##         0.118140211         0.021707245
library(coefplot)
coefplot(M3, predictors=c("PercentPremiumSeats","PitchDifference","WidthDifference","FlightDuration"))

(ii) Confidence Intervals

confint(M3)
##                           2.5 %      97.5 %
## (Intercept)         -0.37848675 0.020419787
## PercentPremiumSeats -0.01366032 0.001662193
## PitchDifference      0.02802222 0.090599731
## WidthDifference      0.06988357 0.166396852
## FlightDuration       0.01171405 0.031700439

INFERENCES:

  1. We accept the last regression (M3) since the value of R-squared is the least.

  2. Moreover, it gives a clearer picture of how, in comparison to an Economy Class seat, is a Premium Economy Class seat’s price strongly dependent on the Pitch Difference, Width Difference and Duration of flight.

  3. Though the Percent of Premium Seats in the flight appears to be a statistically insignificant variable, yet in the other two regressions we saw that the number of seats did matter for each class (Not as much as Pitch Difference, Width Difference, Flight Duration)

  4. Therefore, Pitch Difference, Width Difference, Flight Duration are the key factors affecting the difference in pricing of the seats in the two different classes.