We often come across attractive offers associated with the different classes of tickets offered by various airlines. One of the most desirable in terms of luxury as well as affordability is the “PREMIUM ECONOMY CLASS”. This travel class is positioned between economy class and business class in terms of price, comfort, and amenities.
Various facilites provided include a free upgrade to premium members of frequent-flyer program and passengers flying full-fare economy; a separate section of the economy/coach cabin with more legroom (36-38" (91-96 cm) seat pitch), along with some form of leg rest; improved in-flight entertainment features; dedicated cabin crew; smaller cabin size; better seats (often fewer seats per row, to increase shoulder/elbow room); at-seat laptop power; at-seat telephone; lounge access (for some airlines); priority boarding; increased frequent flyer points; exclusive amenity kits; welcome drinks (juice or champagne); upgraded meals and drinks; increased luggage allowance; seats in front of the motors.
The data set provided to analyze includes information about the flight like
Airline Name
Aircraft Manufacturer
Flight Duration in hours
Month of Travelling
International or Domestic Flight
Number of Economy Seats
Number of Premium Economy Seats
Distance between two consecutive Economy Seats
Distance between two consecutive Premium Economy Seats
Width between armrests of an Economy Seat
Width between armrests of a Premium Economy Seat
Price of Economy Seat
Price of Premium Economy Seat
Some mathematical operations done on the above data is stored in the data set as well.
PriceRelative = (PricePremium - PriceEconomy) / PriceEconomy
SeatsTotal = SeatsEconomy + SeatsPremium
PercentPremiumSeats = (SeatsPremium / SeatsTotal) * 100
PitchDifference = PitchPremium - PitchEconomy
WidthDifference = WidthPremium - WidthEconomy
setwd("C:/Users/Dell/Desktop/Project/Week 3/Day 5")
airlines=read.csv("SixAirlinesDataV2.csv")
View(airlines)
str(airlines)
## 'data.frame': 458 obs. of 18 variables:
## $ Airline : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Aircraft : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
## $ FlightDuration : num 12.25 12.25 12.25 12.25 8.16 ...
## $ TravelMonth : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
## $ IsInternational : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
## $ SeatsEconomy : int 122 122 122 122 122 122 122 122 122 122 ...
## $ SeatsPremium : int 40 40 40 40 40 40 40 40 40 40 ...
## $ PitchEconomy : int 31 31 31 31 31 31 31 31 31 31 ...
## $ PitchPremium : int 38 38 38 38 38 38 38 38 38 38 ...
## $ WidthEconomy : int 18 18 18 18 18 18 18 18 18 18 ...
## $ WidthPremium : int 19 19 19 19 19 19 19 19 19 19 ...
## $ PriceEconomy : int 2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
## $ PricePremium : int 3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
## $ PriceRelative : num 0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
## $ SeatsTotal : int 162 162 162 162 162 162 162 162 162 162 ...
## $ PitchDifference : int 7 7 7 7 7 7 7 7 7 7 ...
## $ WidthDifference : int 1 1 1 1 1 1 1 1 1 1 ...
## $ PercentPremiumSeats: num 24.7 24.7 24.7 24.7 24.7 ...
summary(airlines)
## Airline Aircraft FlightDuration TravelMonth
## AirFrance: 74 AirBus:151 Min. : 1.250 Aug:127
## British :175 Boeing:307 1st Qu.: 4.260 Jul: 75
## Delta : 46 Median : 7.790 Oct:127
## Jet : 61 Mean : 7.578 Sep:129
## Singapore: 40 3rd Qu.:10.620
## Virgin : 62 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 40 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:418 1st Qu.:133.0 1st Qu.:21.00 1st Qu.:31.00
## Median :185.0 Median :36.00 Median :31.00
## Mean :202.3 Mean :33.65 Mean :31.22
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 65
## 1st Qu.:38.00 1st Qu.:18.00 1st Qu.:19.00 1st Qu.: 413
## Median :38.00 Median :18.00 Median :19.00 Median :1242
## Mean :37.91 Mean :17.84 Mean :19.47 Mean :1327
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1909
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 86.0 Min. :0.0200 Min. : 98 Min. : 2.000
## 1st Qu.: 528.8 1st Qu.:0.1000 1st Qu.:166 1st Qu.: 6.000
## Median :1737.0 Median :0.3650 Median :227 Median : 7.000
## Mean :1845.3 Mean :0.4872 Mean :236 Mean : 6.688
## 3rd Qu.:2989.0 3rd Qu.:0.7400 3rd Qu.:279 3rd Qu.: 7.000
## Max. :7414.0 Max. :1.8900 Max. :441 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.28
## Median :1.000 Median :13.21
## Mean :1.633 Mean :14.65
## 3rd Qu.:3.000 3rd Qu.:15.36
## Max. :4.000 Max. :24.69
library(psych)
describe(airlines)
## vars n mean sd median trimmed mad min
## Airline* 1 458 3.01 1.65 2.00 2.89 1.48 1.00
## Aircraft* 2 458 1.67 0.47 2.00 1.71 0.00 1.00
## FlightDuration 3 458 7.58 3.54 7.79 7.57 4.81 1.25
## TravelMonth* 4 458 2.56 1.17 3.00 2.58 1.48 1.00
## IsInternational* 5 458 1.91 0.28 2.00 2.00 0.00 1.00
## SeatsEconomy 6 458 202.31 76.37 185.00 194.64 85.99 78.00
## SeatsPremium 7 458 33.65 13.26 36.00 33.35 11.86 8.00
## PitchEconomy 8 458 31.22 0.66 31.00 31.26 0.00 30.00
## PitchPremium 9 458 37.91 1.31 38.00 38.05 0.00 34.00
## WidthEconomy 10 458 17.84 0.56 18.00 17.81 0.00 17.00
## WidthPremium 11 458 19.47 1.10 19.00 19.53 0.00 17.00
## PriceEconomy 12 458 1327.08 988.27 1242.00 1244.40 1159.39 65.00
## PricePremium 13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative 14 458 0.49 0.45 0.36 0.42 0.41 0.02
## SeatsTotal 15 458 235.96 85.29 227.00 228.73 90.44 98.00
## PitchDifference 16 458 6.69 1.76 7.00 6.76 0.00 2.00
## WidthDifference 17 458 1.63 1.19 1.00 1.53 0.00 0.00
## PercentPremiumSeats 18 458 14.65 4.84 13.21 14.31 2.68 4.71
## max range skew kurtosis se
## Airline* 6.00 5.00 0.61 -0.95 0.08
## Aircraft* 2.00 1.00 -0.72 -1.48 0.02
## FlightDuration 14.66 13.41 -0.07 -1.12 0.17
## TravelMonth* 4.00 3.00 -0.14 -1.46 0.05
## IsInternational* 2.00 1.00 -2.91 6.50 0.01
## SeatsEconomy 389.00 311.00 0.72 -0.36 3.57
## SeatsPremium 66.00 58.00 0.23 -0.46 0.62
## PitchEconomy 33.00 3.00 -0.03 -0.35 0.03
## PitchPremium 40.00 6.00 -1.51 3.52 0.06
## WidthEconomy 19.00 2.00 -0.04 -0.08 0.03
## WidthPremium 21.00 4.00 -0.08 -0.31 0.05
## PriceEconomy 3593.00 3528.00 0.51 -0.88 46.18
## PricePremium 7414.00 7328.00 0.50 0.43 60.19
## PriceRelative 1.89 1.87 1.17 0.72 0.02
## SeatsTotal 441.00 343.00 0.70 -0.53 3.99
## PitchDifference 10.00 8.00 -0.54 1.78 0.08
## WidthDifference 4.00 4.00 0.84 -0.53 0.06
## PercentPremiumSeats 24.69 19.98 0.71 0.28 0.23
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(gtable)
library(grid)
f1=ggplot(airlines,aes(Airline))+stat_count(color="gray")
f2=ggplot(airlines,aes(Aircraft))+stat_count(color="gray")
f3=ggplot(airlines,aes(TravelMonth))+stat_count(color="gray")
f4=ggplot(airlines,aes(IsInternational))+stat_count(color="gray")
ff1=ggplot_gtable(ggplot_build(f1))
ff2=ggplot_gtable(ggplot_build(f2))
ff3=ggplot_gtable(ggplot_build(f3))
ff4=ggplot_gtable(ggplot_build(f4))
ff=gtable(widths=unit(c(1,2),"null"),heights=unit(c(2,1),"null"),name="Flight Information")
ff=gtable_add_grob(ff,ff1,1,1)
ff=gtable_add_grob(ff,ff2,1,2)
ff=gtable_add_grob(ff,ff3,2,1)
ff=gtable_add_grob(ff,ff4,2,2)
grid.draw(ff)
a1=ggplot(airlines,aes(WidthEconomy))+stat_bin(bins=20,color="gray")
a2=ggplot(airlines,aes(PitchEconomy))+stat_bin(bins=20,color="gray")
a3=ggplot(airlines,aes(SeatsEconomy))+stat_bin(bins=20,color="gray")
a4=ggplot(airlines,aes(PriceEconomy))+stat_bin(bins=20,color="gray")
aa1=ggplot_gtable(ggplot_build(a1))
aa2=ggplot_gtable(ggplot_build(a2))
aa3=ggplot_gtable(ggplot_build(a3))
aa4=ggplot_gtable(ggplot_build(a4))
aa=gtable(widths=unit(c(1,2),"null"),heights=unit(c(2,1),"null"),name="Economy Class")
aa=gtable_add_grob(aa,aa1,1,1)
aa=gtable_add_grob(aa,aa2,1,2)
aa=gtable_add_grob(aa,aa3,2,1)
aa=gtable_add_grob(aa,aa4,2,2)
grid.draw(aa)
b1=ggplot(airlines,aes(WidthPremium))+stat_bin(bins=20,color="gray")
b2=ggplot(airlines,aes(PitchPremium))+stat_bin(bins=20,color="gray")
b3=ggplot(airlines,aes(SeatsPremium))+stat_bin(bins=20,color="gray")
b4=ggplot(airlines,aes(PricePremium))+stat_bin(bins=20,color="gray")
bb1=ggplot_gtable(ggplot_build(b1))
bb2=ggplot_gtable(ggplot_build(b2))
bb3=ggplot_gtable(ggplot_build(b3))
bb4=ggplot_gtable(ggplot_build(b4))
bb=gtable(widths=unit(c(1,2),"null"),heights=unit(c(2,1),"null"),name="Premium Economy Class")
bb=gtable_add_grob(bb,bb1,1,1)
bb=gtable_add_grob(bb,bb2,1,2)
bb=gtable_add_grob(bb,bb3,2,1)
bb=gtable_add_grob(bb,bb4,2,2)
grid.draw(bb)
library(corrgram)
corrgram(airlines,order=FALSE,lower.panel=panel.shade,upper.panel=panel.pie,diag.panel=panel.minmax,text.panel=panel.txt,main="Corrgram of Airline intercorrelations")
On studying the corrgram closely, we can make out that there seem to exist some correlations between Relative Price of a Premium Economy Class seat against an Economy Class seat with variables like Percentage of Premium Seats in the plane, Pitch Difference, Width Difference and Flight Duration. It can be seen clearly in the following scatter plots
7A. Scatterplot of PriceRelative vs PercentPremiumSeats
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplot(PriceRelative~PercentPremiumSeats,data=airlines,spread=FALSE,smoother.args=list(lty=2),pch=19,main="Scatterplot of PriceRelative vs PercentPremiumSeats",xlab="PercentPremiumSeats",ylab="PriceRelative")
7B. Scatterplot of PriceRelative vs PitchDifference
scatterplot(PriceRelative~PitchDifference,data=airlines,spread=FALSE,smoother.args=list(lty=2),pch=19,main="Scatterplot of PriceRelative vs PitchDifference",xlab="PitchDifference",ylab="PriceRelative")
7C. Scatterplot of PriceRelative vs WidthDifference
scatterplot(PriceRelative~WidthDifference,data=airlines,spread=FALSE,smoother.args=list(lty=2),pch=19,main="Scatterplot of PriceRelative vs WidthDifference",xlab="WidthDifference",ylab="PriceRelative")
7D. Scatterplot of PriceRelative vs FlightDuration
scatterplot(PriceRelative~FlightDuration,data=airlines,spread=FALSE,smoother.args=list(lty=2),pch=19,main="Scatterplot of PriceRelative vs FlightDuration",xlab="FlightDuration",ylab="PriceRelative")
A few points to be noted are:
The Percentage of Premium Seats as a variable is slightly negatively correlated with the Relative Price of a Premium Economy Class ticket
On the other hand, Pitch and Width Difference are strongly positively correlated with the Relative Price.
At the end, the Duration of the Flight is slightly positively correlated with the Relative Price.
After visualizing data, we need to formulate a hypothesis which needs to be tested with the help a regression model:
“What factors explain the difference in price between an economy ticket and a premium-economy airline ticket?”
In order to be able to answer this question, we need to develop a null hypothesis which states that “There is no significant difference between the prices of the seats of the two classes”
library(MASS)
t.test(PriceEconomy~IsInternational,data=airlines)
##
## Welch Two Sample t-test
##
## data: PriceEconomy by IsInternational
## t = -20.368, df = 433.89, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1165.9247 -960.7104
## sample estimates:
## mean in group Domestic mean in group International
## 356.625 1419.943
Since the value of p is less than 0.05, there is a significant difference in the prices of the seats of the two classes with respect to whether the plane flies domestic or international. Thus, the null hypothesis is rejected. In order to validate more, we need to develop a regression model as folows:
9A. Regression for Economy Class
M1=lm(PriceEconomy~SeatsEconomy+PitchEconomy+WidthEconomy+FlightDuration,data=airlines)
summary(M1)
##
## Call:
## lm(formula = PriceEconomy ~ SeatsEconomy + PitchEconomy + WidthEconomy +
## FlightDuration, data = airlines)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1612.3 -523.3 -9.7 626.7 1502.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2570.394 1974.001 -1.302 0.1935
## SeatsEconomy 1.174 0.491 2.390 0.0173 *
## PitchEconomy 407.735 56.535 7.212 2.33e-12 ***
## WidthEconomy -581.787 75.176 -7.739 6.58e-14 ***
## FlightDuration 172.793 11.229 15.388 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 742.9 on 453 degrees of freedom
## Multiple R-squared: 0.4399, Adjusted R-squared: 0.4349
## F-statistic: 88.94 on 4 and 453 DF, p-value: < 2.2e-16
All the factors involved are statistically significant and are contributing greatly towards deciding the price of an Economy Class Seat.
9B. Regression for Premium Economy Class
M2=lm(PricePremium~SeatsPremium+PitchPremium+WidthPremium+FlightDuration,data=airlines)
summary(M2)
##
## Call:
## lm(formula = PricePremium ~ SeatsPremium + PitchPremium + WidthPremium +
## FlightDuration, data = airlines)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2359.6 -617.2 48.8 731.3 4214.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1531.734 1344.056 -1.140 0.2550
## SeatsPremium 11.244 3.474 3.236 0.0013 **
## PitchPremium 62.803 52.355 1.200 0.2309
## WidthPremium -57.276 62.752 -0.913 0.3619
## FlightDuration 228.732 13.082 17.484 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 971.8 on 453 degrees of freedom
## Multiple R-squared: 0.4358, Adjusted R-squared: 0.4308
## F-statistic: 87.47 on 4 and 453 DF, p-value: < 2.2e-16
The very exact value of Pitch and Width of a Premium Economy Class seat will not affect its pricing greatly. However, the number of Premium Economy Class seats and the Flight Duration will have a direct impact on to the pricing.
9C. Regression for difference between the two classes
M3=lm(PriceRelative~PercentPremiumSeats+PitchDifference+WidthDifference+FlightDuration,data=airlines)
summary(M3)
##
## Call:
## lm(formula = PriceRelative ~ PercentPremiumSeats + PitchDifference +
## WidthDifference + FlightDuration, data = airlines)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.79439 -0.29424 -0.03427 0.16197 1.13688
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.179033 0.101492 -1.764 0.07840 .
## PercentPremiumSeats -0.005999 0.003898 -1.539 0.12454
## PitchDifference 0.059311 0.015921 3.725 0.00022 ***
## WidthDifference 0.118140 0.024555 4.811 2.05e-06 ***
## FlightDuration 0.021707 0.005085 4.269 2.39e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.381 on 453 degrees of freedom
## Multiple R-squared: 0.2913, Adjusted R-squared: 0.285
## F-statistic: 46.54 on 4 and 453 DF, p-value: < 2.2e-16
(i) Beta coefficients and their visualizations
M3$coefficients
## (Intercept) PercentPremiumSeats PitchDifference
## -0.179033482 -0.005999062 0.059310975
## WidthDifference FlightDuration
## 0.118140211 0.021707245
library(coefplot)
coefplot(M3, predictors=c("PercentPremiumSeats","PitchDifference","WidthDifference","FlightDuration"))
(ii) Confidence Intervals
confint(M3)
## 2.5 % 97.5 %
## (Intercept) -0.37848675 0.020419787
## PercentPremiumSeats -0.01366032 0.001662193
## PitchDifference 0.02802222 0.090599731
## WidthDifference 0.06988357 0.166396852
## FlightDuration 0.01171405 0.031700439
We accept the last regression (M3) since the value of R-squared is the least.
Moreover, it gives a clearer picture of how, in comparison to an Economy Class seat, is a Premium Economy Class seat’s price strongly dependent on the Pitch Difference, Width Difference and Duration of flight.
Though the Percent of Premium Seats in the flight appears to be a statistically insignificant variable, yet in the other two regressions we saw that the number of seats did matter for each class (Not as much as Pitch Difference, Width Difference, Flight Duration)
Therefore, Pitch Difference, Width Difference, Flight Duration are the key factors affecting the difference in pricing of the seats in the two different classes.