1. Introduction:
Premium economy is a travel class offered on some airlines. This travel class is positioned between Economy Class and Business Class in terms of price, comfort, and amenities. Premium Economy is found mostly on international flights and, compared to standard Economy, offers about 5-7 inches of extra legroom as well as additional amenities, which can include: 1-2 extra inches of seat width. 2-3 extra inches of seat recline. Adjustable headrests, legrests, or lumbar support. Several airlines offer a Premium Economy class to passengers willing to pay more for slightly better seats and, in some cases, better service. The definition for Premium Economy class is not standardized and varies widely from airline to airline, from a slightly larger seat pitch with no other amenities to a true separate Premium Economy class with larger and more comfortable seats, better dining options, and better service. But on the other hand, there are some differences, the most noticeable of which are seat width and pitch (legroom). Pitch can vary from 28" to 34“, and width from 17” to 33“. While these few inches might not seem like much now, when you’re in that seat for 6+ hours, they matter! Other factors that may vary are power outlets, Wi-Fi, amenity bag, food, in-flight entertainment, type of TV screen and level of service. Some airlines now have their in-flight services built into touch screens. All you have to do is enter your card details, add the food or items you want to your basket, and cabin crew will deliver your purchases directly to your seat.
But when it comes to its pricing, it becomes a very crutial issue. This is because a Airline’s price reflects an assessment of the value of the class that the passengers’s prefer to travel in and various other factors, and also their willingness-to-pay for the ticket. This analysis is focussed on the “Pricing of the Airlines Classes”, based on the quality of flying it provides.
2. Objectives:
1.(a) Testing null hypothesis that there is no significant difference in the prices of premium economy and economy classes. (b) Testing null hypothesis that there is no significant difference in the PitchPremium, PitchEconomy. (c) Testing null hypothesis that there is no significant difference in the WidthPremium, WidthEconomy. 2. Regression analysis to predict the price of premium-economy according to premium-pitch, premium-width, percent premium seats and price economy.
3. Data Collection:
This dataset describes information about Premium Economy and Regular Economy Air Tickets. To perform this analysis, the data was collected from the open source, R datasets. The websites of the following six airlines British Airways, Delta Airlines, Air France, Singapore Airlines, Virgin Airlines, Jet Airways were visited. As well as, the website Seat Guru was visited to get additional information about the Aircraft being used by each airline on each route. We recorded the price of purchasing a Economy Ticket and a Premium Economy Ticket from the airline website. In the dataset-
- There are 458 observations.
- There are 18 variables.
Airline: Factor variable denoting the name of the Airline. There are 6 airlines in the data:
British Airways Delta Airlines Air France Singapore Airlines Virgin Airlines Jet Airways
Aircraft: Factor variable denoting the manufacturer of the Airplane / Aircraft e.g.
Boeing Airbus
TravelMonth: Factor variable denoting the month Travel.
Jul Aug Sep Oct
FlightDuration: The factor is Hours. The number of hours denotes the Flight Duration.
IsInternational: The Factor represents International or Domestic Flight w.r.t. Airlines’ Home Country.
SeatsEconomy: The variable is numberic and it represents the number of Economy Seats in the Aircraft.
SeatsPremium: The variable is numberic and it represents the number of Premium Economy Seats in the Aircraft.
PitchEconomy: The variable is numberic (Inches) and it tells about the distance between two consecutive Economy Seats.
PitchPremium: The variable is numberic (Inches) and it tells about the distance between two consecutive Premium Economy Seats.
WidthEconomy: The variable is numberic (Inches) and it tells about Width between armrests of an Economy Seat.
WidthPremium: The variable is numberic (Inches) and it tells about Width between armrests of an Premium Economy Seat.
PriceEconomy: The variable is numberic (USD) and it tells about the Price of Economy Seat.
PricePremium: The variable is numberic (USD) and it tells about the Price of Premium Economy Seat.
PriceRelative: The variable is numberic (USD) and it is “(PricePremium - PriceEconomy) / PriceEconomy”.
SeatsTotal: The variable is numberic and it is “SeatsEconomy + SeatsPremium”.
PercentPremiumSeats: The variable is numberic and it is “(SeatsPremium / SeatsTotal) * 100”.
PitchDifference: The variable is numberic (Inches) and it is “PitchPremium - PitchEconomy”.
WidthDifference: The variable is numberic (Inches) and it is “WidthPremium - WidthEconomy”.
4. Data Representation:
1.This is the data which will be used in the analysis of the given objectives.
statsairlines.df <- read.csv("C:/interships/SixAirlinesData.csv")
View(statsairlines.df)
3. Visualisation: Premium Economy seats are more expensive than Economy seats. Plot of Premium Economy Ticket Prices versus Economy Ticket Prices.
plot(~statsairlines.df$PriceEconomy + statsairlines.df$PricePremium, main="Premium Economy Price vs. Economy Price")
abline(0,1)

Here we see that the points are above the 45 degree line. As expected, Premium Economy Airfares are higher than the corresponding Economy Airfares on the same flight.
4. ROLE OF DIFFERENCE IN PITCH BETWEEN PREMIUM ECONOMY AND ECONOMY SEATS ON THE PRICING.
4a. Distribution of the difference in the pitch of Premium Economy seats and the pitch of Economy seats
pitchDifferenceTable <- table(statsairlines.df$PitchDifference)
pitchDifferenceTable
##
## 2 3 6 7 10
## 24 16 121 243 54
library(lattice)
histogram(~PitchDifference, data = statsairlines.df,
main = "Distribution of Pitch Difference", xlab="Difference in Pitch", col='gray' )

Result: The difference in pitch of Premium Economy and Economy seats are {2, 3, 6, 7, 10} inches. The most frequently observed difference in pitch of seats in Premium Economy and Economy is 7 inches.
4b. Effect of Pitch Difference on the relative price of Economy and Premium Economy airfares.
pd = aggregate(cbind(PriceEconomy,PricePremium, PriceRelative) ~ PitchDifference,
data = statsairlines.df, mean)
pd
## PitchDifference PriceEconomy PricePremium PriceRelative
## 1 2 348.0000 377.3333 0.08708333
## 2 3 369.5625 398.7500 0.08125000
## 3 6 2008.6942 2333.7438 0.34082645
## 4 7 1388.1317 2155.4897 0.51888889
## 5 10 243.8519 435.6481 0.97074074
4c. Effect of Pitch Difference on the price of Economy and Premium Economy airfares.
boxplot(PriceRelative~PitchDifference,data=statsairlines.df, main="Relative Price Difference vs. Pitch", ylab="Pitch Difference", xlab="Relative Price b/w Economy and Premium Economy", horizontal=TRUE)

Result: The relative difference in airfare (PriceRelative) between Premium Economy and Economy seats increases as the difference in their pitch (PitchDifference) increases.
5. ROLE OF DIFFERENCE IN SEAT WIDTH OF PREMIUM ECONOMY AND ECONOMY SEATS
5a. Distribution of the difference in the width of Premium Economy seats and the width of Economy seats.
widthDifferenceTable <- table(statsairlines.df$WidthDifference)
widthDifferenceTable
##
## 0 1 2 3 4
## 40 264 32 68 54
library(lattice)
histogram(~WidthDifference, data = statsairlines.df,
main = "Distribution of Difference in Seat Width", xlab="Difference in Seat Width", col='gray' )

The difference in seat width of Premium Economy and Economy seats are {0, 1, 2, 3, 4} inches. Result: The most frequently observed difference in width of seats in Premium Economy and Economy is 1 inch.
5b. Effect of Seat WidthDifference on the price of Economy and Premium Economy airfares.
aggregate(cbind(PriceEconomy,PricePremium, PriceRelative) ~ WidthDifference,
data = statsairlines.df, mean)
## WidthDifference PriceEconomy PricePremium PriceRelative
## 1 0 356.6250 385.9000 0.0847500
## 2 1 1428.4053 1966.0795 0.4184091
## 3 2 2884.7500 3197.4375 0.2296875
## 4 3 1631.7206 2717.7059 0.7282353
## 5 4 243.8519 435.6481 0.9707407
5c. Effect of Seat WidthDifference on the price of Economy and Premium Economy airfares.
boxplot(PriceRelative~WidthDifference,data=statsairlines.df, main="Relative Price Difference vs. Seat Width", ylab="Seat Width Difference", xlab="Relative Price b/w Economy and Premium Economy", horizontal=TRUE)

Result: The relative difference in airfare (PriceRelative) between Premium Economy and Economy seats increases as the difference in their seat width (WidthDifference) increases.
6. COMBINED ROLE OF DIFFERENCES IN PITCH AND SEAT WIDTH BETWEEN PREMIUM ECONOMY AND ECONOMY SEATS ON PRICING
pitchWidthTable <- xtabs(~WidthDifference + PitchDifference, data=statsairlines.df)
ftable(pitchWidthTable)
## PitchDifference 2 3 6 7 10
## WidthDifference
## 0 24 16 0 0 0
## 1 0 0 89 175 0
## 2 0 0 32 0 0
## 3 0 0 0 68 0
## 4 0 0 0 0 54
7. CORRGRAM.
library(corrgram)
corrgram(statsairlines.df, order=TRUE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Corrgram of Variables")

8. Boxplot of prices of economy.
boxplot(statsairlines.df$PriceEconomy,xlab="Price",ylab="Economy Class",horizontal=TRUE)

9. Boxplot of prices of premium-economy.
boxplot(statsairlines.df$PricePremium,xlab="Price",ylab="Premium-Economy Class",horizontal=TRUE)

10. Scatter plot of pitch-difference vs. price-relative
library(car)
scatterplot(statsairlines.df$PitchDifference,statsairlines.df$PriceRelative,xlab="Pitch Differnece",ylab="Price Relative")

5. Data Analysis 1:
1. (a) Testing null hypothesis that the prices of premium economy and economy class is equal.
1. (b) Testing null hypothesis that there is no significant difference in the PitchPremium, PitchEconomy.
1. (c) Testing null hypothesis that the WidthEconomy is greater than WidthPremium.
To test this, we will use the t-test as it helps in the comparison of two groups. A t-test is an analysis of two populations means through the use of statistical examination. We use this statistical test to compare our sample populations and determine if there is a significant difference between their means. The result of the t-test is a ‘t’ value; this value is then used to determine the p-value. To perform this analysis, R-studio is used.
Assumptions:
we assume that p-value this is the value you use to determine if the difference between the means in your sample populations is significant. For our purposes, a p-value < 0.05 suggests a significant difference between the means of our sample population and we would reject our null hypothesis. A p-value > 0.05 suggests no significant difference between the means of our sample populations and we would not reject our null hypothesis.
- t-test of PricePremium, PriceEconomy.
t.test(statsairlines.df$PricePremium, statsairlines.df$PriceEconomy)
##
## Welch Two Sample t-test
##
## data: statsairlines.df$PricePremium and statsairlines.df$PriceEconomy
## t = 6.8304, df = 856.56, p-value = 1.605e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 369.2793 667.0831
## sample estimates:
## mean of x mean of y
## 1845.258 1327.076
When we test the hypothesis using t-test, we see that the p-value is less than 0.05 which makes us reject the null hypothesis that they are equal. This shows that Obviously the ticket prices of the the economy and premium economy differ. Also while visualising the plot previously,i.e, premium economy ticket price vs economy price, we see that the premium economy ticket prices are higher than the economy ticket prices.
- t-test of PitchPremium, PitchEconomy.
t.test(statsairlines.df$PitchPremium, statsairlines.df$PitchEconomy)
##
## Welch Two Sample t-test
##
## data: statsairlines.df$PitchPremium and statsairlines.df$PitchEconomy
## t = 97.482, df = 671.02, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 6.553067 6.822479
## sample estimates:
## mean of x mean of y
## 37.90611 31.21834
When we test the second hypothesis using t-test, we see that the p-value is less than 0.05 which makes us reject the null hypothesis that there is no significant difference between the pitch size of the Economy and premium economy class. This makes us accept the alternative hypothesis and concludes that there is undoubtedly a significant difference between the sizes of the pitches.
- t-test on WidthPremium, WidthEconomy.
t.test(statsairlines.df$WidthPremium, statsairlines.df$WidthEconomy)
##
## Welch Two Sample t-test
##
## data: statsairlines.df$WidthPremium and statsairlines.df$WidthEconomy
## t = 28.4, df = 678.24, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.520276 1.746100
## sample estimates:
## mean of x mean of y
## 19.47162 17.83843
Here, when we test the third hypothesis using t-test, we see that the p-value is less than 0.05 which makes us reject the null hypothesis that the width of the Economy class is greater than the Premium Economy class. This makes us accept the alternative hypothesis and concludes that the width of the Premium Class is surely greater than the Economy.
The validity of the analysis is pretty clear. The t-tests performed here gives us a perfect measure on the significance of the test taking variables into account. According to the t-test if the p-value is less than 0.05, we reject the hypothesis. So when the hypothesis was rejected, we get the required evidence to fight against the hypothesis to ultimately prove it right.
6. Data Analysis 2:
2. Regression analysis to predict the price of premium-economy according to premium-pitch, premium-width, percent premium seats and price economy.
Regression analysis estimates the conditional expectation of the dependent variable given the independent variables - that is, the average value of the dependent variable when the independent variables are fixed. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Regression analysis is widely used for prediction and forecasting. In this analysis, regression analysis is used to see what are the variables which strongly affect the pricing of the Premium Economy and Economy class tickets.
airlineregg <- lm(statsairlines.df$PricePremium ~ statsairlines.df$PitchPremium+statsairlines.df$WidthPremium+statsairlines.df$PercentPremiumSeats+statsairlines.df$PriceEconomy+statsairlines.df$FlightDuration)
summary(airlineregg)
##
## Call:
## lm(formula = statsairlines.df$PricePremium ~ statsairlines.df$PitchPremium +
## statsairlines.df$WidthPremium + statsairlines.df$PercentPremiumSeats +
## statsairlines.df$PriceEconomy + statsairlines.df$FlightDuration)
##
## Residuals:
## Min 1Q Median 3Q Max
## -866.6 -253.3 -41.4 126.1 3467.2
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) -1.069e+03 7.005e+02 -1.527
## statsairlines.df$PitchPremium -8.009e+01 2.681e+01 -2.987
## statsairlines.df$WidthPremium 1.949e+02 3.249e+01 5.999
## statsairlines.df$PercentPremiumSeats 1.832e+01 4.850e+00 3.778
## statsairlines.df$PriceEconomy 1.057e+00 2.883e-02 36.660
## statsairlines.df$FlightDuration 6.393e+01 8.007e+00 7.984
## Pr(>|t|)
## (Intercept) 0.127584
## statsairlines.df$PitchPremium 0.002971 **
## statsairlines.df$WidthPremium 4.08e-09 ***
## statsairlines.df$PercentPremiumSeats 0.000179 ***
## statsairlines.df$PriceEconomy < 2e-16 ***
## statsairlines.df$FlightDuration 1.18e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 490.9 on 452 degrees of freedom
## Multiple R-squared: 0.8564, Adjusted R-squared: 0.8548
## F-statistic: 538.9 on 5 and 452 DF, p-value: < 2.2e-16
After performing the regression analysis where we have kept the price of the premium-economy as the dependent variable, we see that we use the regression analysis to predict the price of premium-economy according to premium-pitch, premium-width, percent premium seats, flight duration and price economy.The p-value for the independent variable being less than 0.05 depicts that the variables are highly responsible of the price fixing of the Premium-Economy class tickets.
The validity of the analysis can be seen as it correctly shows the dependency of certain variables on the independent variable. It is valid as long as these variables and only these airlines are taken into picture. When other airlines having varying inputs will be considered, the analysis will vary, might not that much, but it definitely will.
7. Final Summary of the Analysis:
–> Through the summary of the dataset it can be seen that mean price of economy class is 1327 whereas mean price of premium economy class is 1737. Comparing more than the economy price, whereas minimum would be 10% more than economy price.
–> While carrying out a t-test between two prices, null hypothesis is rejected and it can be concluded that there is a difference between the two prices and the prices of the premium-economy Class is fairly higher than the economy class.
–> While carrying the second t-test between the size of the pitches, null hypothesis is rejected and hence it can be concluded that there is a definite significant difference between the sizes of the pitches in the Premium-Economy and Economy class.
–> And during the third t-test analysis, we saw that the p-value being less than 0.05, null hypothesis is rejected which makes a conclusion that the Width of the Premium-economy is greater than the Economy Class.
–>Also it can be seen from the regression analysis that it accounts for approximately 85% of variance in prices. Also the p-value obtained through regression model is really less which is a good sign and tells us that it is a really good fit. Through the corrgram it can also be seen that price of premium is correlated with the price of economy which means it varies as the price of economy class varies. Overall it can be seen that the price difference between two classes is highly varied and also depends upon the duration when the ticket is booked.
8. Limitation of the study.
- Accuracy of secondary data is not known.
- The dependability of the source must be seen.
9. Scope for Further study.
This model has only focused on a few airlines and their respective qualities. But for further studies, more number of airlines can be taken into consideration and more number of analysis can be done to get a better understanding of the model.