airline.df <- read.csv(paste("SixAirlinesDataV2.csv", sep=""))
attach(airline.df)
library(psych)
describe(airline.df)[,c(2,3,4,5,8,9)]
## n mean sd median min max
## Airline* 458 3.01 1.65 2.00 1.00 6.00
## Aircraft* 458 1.67 0.47 2.00 1.00 2.00
## FlightDuration 458 7.58 3.54 7.79 1.25 14.66
## TravelMonth* 458 2.56 1.17 3.00 1.00 4.00
## IsInternational* 458 1.91 0.28 2.00 1.00 2.00
## SeatsEconomy 458 202.31 76.37 185.00 78.00 389.00
## SeatsPremium 458 33.65 13.26 36.00 8.00 66.00
## PitchEconomy 458 31.22 0.66 31.00 30.00 33.00
## PitchPremium 458 37.91 1.31 38.00 34.00 40.00
## WidthEconomy 458 17.84 0.56 18.00 17.00 19.00
## WidthPremium 458 19.47 1.10 19.00 17.00 21.00
## PriceEconomy 458 1327.08 988.27 1242.00 65.00 3593.00
## PricePremium 458 1845.26 1288.14 1737.00 86.00 7414.00
## PriceRelative 458 0.49 0.45 0.36 0.02 1.89
## SeatsTotal 458 235.96 85.29 227.00 98.00 441.00
## PitchDifference 458 6.69 1.76 7.00 2.00 10.00
## WidthDifference 458 1.63 1.19 1.00 0.00 4.00
## PercentPremiumSeats 458 14.65 4.84 13.21 4.71 24.69
Premium Economy seats are more expensive than Economy seats, as expected
# Scatterplot Matrices from the car Package
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplotMatrix(~PricePremium+PriceEconomy+PitchDifference+WidthDifference, data=airline.df,
main="Premium Economy vs. Economy Airfares")
scatterplotMatrix(~PricePremium+PriceEconomy+SeatsTotal+PercentPremiumSeats, data=airline.df,
main="Premium Economy vs. Economy Airfares")
library(Hmisc)
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
##
## Attaching package: 'Hmisc'
## The following object is masked from 'package:psych':
##
## describe
## The following objects are masked from 'package:base':
##
## format.pval, units
colairlines <- c("PricePremium","PriceEconomy","PitchDifference","WidthDifference")
corMatrix <- rcorr(as.matrix(airline.df[,colairlines]))
corMatrix
## PricePremium PriceEconomy PitchDifference WidthDifference
## PricePremium 1.00 0.90 -0.02 -0.01
## PriceEconomy 0.90 1.00 -0.10 -0.08
## PitchDifference -0.02 -0.10 1.00 0.76
## WidthDifference -0.01 -0.08 0.76 1.00
##
## n= 458
##
##
## P
## PricePremium PriceEconomy PitchDifference WidthDifference
## PricePremium 0.0000 0.6998 0.8059
## PriceEconomy 0.0000 0.0332 0.0708
## PitchDifference 0.6998 0.0332 0.0000
## WidthDifference 0.8059 0.0708 0.0000
colairlines2 <- c("PricePremium","PriceEconomy","SeatsTotal","PercentPremiumSeats")
corMatrix2 <- rcorr(as.matrix(airline.df[,colairlines2]))
corMatrix2
## PricePremium PriceEconomy SeatsTotal
## PricePremium 1.00 0.90 0.19
## PriceEconomy 0.90 1.00 0.13
## SeatsTotal 0.19 0.13 1.00
## PercentPremiumSeats 0.12 0.07 -0.22
## PercentPremiumSeats
## PricePremium 0.12
## PriceEconomy 0.07
## SeatsTotal -0.22
## PercentPremiumSeats 1.00
##
## n= 458
##
##
## P
## PricePremium PriceEconomy SeatsTotal
## PricePremium 0.0000 0.0000
## PriceEconomy 0.0000 0.0045
## SeatsTotal 0.0000 0.0045
## PercentPremiumSeats 0.0127 0.1628 0.0000
## PercentPremiumSeats
## PricePremium 0.0127
## PriceEconomy 0.1628
## SeatsTotal 0.0000
## PercentPremiumSeats
library(Hmisc)
library(car)
library(corrgram)
colairlines <- c("PricePremium","PriceEconomy","PitchDifference","WidthDifference","SeatsTotal","PercentPremiumSeats")
corrgram(airline.df[,colairlines], order=TRUE,
main="Premium Economy vs. Economy Airfares",
lower.panel=panel.pts, upper.panel=panel.pie,
diag.panel=panel.minmax, text.panel=panel.txt)
In this model we try regressing Price Premium on ALL the remaining columns.
Model1 <- PricePremium ~ PriceEconomy + PitchDifference + WidthDifference + PercentPremiumSeats + SeatsTotal + IsInternational + TravelMonth + FlightDuration + Aircraft
fit1 <- lm(Model1, data = airline.df)
summary(fit1)
##
## Call:
## lm(formula = Model1, data = airline.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -977.2 -246.3 -47.9 135.2 3419.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.211e+03 1.755e+02 -6.898 1.82e-11 ***
## PriceEconomy 1.064e+00 3.114e-02 34.175 < 2e-16 ***
## PitchDifference 8.510e+01 3.913e+01 2.175 0.030163 *
## WidthDifference 1.240e+02 3.438e+01 3.607 0.000345 ***
## PercentPremiumSeats 3.177e+01 5.250e+00 6.052 3.04e-09 ***
## SeatsTotal 1.925e+00 3.360e-01 5.729 1.87e-08 ***
## IsInternationalInternational -7.537e+02 2.135e+02 -3.530 0.000458 ***
## TravelMonthJul -3.441e+01 7.074e+01 -0.486 0.626904
## TravelMonthOct 2.692e+01 6.036e+01 0.446 0.655795
## TravelMonthSep -2.097e+00 6.015e+01 -0.035 0.972203
## FlightDuration 8.455e+01 8.809e+00 9.598 < 2e-16 ***
## AircraftBoeing -2.082e+00 5.651e+01 -0.037 0.970625
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 480.7 on 446 degrees of freedom
## Multiple R-squared: 0.8641, Adjusted R-squared: 0.8607
## F-statistic: 257.7 on 11 and 446 DF, p-value: < 2.2e-16
Next, we want to find the best fit model. We will use regsubsets()
from library leaps
library(leaps)
## Warning: package 'leaps' was built under R version 3.4.3
leap1 <- regsubsets(Model1, data = airline.df, nbest=1)
# summary(leap1)
plot(leap1, scale="adjr2")
The best fit model excludes
TravelMonth
and Aircraft
type (Boeing or AirBus). Therefore, in our next model, we rerun the regression, excluding these variables.
Model2 <- PricePremium ~ PriceEconomy + PitchDifference + WidthDifference + PercentPremiumSeats + SeatsTotal + FlightDuration + IsInternational
fit2 <- lm(Model2, data = airline.df)
summary(fit2)
##
## Call:
## lm(formula = Model2, data = airline.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1010.0 -258.4 -49.9 133.6 3416.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.213e+03 1.695e+02 -7.156 3.40e-12 ***
## PriceEconomy 1.063e+00 3.077e-02 34.537 < 2e-16 ***
## PitchDifference 8.421e+01 3.656e+01 2.303 0.021722 *
## WidthDifference 1.224e+02 3.373e+01 3.629 0.000318 ***
## PercentPremiumSeats 3.190e+01 5.220e+00 6.112 2.14e-09 ***
## SeatsTotal 1.920e+00 3.241e-01 5.922 6.31e-09 ***
## FlightDuration 8.459e+01 8.507e+00 9.943 < 2e-16 ***
## IsInternationalInternational -7.412e+02 2.001e+02 -3.704 0.000238 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 479 on 450 degrees of freedom
## Multiple R-squared: 0.8638, Adjusted R-squared: 0.8617
## F-statistic: 407.9 on 7 and 450 DF, p-value: < 2.2e-16
We measure the difference in quality between Premium Economy and Economy in two ways: The difference in pitch (PitchDifference) measures the additional legroom in Premium Economy seats, compared to Economy seats. The difference in width (WidthDifference) measures the additional width of the seat in Premium Economy, compared to Economy.
Effect of Quality Difference:
The airfare for Premium Economy, relative to the Economy airfare: 1) increases as the difference in pitch between Premium Economy and Economy seats increases 2) increases as the difference in seat width between Premium Economy and Economy seats increases.
Effect of the Supply of Premium Economy seats:
The airfare for Premium Economy, relative to the Economy airfare: 1) increases as the total number of seats in the airplane increases 2) increases as the percentage of Premium Economy seats in the plane increases
library(coefplot)
coefplot(fit2, intercept= FALSE, outerCI=1.96,coefficients=c("PriceEconomy","PitchDifference", "WidthDifference", "PercentPremiumSeats", "SeatsTotal", "FlightDuration"))
# the Adjusted R Squared for Model 2 is less than Model 1
summary(fit1)$adj.r.squared
## [1] 0.8607235
summary(fit2)$adj.r.squared
## [1] 0.861724
# the AIC for Model 2 is less than Model 1
AIC(fit1)
## [1] 6970.166
AIC(fit2)
## [1] 6962.954
Thus, Model 2 is our ‘best’ ordinary least squares model. Model 2 predicts the price of the premium economy seat PricePremium
, as a function of the following explanatory variables: “PriceEconomy”,“PitchDifference”, “WidthDifference”, “PercentPremiumSeats”, “SeatsTotal”, “FlightDuration”