Summary statistics:
describe(airlines)
## vars n mean sd median trimmed mad min
## Airline* 1 458 3.01 1.65 2.00 2.89 1.48 1.00
## Aircraft* 2 458 1.67 0.47 2.00 1.71 0.00 1.00
## FlightDuration 3 458 7.58 3.54 7.79 7.57 4.81 1.25
## TravelMonth* 4 458 2.56 1.17 3.00 2.58 1.48 1.00
## IsInternational* 5 458 1.91 0.28 2.00 2.00 0.00 1.00
## SeatsEconomy 6 458 202.31 76.37 185.00 194.64 85.99 78.00
## SeatsPremium 7 458 33.65 13.26 36.00 33.35 11.86 8.00
## PitchEconomy 8 458 31.22 0.66 31.00 31.26 0.00 30.00
## PitchPremium 9 458 37.91 1.31 38.00 38.05 0.00 34.00
## WidthEconomy 10 458 17.84 0.56 18.00 17.81 0.00 17.00
## WidthPremium 11 458 19.47 1.10 19.00 19.53 0.00 17.00
## PriceEconomy 12 458 1327.08 988.27 1242.00 1244.40 1159.39 65.00
## PricePremium 13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative 14 458 0.49 0.45 0.36 0.42 0.41 0.02
## SeatsTotal 15 458 235.96 85.29 227.00 228.73 90.44 98.00
## PitchDifference 16 458 6.69 1.76 7.00 6.76 0.00 2.00
## WidthDifference 17 458 1.63 1.19 1.00 1.53 0.00 0.00
## PercentPremiumSeats 18 458 14.65 4.84 13.21 14.31 2.68 4.71
## max range skew kurtosis se
## Airline* 6.00 5.00 0.61 -0.95 0.08
## Aircraft* 2.00 1.00 -0.72 -1.48 0.02
## FlightDuration 14.66 13.41 -0.07 -1.12 0.17
## TravelMonth* 4.00 3.00 -0.14 -1.46 0.05
## IsInternational* 2.00 1.00 -2.91 6.50 0.01
## SeatsEconomy 389.00 311.00 0.72 -0.36 3.57
## SeatsPremium 66.00 58.00 0.23 -0.46 0.62
## PitchEconomy 33.00 3.00 -0.03 -0.35 0.03
## PitchPremium 40.00 6.00 -1.51 3.52 0.06
## WidthEconomy 19.00 2.00 -0.04 -0.08 0.03
## WidthPremium 21.00 4.00 -0.08 -0.31 0.05
## PriceEconomy 3593.00 3528.00 0.51 -0.88 46.18
## PricePremium 7414.00 7328.00 0.50 0.43 60.19
## PriceRelative 1.89 1.87 1.17 0.72 0.02
## SeatsTotal 441.00 343.00 0.70 -0.53 3.99
## PitchDifference 10.00 8.00 -0.54 1.78 0.08
## WidthDifference 4.00 4.00 0.84 -0.53 0.06
## PercentPremiumSeats 24.69 19.98 0.71 0.28 0.23
Some plots to visualize the distribution of each variable independently:
par(mfrow=c(1,2))
boxplot(airlines$PriceEconomy)
boxplot(airlines$PricePremium)
Scatter Plots to understand how are the variables correlated pair-wise
par(mfrow=c(1,2))
plot(airlines$FlightDuration,airlines$PriceRelative)
plot(airlines$WidthDifference,airlines$PercentPremiumSeats)
Corrgram:
corrgram(airlines, order=TRUE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
diag.panel=panel.minmax,
main="Corrgram of Premium vs Economy")
Variance-Covariance Matrix:
data<-cbind(airlines[,c(3,6:18)])
"variance"
## [1] "variance"
var(data)
## FlightDuration SeatsEconomy SeatsPremium
## FlightDuration 12.5462183 52.9194291 7.57372426
## SeatsEconomy 52.9194291 5832.9154300 633.07060954
## SeatsPremium 7.5737243 633.0706095 175.86521648
## PitchEconomy 0.6817421 7.2117665 -0.29725856
## PitchPremium 0.4477835 11.9637325 0.08508595
## WidthEconomy 0.9014224 15.9105138 3.36977440
## WidthPremium 0.4019845 8.5832800 -0.03954019
## PriceEconomy 1983.5401655 9673.7944684 1489.38359627
## PricePremium 2959.9783043 17413.2541733 3717.36428960
## PriceRelative 0.1932368 0.1361699 -0.58078765
## SeatsTotal 60.4931534 6465.9860396 808.93582602
## PitchDifference -0.2339587 4.7519660 0.38234451
## WidthDifference -0.4994380 -7.3272338 -3.40931459
## PercentPremiumSeats 1.0379912 -122.3914537 31.14753127
## PitchEconomy PitchPremium WidthEconomy WidthPremium
## FlightDuration 0.6817421 0.44778348 0.90142242 0.40198446
## SeatsEconomy 7.2117665 11.96373253 15.91051379 8.58327998
## SeatsPremium -0.2972586 0.08508595 3.36977440 -0.03954019
## PitchEconomy 0.4292471 -0.47398546 0.10756500 -0.38766208
## PitchPremium -0.4739855 1.72639580 -0.01739081 1.08157435
## WidthEconomy 0.1075650 -0.01739081 0.31081765 0.05010845
## WidthPremium -0.3876621 1.08157435 0.05010845 1.20378776
## PriceEconomy 238.7031905 65.42513354 37.46095191 -61.85450011
## PricePremium 190.8517195 149.85356368 108.11611707 90.47997668
## PriceRelative -0.1248808 0.24719874 -0.01104335 0.24928593
## SeatsTotal 6.9145079 12.04881848 19.28028819 8.54373979
## PitchDifference -0.9032326 2.20038126 -0.12495581 1.46923643
## WidthDifference -0.4952271 1.09896515 -0.26070920 1.15367930
## PercentPremiumSeats -0.3261739 -1.11655834 0.61321816 -0.97393787
## PriceEconomy PricePremium PriceRelative
## FlightDuration 1983.54017 2959.97830 0.19323683
## SeatsEconomy 9673.79447 17413.25417 0.13616991
## SeatsPremium 1489.38360 3717.36429 -0.58078765
## PitchEconomy 238.70319 190.85172 -0.12488080
## PitchPremium 65.42513 149.85356 0.24719874
## WidthEconomy 37.46095 108.11612 -0.01104335
## WidthPremium -61.85450 90.47998 0.24928593
## PriceEconomy 976684.06198 1147494.76801 -128.49991725
## PricePremium 1147494.76801 1659293.11947 18.48428836
## PriceRelative -128.49992 18.48429 0.20302893
## SeatsTotal 11163.17806 21130.61846 -0.44461774
## PitchDifference -173.27806 -40.99816 0.37207954
## WidthDifference -99.31545 -17.63614 0.26032928
## PercentPremiumSeats 312.61077 726.01582 -0.35252750
## SeatsTotal PitchDifference WidthDifference
## FlightDuration 60.4931534 -0.2339587 -0.4994380
## SeatsEconomy 6465.9860396 4.7519660 -7.3272338
## SeatsPremium 808.9358260 0.3823445 -3.4093146
## PitchEconomy 6.9145079 -0.9032326 -0.4952271
## PitchPremium 12.0488185 2.2003813 1.0989652
## WidthEconomy 19.2802882 -0.1249558 -0.2607092
## WidthPremium 8.5437398 1.4692364 1.1536793
## PriceEconomy 11163.1780647 -173.2780570 -99.3154520
## PricePremium 21130.6184629 -40.9981558 -17.6361404
## PriceRelative -0.4446177 0.3720795 0.2603293
## SeatsTotal 7274.9218656 5.1343105 -10.7365484
## PitchDifference 5.1343105 3.1036138 1.5941922
## WidthDifference -10.7365484 1.5941922 1.4143885
## PercentPremiumSeats -91.2439224 -0.7903844 -1.5871560
## PercentPremiumSeats
## FlightDuration 1.0379912
## SeatsEconomy -122.3914537
## SeatsPremium 31.1475313
## PitchEconomy -0.3261739
## PitchPremium -1.1165583
## WidthEconomy 0.6132182
## WidthPremium -0.9739379
## PriceEconomy 312.6107669
## PricePremium 726.0158229
## PriceRelative -0.3525275
## SeatsTotal -91.2439224
## PitchDifference -0.7903844
## WidthDifference -1.5871560
## PercentPremiumSeats 23.4493343
"covariance"
## [1] "covariance"
cov(data)
## FlightDuration SeatsEconomy SeatsPremium
## FlightDuration 12.5462183 52.9194291 7.57372426
## SeatsEconomy 52.9194291 5832.9154300 633.07060954
## SeatsPremium 7.5737243 633.0706095 175.86521648
## PitchEconomy 0.6817421 7.2117665 -0.29725856
## PitchPremium 0.4477835 11.9637325 0.08508595
## WidthEconomy 0.9014224 15.9105138 3.36977440
## WidthPremium 0.4019845 8.5832800 -0.03954019
## PriceEconomy 1983.5401655 9673.7944684 1489.38359627
## PricePremium 2959.9783043 17413.2541733 3717.36428960
## PriceRelative 0.1932368 0.1361699 -0.58078765
## SeatsTotal 60.4931534 6465.9860396 808.93582602
## PitchDifference -0.2339587 4.7519660 0.38234451
## WidthDifference -0.4994380 -7.3272338 -3.40931459
## PercentPremiumSeats 1.0379912 -122.3914537 31.14753127
## PitchEconomy PitchPremium WidthEconomy WidthPremium
## FlightDuration 0.6817421 0.44778348 0.90142242 0.40198446
## SeatsEconomy 7.2117665 11.96373253 15.91051379 8.58327998
## SeatsPremium -0.2972586 0.08508595 3.36977440 -0.03954019
## PitchEconomy 0.4292471 -0.47398546 0.10756500 -0.38766208
## PitchPremium -0.4739855 1.72639580 -0.01739081 1.08157435
## WidthEconomy 0.1075650 -0.01739081 0.31081765 0.05010845
## WidthPremium -0.3876621 1.08157435 0.05010845 1.20378776
## PriceEconomy 238.7031905 65.42513354 37.46095191 -61.85450011
## PricePremium 190.8517195 149.85356368 108.11611707 90.47997668
## PriceRelative -0.1248808 0.24719874 -0.01104335 0.24928593
## SeatsTotal 6.9145079 12.04881848 19.28028819 8.54373979
## PitchDifference -0.9032326 2.20038126 -0.12495581 1.46923643
## WidthDifference -0.4952271 1.09896515 -0.26070920 1.15367930
## PercentPremiumSeats -0.3261739 -1.11655834 0.61321816 -0.97393787
## PriceEconomy PricePremium PriceRelative
## FlightDuration 1983.54017 2959.97830 0.19323683
## SeatsEconomy 9673.79447 17413.25417 0.13616991
## SeatsPremium 1489.38360 3717.36429 -0.58078765
## PitchEconomy 238.70319 190.85172 -0.12488080
## PitchPremium 65.42513 149.85356 0.24719874
## WidthEconomy 37.46095 108.11612 -0.01104335
## WidthPremium -61.85450 90.47998 0.24928593
## PriceEconomy 976684.06198 1147494.76801 -128.49991725
## PricePremium 1147494.76801 1659293.11947 18.48428836
## PriceRelative -128.49992 18.48429 0.20302893
## SeatsTotal 11163.17806 21130.61846 -0.44461774
## PitchDifference -173.27806 -40.99816 0.37207954
## WidthDifference -99.31545 -17.63614 0.26032928
## PercentPremiumSeats 312.61077 726.01582 -0.35252750
## SeatsTotal PitchDifference WidthDifference
## FlightDuration 60.4931534 -0.2339587 -0.4994380
## SeatsEconomy 6465.9860396 4.7519660 -7.3272338
## SeatsPremium 808.9358260 0.3823445 -3.4093146
## PitchEconomy 6.9145079 -0.9032326 -0.4952271
## PitchPremium 12.0488185 2.2003813 1.0989652
## WidthEconomy 19.2802882 -0.1249558 -0.2607092
## WidthPremium 8.5437398 1.4692364 1.1536793
## PriceEconomy 11163.1780647 -173.2780570 -99.3154520
## PricePremium 21130.6184629 -40.9981558 -17.6361404
## PriceRelative -0.4446177 0.3720795 0.2603293
## SeatsTotal 7274.9218656 5.1343105 -10.7365484
## PitchDifference 5.1343105 3.1036138 1.5941922
## WidthDifference -10.7365484 1.5941922 1.4143885
## PercentPremiumSeats -91.2439224 -0.7903844 -1.5871560
## PercentPremiumSeats
## FlightDuration 1.0379912
## SeatsEconomy -122.3914537
## SeatsPremium 31.1475313
## PitchEconomy -0.3261739
## PitchPremium -1.1165583
## WidthEconomy 0.6132182
## WidthPremium -0.9739379
## PriceEconomy 312.6107669
## PricePremium 726.0158229
## PriceRelative -0.3525275
## SeatsTotal -91.2439224
## PitchDifference -0.7903844
## WidthDifference -1.5871560
## PercentPremiumSeats 23.4493343
Consider some hypotheses: 1)Choice of aircraft is independent of %premium seats taken 2)Relative price of tickets is independent of Flight duration
T tests: Hypothesis 1:
t.test(airlines$PricePremium,airlines$FlightDuration)
##
## Welch Two Sample t-test
##
## data: airlines$PricePremium and airlines$FlightDuration
## t = 30.531, df = 457.01, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1719.395 1955.965
## sample estimates:
## mean of x mean of y
## 1845.257642 7.577838
Hypothesis 2:
t.test(airlines$PriceRelative,airlines$FlightDuration)
##
## Welch Two Sample t-test
##
## data: airlines$PriceRelative and airlines$FlightDuration
## t = -42.499, df = 471.79, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7.418482 -6.762785
## sample estimates:
## mean of x mean of y
## 0.4872052 7.5778384
Since tests of both give p-values <0.05, we can reject the null hypotheses.
Regression models Hypothesis 1:
summary(lm(PricePremium ~ FlightDuration, data = airlines))
##
## Call:
## lm(formula = PricePremium ~ FlightDuration, data = airlines)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2292.5 -664.7 -103.8 803.0 4093.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 57.45 108.39 0.53 0.596
## FlightDuration 235.93 12.96 18.20 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 981.4 on 456 degrees of freedom
## Multiple R-squared: 0.4209, Adjusted R-squared: 0.4196
## F-statistic: 331.4 on 1 and 456 DF, p-value: < 2.2e-16
Inferences: 1)The regression coefficient of 235.93 Significantly greater than 0(p value << 0.001) and there is an expected increase Of 235.93 units of price for every 1 hour increase in flight duration.
2)Multiple R squared indicates model accounts for 42.09% variance in Premium prices.
3)The residual standard error(981.4) in predicting the Premium prices from the flight duration
Hypothesis 2:
summary(lm(PriceRelative~FlightDuration,data=airlines))
##
## Call:
## lm(formula = PriceRelative ~ FlightDuration, data = airlines)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5507 -0.3373 -0.1167 0.2363 1.4694
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.370491 0.049454 7.492 3.56e-13 ***
## FlightDuration 0.015402 0.005913 2.605 0.0095 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4478 on 456 degrees of freedom
## Multiple R-squared: 0.01466, Adjusted R-squared: 0.0125
## F-statistic: 6.784 on 1 and 456 DF, p-value: 0.009498
Inferences: 1)The regression coefficient of 0.015402 It is greater than 0 and there is an expected increase Of 0.015402 units of price for every 1 hour increase in flight duration.
2)Multiple R squared indicates model accounts for 1.46% variance in Premium prices.
3)The residual standard error(0.4478) in predicting the Premium prices from the flight duration
Regression models- 1)PricePremium=(235.93)FlightDuration+57.45 2)PriceRelative=(0.015402)FlightDuration+0.370491
Low p-values indicate that models are good
Thus the T tests are verified and Null hypotheses are rejected and alternate hypotheses are accepted.
Finding what factors affect difference in price between an economy ticket and a premium-economy airline ticket:
summary(lm(PriceRelative~SeatsEconomy+FlightDuration+SeatsPremium+PitchDifference+WidthDifference+PercentPremiumSeats,data=airlines))
##
## Call:
## lm(formula = PriceRelative ~ SeatsEconomy + FlightDuration +
## SeatsPremium + PitchDifference + WidthDifference + PercentPremiumSeats,
## data = airlines)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.81222 -0.28724 -0.03929 0.15184 1.13902
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1070373 0.2013753 -0.532 0.595312
## SeatsEconomy -0.0003348 0.0008855 -0.378 0.705549
## FlightDuration 0.0228284 0.0053223 4.289 2.20e-05 ***
## SeatsPremium 0.0004936 0.0053482 0.092 0.926512
## PitchDifference 0.0629304 0.0163747 3.843 0.000139 ***
## WidthDifference 0.1107340 0.0259805 4.262 2.47e-05 ***
## PercentPremiumSeats -0.0088310 0.0126109 -0.700 0.484119
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3815 on 451 degrees of freedom
## Multiple R-squared: 0.2927, Adjusted R-squared: 0.2833
## F-statistic: 31.11 on 6 and 451 DF, p-value: < 2.2e-16
Analysis shows that Relative price between Premium economy and economy tickets primarily depends on FlightDuration,PitchDifference,WidthDifference