This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
air<-read.csv(paste("SixAirlinesDatav2.csv",sep=""))
str(air)
## 'data.frame': 458 obs. of 18 variables:
## $ Airline : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Aircraft : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
## $ FlightDuration : num 12.25 12.25 12.25 12.25 8.16 ...
## $ TravelMonth : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
## $ IsInternational : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
## $ SeatsEconomy : int 122 122 122 122 122 122 122 122 122 122 ...
## $ SeatsPremium : int 40 40 40 40 40 40 40 40 40 40 ...
## $ PitchEconomy : int 31 31 31 31 31 31 31 31 31 31 ...
## $ PitchPremium : int 38 38 38 38 38 38 38 38 38 38 ...
## $ WidthEconomy : int 18 18 18 18 18 18 18 18 18 18 ...
## $ WidthPremium : int 19 19 19 19 19 19 19 19 19 19 ...
## $ PriceEconomy : int 2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
## $ PricePremium : int 3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
## $ PriceRelative : num 0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
## $ SeatsTotal : int 162 162 162 162 162 162 162 162 162 162 ...
## $ PitchDifference : int 7 7 7 7 7 7 7 7 7 7 ...
## $ WidthDifference : int 1 1 1 1 1 1 1 1 1 1 ...
## $ PercentPremiumSeats: num 24.7 24.7 24.7 24.7 24.7 ...
Data set Description
library(psych)
describe(air)
## vars n mean sd median trimmed mad min
## Airline* 1 458 3.01 1.65 2.00 2.89 1.48 1.00
## Aircraft* 2 458 1.67 0.47 2.00 1.71 0.00 1.00
## FlightDuration 3 458 7.58 3.54 7.79 7.57 4.81 1.25
## TravelMonth* 4 458 2.56 1.17 3.00 2.58 1.48 1.00
## IsInternational* 5 458 1.91 0.28 2.00 2.00 0.00 1.00
## SeatsEconomy 6 458 202.31 76.37 185.00 194.64 85.99 78.00
## SeatsPremium 7 458 33.65 13.26 36.00 33.35 11.86 8.00
## PitchEconomy 8 458 31.22 0.66 31.00 31.26 0.00 30.00
## PitchPremium 9 458 37.91 1.31 38.00 38.05 0.00 34.00
## WidthEconomy 10 458 17.84 0.56 18.00 17.81 0.00 17.00
## WidthPremium 11 458 19.47 1.10 19.00 19.53 0.00 17.00
## PriceEconomy 12 458 1327.08 988.27 1242.00 1244.40 1159.39 65.00
## PricePremium 13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative 14 458 0.49 0.45 0.36 0.42 0.41 0.02
## SeatsTotal 15 458 235.96 85.29 227.00 228.73 90.44 98.00
## PitchDifference 16 458 6.69 1.76 7.00 6.76 0.00 2.00
## WidthDifference 17 458 1.63 1.19 1.00 1.53 0.00 0.00
## PercentPremiumSeats 18 458 14.65 4.84 13.21 14.31 2.68 4.71
## max range skew kurtosis se
## Airline* 6.00 5.00 0.61 -0.95 0.08
## Aircraft* 2.00 1.00 -0.72 -1.48 0.02
## FlightDuration 14.66 13.41 -0.07 -1.12 0.17
## TravelMonth* 4.00 3.00 -0.14 -1.46 0.05
## IsInternational* 2.00 1.00 -2.91 6.50 0.01
## SeatsEconomy 389.00 311.00 0.72 -0.36 3.57
## SeatsPremium 66.00 58.00 0.23 -0.46 0.62
## PitchEconomy 33.00 3.00 -0.03 -0.35 0.03
## PitchPremium 40.00 6.00 -1.51 3.52 0.06
## WidthEconomy 19.00 2.00 -0.04 -0.08 0.03
## WidthPremium 21.00 4.00 -0.08 -0.31 0.05
## PriceEconomy 3593.00 3528.00 0.51 -0.88 46.18
## PricePremium 7414.00 7328.00 0.50 0.43 60.19
## PriceRelative 1.89 1.87 1.17 0.72 0.02
## SeatsTotal 441.00 343.00 0.70 -0.53 3.99
## PitchDifference 10.00 8.00 -0.54 1.78 0.08
## WidthDifference 4.00 4.00 0.84 -0.53 0.06
## PercentPremiumSeats 24.69 19.98 0.71 0.28 0.23
Data set Summary
summary(air)
## Airline Aircraft FlightDuration TravelMonth
## AirFrance: 74 AirBus:151 Min. : 1.250 Aug:127
## British :175 Boeing:307 1st Qu.: 4.260 Jul: 75
## Delta : 46 Median : 7.790 Oct:127
## Jet : 61 Mean : 7.578 Sep:129
## Singapore: 40 3rd Qu.:10.620
## Virgin : 62 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 40 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:418 1st Qu.:133.0 1st Qu.:21.00 1st Qu.:31.00
## Median :185.0 Median :36.00 Median :31.00
## Mean :202.3 Mean :33.65 Mean :31.22
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 65
## 1st Qu.:38.00 1st Qu.:18.00 1st Qu.:19.00 1st Qu.: 413
## Median :38.00 Median :18.00 Median :19.00 Median :1242
## Mean :37.91 Mean :17.84 Mean :19.47 Mean :1327
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1909
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 86.0 Min. :0.0200 Min. : 98 Min. : 2.000
## 1st Qu.: 528.8 1st Qu.:0.1000 1st Qu.:166 1st Qu.: 6.000
## Median :1737.0 Median :0.3650 Median :227 Median : 7.000
## Mean :1845.3 Mean :0.4872 Mean :236 Mean : 6.688
## 3rd Qu.:2989.0 3rd Qu.:0.7400 3rd Qu.:279 3rd Qu.: 7.000
## Max. :7414.0 Max. :1.8900 Max. :441 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.28
## Median :1.000 Median :13.21
## Mean :1.633 Mean :14.65
## 3rd Qu.:3.000 3rd Qu.:15.36
## Max. :4.000 Max. :24.69
plot(air$IsInternational,main = "domestic vs international flights",col="grey")
plot(air$TravelMonth,main = "Monthly Travel",col="grey", xlab="month", ylab="count")
plot(air$Airline, air$SeatsEconomy, main="Airline and The Total Number of seats in Economy Class",col = c("light blue","light green","orange","cyan","grey","light yellow"))
plot(air$Airline, air$SeatsPremium, main="Airline and The Total Number of seats in Premium Class",col = c("light blue","light green","orange","cyan","grey","light yellow"))
boxplot(FlightDuration~Aircraft,data=air,xlab="Aircraft type", ylab="Flight duration",col = c("light blue","light green"))
par(mfrow=c(1,2))
hist(air$WidthEconomy, xlab="Economy Seats Width",col = "violet",main="Economy class")
hist(air$WidthPremium, xlab="Premium Seats Width",col = "grey",main="Premium class")
par(mfrow=c(1,2))
hist(air$PitchEconomy, xlab="Economy Seats Pitch",col = "light blue",main="Economy class ")
hist(air$PitchPremium, xlab="Premium Seats Pitch",col = "green",main="Premium class ")
par(mfrow=c(1,2))
hist(air$PriceEconomy, xlab="Economy Seats Price",col = "orange",main="Economy class")
hist(air$PricePremium, xlab="Premium Seats Price",col = "light green",main="Premium class")
aggregate(air$PricePremium~air$Airline, FUN=mean)
## air$Airline air$PricePremium
## 1 AirFrance 3065.2162
## 2 British 1937.0286
## 3 Delta 684.6739
## 4 Jet 483.3607
## 5 Singapore 1239.9250
## 6 Virgin 2721.6935
aggregate(air$PricePremium~air$Airline, FUN=mean)
## air$Airline air$PricePremium
## 1 AirFrance 3065.2162
## 2 British 1937.0286
## 3 Delta 684.6739
## 4 Jet 483.3607
## 5 Singapore 1239.9250
## 6 Virgin 2721.6935
At higher percentage relative pricing seem to be lower in general.
aggregate(air$PriceRelative~air$PercentPremiumSeats, FUN = mean)
## air$PercentPremiumSeats air$PriceRelative
## 1 4.71 0.95263158
## 2 8.90 0.20928571
## 3 9.76 0.80000000
## 4 10.00 1.32000000
## 5 10.57 0.60416667
## 6 11.43 1.15370370
## 7 12.12 0.03000000
## 8 12.28 0.08590909
## 9 12.50 0.23125000
## 10 12.82 0.07200000
## 11 12.90 0.50414634
## 12 13.04 0.06200000
## 13 13.13 0.10750000
## 14 13.21 0.34958333
## 15 14.02 0.61111111
## 16 14.50 0.09500000
## 17 14.97 0.74000000
## 18 14.99 0.31000000
## 19 15.02 1.03523810
## 20 15.36 0.32211538
## 21 16.46 0.09000000
## 22 16.87 0.39625000
## 23 18.73 0.73500000
## 24 20.41 0.06125000
## 25 20.60 0.44666667
## 26 23.49 0.41600000
## 27 24.69 0.42254902
It is confirmed that higher perks would lead to more relative pricing. A width difference of 4 led to the almost doubling of prices from economic to premium.
library(lattice)
bwplot(PriceRelative~Airline|IsInternational, data=air)
bwplot(PriceRelative~Aircraft|IsInternational, data=air)
bwplot(PriceRelative~TravelMonth, data=air)
Relative prices were higher for Boeing than Airbus
library (car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
par(mfrow=c(1,2))
with(air,plot(Airline,PriceEconomy, xlab = "Airline", ylab="Price Economy"))
with(air,plot(Airline,PricePremium,xlab = "Airline", ylab="Price Premium"))
par(mfrow=c(1,2))
with(air,plot(FlightDuration,PriceEconomy))
with(air,plot(FlightDuration,PricePremium))
we notice that flight durations between 0 to 4 hrs have low prices while that for 5 to 14 hrs is widely distributed
par(mfrow=c(1,2))
with(air,plot(TravelMonth,PriceEconomy, ylab = "Economy Price"))
with(air,plot(TravelMonth,PricePremium, ylab="Premium Price"))
pricing for the month of july has a lower mean than the other months
pairs(formula=~PriceRelative+PitchDifference, data=air)
library(corrgram)
corrgram(air, order=NULL, panel=panel.cor,text.panel=panel.txt,main="Corrogram")
corrgram(air, order=TRUE, upper.panel=panel.pie,lower.panel=panel.shade, text.panel=panel.txt,main="Corrgram")
This confirms the hypothesis that premium seats with more perks would be priced more higher relative to economy tickets.
Using the probable factors, a regression model is proposed
T-test Hypotheses H1: There is no relation between relative price and width difference. H2: There is no relation between relative price and pitch difference.
fit<-lm(PriceRelative~WidthDifference+PitchDifference, data=air)
summary(fit)
##
## Call:
## lm(formula = PriceRelative ~ WidthDifference + PitchDifference,
## data = air)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.84163 -0.28484 -0.07241 0.17698 1.18778
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.10514 0.08304 -1.266 0.206077
## WidthDifference 0.11621 0.02356 4.933 1.14e-06 ***
## PitchDifference 0.06019 0.01590 3.785 0.000174 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3886 on 455 degrees of freedom
## Multiple R-squared: 0.2593, Adjusted R-squared: 0.2561
## F-statistic: 79.65 on 2 and 455 DF, p-value: < 2.2e-16
fit<-lm(PricePremium~PriceEconomy+WidthDifference+PitchDifference, data=air)
summary(fit)
##
## Call:
## lm(formula = PricePremium ~ PriceEconomy + WidthDifference +
## PitchDifference, data = air)
##
## Residuals:
## Min 1Q Median 3Q Max
## -809.9 -325.8 -97.1 176.3 3470.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -33.37619 125.69676 -0.266 0.7907
## PriceEconomy 1.18456 0.02623 45.152 <2e-16 ***
## WidthDifference 26.25535 33.43032 0.785 0.4326
## PitchDifference 39.43892 22.59939 1.745 0.0816 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 551.5 on 454 degrees of freedom
## Multiple R-squared: 0.8179, Adjusted R-squared: 0.8167
## F-statistic: 679.9 on 3 and 454 DF, p-value: < 2.2e-16
The Null hypothesis is rejected as the p value << 0.05 and Relative price mainly depends on the width and pitch difference of the seats.
fit1<-lm(air$PriceRelative~air$TravelMonth)
summary(fit1)
##
## Call:
## lm(formula = air$PriceRelative ~ air$TravelMonth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4908 -0.3779 -0.1179 0.2523 1.4321
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.47661 0.04005 11.899 <2e-16 ***
## air$TravelMonthJul 0.02205 0.06574 0.335 0.737
## air$TravelMonthOct 0.04417 0.05665 0.780 0.436
## air$TravelMonthSep -0.01871 0.05643 -0.332 0.740
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4514 on 454 degrees of freedom
## Multiple R-squared: 0.002997, Adjusted R-squared: -0.003591
## F-statistic: 0.4549 on 3 and 454 DF, p-value: 0.714
Model1 <- PricePremium ~ PitchPremium + WidthPremium + SeatsPremium
fit2 <- lm(Model1, data = air)
summary(fit2)
##
## Call:
## lm(formula = Model1, data = air)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2219.2 -936.9 -120.4 1078.6 5762.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2127.171 1736.937 -1.225 0.221
## PitchPremium 87.481 67.656 1.293 0.197
## WidthPremium -2.744 81.021 -0.034 0.973
## SeatsPremium 21.095 4.432 4.760 2.61e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1256 on 454 degrees of freedom
## Multiple R-squared: 0.05501, Adjusted R-squared: 0.04877
## F-statistic: 8.809 on 3 and 454 DF, p-value: 1.094e-05
P-value ??? 0.05 : The correlation is statistically significant
k<-lm(FlightDuration~PricePremium+PriceEconomy, data=air)
summary(k)
##
## Call:
## lm(formula = FlightDuration ~ PricePremium + PriceEconomy, data = air)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4012 -2.0051 -0.6418 1.2002 7.6382
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.3037849 0.2208679 19.486 <2e-16 ***
## PricePremium 0.0020235 0.0002262 8.945 <2e-16 ***
## PriceEconomy -0.0003465 0.0002949 -1.175 0.241
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.697 on 455 degrees of freedom
## Multiple R-squared: 0.4226, Adjusted R-squared: 0.4201
## F-statistic: 166.5 on 2 and 455 DF, p-value: < 2.2e-16
t.test(I(air$PriceEconomy+air$PricePremium)~air$Aircraft)
##
## Welch Two Sample t-test
##
## data: I(air$PriceEconomy + air$PricePremium) by air$Aircraft
## t = 0.4542, df = 300.02, p-value = 0.65
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -333.7254 534.0008
## sample estimates:
## mean in group AirBus mean in group Boeing
## 3239.457 3139.319
Final Summary: Difference between the maximum and minimum ticket cost (for both classes) is dependent upon the airline type.
The Airline factor is statistically related to the economy class air ticket price, the premium economy class air ticket price and the relative price of both the classes from the correlation tests for the same.
Due to the p value being < 0.05, fight duration factor is very much significantly important in determining the difference in the prices of the economy class and the premium economy class air tickets.
The Aircraft factor is negatively correlated to the difference in the prices of the economy and premium economy class air tickets due to negative correlation coefficient. With a p-value > 0.05, the Aircraft factor s not a significant contributor to the latter.