setwd("C:/Users/harsh/Desktop/r")
airlines<- read.csv("SixAirlinesDataV2.csv")
attach(airlines)
library(psych)
describe(airlines)
## vars n mean sd median trimmed mad min
## Airline* 1 458 3.01 1.65 2.00 2.89 1.48 1.00
## Aircraft* 2 458 1.67 0.47 2.00 1.71 0.00 1.00
## FlightDuration 3 458 7.58 3.54 7.79 7.57 4.81 1.25
## TravelMonth* 4 458 2.56 1.17 3.00 2.58 1.48 1.00
## IsInternational* 5 458 1.91 0.28 2.00 2.00 0.00 1.00
## SeatsEconomy 6 458 202.31 76.37 185.00 194.64 85.99 78.00
## SeatsPremium 7 458 33.65 13.26 36.00 33.35 11.86 8.00
## PitchEconomy 8 458 31.22 0.66 31.00 31.26 0.00 30.00
## PitchPremium 9 458 37.91 1.31 38.00 38.05 0.00 34.00
## WidthEconomy 10 458 17.84 0.56 18.00 17.81 0.00 17.00
## WidthPremium 11 458 19.47 1.10 19.00 19.53 0.00 17.00
## PriceEconomy 12 458 1327.08 988.27 1242.00 1244.40 1159.39 65.00
## PricePremium 13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative 14 458 0.49 0.45 0.36 0.42 0.41 0.02
## SeatsTotal 15 458 235.96 85.29 227.00 228.73 90.44 98.00
## PitchDifference 16 458 6.69 1.76 7.00 6.76 0.00 2.00
## WidthDifference 17 458 1.63 1.19 1.00 1.53 0.00 0.00
## PercentPremiumSeats 18 458 14.65 4.84 13.21 14.31 2.68 4.71
## max range skew kurtosis se
## Airline* 6.00 5.00 0.61 -0.95 0.08
## Aircraft* 2.00 1.00 -0.72 -1.48 0.02
## FlightDuration 14.66 13.41 -0.07 -1.12 0.17
## TravelMonth* 4.00 3.00 -0.14 -1.46 0.05
## IsInternational* 2.00 1.00 -2.91 6.50 0.01
## SeatsEconomy 389.00 311.00 0.72 -0.36 3.57
## SeatsPremium 66.00 58.00 0.23 -0.46 0.62
## PitchEconomy 33.00 3.00 -0.03 -0.35 0.03
## PitchPremium 40.00 6.00 -1.51 3.52 0.06
## WidthEconomy 19.00 2.00 -0.04 -0.08 0.03
## WidthPremium 21.00 4.00 -0.08 -0.31 0.05
## PriceEconomy 3593.00 3528.00 0.51 -0.88 46.18
## PricePremium 7414.00 7328.00 0.50 0.43 60.19
## PriceRelative 1.89 1.87 1.17 0.72 0.02
## SeatsTotal 441.00 343.00 0.70 -0.53 3.99
## PitchDifference 10.00 8.00 -0.54 1.78 0.08
## WidthDifference 4.00 4.00 0.84 -0.53 0.06
## PercentPremiumSeats 24.69 19.98 0.71 0.28 0.23
The average price of an economy seat is USD 1327, while the average price of a premium-economy seat is USD 1845, i.e premium-economy seats are pricier by almost 49%.
par(mfrow=c(3,3))
plot(Airline, xlab="Airline")
hist(FlightDuration , xlab="Flight duration")
hist(SeatsEconomy, xlab="Number of Economy Seats")
hist(SeatsPremium, xlab="Number of Premium Seats")
hist(WidthEconomy, xlab="Width of Economy Seats")
hist(WidthPremium, xlab="Width of Premium Economy Seats")
hist(PriceEconomy, xlab="Price of Economy Seats")
hist(PricePremium, xlab="Price of Premium Economy Seats")
par(mfrow=c(1,1))
The difference in price can be explained by the enhanced features in premium-economy seats, i.e greater legroom (pitch) and wider seats. On average, premium-economy seats have 6.6 inches more legroom than economy seats, and are wider by around 1.6 inches. There are some other factor affecting the price of the airline ticket, like flight duration, whether the flight is international or domestic or factors like the type of airline it is(Boeing or Airbus).
airlines$PriceDifference <- airlines$PricePremium - airlines$PriceEconomy
boxplot(airlines$PriceDifference~airlines$Airline, ylab="Price Difference", xlab="Airline", main="Boxplot of Price Difference vs. Airline", col=c("red","orangered","yellow2","green3","skyblue","blue2"))
boxplot(airlines$PriceDifference~airlines$Aircraft, ylab="Price Difference", xlab="Aircraft", main="Boxplot of Economy Price vs. Aircraft", col=c("red","orangered","yellow2","green3","skyblue","blue2"))
boxplot(airlines$PriceDifference~airlines$TravelMonth, ylab="Price Difference", xlab="Travel Month", main="Boxplot of Price Difference vs. Month of Travel", col=c("red","orangered","yellow2","green3","skyblue","blue2"))
boxplot(airlines$PriceDifference~airlines$IsInternational, ylab="Price Difference", main="Boxplot of Price Difference vs. Type of flight", col=c("orangered","red","yellow2","green3","skyblue","blue2"))
boxplot(airlines$PriceDifference~airlines$SeatsTotal, ylab="Price Difference", xlab="Number of Total seats", main="Boxplot of Price Difference vs. Number of Total seats", col=c("red","orangered","yellow2","green3","skyblue","blue2"))
boxplot(airlines$PriceDifference~airlines$PercentPremiumSeats, ylab="Price Difference", xlab="PercentPremiumSeats", main="Boxplot of Premium Difference vs. Percentage of Premium seats", col=c("red","orangered","yellow2","green3","skyblue","blue2"))
boxplot(airlines$PriceDifference~airlines$PitchDifference, ylab="Price Difference", xlab="Pitch Difference", main="Boxplot of Price Difference vs.Pitch Difference", col=c("yellow2","green3","skyblue","blue2"))
boxplot(airlines$PriceDifference~airlines$WidthDifference, ylab="Price Difference", xlab="Width Difference", main="Boxplot of Price Difference vs.Width Difference", col=c("yellow2","green3","skyblue","blue2"))
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplotMatrix( ~ WidthDifference + PitchDifference + PriceDifference , data = airlines , diagonal = "histogram")
scatterplotMatrix( ~ SeatsTotal + PercentPremiumSeats + PriceDifference, data = airlines , diagonal = "histogram")
scatterplotMatrix( ~ FlightDuration + Aircraft + PriceDifference, data = airlines , diagonal = "histogram")
scatterplotMatrix( ~ FlightDuration+ Airline + PriceDifference, data = airlines , diagonal = "histogram")
scatterplotMatrix( ~ TravelMonth + Airline + PriceDifference, data = airlines , diagonal = "histogram")
scatterplotMatrix( ~ TravelMonth + Aircraft + PriceDifference, data = airlines , diagonal = "histogram")
scatterplotMatrix( ~ WidthDifference + Airline + PriceDifference , data = airlines , diagonal = "histogram")
scatterplotMatrix( ~ PitchDifference + Airline + PriceDifference , data = airlines , diagonal = "histogram")
library(corrgram)
corrgram(airlines,upper.panel= panel.pie, main="Corrgram of Airlines Data")
#Correlation Matrix
correlationmatrix <- cor(airlines[,6:19])
round(correlationmatrix,digits = 2)
## SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy 1.00 0.63 0.14 0.12
## SeatsPremium 0.63 1.00 -0.03 0.00
## PitchEconomy 0.14 -0.03 1.00 -0.55
## PitchPremium 0.12 0.00 -0.55 1.00
## WidthEconomy 0.37 0.46 0.29 -0.02
## WidthPremium 0.10 0.00 -0.54 0.75
## PriceEconomy 0.13 0.11 0.37 0.05
## PricePremium 0.18 0.22 0.23 0.09
## PriceRelative 0.00 -0.10 -0.42 0.42
## SeatsTotal 0.99 0.72 0.12 0.11
## PitchDifference 0.04 0.02 -0.78 0.95
## WidthDifference -0.08 -0.22 -0.64 0.70
## PercentPremiumSeats -0.33 0.49 -0.10 -0.18
## PriceDifference 0.17 0.29 -0.13 0.11
## WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy 0.37 0.10 0.13 0.18
## SeatsPremium 0.46 0.00 0.11 0.22
## PitchEconomy 0.29 -0.54 0.37 0.23
## PitchPremium -0.02 0.75 0.05 0.09
## WidthEconomy 1.00 0.08 0.07 0.15
## WidthPremium 0.08 1.00 -0.06 0.06
## PriceEconomy 0.07 -0.06 1.00 0.90
## PricePremium 0.15 0.06 0.90 1.00
## PriceRelative -0.04 0.50 -0.29 0.03
## SeatsTotal 0.41 0.09 0.13 0.19
## PitchDifference -0.13 0.76 -0.10 -0.02
## WidthDifference -0.39 0.88 -0.08 -0.01
## PercentPremiumSeats 0.23 -0.18 0.07 0.12
## PriceDifference 0.22 0.24 0.30 0.68
## PriceRelative SeatsTotal PitchDifference
## SeatsEconomy 0.00 0.99 0.04
## SeatsPremium -0.10 0.72 0.02
## PitchEconomy -0.42 0.12 -0.78
## PitchPremium 0.42 0.11 0.95
## WidthEconomy -0.04 0.41 -0.13
## WidthPremium 0.50 0.09 0.76
## PriceEconomy -0.29 0.13 -0.10
## PricePremium 0.03 0.19 -0.02
## PriceRelative 1.00 -0.01 0.47
## SeatsTotal -0.01 1.00 0.03
## PitchDifference 0.47 0.03 1.00
## WidthDifference 0.49 -0.11 0.76
## PercentPremiumSeats -0.16 -0.22 -0.09
## PriceDifference 0.56 0.20 0.13
## WidthDifference PercentPremiumSeats PriceDifference
## SeatsEconomy -0.08 -0.33 0.17
## SeatsPremium -0.22 0.49 0.29
## PitchEconomy -0.64 -0.10 -0.13
## PitchPremium 0.70 -0.18 0.11
## WidthEconomy -0.39 0.23 0.22
## WidthPremium 0.88 -0.18 0.24
## PriceEconomy -0.08 0.07 0.30
## PricePremium -0.01 0.12 0.68
## PriceRelative 0.49 -0.16 0.56
## SeatsTotal -0.11 -0.22 0.20
## PitchDifference 0.76 -0.09 0.13
## WidthDifference 1.00 -0.28 0.12
## PercentPremiumSeats -0.28 1.00 0.15
## PriceDifference 0.12 0.15 1.00
VarianceCovariancematrix <- var(airlines[,6:19])
round(VarianceCovariancematrix, 2)
## SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy 5832.92 633.07 7.21 11.96
## SeatsPremium 633.07 175.87 -0.30 0.09
## PitchEconomy 7.21 -0.30 0.43 -0.47
## PitchPremium 11.96 0.09 -0.47 1.73
## WidthEconomy 15.91 3.37 0.11 -0.02
## WidthPremium 8.58 -0.04 -0.39 1.08
## PriceEconomy 9673.79 1489.38 238.70 65.43
## PricePremium 17413.25 3717.36 190.85 149.85
## PriceRelative 0.14 -0.58 -0.12 0.25
## SeatsTotal 6465.99 808.94 6.91 12.05
## PitchDifference 4.75 0.38 -0.90 2.20
## WidthDifference -7.33 -3.41 -0.50 1.10
## PercentPremiumSeats -122.39 31.15 -0.33 -1.12
## PriceDifference 7739.46 2227.98 -47.85 84.43
## WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy 15.91 8.58 9673.79 17413.25
## SeatsPremium 3.37 -0.04 1489.38 3717.36
## PitchEconomy 0.11 -0.39 238.70 190.85
## PitchPremium -0.02 1.08 65.43 149.85
## WidthEconomy 0.31 0.05 37.46 108.12
## WidthPremium 0.05 1.20 -61.85 90.48
## PriceEconomy 37.46 -61.85 976684.06 1147494.77
## PricePremium 108.12 90.48 1147494.77 1659293.12
## PriceRelative -0.01 0.25 -128.50 18.48
## SeatsTotal 19.28 8.54 11163.18 21130.62
## PitchDifference -0.12 1.47 -173.28 -41.00
## WidthDifference -0.26 1.15 -99.32 -17.64
## PercentPremiumSeats 0.61 -0.97 312.61 726.02
## PriceDifference 70.66 152.33 170810.71 511798.35
## PriceRelative SeatsTotal PitchDifference
## SeatsEconomy 0.14 6465.99 4.75
## SeatsPremium -0.58 808.94 0.38
## PitchEconomy -0.12 6.91 -0.90
## PitchPremium 0.25 12.05 2.20
## WidthEconomy -0.01 19.28 -0.12
## WidthPremium 0.25 8.54 1.47
## PriceEconomy -128.50 11163.18 -173.28
## PricePremium 18.48 21130.62 -41.00
## PriceRelative 0.20 -0.44 0.37
## SeatsTotal -0.44 7274.92 5.13
## PitchDifference 0.37 5.13 3.10
## WidthDifference 0.26 -10.74 1.59
## PercentPremiumSeats -0.35 -91.24 -0.79
## PriceDifference 146.98 9967.44 132.28
## WidthDifference PercentPremiumSeats PriceDifference
## SeatsEconomy -7.33 -122.39 7739.46
## SeatsPremium -3.41 31.15 2227.98
## PitchEconomy -0.50 -0.33 -47.85
## PitchPremium 1.10 -1.12 84.43
## WidthEconomy -0.26 0.61 70.66
## WidthPremium 1.15 -0.97 152.33
## PriceEconomy -99.32 312.61 170810.71
## PricePremium -17.64 726.02 511798.35
## PriceRelative 0.26 -0.35 146.98
## SeatsTotal -10.74 -91.24 9967.44
## PitchDifference 1.59 -0.79 132.28
## WidthDifference 1.41 -1.59 81.68
## PercentPremiumSeats -1.59 23.45 413.41
## PriceDifference 81.68 413.41 340987.65
H0 : There is no correlation between “Price Difference of Premium Economy and Economy airline seat tickets” and “The variables present in the data provided”. H1 : Alternate Hypothesis i.e. Yes, there is a correlation between the above mentioned variables.
newairlines <- airlines
newairlines$Airline <- as.numeric(airlines$Airline)
newairlines$Aircraft <- as.numeric(airlines$Aircraft)
newairlines$IsInternational <- as.numeric(airlines$IsInternational)
newairlines$TravelMonth <- as.numeric(airlines$TravelMonth)
newairlines <- newairlines[order(newairlines$Airline),]
cor.test(newairlines$PriceDifference, newairlines$Airline)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$Airline
## t = 4.581, df = 456, p-value = 5.98e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1204415 0.2956973
## sample estimates:
## cor
## 0.2097535
cor.test(newairlines$PriceDifference, newairlines$Aircraft)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$Aircraft
## t = 0.47848, df = 456, p-value = 0.6325
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.06936787 0.11379457
## sample estimates:
## cor
## 0.02240132
cor.test(newairlines$PriceDifference, newairlines$FlightDuration)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$FlightDuration
## t = 11.435, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3976578 0.5403379
## sample estimates:
## cor
## 0.4720837
cor.test(newairlines$PriceDifference, newairlines$TravelMonth)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$TravelMonth
## t = 0.15559, df = 456, p-value = 0.8764
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08439705 0.09884693
## sample estimates:
## cor
## 0.007286108
cor.test(newairlines$PriceDifference, newairlines$IsInternational)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$IsInternational
## t = 5.7328, df = 456, p-value = 1.799e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1717354 0.3427659
## sample estimates:
## cor
## 0.2592822
cor.test(newairlines$PriceDifference, newairlines$SeatsEconomy)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$SeatsEconomy
## t = 3.7629, df = 456, p-value = 0.0001899
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08323627 0.26101599
## sample estimates:
## cor
## 0.1735396
cor.test(newairlines$PriceDifference, newairlines$SeatsPremium)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$SeatsPremium
## t = 6.415, df = 456, p-value = 3.53e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2013903 0.3695918
## sample estimates:
## cor
## 0.2877081
cor.test(newairlines$PriceDifference, newairlines$PitchEconomy)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$PitchEconomy
## t = -2.692, df = 456, p-value = 0.007363
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.21424699 -0.03383647
## sample estimates:
## cor
## -0.1250755
cor.test(newairlines$PriceDifference, newairlines$PitchPremium)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$PitchPremium
## t = 2.3642, df = 456, p-value = 0.01849
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.01860038 0.19965344
## sample estimates:
## cor
## 0.1100397
cor.test(newairlines$PriceDifference, newairlines$PricePremium)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$PricePremium
## t = 19.826, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.627926 0.726726
## sample estimates:
## cor
## 0.6804058
cor.test(newairlines$PriceDifference, newairlines$PriceEconomy)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$PriceEconomy
## t = 6.617, df = 456, p-value = 1.031e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2100542 0.3773766
## sample estimates:
## cor
## 0.2959843
cor.test(newairlines$PriceDifference, newairlines$WidthPremium)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$WidthPremium
## t = 5.2272, df = 456, p-value = 2.625e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1493962 0.3223719
## sample estimates:
## cor
## 0.2377683
cor.test(newairlines$PriceDifference, newairlines$SeatsTotal)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$SeatsTotal
## t = 4.3617, df = 456, p-value = 1.597e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1105243 0.2864978
## sample estimates:
## cor
## 0.2001245
cor.test(newairlines$PriceDifference, newairlines$PitchDifference)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$PitchDifference
## t = 2.7688, df = 456, p-value = 0.005855
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.03739893 0.21764764
## sample estimates:
## cor
## 0.1285851
cor.test(newairlines$PriceDifference, newairlines$WidthEconomy)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$WidthEconomy
## t = 4.7477, df = 456, p-value = 2.759e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1279485 0.3026396
## sample estimates:
## cor
## 0.217031
cor.test(newairlines$PriceDifference, newairlines$WidthDifference)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$WidthDifference
## t = 2.5291, df = 456, p-value = 0.01177
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.02627012 0.20700978
## sample estimates:
## cor
## 0.1176138
cor.test(newairlines$PriceDifference, newairlines$PercentPremiumSeats)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$PercentPremiumSeats
## t = 3.1558, df = 456, p-value = 0.001706
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.05531205 0.23468103
## sample estimates:
## cor
## 0.1461979
cor.test(newairlines$PriceDifference, newairlines$PriceRelative)
##
## Pearson's product-moment correlation
##
## data: newairlines$PriceDifference and newairlines$PriceRelative
## t = 14.382, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4921938 0.6185916
## sample estimates:
## cor
## 0.5586276
model <- lm(PriceRelative~FlightDuration+IsInternational+PercentPremiumSeats+WidthDifference+PitchDifference, data=airlines)
summary(model)
##
## Call:
## lm(formula = PriceRelative ~ FlightDuration + IsInternational +
## PercentPremiumSeats + WidthDifference + PitchDifference,
## data = airlines)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.87450 -0.23846 -0.05599 0.15293 1.28664
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.339419 0.103974 -3.264 0.00118 **
## FlightDuration 0.040698 0.006243 6.519 1.90e-10 ***
## IsInternationalInternational -0.628346 0.125642 -5.001 8.17e-07 ***
## PercentPremiumSeats -0.006055 0.003799 -1.594 0.11167
## WidthDifference 0.077552 0.025268 3.069 0.00228 **
## PitchDifference 0.157558 0.025033 6.294 7.33e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3713 on 452 degrees of freedom
## Multiple R-squared: 0.3284, Adjusted R-squared: 0.321
## F-statistic: 44.21 on 5 and 452 DF, p-value: < 2.2e-16
Therefore, The important variables for this regression model are : FlightDuration, WidthDifference, PitchDifference, IsInternational and PercentPremiumSeats
Regression Model : PriceRelative = B0+B1(FlightDuration)+B2(ISInternational==1)+B3(PercentPremiumSeats)+B4(Widthifference)+B5(Pitchifference)
Coefficients :
coefficients(model)
## (Intercept) FlightDuration
## -0.339419222 0.040698044
## IsInternationalInternational PercentPremiumSeats
## -0.628345904 -0.006055119
## WidthDifference PitchDifference
## 0.077551688 0.157558138
So, the relative pricing depends on these factors(as their p values are less than 0.5). Width difference and pitch difference increase the relative price as well as the flight duration. It also depends on the type of Airline and whether it is international or not. Different pricing is done in different airlines.
The model is a good fit model as p value is very less than 0.05.