Mini Project to explain the factors for difference in price between an economy ticket and a premium-economy airline ticket

getwd()
## [1] "C:/Users/parvp/Desktop/data analytics internship"
airline.df<-read.csv(paste("SixAirlinesDataV2.csv",sep=""))
head(airline.df)
##   Airline Aircraft FlightDuration TravelMonth IsInternational SeatsEconomy
## 1 British   Boeing          12.25         Jul   International          122
## 2 British   Boeing          12.25         Aug   International          122
## 3 British   Boeing          12.25         Sep   International          122
## 4 British   Boeing          12.25         Oct   International          122
## 5 British   Boeing           8.16         Aug   International          122
## 6 British   Boeing           8.16         Sep   International          122
##   SeatsPremium PitchEconomy PitchPremium WidthEconomy WidthPremium
## 1           40           31           38           18           19
## 2           40           31           38           18           19
## 3           40           31           38           18           19
## 4           40           31           38           18           19
## 5           40           31           38           18           19
## 6           40           31           38           18           19
##   PriceEconomy PricePremium PriceRelative SeatsTotal PitchDifference
## 1         2707         3725          0.38        162               7
## 2         2707         3725          0.38        162               7
## 3         2707         3725          0.38        162               7
## 4         2707         3725          0.38        162               7
## 5         1793         2999          0.67        162               7
## 6         1793         2999          0.67        162               7
##   WidthDifference PercentPremiumSeats
## 1               1               24.69
## 2               1               24.69
## 3               1               24.69
## 4               1               24.69
## 5               1               24.69
## 6               1               24.69
library(psych)
## Warning: package 'psych' was built under R version 3.4.3
describe(airline.df)
##                     vars   n    mean      sd  median trimmed     mad   min
## Airline*               1 458    3.01    1.65    2.00    2.89    1.48  1.00
## Aircraft*              2 458    1.67    0.47    2.00    1.71    0.00  1.00
## FlightDuration         3 458    7.58    3.54    7.79    7.57    4.81  1.25
## TravelMonth*           4 458    2.56    1.17    3.00    2.58    1.48  1.00
## IsInternational*       5 458    1.91    0.28    2.00    2.00    0.00  1.00
## SeatsEconomy           6 458  202.31   76.37  185.00  194.64   85.99 78.00
## SeatsPremium           7 458   33.65   13.26   36.00   33.35   11.86  8.00
## PitchEconomy           8 458   31.22    0.66   31.00   31.26    0.00 30.00
## PitchPremium           9 458   37.91    1.31   38.00   38.05    0.00 34.00
## WidthEconomy          10 458   17.84    0.56   18.00   17.81    0.00 17.00
## WidthPremium          11 458   19.47    1.10   19.00   19.53    0.00 17.00
## PriceEconomy          12 458 1327.08  988.27 1242.00 1244.40 1159.39 65.00
## PricePremium          13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative         14 458    0.49    0.45    0.36    0.42    0.41  0.02
## SeatsTotal            15 458  235.96   85.29  227.00  228.73   90.44 98.00
## PitchDifference       16 458    6.69    1.76    7.00    6.76    0.00  2.00
## WidthDifference       17 458    1.63    1.19    1.00    1.53    0.00  0.00
## PercentPremiumSeats   18 458   14.65    4.84   13.21   14.31    2.68  4.71
##                         max   range  skew kurtosis    se
## Airline*               6.00    5.00  0.61    -0.95  0.08
## Aircraft*              2.00    1.00 -0.72    -1.48  0.02
## FlightDuration        14.66   13.41 -0.07    -1.12  0.17
## TravelMonth*           4.00    3.00 -0.14    -1.46  0.05
## IsInternational*       2.00    1.00 -2.91     6.50  0.01
## SeatsEconomy         389.00  311.00  0.72    -0.36  3.57
## SeatsPremium          66.00   58.00  0.23    -0.46  0.62
## PitchEconomy          33.00    3.00 -0.03    -0.35  0.03
## PitchPremium          40.00    6.00 -1.51     3.52  0.06
## WidthEconomy          19.00    2.00 -0.04    -0.08  0.03
## WidthPremium          21.00    4.00 -0.08    -0.31  0.05
## PriceEconomy        3593.00 3528.00  0.51    -0.88 46.18
## PricePremium        7414.00 7328.00  0.50     0.43 60.19
## PriceRelative          1.89    1.87  1.17     0.72  0.02
## SeatsTotal           441.00  343.00  0.70    -0.53  3.99
## PitchDifference       10.00    8.00 -0.54     1.78  0.08
## WidthDifference        4.00    4.00  0.84    -0.53  0.06
## PercentPremiumSeats   24.69   19.98  0.71     0.28  0.23

Mean Pricing for economy and premium in different airlines.

aggregate(x = airline.df[c("PriceEconomy","PricePremium")],by=list(Airline=airline.df$Airline),FUN = mean)
##     Airline PriceEconomy PricePremium
## 1 AirFrance    2769.7838    3065.2162
## 2   British    1293.4800    1937.0286
## 3     Delta     560.9348     684.6739
## 4       Jet     276.1639     483.3607
## 5 Singapore     860.2500    1239.9250
## 6    Virgin    1603.5323    2721.6935
boxplot(airline.df$PriceEconomy~airline.df$Airline, main="Airline Vs PriceEconomy",col = c("lightgreen","lightblue","peachpuff","gray","yellow","cyan"))

boxplot(airline.df$PricePremium~airline.df$Airline, main="Airline Vs PricePremium",col = c("lightgreen","lightblue","peachpuff","gray","yellow","cyan"))

boxplot(airline.df$PriceEconomy,airline.df$PricePremium,ylab="Prices", main="Economy vs Premium Pricing", col=c("yellow" , "lightgreen"), names=c("Economy","Premium"))

par(mfrow=c(1,2))
with(airline.df,plot(Aircraft,PriceEconomy,col=c("peachpuff","khaki"),main="Aircraft vs Economy Pricing"))
with(airline.df,plot(Aircraft,PricePremium,col=c("peachpuff","khaki"), main="Aircraft vs Premium Pricing"))

plot(airline.df$Airline, airline.df$SeatsEconomy, main="Airline vs Seats in Economy Class",col="sienna1")

plot(airline.df$Airline, airline.df$SeatsPremium, main="Airline vs Seats in Premium Class",col="sienna1")

plot(airline.df$IsInternational,main = "Number of Domestic vs International Flights",col="khaki")

Price variation of economy with flight duration

plot(airline.df$FlightDuration,airline.df$PriceEconomy,
     main="Flight duration vs Economy Price",
     xlab="Flight duration",
     ylab = "Economy Price")
abline(lm(airline.df$PriceEconomy~airline.df$FlightDuration),
       col="red")

Price variation of premium with flight duration

plot(airline.df$FlightDuration,airline.df$PricePremium,
     main="Flight duration vs Premium Price",
     xlab="flight duration",
     ylab="Premium Price")
abline(lm(airline.df$PricePremium~airline.df$FlightDuration),
       col="blue")

Increase is gradual for both classes along with the Duration of flight. However, Rate of increase for economy is more compared to premium.

attach(airline.df)
plot(WidthDifference,PriceRelative,main = "Analysis of width of Seats in Difference in price of class")
abline(lm(PriceRelative~WidthDifference),col="red")

plot(PitchDifference,PriceRelative,main = "Analysis of Pitch of Seats in Difference in price of class")
abline(lm(PriceRelative~PitchDifference),col="red")

library(corrgram)
## Warning: package 'corrgram' was built under R version 3.4.3
corrgram(airline.df, order=TRUE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of airline dataset")

Null Hypothesis : There is no significant difference between the price of tickets in economy class and premium economy class

t.test(PriceEconomy,PricePremium,var.equal=TRUE, paired=FALSE)
## 
##  Two Sample t-test
## 
## data:  PriceEconomy and PricePremium
## t = -6.8304, df = 914, p-value = 1.544e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -667.0699 -369.2926
## sample estimates:
## mean of x mean of y 
##  1327.076  1845.258

We have a p-value = 1.544e-11 which is less than 0.05, suggesting that there is a significant difference between PriceEconomy and PricePremium. Hence we reject our null hypothesis.

Regression model for showing the price differnce with respect to pitch difference and width difference

fit <- lm(PriceRelative ~ PitchDifference + WidthDifference + FlightDuration + PercentPremiumSeats, data = airline.df)
summary(fit)
## 
## Call:
## lm(formula = PriceRelative ~ PitchDifference + WidthDifference + 
##     FlightDuration + PercentPremiumSeats, data = airline.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.79439 -0.29424 -0.03427  0.16197  1.13688 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -0.179033   0.101492  -1.764  0.07840 .  
## PitchDifference      0.059311   0.015921   3.725  0.00022 ***
## WidthDifference      0.118140   0.024555   4.811 2.05e-06 ***
## FlightDuration       0.021707   0.005085   4.269 2.39e-05 ***
## PercentPremiumSeats -0.005999   0.003898  -1.539  0.12454    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.381 on 453 degrees of freedom
## Multiple R-squared:  0.2913, Adjusted R-squared:  0.285 
## F-statistic: 46.54 on 4 and 453 DF,  p-value: < 2.2e-16

PriceRelative = -0.179033 + (0.059311)x(PitchDifference) + (0.118140)x(WidthDifference) + (0.021707)x(FlightDuration)

There is a significant relation of width difference, pitch difference and flight duration on difference in the price of tickets as their p-values <0.05, while PercentPremiumSeats contributes negatively with p-value>0.05