Reading the dataset into R

airlines.df <- read.csv(paste("SixAirlinesDataV2.csv", sep=""))
attach(airlines.df)

Summarizing the data set

summary(airlines.df)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69

Comparing the rates of Economy and Premium seats

plot(airlines.df$PriceEconomy,airlines.df$PricePremium,xlab="Price of economy ticket",ylab = "Price of Premium Ticket",cex=0.6)
abline(0,1)

As all the points are scattered above the 45* line. It means that the Price of Premium is more than the Price of Economy tickets **********************************************************************************************************************

THE QUESTION IS; What factors explain the difference in price between an economy ticket and a premium-economy airline ticket?


Visualizing Factor 1:- Pitch Difference

boxplot(airlines.df$PriceRelative~airlines.df$PitchDifference,horizontal=TRUE,xlab="Relative Price between Economy and Premium Economy",ylab="Pitch Difference")

The relative price of the Premium Economy Seat and Economy Seat is maximum when the Pitch Difference is at MAX.

Visualizing Factor 2:- Width Difference

library(lattice)
histogram(airlines.df$PriceRelative~airlines.df$WidthDifference,horizintal=TRUE,col="grey",xlab="Width Difference")

The difference in seat width of Premium Economy and Economy seats are 0, 1, 2, 3, 4 inches and the most frequent Width Difference is 1 inch.

Visualizing Factor 3:- Percentage of Premium Seats

plot(airlines.df$PriceRelative~airlines.df$PercentPremiumSeats,col="red",ylab="Relative Price",xlab="Percentage of Premium Seats")

The Relative Price goes Down as the Percentage of Premium Seats in a Plane Increases.

Correlation Matrix :-

library(corrgram)
airlines2 <- c("PricePremium","PriceEconomy","PitchDifference","WidthDifference","SeatsTotal","PercentPremiumSeats")
colz <- colorRampPalette(c("darkkhaki","darkgreen","burlywood1"))
corrgram(airlines.df[,airlines2],order=TRUE,lower.panel=panel.shade,upper.panel=panel.pie,text.panel=panel.txt,main="Corrgram of Selected factors",col.regions = colz)

T-Tests to test Null hypothesis: The Price of Economy and Premium Economy are EQUAL

t.test(PriceEconomy,PricePremium)
## 
##  Welch Two Sample t-test
## 
## data:  PriceEconomy and PricePremium
## t = -6.8304, df = 856.56, p-value = 1.605e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -667.0831 -369.2793
## sample estimates:
## mean of x mean of y 
##  1327.076  1845.258

as the p-value is less than 0.05, we can reject the null hypothesis that the prices are equal.

Fitting Regression Model

model1 <- lm(PricePremium ~ PriceEconomy + PitchDifference + WidthDifference + PercentPremiumSeats + SeatsTotal + IsInternational + TravelMonth + FlightDuration + Aircraft,data=airlines.df)
summary(model1)
## 
## Call:
## lm(formula = PricePremium ~ PriceEconomy + PitchDifference + 
##     WidthDifference + PercentPremiumSeats + SeatsTotal + IsInternational + 
##     TravelMonth + FlightDuration + Aircraft, data = airlines.df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -977.2 -246.3  -47.9  135.2 3419.7 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -1.211e+03  1.755e+02  -6.898 1.82e-11 ***
## PriceEconomy                  1.064e+00  3.114e-02  34.175  < 2e-16 ***
## PitchDifference               8.510e+01  3.913e+01   2.175 0.030163 *  
## WidthDifference               1.240e+02  3.438e+01   3.607 0.000345 ***
## PercentPremiumSeats           3.177e+01  5.250e+00   6.052 3.04e-09 ***
## SeatsTotal                    1.925e+00  3.360e-01   5.729 1.87e-08 ***
## IsInternationalInternational -7.537e+02  2.135e+02  -3.530 0.000458 ***
## TravelMonthJul               -3.441e+01  7.074e+01  -0.486 0.626904    
## TravelMonthOct                2.692e+01  6.036e+01   0.446 0.655795    
## TravelMonthSep               -2.097e+00  6.015e+01  -0.035 0.972203    
## FlightDuration                8.455e+01  8.809e+00   9.598  < 2e-16 ***
## AircraftBoeing               -2.082e+00  5.651e+01  -0.037 0.970625    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 480.7 on 446 degrees of freedom
## Multiple R-squared:  0.8641, Adjusted R-squared:  0.8607 
## F-statistic: 257.7 on 11 and 446 DF,  p-value: < 2.2e-16

the p-value indicates that not all the factors taken into consideration above are relevant. Thus, trying a better fit model

model2 <- lm(PricePremium ~ PriceEconomy + PitchDifference + WidthDifference + PercentPremiumSeats + SeatsTotal + FlightDuration + IsInternational,data = airlines.df)
summary(model2)
## 
## Call:
## lm(formula = PricePremium ~ PriceEconomy + PitchDifference + 
##     WidthDifference + PercentPremiumSeats + SeatsTotal + FlightDuration + 
##     IsInternational, data = airlines.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1010.0  -258.4   -49.9   133.6  3416.7 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -1.213e+03  1.695e+02  -7.156 3.40e-12 ***
## PriceEconomy                  1.063e+00  3.077e-02  34.537  < 2e-16 ***
## PitchDifference               8.421e+01  3.656e+01   2.303 0.021722 *  
## WidthDifference               1.224e+02  3.373e+01   3.629 0.000318 ***
## PercentPremiumSeats           3.190e+01  5.220e+00   6.112 2.14e-09 ***
## SeatsTotal                    1.920e+00  3.241e-01   5.922 6.31e-09 ***
## FlightDuration                8.459e+01  8.507e+00   9.943  < 2e-16 ***
## IsInternationalInternational -7.412e+02  2.001e+02  -3.704 0.000238 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 479 on 450 degrees of freedom
## Multiple R-squared:  0.8638, Adjusted R-squared:  0.8617 
## F-statistic: 407.9 on 7 and 450 DF,  p-value: < 2.2e-16

** AS THE ADJUSTED R SQUARE VALUE OF MODEL 2 IS MORE, MODEL 2 IS BETTER FIT

Model 2 predicts the price of the premium economy seat as a function of the following explanatory variables:

“PriceEconomy” “PitchDifference” “WidthDifference” “PercentPremiumSeats” “SeatsTotal” “FlightDuration”

INFERENCES:-

The airfare for Premium Economy:-

  1. Increases as the difference in pitch of Premium Economy and Economy increases

  2. Increases as the difference in seat width of Premium Economy and Economy increases.

  3. Increases as the total number of seats increases

  4. Increases as the percentage of Premium Economy seats increases