1. Reading the Data

setwd("C:/Users/CJ With HP/Desktop/IIM Lucknow/Datasets")
airlines.df <- read.csv(paste("SixAirlinesDataV2.csv",sep = ""))
attach(airlines.df)

2. Displaying summary statistics

summary(airlines.df)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69

The dependent parameter on which we are going to work upon will be PriceRelative i.e. Relative difference in the price between Economy Tickets and Premium Tickets.

3. Creating a correlation Matrix

round(cor(airlines.df[,c(3,6:18)]),2)
##                     FlightDuration SeatsEconomy SeatsPremium PitchEconomy
## FlightDuration                1.00         0.20         0.16         0.29
## SeatsEconomy                  0.20         1.00         0.63         0.14
## SeatsPremium                  0.16         0.63         1.00        -0.03
## PitchEconomy                  0.29         0.14        -0.03         1.00
## PitchPremium                  0.10         0.12         0.00        -0.55
## WidthEconomy                  0.46         0.37         0.46         0.29
## WidthPremium                  0.10         0.10         0.00        -0.54
## PriceEconomy                  0.57         0.13         0.11         0.37
## PricePremium                  0.65         0.18         0.22         0.23
## PriceRelative                 0.12         0.00        -0.10        -0.42
## SeatsTotal                    0.20         0.99         0.72         0.12
## PitchDifference              -0.04         0.04         0.02        -0.78
## WidthDifference              -0.12        -0.08        -0.22        -0.64
## PercentPremiumSeats           0.06        -0.33         0.49        -0.10
##                     PitchPremium WidthEconomy WidthPremium PriceEconomy
## FlightDuration              0.10         0.46         0.10         0.57
## SeatsEconomy                0.12         0.37         0.10         0.13
## SeatsPremium                0.00         0.46         0.00         0.11
## PitchEconomy               -0.55         0.29        -0.54         0.37
## PitchPremium                1.00        -0.02         0.75         0.05
## WidthEconomy               -0.02         1.00         0.08         0.07
## WidthPremium                0.75         0.08         1.00        -0.06
## PriceEconomy                0.05         0.07        -0.06         1.00
## PricePremium                0.09         0.15         0.06         0.90
## PriceRelative               0.42        -0.04         0.50        -0.29
## SeatsTotal                  0.11         0.41         0.09         0.13
## PitchDifference             0.95        -0.13         0.76        -0.10
## WidthDifference             0.70        -0.39         0.88        -0.08
## PercentPremiumSeats        -0.18         0.23        -0.18         0.07
##                     PricePremium PriceRelative SeatsTotal PitchDifference
## FlightDuration              0.65          0.12       0.20           -0.04
## SeatsEconomy                0.18          0.00       0.99            0.04
## SeatsPremium                0.22         -0.10       0.72            0.02
## PitchEconomy                0.23         -0.42       0.12           -0.78
## PitchPremium                0.09          0.42       0.11            0.95
## WidthEconomy                0.15         -0.04       0.41           -0.13
## WidthPremium                0.06          0.50       0.09            0.76
## PriceEconomy                0.90         -0.29       0.13           -0.10
## PricePremium                1.00          0.03       0.19           -0.02
## PriceRelative               0.03          1.00      -0.01            0.47
## SeatsTotal                  0.19         -0.01       1.00            0.03
## PitchDifference            -0.02          0.47       0.03            1.00
## WidthDifference            -0.01          0.49      -0.11            0.76
## PercentPremiumSeats         0.12         -0.16      -0.22           -0.09
##                     WidthDifference PercentPremiumSeats
## FlightDuration                -0.12                0.06
## SeatsEconomy                  -0.08               -0.33
## SeatsPremium                  -0.22                0.49
## PitchEconomy                  -0.64               -0.10
## PitchPremium                   0.70               -0.18
## WidthEconomy                  -0.39                0.23
## WidthPremium                   0.88               -0.18
## PriceEconomy                  -0.08                0.07
## PricePremium                  -0.01                0.12
## PriceRelative                  0.49               -0.16
## SeatsTotal                    -0.11               -0.22
## PitchDifference                0.76               -0.09
## WidthDifference                1.00               -0.28
## PercentPremiumSeats           -0.28                1.00

We would be using boxplots and aggregate method to govern the relationship between PriceRelative and Categorical Variables.

4.Governing relation between PriceRelative and IsInternational

boxplot(PriceRelative~IsInternational)

aggregate(PriceRelative,by=list(IsInternational),mean)
##         Group.1         x
## 1      Domestic 0.0847500
## 2 International 0.5257177

Conclusion: The relative price is 52% higher in case of International flights compared to only 8% higher in Domestic flights.

5.Governing relation between PriceRelative and Airline

boxplot(PriceRelative~Airline)

aggregate(PriceRelative,by=list(Airline),mean)
##     Group.1         x
## 1 AirFrance 0.2047297
## 2   British 0.4375429
## 3     Delta 0.1250000
## 4       Jet 0.9396721
## 5 Singapore 0.5297500
## 6    Virgin 0.7606452

Conclusion: Jet Airwarys charge the highest relative price i.e. nearly 93% followed by Virgin and Singapore Airlines.

6.Governing relation between PriceRelative and Aircraft

boxplot(PriceRelative~Aircraft)

aggregate(PriceRelative,by=list(Aircraft),mean)
##   Group.1         x
## 1  AirBus 0.4147682
## 2  Boeing 0.5228339

Conclusion: Boeing Aircraft has a higher relative price i.e. 52.28% compared to Airbus 41%.

7.Governing relation between PriceRelative and TravelMonth

boxplot(PriceRelative~TravelMonth)

aggregate(PriceRelative,by=list(TravelMonth,Airline),mean)
##    Group.1   Group.2          x
## 1      Aug AirFrance 0.16600000
## 2      Jul AirFrance 0.08583333
## 3      Oct AirFrance 0.27700000
## 4      Sep AirFrance 0.23909091
## 5      Aug   British 0.42884615
## 6      Jul   British 0.40875000
## 7      Oct   British 0.47584906
## 8      Sep   British 0.41685185
## 9      Aug     Delta 0.14333333
## 10     Jul     Delta 0.06400000
## 11     Oct     Delta 0.14076923
## 12     Sep     Delta 0.14181818
## 13     Aug       Jet 0.94500000
## 14     Jul       Jet 0.91000000
## 15     Oct       Jet 1.08666667
## 16     Sep       Jet 0.81666667
## 17     Aug Singapore 0.53090909
## 18     Jul Singapore 0.58875000
## 19     Oct Singapore 0.50800000
## 20     Sep Singapore 0.50545455
## 21     Aug    Virgin 0.76437500
## 22     Jul    Virgin 0.77357143
## 23     Oct    Virgin 0.76062500
## 24     Sep    Virgin 0.74562500

Conclusion: The relative price is nearly same across each month for each Airline, with the rates in October being a little higher.

Now,we would be comparing Relative Price with continous varibales.

8. Governing relation between PriceRelative and WidthDifference

boxplot(PriceRelative~WidthDifference)

cor.test(PriceRelative,WidthDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  PriceRelative and WidthDifference
## t = 11.869, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4125388 0.5528218
## sample estimates:
##       cor 
## 0.4858024

9. Governing relation between PriceRelative and PitchDifference

boxplot(PriceRelative~PitchDifference)

cor.test(PriceRelative,PitchDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  PriceRelative and PitchDifference
## t = 11.331, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3940262 0.5372817
## sample estimates:
##       cor 
## 0.4687302

Conclusion: Width Difference and Pitch Difference are highly correlated with Relative Price.

10. Running Regression to check the Hypothesis whether Relative Price is statistically determined by the independent Variables or not.

fit<- lm(PriceRelative~SeatsPremium + PercentPremiumSeats + WidthDifference + PitchDifference + FlightDuration)
summary(fit)
## 
## Call:
## lm(formula = PriceRelative ~ SeatsPremium + PercentPremiumSeats + 
##     WidthDifference + PitchDifference + FlightDuration)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.81065 -0.30092 -0.04026  0.15425  1.13516 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -0.172716   0.101757  -1.697 0.090322 .  
## SeatsPremium        -0.001436   0.001600  -0.897 0.370012    
## PercentPremiumSeats -0.004351   0.004310  -1.009 0.313316    
## WidthDifference      0.113092   0.025197   4.488 9.12e-06 ***
## PitchDifference      0.062545   0.016327   3.831 0.000146 ***
## FlightDuration       0.022297   0.005128   4.348 1.70e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3811 on 452 degrees of freedom
## Multiple R-squared:  0.2925, Adjusted R-squared:  0.2847 
## F-statistic: 37.38 on 5 and 452 DF,  p-value: < 2.2e-16

Conclusion: The relative Price is strongly determined by the three continous variables namely WidthDifference, Pitch Difference and Flight Duration, and the variables are positively correlated to the Relative Price. Howerver number of premium seats or percentage of premium seats are not statically significant in the determination of Relative Price.

Therefore Answer to the Question:

What factors explain the difference in price between an economy ticket and a premium-economy airline ticket? is:

The type of Airline, Aircraft, the month of travel, the domestic or the international travel are the categorical factors which determine the difference in price between an economy ticket and a premium-economy airline ticket. Similary, The relative Price is strongly determined by the three continous variables namely WidthDifference, Pitch Difference and Flight Duration.