Reading the data

air <- read.csv("SixAirlinesDataV2.csv")
View(air)

Summarizing the dataset

library(psych)
describe(air)
##                     vars   n    mean      sd  median trimmed     mad   min
## Airline*               1 458    3.01    1.65    2.00    2.89    1.48  1.00
## Aircraft*              2 458    1.67    0.47    2.00    1.71    0.00  1.00
## FlightDuration         3 458    7.58    3.54    7.79    7.57    4.81  1.25
## TravelMonth*           4 458    2.56    1.17    3.00    2.58    1.48  1.00
## IsInternational*       5 458    1.91    0.28    2.00    2.00    0.00  1.00
## SeatsEconomy           6 458  202.31   76.37  185.00  194.64   85.99 78.00
## SeatsPremium           7 458   33.65   13.26   36.00   33.35   11.86  8.00
## PitchEconomy           8 458   31.22    0.66   31.00   31.26    0.00 30.00
## PitchPremium           9 458   37.91    1.31   38.00   38.05    0.00 34.00
## WidthEconomy          10 458   17.84    0.56   18.00   17.81    0.00 17.00
## WidthPremium          11 458   19.47    1.10   19.00   19.53    0.00 17.00
## PriceEconomy          12 458 1327.08  988.27 1242.00 1244.40 1159.39 65.00
## PricePremium          13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative         14 458    0.49    0.45    0.36    0.42    0.41  0.02
## SeatsTotal            15 458  235.96   85.29  227.00  228.73   90.44 98.00
## PitchDifference       16 458    6.69    1.76    7.00    6.76    0.00  2.00
## WidthDifference       17 458    1.63    1.19    1.00    1.53    0.00  0.00
## PercentPremiumSeats   18 458   14.65    4.84   13.21   14.31    2.68  4.71
##                         max   range  skew kurtosis    se
## Airline*               6.00    5.00  0.61    -0.95  0.08
## Aircraft*              2.00    1.00 -0.72    -1.48  0.02
## FlightDuration        14.66   13.41 -0.07    -1.12  0.17
## TravelMonth*           4.00    3.00 -0.14    -1.46  0.05
## IsInternational*       2.00    1.00 -2.91     6.50  0.01
## SeatsEconomy         389.00  311.00  0.72    -0.36  3.57
## SeatsPremium          66.00   58.00  0.23    -0.46  0.62
## PitchEconomy          33.00    3.00 -0.03    -0.35  0.03
## PitchPremium          40.00    6.00 -1.51     3.52  0.06
## WidthEconomy          19.00    2.00 -0.04    -0.08  0.03
## WidthPremium          21.00    4.00 -0.08    -0.31  0.05
## PriceEconomy        3593.00 3528.00  0.51    -0.88 46.18
## PricePremium        7414.00 7328.00  0.50     0.43 60.19
## PriceRelative          1.89    1.87  1.17     0.72  0.02
## SeatsTotal           441.00  343.00  0.70    -0.53  3.99
## PitchDifference       10.00    8.00 -0.54     1.78  0.08
## WidthDifference        4.00    4.00  0.84    -0.53  0.06
## PercentPremiumSeats   24.69   19.98  0.71     0.28  0.23
str(air)
## 'data.frame':    458 obs. of  18 variables:
##  $ Airline            : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Aircraft           : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
##  $ FlightDuration     : num  12.25 12.25 12.25 12.25 8.16 ...
##  $ TravelMonth        : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
##  $ IsInternational    : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
##  $ SeatsEconomy       : int  122 122 122 122 122 122 122 122 122 122 ...
##  $ SeatsPremium       : int  40 40 40 40 40 40 40 40 40 40 ...
##  $ PitchEconomy       : int  31 31 31 31 31 31 31 31 31 31 ...
##  $ PitchPremium       : int  38 38 38 38 38 38 38 38 38 38 ...
##  $ WidthEconomy       : int  18 18 18 18 18 18 18 18 18 18 ...
##  $ WidthPremium       : int  19 19 19 19 19 19 19 19 19 19 ...
##  $ PriceEconomy       : int  2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
##  $ PricePremium       : int  3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
##  $ PriceRelative      : num  0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
##  $ SeatsTotal         : int  162 162 162 162 162 162 162 162 162 162 ...
##  $ PitchDifference    : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ WidthDifference    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PercentPremiumSeats: num  24.7 24.7 24.7 24.7 24.7 ...

Visualizing the data individually

par(mfrow = c(1,3))
barplot(air$SeatsEconomy)
boxplot(air$SeatsEconomy)
hist(air$SeatsEconomy)

barplot(air$SeatsPremium)
boxplot(air$SeatsPremium)
hist(air$SeatsPremium)

barplot(air$PitchEconomy)
boxplot(air$PitchEconomy)
hist(air$PitchEconomy)

barplot(air$PitchPremium)
boxplot(air$PitchPremium)
hist(air$PitchPremium)

barplot(air$WidthEconomy)
boxplot(air$WidthEconomy)
hist(air$WidthEconomy)

barplot(air$WidthPremium)
boxplot(air$WidthPremium)
hist(air$WidthPremium)

barplot(air$PriceEconomy)
boxplot(air$PriceEconomy)
hist(air$PriceEconomy)

barplot(air$PricePremium)
boxplot(air$PricePremium)
hist(air$PricePremium)

barplot(air$PriceRelative)
boxplot(air$PriceRelative)
hist(air$PriceRelative)

par(mfrow = c(1,1))

Understanding the regular gap intervals in the data of PriceRelative

par(mfrow = c(1,3))
boxplot(air$PriceRelative ~ air$TravelMonth)
boxplot(air$PriceRelative ~ air$Airline)
boxplot(air$PriceRelative ~ air$Aircraft)

par(mfrow = c(1,1))
table(air$Airline)
## 
## AirFrance   British     Delta       Jet Singapore    Virgin 
##        74       175        46        61        40        62

We can clearly see here that the relative difference between the prices of premium and economy class of the 2 airlines i.e. AirFrance and Delta airlines are very low hence their are hedges in the data of barplot whereever the data of these 2 airlines are given.

Visulaizing scatterplot matrix between different components

attach(air)
pairs(formula = ~ PricePremium + PercentPremiumSeats + Aircraft + Airline)

pairs(formula = ~ PricePremium + PitchPremium + WidthPremium)

pairs(formula = ~ PriceRelative + WidthDifference + PitchDifference)

detach(air)

Building a correlation matrix between components

round(cor(air[,6:18]),2)
##                     SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy                1.00         0.63         0.14         0.12
## SeatsPremium                0.63         1.00        -0.03         0.00
## PitchEconomy                0.14        -0.03         1.00        -0.55
## PitchPremium                0.12         0.00        -0.55         1.00
## WidthEconomy                0.37         0.46         0.29        -0.02
## WidthPremium                0.10         0.00        -0.54         0.75
## PriceEconomy                0.13         0.11         0.37         0.05
## PricePremium                0.18         0.22         0.23         0.09
## PriceRelative               0.00        -0.10        -0.42         0.42
## SeatsTotal                  0.99         0.72         0.12         0.11
## PitchDifference             0.04         0.02        -0.78         0.95
## WidthDifference            -0.08        -0.22        -0.64         0.70
## PercentPremiumSeats        -0.33         0.49        -0.10        -0.18
##                     WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy                0.37         0.10         0.13         0.18
## SeatsPremium                0.46         0.00         0.11         0.22
## PitchEconomy                0.29        -0.54         0.37         0.23
## PitchPremium               -0.02         0.75         0.05         0.09
## WidthEconomy                1.00         0.08         0.07         0.15
## WidthPremium                0.08         1.00        -0.06         0.06
## PriceEconomy                0.07        -0.06         1.00         0.90
## PricePremium                0.15         0.06         0.90         1.00
## PriceRelative              -0.04         0.50        -0.29         0.03
## SeatsTotal                  0.41         0.09         0.13         0.19
## PitchDifference            -0.13         0.76        -0.10        -0.02
## WidthDifference            -0.39         0.88        -0.08        -0.01
## PercentPremiumSeats         0.23        -0.18         0.07         0.12
##                     PriceRelative SeatsTotal PitchDifference
## SeatsEconomy                 0.00       0.99            0.04
## SeatsPremium                -0.10       0.72            0.02
## PitchEconomy                -0.42       0.12           -0.78
## PitchPremium                 0.42       0.11            0.95
## WidthEconomy                -0.04       0.41           -0.13
## WidthPremium                 0.50       0.09            0.76
## PriceEconomy                -0.29       0.13           -0.10
## PricePremium                 0.03       0.19           -0.02
## PriceRelative                1.00      -0.01            0.47
## SeatsTotal                  -0.01       1.00            0.03
## PitchDifference              0.47       0.03            1.00
## WidthDifference              0.49      -0.11            0.76
## PercentPremiumSeats         -0.16      -0.22           -0.09
##                     WidthDifference PercentPremiumSeats
## SeatsEconomy                  -0.08               -0.33
## SeatsPremium                  -0.22                0.49
## PitchEconomy                  -0.64               -0.10
## PitchPremium                   0.70               -0.18
## WidthEconomy                  -0.39                0.23
## WidthPremium                   0.88               -0.18
## PriceEconomy                  -0.08                0.07
## PricePremium                  -0.01                0.12
## PriceRelative                  0.49               -0.16
## SeatsTotal                    -0.11               -0.22
## PitchDifference                0.76               -0.09
## WidthDifference                1.00               -0.28
## PercentPremiumSeats           -0.28                1.00

Visualizing the correlation bewteen the components

library(corrgram)
corrgram(air, upper.panel = panel.pie)

Insights

We can see here some obvious correlations such as 1. Total number of seats is very highly correlated to economy class seats since the percentage of economy class seats in total seats is almost 90%.

  1. Economy and premium price higly correlated positively with flight duration which is obvious.

  2. width and pitch are higly correlated because of the comfort provided in premium class guarantee it.

  3. pitch economy and pitch difference are higly negatively correlated and it is obvious that pitch in economy class increases then the pitch difference between the two class will decrease and vice versa.

  4. percent premium seats is also highly negatively correlated with economy seats wchich is also obvious, the nore economy seats the less percent of premium seats.

Main Research question

Factors affecting the price difference between premium and economy class

Lets check the correlation of relative price with other factors

cor(air$PriceRelative, air[,c(6:18)])
##      SeatsEconomy SeatsPremium PitchEconomy PitchPremium WidthEconomy
## [1,]  0.003956939  -0.09719601    -0.423022    0.4175391  -0.04396116
##      WidthPremium PriceEconomy PricePremium PriceRelative  SeatsTotal
## [1,]    0.5042476   -0.2885671   0.03184654             1 -0.01156894
##      PitchDifference WidthDifference PercentPremiumSeats
## [1,]       0.4687302       0.4858024          -0.1615656

We can see here the correlation of Relative prices between the two classes and how are they related to each of the factors.

Hypothesis testing

on the basis of correlations obtained we can start testing the hypothesis that these correlations are significant or not.

testing Pricerelative for pitch difference where null hypothesis is: Thier is no significant correlation (i.e. correlation between them is zero)

cor.test(air$PriceRelative, air$PitchDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  air$PriceRelative and air$PitchDifference
## t = 11.331, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3940262 0.5372817
## sample estimates:
##       cor 
## 0.4687302

Here we can see the same correlation as obtained previously = 0.468 and we can also see that the prob value < 0.05, We can reject the Null hypothesis and accept the alternate hypothesis which is, Their exixt a significant relation between relative price and pitch difference.

Testing all the factors simultaneoulsy

library(psych)
corr.test(air[,c(6:18)])
## Call:corr.test(x = air[, c(6:18)])
## Correlation matrix 
##                     SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy                1.00         0.63         0.14         0.12
## SeatsPremium                0.63         1.00        -0.03         0.00
## PitchEconomy                0.14        -0.03         1.00        -0.55
## PitchPremium                0.12         0.00        -0.55         1.00
## WidthEconomy                0.37         0.46         0.29        -0.02
## WidthPremium                0.10         0.00        -0.54         0.75
## PriceEconomy                0.13         0.11         0.37         0.05
## PricePremium                0.18         0.22         0.23         0.09
## PriceRelative               0.00        -0.10        -0.42         0.42
## SeatsTotal                  0.99         0.72         0.12         0.11
## PitchDifference             0.04         0.02        -0.78         0.95
## WidthDifference            -0.08        -0.22        -0.64         0.70
## PercentPremiumSeats        -0.33         0.49        -0.10        -0.18
##                     WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy                0.37         0.10         0.13         0.18
## SeatsPremium                0.46         0.00         0.11         0.22
## PitchEconomy                0.29        -0.54         0.37         0.23
## PitchPremium               -0.02         0.75         0.05         0.09
## WidthEconomy                1.00         0.08         0.07         0.15
## WidthPremium                0.08         1.00        -0.06         0.06
## PriceEconomy                0.07        -0.06         1.00         0.90
## PricePremium                0.15         0.06         0.90         1.00
## PriceRelative              -0.04         0.50        -0.29         0.03
## SeatsTotal                  0.41         0.09         0.13         0.19
## PitchDifference            -0.13         0.76        -0.10        -0.02
## WidthDifference            -0.39         0.88        -0.08        -0.01
## PercentPremiumSeats         0.23        -0.18         0.07         0.12
##                     PriceRelative SeatsTotal PitchDifference
## SeatsEconomy                 0.00       0.99            0.04
## SeatsPremium                -0.10       0.72            0.02
## PitchEconomy                -0.42       0.12           -0.78
## PitchPremium                 0.42       0.11            0.95
## WidthEconomy                -0.04       0.41           -0.13
## WidthPremium                 0.50       0.09            0.76
## PriceEconomy                -0.29       0.13           -0.10
## PricePremium                 0.03       0.19           -0.02
## PriceRelative                1.00      -0.01            0.47
## SeatsTotal                  -0.01       1.00            0.03
## PitchDifference              0.47       0.03            1.00
## WidthDifference              0.49      -0.11            0.76
## PercentPremiumSeats         -0.16      -0.22           -0.09
##                     WidthDifference PercentPremiumSeats
## SeatsEconomy                  -0.08               -0.33
## SeatsPremium                  -0.22                0.49
## PitchEconomy                  -0.64               -0.10
## PitchPremium                   0.70               -0.18
## WidthEconomy                  -0.39                0.23
## WidthPremium                   0.88               -0.18
## PriceEconomy                  -0.08                0.07
## PricePremium                  -0.01                0.12
## PriceRelative                  0.49               -0.16
## SeatsTotal                    -0.11               -0.22
## PitchDifference                0.76               -0.09
## WidthDifference                1.00               -0.28
## PercentPremiumSeats           -0.28                1.00
## Sample Size 
## [1] 458
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##                     SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy                0.00         0.00         0.08         0.35
## SeatsPremium                0.00         0.00         1.00         1.00
## PitchEconomy                0.00         0.47         0.00         0.00
## PitchPremium                0.01         0.92         0.00         0.00
## WidthEconomy                0.00         0.00         0.00         0.61
## WidthPremium                0.03         0.95         0.00         0.00
## PriceEconomy                0.01         0.01         0.00         0.28
## PricePremium                0.00         0.00         0.00         0.06
## PriceRelative               0.93         0.04         0.00         0.00
## SeatsTotal                  0.00         0.00         0.01         0.02
## PitchDifference             0.45         0.73         0.00         0.00
## WidthDifference             0.08         0.00         0.00         0.00
## PercentPremiumSeats         0.00         0.00         0.03         0.00
##                     WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy                0.00         0.78         0.22         0.01
## SeatsPremium                0.00         1.00         0.46         0.00
## PitchEconomy                0.00         0.00         0.00         0.00
## PitchPremium                1.00         0.00         1.00         1.00
## WidthEconomy                0.00         1.00         1.00         0.05
## WidthPremium                0.08         0.00         1.00         1.00
## PriceEconomy                0.15         0.22         0.00         0.00
## PricePremium                0.00         0.17         0.00         0.00
## PriceRelative               0.35         0.00         0.00         0.50
## SeatsTotal                  0.00         0.05         0.00         0.00
## PitchDifference             0.01         0.00         0.03         0.70
## WidthDifference             0.00         0.00         0.07         0.81
## PercentPremiumSeats         0.00         0.00         0.16         0.01
##                     PriceRelative SeatsTotal PitchDifference
## SeatsEconomy                 1.00       0.00            1.00
## SeatsPremium                 0.94       0.00            1.00
## PitchEconomy                 0.00       0.27            0.00
## PitchPremium                 0.00       0.64            0.00
## WidthEconomy                 1.00       0.00            0.22
## WidthPremium                 0.00       1.00            0.00
## PriceEconomy                 0.00       0.17            0.86
## PricePremium                 1.00       0.00            1.00
## PriceRelative                0.00       1.00            0.00
## SeatsTotal                   0.80       0.00            1.00
## PitchDifference              0.00       0.47            0.00
## WidthDifference              0.00       0.02            0.00
## PercentPremiumSeats          0.00       0.00            0.05
##                     WidthDifference PercentPremiumSeats
## SeatsEconomy                   1.00                0.00
## SeatsPremium                   0.00                0.00
## PitchEconomy                   0.00                0.78
## PitchPremium                   0.00                0.01
## WidthEconomy                   0.00                0.00
## WidthPremium                   0.00                0.00
## PriceEconomy                   1.00                1.00
## PricePremium                   1.00                0.41
## PriceRelative                  0.00                0.02
## SeatsTotal                     0.68                0.00
## PitchDifference                0.00                1.00
## WidthDifference                0.00                0.00
## PercentPremiumSeats            0.00                0.00
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

By this test, If we focus onProbability values (p-values) of priceRelative we can see that our null hypothesis gets rejected where p < 0.05 means their is a significant correlation between variables. By this test we can conclude that Pricerelative is effected mainly by - PitchEconomy, PitchPremium, WidthPremium, PriceEconomy, PitchDifference, WidthDifference and PercentPremiumSeats

Creating a linear model

Lets analyse how are these variales effecting our dependent variable PriceRelative. in this analysis we can learn how much units our pricerelative changes when one or more than one affecting variable is changed 1 unit

According to the theory of linear model our dependent variable (y) here is PriceRelative, and our independent variables(x1, x2, x3, …) are - PitchEconomy, PitchPremium, WidthPremium, PriceEconomy, PitchDifference, WidthDifference and PercentPremiumSeats

and our mathematical model will be

y = B0 + B1x1 + B2x2 + B3x3 + B4x4 …. + e

where B0 is the intercept

m1 <- lm(PriceRelative ~ PitchEconomy + PitchPremium + WidthPremium + PriceEconomy + PitchDifference + WidthDifference + PercentPremiumSeats, data = air)
summary(m1)
## 
## Call:
## lm(formula = PriceRelative ~ PitchEconomy + PitchPremium + WidthPremium + 
##     PriceEconomy + PitchDifference + WidthDifference + PercentPremiumSeats, 
##     data = air)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90093 -0.22133 -0.02915  0.15791  1.16165 
## 
## Coefficients: (1 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -1.102e+00  1.752e+00  -0.629 0.529437    
## PitchEconomy        -6.810e-02  4.511e-02  -1.510 0.131826    
## PitchPremium         3.359e-02  2.192e-02   1.533 0.126056    
## WidthPremium         1.371e-01  3.827e-02   3.583 0.000377 ***
## PriceEconomy        -1.056e-04  2.085e-05  -5.064 5.99e-07 ***
## PitchDifference             NA         NA      NA       NA    
## WidthDifference      7.238e-03  3.769e-02   0.192 0.847790    
## PercentPremiumSeats -6.789e-03  4.267e-03  -1.591 0.112312    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3688 on 451 degrees of freedom
## Multiple R-squared:  0.339,  Adjusted R-squared:  0.3302 
## F-statistic: 38.56 on 6 and 451 DF,  p-value: < 2.2e-16
coefficients(m1)
##         (Intercept)        PitchEconomy        PitchPremium 
##       -1.1023980556       -0.0680970092        0.0335887397 
##        WidthPremium        PriceEconomy     PitchDifference 
##        0.1371215598       -0.0001055815                  NA 
##     WidthDifference PercentPremiumSeats 
##        0.0072383702       -0.0067888054

We can see here that only the 3 variables are statistically significant in the computation of relative price which are WidthPremium, PriceEconomy and PitchDifference. and the data of PitchDifference cant be obtained in this model due to the vast difference in values effected by presence of other variables. So We will test by creating another model named m2 where dependent variable (y) = PriceRelative and indipendent variables (x1, x2, x3) are PitchDifference, WidthPremium and PriceEconomy

therefore model becomes

y = B0 + B1x1 B2x2 + B3x3 + e

m2 <- lm(air$PriceRelative ~ air$PitchDifference + air$WidthPremium + air$PriceEconomy)
summary(m2)
## 
## Call:
## lm(formula = air$PriceRelative ~ air$PitchDifference + air$WidthPremium + 
##     air$PriceEconomy)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83079 -0.23877 -0.05586  0.14382  1.17478 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.541e+00  4.009e-01  -6.337 5.66e-10 ***
## air$PitchDifference  4.321e-02  1.513e-02   2.856  0.00449 ** 
## air$WidthPremium     1.485e-01  2.422e-02   6.130 1.91e-09 ***
## air$PriceEconomy    -1.145e-04  1.756e-05  -6.521 1.86e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3689 on 454 degrees of freedom
## Multiple R-squared:  0.3339, Adjusted R-squared:  0.3295 
## F-statistic: 75.88 on 3 and 454 DF,  p-value: < 2.2e-16

Model is stastitically signifiacnt as the p-value of model is = 2.2e-16 and we can see the intercept and beta coefficients

coefficients(m2)
##         (Intercept) air$PitchDifference    air$WidthPremium 
##       -2.5405744540        0.0432138143        0.1484583578 
##    air$PriceEconomy 
##       -0.0001144987

Result

So our Model becomes

PriceRelative = -2.5 + 0.432PitchDifference + 0.1485WidthPremium - 0.0001PriceEconomy

Conclusion

We can see in this model that most of the RealativePrice is affected by the Pitch diference and WidthPremium

In simple language

  1. The sapcing between the seats is more in premium class than in Economy class which causes the price of Premium class to be more than Economy class.

  2. The Width of the seats in Premium class is more than in the economy class hence increasing the price fare of Premium class.