reading the data into a data frame named airline

airline.df <- read.csv(paste("SixAirlinesDataV2.csv", sep=""))
View(airline.df)

summarising the data

summary(airline.df)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69
library(psych)
describe(airline.df)
##                     vars   n    mean      sd  median trimmed     mad   min
## Airline*               1 458    3.01    1.65    2.00    2.89    1.48  1.00
## Aircraft*              2 458    1.67    0.47    2.00    1.71    0.00  1.00
## FlightDuration         3 458    7.58    3.54    7.79    7.57    4.81  1.25
## TravelMonth*           4 458    2.56    1.17    3.00    2.58    1.48  1.00
## IsInternational*       5 458    1.91    0.28    2.00    2.00    0.00  1.00
## SeatsEconomy           6 458  202.31   76.37  185.00  194.64   85.99 78.00
## SeatsPremium           7 458   33.65   13.26   36.00   33.35   11.86  8.00
## PitchEconomy           8 458   31.22    0.66   31.00   31.26    0.00 30.00
## PitchPremium           9 458   37.91    1.31   38.00   38.05    0.00 34.00
## WidthEconomy          10 458   17.84    0.56   18.00   17.81    0.00 17.00
## WidthPremium          11 458   19.47    1.10   19.00   19.53    0.00 17.00
## PriceEconomy          12 458 1327.08  988.27 1242.00 1244.40 1159.39 65.00
## PricePremium          13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative         14 458    0.49    0.45    0.36    0.42    0.41  0.02
## SeatsTotal            15 458  235.96   85.29  227.00  228.73   90.44 98.00
## PitchDifference       16 458    6.69    1.76    7.00    6.76    0.00  2.00
## WidthDifference       17 458    1.63    1.19    1.00    1.53    0.00  0.00
## PercentPremiumSeats   18 458   14.65    4.84   13.21   14.31    2.68  4.71
##                         max   range  skew kurtosis    se
## Airline*               6.00    5.00  0.61    -0.95  0.08
## Aircraft*              2.00    1.00 -0.72    -1.48  0.02
## FlightDuration        14.66   13.41 -0.07    -1.12  0.17
## TravelMonth*           4.00    3.00 -0.14    -1.46  0.05
## IsInternational*       2.00    1.00 -2.91     6.50  0.01
## SeatsEconomy         389.00  311.00  0.72    -0.36  3.57
## SeatsPremium          66.00   58.00  0.23    -0.46  0.62
## PitchEconomy          33.00    3.00 -0.03    -0.35  0.03
## PitchPremium          40.00    6.00 -1.51     3.52  0.06
## WidthEconomy          19.00    2.00 -0.04    -0.08  0.03
## WidthPremium          21.00    4.00 -0.08    -0.31  0.05
## PriceEconomy        3593.00 3528.00  0.51    -0.88 46.18
## PricePremium        7414.00 7328.00  0.50     0.43 60.19
## PriceRelative          1.89    1.87  1.17     0.72  0.02
## SeatsTotal           441.00  343.00  0.70    -0.53  3.99
## PitchDifference       10.00    8.00 -0.54     1.78  0.08
## WidthDifference        4.00    4.00  0.84    -0.53  0.06
## PercentPremiumSeats   24.69   19.98  0.71     0.28  0.23

Visualization Using BoxPlots

prices of premium economy seats in different airlines.

boxplot(airline.df$PricePremium~airline.df$Airline)

##prices of economy seats in different airlines.

boxplot(airline.df$PriceEconomy~airline.df$Airline)

##prices of primium economy seats with respect to the month.

boxplot(airline.df$PricePremium~airline.df$TravelMonth)

##prices of economy seats with respect to the month.

boxplot(airline.df$PriceEconomy~airline.df$TravelMonth)

##how the relative prices between economy and premium economy differ according to the month.

boxplot(airline.df$PriceRelative~airline.df$TravelMonth)

##plot between price relative and international, by this plot we can infer that primum economy prices in international flights are more costly than thier economy seats when compared to domestic flights.

boxplot(airline.df$PriceRelative~airline.df$IsInternational) 

##plot between price relative and aircarft,by this plot we can infer that boeing aircarft’s premium economy price is gretaer than its economy than that of an airbus.

boxplot(airline.df$PriceRelative~airline.df$Aircraft) 

##scatterplot for non-categorical variables. ##scatterplot between relative price and flight duration.

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(airline.df$PriceRelative~airline.df$FlightDuration) 

##scatterplot between relative price and economy seats.

scatterplot(airline.df$SeatsEconomy~airline.df$PriceRelative)

##corrgram for the data set airline.

library(corrgram)
corrgram(airline.df, order=TRUE, lower.panel=panel.shade,upper.panel=panel.pie, text.panel=panel.txt)

a Correlation Matrix for the variables in the dataset.

round(cor(airline.df[,6:18]),2)
##                     SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy                1.00         0.63         0.14         0.12
## SeatsPremium                0.63         1.00        -0.03         0.00
## PitchEconomy                0.14        -0.03         1.00        -0.55
## PitchPremium                0.12         0.00        -0.55         1.00
## WidthEconomy                0.37         0.46         0.29        -0.02
## WidthPremium                0.10         0.00        -0.54         0.75
## PriceEconomy                0.13         0.11         0.37         0.05
## PricePremium                0.18         0.22         0.23         0.09
## PriceRelative               0.00        -0.10        -0.42         0.42
## SeatsTotal                  0.99         0.72         0.12         0.11
## PitchDifference             0.04         0.02        -0.78         0.95
## WidthDifference            -0.08        -0.22        -0.64         0.70
## PercentPremiumSeats        -0.33         0.49        -0.10        -0.18
##                     WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy                0.37         0.10         0.13         0.18
## SeatsPremium                0.46         0.00         0.11         0.22
## PitchEconomy                0.29        -0.54         0.37         0.23
## PitchPremium               -0.02         0.75         0.05         0.09
## WidthEconomy                1.00         0.08         0.07         0.15
## WidthPremium                0.08         1.00        -0.06         0.06
## PriceEconomy                0.07        -0.06         1.00         0.90
## PricePremium                0.15         0.06         0.90         1.00
## PriceRelative              -0.04         0.50        -0.29         0.03
## SeatsTotal                  0.41         0.09         0.13         0.19
## PitchDifference            -0.13         0.76        -0.10        -0.02
## WidthDifference            -0.39         0.88        -0.08        -0.01
## PercentPremiumSeats         0.23        -0.18         0.07         0.12
##                     PriceRelative SeatsTotal PitchDifference
## SeatsEconomy                 0.00       0.99            0.04
## SeatsPremium                -0.10       0.72            0.02
## PitchEconomy                -0.42       0.12           -0.78
## PitchPremium                 0.42       0.11            0.95
## WidthEconomy                -0.04       0.41           -0.13
## WidthPremium                 0.50       0.09            0.76
## PriceEconomy                -0.29       0.13           -0.10
## PricePremium                 0.03       0.19           -0.02
## PriceRelative                1.00      -0.01            0.47
## SeatsTotal                  -0.01       1.00            0.03
## PitchDifference              0.47       0.03            1.00
## WidthDifference              0.49      -0.11            0.76
## PercentPremiumSeats         -0.16      -0.22           -0.09
##                     WidthDifference PercentPremiumSeats
## SeatsEconomy                  -0.08               -0.33
## SeatsPremium                  -0.22                0.49
## PitchEconomy                  -0.64               -0.10
## PitchPremium                   0.70               -0.18
## WidthEconomy                  -0.39                0.23
## WidthPremium                   0.88               -0.18
## PriceEconomy                  -0.08                0.07
## PricePremium                  -0.01                0.12
## PriceRelative                  0.49               -0.16
## SeatsTotal                    -0.11               -0.22
## PitchDifference                0.76               -0.09
## WidthDifference                1.00               -0.28
## PercentPremiumSeats           -0.28                1.00

in order to know the factors affecting the relative price between economy and premium economy,we find the correlation between relative price and other variables.

cor(airline.df$PriceRelative, airline.df[,c(6:18)])
##      SeatsEconomy SeatsPremium PitchEconomy PitchPremium WidthEconomy
## [1,]  0.003956939  -0.09719601    -0.423022    0.4175391  -0.04396116
##      WidthPremium PriceEconomy PricePremium PriceRelative  SeatsTotal
## [1,]    0.5042476   -0.2885671   0.03184654             1 -0.01156894
##      PitchDifference WidthDifference PercentPremiumSeats
## [1,]       0.4687302       0.4858024          -0.1615656

in order to know whether the correlation obtained above are significant or not we run hypothesis tests with the null hypothesis as there is no significant correlation.we reject the null hypotheses whith p-vale <0.05.

cor.test(airline.df$PriceRelative, airline.df$SeatsEconomy)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$SeatsEconomy
## t = 0.084498, df = 456, p-value = 0.9327
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08770167  0.09554911
## sample estimates:
##         cor 
## 0.003956939
cor.test(airline.df$PriceRelative, airline.df$SeatsPremium)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$SeatsPremium
## t = -2.0854, df = 456, p-value = 0.03759
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.18715605 -0.00561924
## sample estimates:
##         cor 
## -0.09719601
cor.test(airline.df$PriceRelative, airline.df$SeatsTotal)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$SeatsTotal
## t = -0.24706, df = 456, p-value = 0.805
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.10308648  0.08014282
## sample estimates:
##         cor 
## -0.01156894
cor.test(airline.df$PriceRelative, airline.df$PitchEconomy)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$PitchEconomy
## t = -9.9692, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4954453 -0.3447581
## sample estimates:
##       cor 
## -0.423022
cor.test(airline.df$PriceRelative, airline.df$PitchPremium)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$PitchPremium
## t = 9.8125, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3388769 0.4904041
## sample estimates:
##       cor 
## 0.4175391
cor.test(airline.df$PriceRelative, airline.df$PitchDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$PitchDifference
## t = 11.331, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3940262 0.5372817
## sample estimates:
##       cor 
## 0.4687302
cor.test(airline.df$PriceRelative, airline.df$WidthEconomy)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$WidthEconomy
## t = -0.93966, df = 456, p-value = 0.3479
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.13504401  0.04785843
## sample estimates:
##         cor 
## -0.04396116
cor.test(airline.df$PriceRelative, airline.df$WidthPremium)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$WidthPremium
## t = 12.469, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4326084 0.5695593
## sample estimates:
##       cor 
## 0.5042476
cor.test(airline.df$PriceRelative, airline.df$WidthDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$WidthDifference
## t = 11.869, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4125388 0.5528218
## sample estimates:
##       cor 
## 0.4858024
cor.test(airline.df$PriceRelative, airline.df$PriceEconomy)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$PriceEconomy
## t = -6.4359, df = 456, p-value = 3.112e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3704004 -0.2022889
## sample estimates:
##        cor 
## -0.2885671
cor.test(airline.df$PriceRelative, airline.df$PricePremium)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$PricePremium
## t = 0.6804, df = 456, p-value = 0.4966
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.05995522  0.12311410
## sample estimates:
##        cor 
## 0.03184654
cor.test(airline.df$PriceRelative, airline.df$PercentPremiumSeats)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$PercentPremiumSeats
## t = -3.496, df = 456, p-value = 0.0005185
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.24949885 -0.07098966
## sample estimates:
##        cor 
## -0.1615656

from the above pearson’s correlation tests we can infer that the variables effecting relative price significantly are seatspremium,pitcheconomy,pitchpremium,pitchdifference,widthpremium,widthdifference,priceeconomy,percentpremiumseats.

to know how much of these variable effect the relative price can be known by creating an regression model.

fit <- lm(PriceRelative ~ PercentPremiumSeats + PitchEconomy + PitchPremium  + PitchDifference + WidthPremium + WidthDifference + PriceEconomy,data=airline.df)
fit
## 
## Call:
## lm(formula = PriceRelative ~ PercentPremiumSeats + PitchEconomy + 
##     PitchPremium + PitchDifference + WidthPremium + WidthDifference + 
##     PriceEconomy, data = airline.df)
## 
## Coefficients:
##         (Intercept)  PercentPremiumSeats         PitchEconomy  
##          -1.1023981           -0.0067888           -0.0680970  
##        PitchPremium      PitchDifference         WidthPremium  
##           0.0335887                   NA            0.1371216  
##     WidthDifference         PriceEconomy  
##           0.0072384           -0.0001056
summary(fit)
## 
## Call:
## lm(formula = PriceRelative ~ PercentPremiumSeats + PitchEconomy + 
##     PitchPremium + PitchDifference + WidthPremium + WidthDifference + 
##     PriceEconomy, data = airline.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.90093 -0.22133 -0.02915  0.15791  1.16165 
## 
## Coefficients: (1 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -1.102e+00  1.752e+00  -0.629 0.529437    
## PercentPremiumSeats -6.789e-03  4.267e-03  -1.591 0.112312    
## PitchEconomy        -6.810e-02  4.511e-02  -1.510 0.131826    
## PitchPremium         3.359e-02  2.192e-02   1.533 0.126056    
## PitchDifference             NA         NA      NA       NA    
## WidthPremium         1.371e-01  3.827e-02   3.583 0.000377 ***
## WidthDifference      7.238e-03  3.769e-02   0.192 0.847790    
## PriceEconomy        -1.056e-04  2.085e-05  -5.064 5.99e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3688 on 451 degrees of freedom
## Multiple R-squared:  0.339,  Adjusted R-squared:  0.3302 
## F-statistic: 38.56 on 6 and 451 DF,  p-value: < 2.2e-16
coefficients(fit)
##         (Intercept) PercentPremiumSeats        PitchEconomy 
##       -1.1023980556       -0.0067888054       -0.0680970092 
##        PitchPremium     PitchDifference        WidthPremium 
##        0.0335887397                  NA        0.1371215598 
##     WidthDifference        PriceEconomy 
##        0.0072383702       -0.0001055815

in the above regression model obtained we can see that only widthpremium,priceeconomy,pitchdifference(value is not shwon due to vast difference in values effected by presence of other variables ),so we create an another regression model with just these three variables.

fit2 <- lm(airline.df$PriceRelative ~ airline.df$PitchDifference + airline.df$WidthPremium + airline.df$PriceEconomy)
summary(fit2)
## 
## Call:
## lm(formula = airline.df$PriceRelative ~ airline.df$PitchDifference + 
##     airline.df$WidthPremium + airline.df$PriceEconomy)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83079 -0.23877 -0.05586  0.14382  1.17478 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -2.541e+00  4.009e-01  -6.337 5.66e-10 ***
## airline.df$PitchDifference  4.321e-02  1.513e-02   2.856  0.00449 ** 
## airline.df$WidthPremium     1.485e-01  2.422e-02   6.130 1.91e-09 ***
## airline.df$PriceEconomy    -1.145e-04  1.756e-05  -6.521 1.86e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3689 on 454 degrees of freedom
## Multiple R-squared:  0.3339, Adjusted R-squared:  0.3295 
## F-statistic: 75.88 on 3 and 454 DF,  p-value: < 2.2e-16
coefficients(fit2)
##                (Intercept) airline.df$PitchDifference 
##              -2.5405744540               0.0432138143 
##    airline.df$WidthPremium    airline.df$PriceEconomy 
##               0.1484583578              -0.0001144987

so the best regression model that explains relative price is

PriceRelative = -2.5 + 0.432PitchDifference + 0.1485WidthPremium - 0.0001PriceEconomy

Conclusions:

1.the multiple r-squared value of the second regression model is 0.3339 that means that 33.39% of the variations in the dependent variable that is Pricerelative is explained by the independent variables.

2.the p-vale of the F-statistic is 2.2e-16 which is significantly less than 0.05 , that imples the regression model is a dependable and we can depend on it to make decisions

3.in simple words we can say that the pitchpremium that is the spacing between seats in premium and the widthpremium that is leg room in premium economy is greater than economy so that explains the higher prices of the premium economy.

4.from the correlation matrix we can infer that:

*prices of economy and premium both have high positive correlation with flight duration which is obvious.

*percentpremiumseats is negatively correlated with economy seats which is also obvious that if economy seats increase premium seats will decrease.