airline<-read.csv(paste("SixAirlinesDataV2.csv",sep=" "))
View(airline)

Summary

Summary Statistics of the data set.

summary(airline)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69

Discribe

Discription of variables in the data set.

library(psych)
describe(airline)
##                     vars   n    mean      sd  median trimmed     mad   min
## Airline*               1 458    3.01    1.65    2.00    2.89    1.48  1.00
## Aircraft*              2 458    1.67    0.47    2.00    1.71    0.00  1.00
## FlightDuration         3 458    7.58    3.54    7.79    7.57    4.81  1.25
## TravelMonth*           4 458    2.56    1.17    3.00    2.58    1.48  1.00
## IsInternational*       5 458    1.91    0.28    2.00    2.00    0.00  1.00
## SeatsEconomy           6 458  202.31   76.37  185.00  194.64   85.99 78.00
## SeatsPremium           7 458   33.65   13.26   36.00   33.35   11.86  8.00
## PitchEconomy           8 458   31.22    0.66   31.00   31.26    0.00 30.00
## PitchPremium           9 458   37.91    1.31   38.00   38.05    0.00 34.00
## WidthEconomy          10 458   17.84    0.56   18.00   17.81    0.00 17.00
## WidthPremium          11 458   19.47    1.10   19.00   19.53    0.00 17.00
## PriceEconomy          12 458 1327.08  988.27 1242.00 1244.40 1159.39 65.00
## PricePremium          13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative         14 458    0.49    0.45    0.36    0.42    0.41  0.02
## SeatsTotal            15 458  235.96   85.29  227.00  228.73   90.44 98.00
## PitchDifference       16 458    6.69    1.76    7.00    6.76    0.00  2.00
## WidthDifference       17 458    1.63    1.19    1.00    1.53    0.00  0.00
## PercentPremiumSeats   18 458   14.65    4.84   13.21   14.31    2.68  4.71
##                         max   range  skew kurtosis    se
## Airline*               6.00    5.00  0.61    -0.95  0.08
## Aircraft*              2.00    1.00 -0.72    -1.48  0.02
## FlightDuration        14.66   13.41 -0.07    -1.12  0.17
## TravelMonth*           4.00    3.00 -0.14    -1.46  0.05
## IsInternational*       2.00    1.00 -2.91     6.50  0.01
## SeatsEconomy         389.00  311.00  0.72    -0.36  3.57
## SeatsPremium          66.00   58.00  0.23    -0.46  0.62
## PitchEconomy          33.00    3.00 -0.03    -0.35  0.03
## PitchPremium          40.00    6.00 -1.51     3.52  0.06
## WidthEconomy          19.00    2.00 -0.04    -0.08  0.03
## WidthPremium          21.00    4.00 -0.08    -0.31  0.05
## PriceEconomy        3593.00 3528.00  0.51    -0.88 46.18
## PricePremium        7414.00 7328.00  0.50     0.43 60.19
## PriceRelative          1.89    1.87  1.17     0.72  0.02
## SeatsTotal           441.00  343.00  0.70    -0.53  3.99
## PitchDifference       10.00    8.00 -0.54     1.78  0.08
## WidthDifference        4.00    4.00  0.84    -0.53  0.06
## PercentPremiumSeats   24.69   19.98  0.71     0.28  0.23

Some Plots to better understand the data

Plot between Prices of premium and economy seats.

attach(airline)
library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(PriceEconomy,PricePremium)

Comparison of Premium Price with Others

Plot of Premium price vs Pitch

boxplot(PricePremium~PitchPremium,horizontal=TRUE,main="Plot between Premium seats Price vs Pitch",xlab="Premium Seats Price",ylab="Premium Seats Pitch",col="Light blue")

Plot of Premium price vs Width

boxplot(PricePremium~WidthPremium,horizontal=TRUE,main="Plot between Premium seats Price vs Width",xlab="Premium Seats Price",ylab="Premium Seats Width",col="light blue")

Comparison of Economy Price with Others

Plot between Economy Seat price vs pitch

boxplot(PriceEconomy~PitchEconomy,horizontal=TRUE,main="Plot between Economy seats Price vs Pitch",xlab="Economy Seats Price",ylab="Economy Seats Pitch",col="light blue")

Plot of Economy Seat price Vs Width

boxplot(PriceEconomy~WidthEconomy,horizontal=TRUE,main="Plot between Economy seats Price vs Width",xlab="Economy Seats Price",ylab="Economy Seats Width",col="light blue")

Histogram Of Economy seats dimentions

hist(WidthEconomy,col="red")

hist(PitchEconomy,col="red")

Histogram Of Premium seats dimentions

hist(PitchPremium,col="light green")

hist(WidthPremium,col="light green")

Histogram for Relative Price , Pitch and Width

hist(PriceRelative,col="light pink")

hist(PitchDifference,col="light pink")

hist(WidthDifference,col="light pink")

Comparing all the three Relative parameters together

scatterplotMatrix(formula = ~ PriceRelative + PitchDifference + WidthDifference , cex=0.6, diagonal="histogram")

Correlation between Seat prices and other variables

y<-airline[,12:13]
x<-airline[,6:18]
cor(x,y)
##                     PriceEconomy PricePremium
## SeatsEconomy          0.12816722   0.17700093
## SeatsPremium          0.11364218   0.21761238
## PitchEconomy          0.36866123   0.22614179
## PitchPremium          0.05038455   0.08853915
## WidthEconomy          0.06799061   0.15054837
## WidthPremium         -0.05704522   0.06402004
## PriceEconomy          1.00000000   0.90138870
## PricePremium          0.90138870   1.00000000
## PriceRelative        -0.28856711   0.03184654
## SeatsTotal            0.13243313   0.19232533
## PitchDifference      -0.09952511  -0.01806629
## WidthDifference      -0.08449975  -0.01151218
## PercentPremiumSeats   0.06532232   0.11639097

Corrgram to test correlations between different variables.

library(corrgram)
corrgram(airline, order=TRUE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of Airline Variable intercorrelations")

T test

Null Hypothesis:There is no significant effect of Pitch and width on Relative Price of Premium and Economy seats.

price<-table(PriceRelative)
pitch<-table(PitchDifference)
width<-table(WidthDifference)

Check For Pitch first.

t.test(price,pitch)
## 
##  Welch Two Sample t-test
## 
## data:  price and pitch
## t = -2.0635, df = 4.0011, p-value = 0.108
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -203.87122   30.01816
## sample estimates:
## mean of x mean of y 
##  4.673469 91.600000

Check For Width

t.test(price,width)
## 
##  Welch Two Sample t-test
## 
## data:  price and width
## t = -1.9965, df = 4.001, p-value = 0.1166
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -207.79661   33.94355
## sample estimates:
## mean of x mean of y 
##  4.673469 91.600000

Therefore,from the above test we see that the p-value>0.05 hence we Accept our null hypothesis.

Regression Analysis

Here our main focus is on Price The main focus question is ####What factors explain the difference in price between an economy ticket and a premium-economy airline ticket? So for this we have three price variables and hence our analysis will depend upon 3 factors: 1.Premium seat price 2.Economy seat price and 3.Relative price

Checking the analysis for Premium price.

test1<-lm(PricePremium~SeatsEconomy+FlightDuration+SeatsPremium+PitchEconomy+PitchPremium+WidthEconomy+WidthPremium+PriceEconomy+PriceRelative+SeatsTotal+PitchDifference+WidthDifference+PercentPremiumSeats , data = airline)
summary(test1)
## 
## Call:
## lm(formula = PricePremium ~ SeatsEconomy + FlightDuration + SeatsPremium + 
##     PitchEconomy + PitchPremium + WidthEconomy + WidthPremium + 
##     PriceEconomy + PriceRelative + SeatsTotal + PitchDifference + 
##     WidthDifference + PercentPremiumSeats, data = airline)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -855.46 -127.12   -8.66   89.60 2164.59 
## 
## Coefficients: (3 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1.114e+04  1.467e+03   7.597 1.80e-13 ***
## SeatsEconomy        -2.479e+00  7.231e-01  -3.429 0.000662 ***
## FlightDuration       9.836e+00  6.312e+00   1.558 0.119830    
## SeatsPremium         2.308e+01  4.275e+00   5.399 1.09e-07 ***
## PitchEconomy        -2.601e+02  3.748e+01  -6.939 1.40e-11 ***
## PitchPremium        -1.861e+02  1.794e+01 -10.373  < 2e-16 ***
## WidthEconomy         2.172e+02  4.035e+01   5.384 1.18e-07 ***
## WidthPremium        -8.098e+00  2.236e+01  -0.362 0.717363    
## PriceEconomy         1.359e+00  2.292e-02  59.307  < 2e-16 ***
## PriceRelative        1.039e+03  4.255e+01  24.410  < 2e-16 ***
## SeatsTotal                  NA         NA      NA       NA    
## PitchDifference             NA         NA      NA       NA    
## WidthDifference             NA         NA      NA       NA    
## PercentPremiumSeats -3.407e+01  1.025e+01  -3.323 0.000965 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 300.6 on 447 degrees of freedom
## Multiple R-squared:  0.9467, Adjusted R-squared:  0.9455 
## F-statistic: 794.5 on 10 and 447 DF,  p-value: < 2.2e-16

Checking the analysis for Economic price.

test2<-lm(PriceEconomy~SeatsEconomy+FlightDuration+SeatsPremium+PitchEconomy+PitchPremium+WidthEconomy+WidthPremium+PricePremium+PriceRelative+SeatsTotal+PitchDifference+WidthDifference+PercentPremiumSeats, data = airline)
summary(test2)
## 
## Call:
## lm(formula = PriceEconomy ~ SeatsEconomy + FlightDuration + SeatsPremium + 
##     PitchEconomy + PitchPremium + WidthEconomy + WidthPremium + 
##     PricePremium + PriceRelative + SeatsTotal + PitchDifference + 
##     WidthDifference + PercentPremiumSeats, data = airline)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1197.24   -91.56    10.99   110.71   601.51 
## 
## Coefficients: (3 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -8.885e+03  9.951e+02  -8.929  < 2e-16 ***
## SeatsEconomy         2.128e+00  4.976e-01   4.277 2.32e-05 ***
## FlightDuration       1.292e+01  4.344e+00   2.975  0.00309 ** 
## SeatsPremium        -1.681e+01  2.953e+00  -5.691 2.28e-08 ***
## PitchEconomy         2.352e+02  2.498e+01   9.416  < 2e-16 ***
## PitchPremium         1.435e+02  1.208e+01  11.887  < 2e-16 ***
## WidthEconomy        -2.371e+02  2.659e+01  -8.918  < 2e-16 ***
## WidthPremium         2.069e+01  1.547e+01   1.338  0.18165    
## PricePremium         6.529e-01  1.101e-02  59.307  < 2e-16 ***
## PriceRelative       -7.677e+02  2.667e+01 -28.788  < 2e-16 ***
## SeatsTotal                  NA         NA      NA       NA    
## PitchDifference             NA         NA      NA       NA    
## WidthDifference             NA         NA      NA       NA    
## PercentPremiumSeats  3.160e+01  7.037e+00   4.491 9.04e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 208.3 on 447 degrees of freedom
## Multiple R-squared:  0.9565, Adjusted R-squared:  0.9556 
## F-statistic: 983.6 on 10 and 447 DF,  p-value: < 2.2e-16

Checking the analysis for Relative price.

test3<-lm(PriceRelative ~ SeatsEconomy+FlightDuration+SeatsPremium+PitchEconomy+PitchPremium+WidthEconomy+WidthPremium+PricePremium+SeatsTotal+PitchDifference+WidthDifference+PercentPremiumSeats, data = airline)
summary(test3)
## 
## Call:
## lm(formula = PriceRelative ~ SeatsEconomy + FlightDuration + 
##     SeatsPremium + PitchEconomy + PitchPremium + WidthEconomy + 
##     WidthPremium + PricePremium + SeatsTotal + PitchDifference + 
##     WidthDifference + PercentPremiumSeats, data = airline)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.87453 -0.25322 -0.08333  0.13083  1.35318 
## 
## Coefficients: (3 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          6.928e+00  1.732e+00   3.999 7.43e-05 ***
## SeatsEconomy        -9.463e-05  8.816e-04  -0.107    0.915    
## FlightDuration       3.052e-02  7.559e-03   4.038 6.35e-05 ***
## SeatsPremium        -2.239e-03  5.230e-03  -0.428    0.669    
## PitchEconomy        -2.585e-01  4.253e-02  -6.077 2.63e-09 ***
## PitchPremium        -1.828e-02  2.138e-02  -0.855    0.393    
## WidthEconomy         2.561e-03  4.710e-02   0.054    0.957    
## WidthPremium         1.204e-01  2.680e-02   4.492 9.01e-06 ***
## PricePremium        -6.865e-06  1.950e-05  -0.352    0.725    
## SeatsTotal                  NA         NA      NA       NA    
## PitchDifference             NA         NA      NA       NA    
## WidthDifference             NA         NA      NA       NA    
## PercentPremiumSeats -1.322e-02  1.245e-02  -1.062    0.289    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3691 on 448 degrees of freedom
## Multiple R-squared:  0.3421, Adjusted R-squared:  0.3289 
## F-statistic: 25.89 on 9 and 448 DF,  p-value: < 2.2e-16

Conclusions:

1.SeatsEconomy , SeatsPremium ,PitchEconomy ,PitchPremium ,WidthEconomy ,PriceEconomy ,PriceRelative ,PercentPremiumSeats are highly significant Independent Variables.

2.The above mentiones variables were highly dependent on the price variation of Premium And Economy seats.

3.Variables like FlightDuration ,WidthPremium ,SeatsTotal, PitchDifference, WidthDifference have p value>0.05 therefore they are not significant.

4.The accuracy of model is almost 95% for the Prices of premium and economy seats.

5.The F-statistics which checkhow well does all the variables taken together predicts the dependent variable.Here F-statistics is p-value<2.2e-16 which is highly significant which implies high correlation.