Premium vs Economy Seats Comparison

In this report, we focus on how the data is when we compare Premium seats and Economy seats in an Airplane. The data we are given is about the various parameters that we have in this like armrest space, size and price comparisons.

Focus

The main Focus of this report is to draw attention and study the various parameters that come with Premium Seats and to test as to whether they are worth the experience.

Reading The Dataset

We have downloaded a dataset having various parameters and we study them by first opening the dataset and reading it in a data frame.

air.df<-read.csv(paste("G:/R Intern/SixAirlinesDataV2.csv"))
View(air.df)
dim(air.df)
## [1] 458  18

Thus we see the air.df data frame is created.

Describing the Dataset

summary(air.df)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69

As we see from the above, we make the following observations:

  1. Airline Data is of 6 main row types with British Airways occupying the maximum entries.

  2. Boeing Aircraft is comes up more than Airbus in the Data.

  3. Flight Duration is averaged around 7.79 Hrs with minimum flight of 1 hr 15 mins and maximum time of 14
    Hours.

  4. Most of the entries are International Entries. Thus Premium can be found more in International Flights.

  5. There are at an average 202 Seats in Economy Section with min number at 78 to 389(max).

  6. Premium Seats in each Aircraft average around 33.

  7. Total Seats range from (98,236).

  8. The Price of Premium seats range around $1737(Rs. 111021.83).

  9. The Price of Economy Seats are lower, ranging around $1327(Rs. 84816.83).

10.The Difference in prices can be as small as 20 dollars(USD) to as large as $1890.

Thus maximum entries have been focused on British Airways.

Premium Seats

NOW we come to our main objective i.e. Premium Economy Seats.

aggregate(list(PercentPremium=air.df$PercentPremiumSeats),list(name=air.df$Airline),FUN=mean)
##        name PercentPremium
## 1 AirFrance       11.58757
## 2   British       17.79074
## 3     Delta       14.48217
## 4       Jet       10.17311
## 5 Singapore       11.83000
## 6    Virgin       15.75484

So, as we see from the above Data, the percentage of Premium Seats are around 10-18 % of the total seats.

Draw Box Plots / Bar Plots to visualize the distribution of each variable independently

Economy<-aggregate(air.df$SeatsEconomy,list(air.df$Airline),mean)
Premium<-aggregate(air.df$SeatsPremium,list(air.df$Airline),mean)
Economy
##     Group.1        x
## 1 AirFrance 214.4595
## 2   British 216.5886
## 3     Delta 137.2174
## 4       Jet 140.3115
## 5 Singapore 243.6000
## 6    Virgin 230.1774
Premium
##     Group.1        x
## 1 AirFrance 26.70270
## 2   British 43.18286
## 3     Delta 22.56522
## 4       Jet 15.65574
## 5 Singapore 31.20000
## 6    Virgin 42.53226

Draw Scatter Plots to understand how are the variables correlated pair-wise

attach(air.df)
plot(Airline,SeatsPremium,col="green",main="Airline vs Economy Seats",ylab="Mean Economy Seats")

plot(Airline,SeatsEconomy,col='grey',main="Airline vs Premium Seats",ylab="Mean PremiumSeats")

plot(Airline,SeatsTotal,col='orange',main="Airline vs Total Seats",ylab="Mean Total Seats")

Above we see the Relationship of Premium and Economy Seats with the Airline.

Pitch and Airline

plot(Airline,PriceEconomy,col="blue",main="Airline vs Economy Seats",ylab="Mean Seats")

plot(Airline,PricePremium,col='dark red',main="Airline vs Premium Seats",ylab="Mean Seats")

library(car)
scatterplot(PricePremium~PriceEconomy,main="Economy Price vs Premium Price")

hist(WidthEconomy)

hist(PitchEconomy)

hist(PriceRelative)

At last, we compare the three Relative Factors

scatterplotMatrix(formula = ~ PriceRelative + PitchDifference + WidthDifference , cex=0.6, diagonal="histogram")

Correlation between Variables

For this firstly,we find the correlation Matrix.

cor.data.table<-cor(air.df[,6:18])
cor.data.table[,7:8]
##                     PriceEconomy PricePremium
## SeatsEconomy          0.12816722   0.17700093
## SeatsPremium          0.11364218   0.21761238
## PitchEconomy          0.36866123   0.22614179
## PitchPremium          0.05038455   0.08853915
## WidthEconomy          0.06799061   0.15054837
## WidthPremium         -0.05704522   0.06402004
## PriceEconomy          1.00000000   0.90138870
## PricePremium          0.90138870   1.00000000
## PriceRelative        -0.28856711   0.03184654
## SeatsTotal            0.13243313   0.19232533
## PitchDifference      -0.09952511  -0.01806629
## WidthDifference      -0.08449975  -0.01151218
## PercentPremiumSeats   0.06532232   0.11639097

As we see from the above, we find the Various Correlations among all rows and Price of Economy and Price Of Premium Seats.

Draw a Corrgram; Create a Variance-Covariance Matrix

x<-air.df[,3]+air.df[,6:14]
corrr<-round(cor(x),2)
corrr
##               SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy          1.00         0.64         0.25         0.26
## SeatsPremium          0.64         1.00         0.38         0.37
## PitchEconomy          0.25         0.38         1.00         0.90
## PitchPremium          0.26         0.37         0.90         1.00
## WidthEconomy          0.28         0.45         0.98         0.93
## WidthPremium          0.25         0.38         0.92         0.97
## PriceEconomy          0.15         0.25         0.60         0.53
## PricePremium          0.21         0.36         0.65         0.62
## PriceRelative         0.24         0.38         0.97         0.95
##               WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy          0.28         0.25         0.15         0.21
## SeatsPremium          0.45         0.38         0.25         0.36
## PitchEconomy          0.98         0.92         0.60         0.65
## PitchPremium          0.93         0.97         0.53         0.62
## WidthEconomy          1.00         0.95         0.54         0.62
## WidthPremium          0.95         1.00         0.51         0.62
## PriceEconomy          0.54         0.51         1.00         0.90
## PricePremium          0.62         0.62         0.90         1.00
## PriceRelative         0.98         0.97         0.52         0.64
##               PriceRelative
## SeatsEconomy           0.24
## SeatsPremium           0.38
## PitchEconomy           0.97
## PitchPremium           0.95
## WidthEconomy           0.98
## WidthPremium           0.97
## PriceEconomy           0.52
## PricePremium           0.64
## PriceRelative          1.00

So we get a Correlation Matrix.

## corrplot 0.84 loaded

T-Test

Hypotheis: There is no effect of Airline on Price of Premium Seats and Economy Seats.

To test the above hypothesis we run the t test.

myt<-table(PriceEconomy)
myp<-table(PricePremium)
myx<-table(Airline)
t.test(myp,myx)
## 
##  Welch Two Sample t-test
## 
## data:  myp and myx
## t = -3.6197, df = 5.0004, p-value = 0.01522
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -125.9611  -21.3488
## sample estimates:
## mean of x mean of y 
##  2.678363 76.333333
t.test(myt,myx)
## 
##  Welch Two Sample t-test
## 
## data:  myt and myx
## t = -3.6329, df = 5.0003, p-value = 0.01501
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -126.22904  -21.61658
## sample estimates:
## mean of x mean of y 
##  2.410526 76.333333

As p-value is less (p<0.05) the data is significant. So in the next we determine the significant x-values.

Regression Analysis

Lastly, we need to find all the significant x-values(independent variables) for Price of Premium Seats and Price of economy seats.

Model1=PricePremium~SeatsEconomy+FlightDuration+SeatsPremium+PitchEconomy+PitchPremium+WidthEconomy+WidthPremium+PriceEconomy+PriceRelative+SeatsTotal+PitchDifference+WidthDifference+PercentPremiumSeats

fit<-lm(Model1,data=air.df)
summary(fit)
## 
## Call:
## lm(formula = Model1, data = air.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -855.46 -127.12   -8.66   89.60 2164.59 
## 
## Coefficients: (3 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1.114e+04  1.467e+03   7.597 1.80e-13 ***
## SeatsEconomy        -2.479e+00  7.231e-01  -3.429 0.000662 ***
## FlightDuration       9.836e+00  6.312e+00   1.558 0.119830    
## SeatsPremium         2.308e+01  4.275e+00   5.399 1.09e-07 ***
## PitchEconomy        -2.601e+02  3.748e+01  -6.939 1.40e-11 ***
## PitchPremium        -1.861e+02  1.794e+01 -10.373  < 2e-16 ***
## WidthEconomy         2.172e+02  4.035e+01   5.384 1.18e-07 ***
## WidthPremium        -8.098e+00  2.236e+01  -0.362 0.717363    
## PriceEconomy         1.359e+00  2.292e-02  59.307  < 2e-16 ***
## PriceRelative        1.039e+03  4.255e+01  24.410  < 2e-16 ***
## SeatsTotal                  NA         NA      NA       NA    
## PitchDifference             NA         NA      NA       NA    
## WidthDifference             NA         NA      NA       NA    
## PercentPremiumSeats -3.407e+01  1.025e+01  -3.323 0.000965 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 300.6 on 447 degrees of freedom
## Multiple R-squared:  0.9467, Adjusted R-squared:  0.9455 
## F-statistic: 794.5 on 10 and 447 DF,  p-value: < 2.2e-16
Model2=PriceEconomy~SeatsEconomy+FlightDuration+SeatsPremium+PitchEconomy+PitchPremium+WidthEconomy+WidthPremium+PricePremium+PriceRelative+SeatsTotal+PitchDifference+WidthDifference+PercentPremiumSeats

fit1<-lm(Model2,data=air.df)
summary(fit)
## 
## Call:
## lm(formula = Model1, data = air.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -855.46 -127.12   -8.66   89.60 2164.59 
## 
## Coefficients: (3 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1.114e+04  1.467e+03   7.597 1.80e-13 ***
## SeatsEconomy        -2.479e+00  7.231e-01  -3.429 0.000662 ***
## FlightDuration       9.836e+00  6.312e+00   1.558 0.119830    
## SeatsPremium         2.308e+01  4.275e+00   5.399 1.09e-07 ***
## PitchEconomy        -2.601e+02  3.748e+01  -6.939 1.40e-11 ***
## PitchPremium        -1.861e+02  1.794e+01 -10.373  < 2e-16 ***
## WidthEconomy         2.172e+02  4.035e+01   5.384 1.18e-07 ***
## WidthPremium        -8.098e+00  2.236e+01  -0.362 0.717363    
## PriceEconomy         1.359e+00  2.292e-02  59.307  < 2e-16 ***
## PriceRelative        1.039e+03  4.255e+01  24.410  < 2e-16 ***
## SeatsTotal                  NA         NA      NA       NA    
## PitchDifference             NA         NA      NA       NA    
## WidthDifference             NA         NA      NA       NA    
## PercentPremiumSeats -3.407e+01  1.025e+01  -3.323 0.000965 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 300.6 on 447 degrees of freedom
## Multiple R-squared:  0.9467, Adjusted R-squared:  0.9455 
## F-statistic: 794.5 on 10 and 447 DF,  p-value: < 2.2e-16

We test for all Numeric values and on the basis of above information of Regression Analysis on both PricePremium and PriceEconomic, we infer that:

  1. SeatsEconomy , SeatsPremium ,PitchEconomy ,PitchPremium ,WidthEconomy ,PriceEconomy ,PriceRelative ,PercentPremiumSeats are highly significant Independent Variables.

  2. They are all highly significant and highly correlated to Price of the Seats.
  3. FlightDuration ,WidthPremium ,SeatsTotal, PitchDifference, WidthDifference are not significant at all having p>0.05

Other Observations:

  1. Residual Standard Error is 300.6 on 447 x-values.
  2. The model accounts for 94.67% of variance in the data. The Actual R-Squared Value is 94.55% as penalty is imposed for more than one x-value.
  3. F-STatistics check whether the predictor variables taken together, predict the response variable above chance levels.Here,F-Statistic are 794.5 on 10 implying high correlation among 10 x-vales and y.

This shows us that this is a good model since the p-value is pretty low(2.2e-16) implying high correlation.

Thus we Conclude Our report.