In this report, we focus on how the data is when we compare Premium seats and Economy seats in an Airplane. The data we are given is about the various parameters that we have in this like armrest space, size and price comparisons.
The main Focus of this report is to draw attention and study the various parameters that come with Premium Seats and to test as to whether they are worth the experience.
We have downloaded a dataset having various parameters and we study them by first opening the dataset and reading it in a data frame.
air.df<-read.csv(paste("G:/R Intern/SixAirlinesDataV2.csv"))
View(air.df)
dim(air.df)
## [1] 458 18
Thus we see the air.df data frame is created.
summary(air.df)
## Airline Aircraft FlightDuration TravelMonth
## AirFrance: 74 AirBus:151 Min. : 1.250 Aug:127
## British :175 Boeing:307 1st Qu.: 4.260 Jul: 75
## Delta : 46 Median : 7.790 Oct:127
## Jet : 61 Mean : 7.578 Sep:129
## Singapore: 40 3rd Qu.:10.620
## Virgin : 62 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 40 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:418 1st Qu.:133.0 1st Qu.:21.00 1st Qu.:31.00
## Median :185.0 Median :36.00 Median :31.00
## Mean :202.3 Mean :33.65 Mean :31.22
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 65
## 1st Qu.:38.00 1st Qu.:18.00 1st Qu.:19.00 1st Qu.: 413
## Median :38.00 Median :18.00 Median :19.00 Median :1242
## Mean :37.91 Mean :17.84 Mean :19.47 Mean :1327
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1909
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 86.0 Min. :0.0200 Min. : 98 Min. : 2.000
## 1st Qu.: 528.8 1st Qu.:0.1000 1st Qu.:166 1st Qu.: 6.000
## Median :1737.0 Median :0.3650 Median :227 Median : 7.000
## Mean :1845.3 Mean :0.4872 Mean :236 Mean : 6.688
## 3rd Qu.:2989.0 3rd Qu.:0.7400 3rd Qu.:279 3rd Qu.: 7.000
## Max. :7414.0 Max. :1.8900 Max. :441 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.28
## Median :1.000 Median :13.21
## Mean :1.633 Mean :14.65
## 3rd Qu.:3.000 3rd Qu.:15.36
## Max. :4.000 Max. :24.69
As we see from the above, we make the following observations:
Airline Data is of 6 main row types with British Airways occupying the maximum entries.
Boeing Aircraft is comes up more than Airbus in the Data.
Flight Duration is averaged around 7.79 Hrs with minimum flight of 1 hr 15 mins and maximum time of 14
Hours.
Most of the entries are International Entries. Thus Premium can be found more in International Flights.
There are at an average 202 Seats in Economy Section with min number at 78 to 389(max).
Premium Seats in each Aircraft average around 33.
Total Seats range from (98,236).
The Price of Premium seats range around $1737(Rs. 111021.83).
The Price of Economy Seats are lower, ranging around $1327(Rs. 84816.83).
10.The Difference in prices can be as small as 20 dollars(USD) to as large as $1890.
Thus maximum entries have been focused on British Airways.
NOW we come to our main objective i.e. Premium Economy Seats.
aggregate(list(PercentPremium=air.df$PercentPremiumSeats),list(name=air.df$Airline),FUN=mean)
## name PercentPremium
## 1 AirFrance 11.58757
## 2 British 17.79074
## 3 Delta 14.48217
## 4 Jet 10.17311
## 5 Singapore 11.83000
## 6 Virgin 15.75484
So, as we see from the above Data, the percentage of Premium Seats are around 10-18 % of the total seats.
Economy<-aggregate(air.df$SeatsEconomy,list(air.df$Airline),mean)
Premium<-aggregate(air.df$SeatsPremium,list(air.df$Airline),mean)
Economy
## Group.1 x
## 1 AirFrance 214.4595
## 2 British 216.5886
## 3 Delta 137.2174
## 4 Jet 140.3115
## 5 Singapore 243.6000
## 6 Virgin 230.1774
Premium
## Group.1 x
## 1 AirFrance 26.70270
## 2 British 43.18286
## 3 Delta 22.56522
## 4 Jet 15.65574
## 5 Singapore 31.20000
## 6 Virgin 42.53226
attach(air.df)
plot(Airline,SeatsPremium,col="green",main="Airline vs Economy Seats",ylab="Mean Economy Seats")
plot(Airline,SeatsEconomy,col='grey',main="Airline vs Premium Seats",ylab="Mean PremiumSeats")
plot(Airline,SeatsTotal,col='orange',main="Airline vs Total Seats",ylab="Mean Total Seats")
Above we see the Relationship of Premium and Economy Seats with the Airline.
plot(Airline,PriceEconomy,col="blue",main="Airline vs Economy Seats",ylab="Mean Seats")
plot(Airline,PricePremium,col='dark red',main="Airline vs Premium Seats",ylab="Mean Seats")
library(car)
scatterplot(PricePremium~PriceEconomy,main="Economy Price vs Premium Price")
hist(WidthEconomy)
hist(PitchEconomy)
hist(PriceRelative)
At last, we compare the three Relative Factors
scatterplotMatrix(formula = ~ PriceRelative + PitchDifference + WidthDifference , cex=0.6, diagonal="histogram")
For this firstly,we find the correlation Matrix.
cor.data.table<-cor(air.df[,6:18])
cor.data.table[,7:8]
## PriceEconomy PricePremium
## SeatsEconomy 0.12816722 0.17700093
## SeatsPremium 0.11364218 0.21761238
## PitchEconomy 0.36866123 0.22614179
## PitchPremium 0.05038455 0.08853915
## WidthEconomy 0.06799061 0.15054837
## WidthPremium -0.05704522 0.06402004
## PriceEconomy 1.00000000 0.90138870
## PricePremium 0.90138870 1.00000000
## PriceRelative -0.28856711 0.03184654
## SeatsTotal 0.13243313 0.19232533
## PitchDifference -0.09952511 -0.01806629
## WidthDifference -0.08449975 -0.01151218
## PercentPremiumSeats 0.06532232 0.11639097
As we see from the above, we find the Various Correlations among all rows and Price of Economy and Price Of Premium Seats.
x<-air.df[,3]+air.df[,6:14]
corrr<-round(cor(x),2)
corrr
## SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy 1.00 0.64 0.25 0.26
## SeatsPremium 0.64 1.00 0.38 0.37
## PitchEconomy 0.25 0.38 1.00 0.90
## PitchPremium 0.26 0.37 0.90 1.00
## WidthEconomy 0.28 0.45 0.98 0.93
## WidthPremium 0.25 0.38 0.92 0.97
## PriceEconomy 0.15 0.25 0.60 0.53
## PricePremium 0.21 0.36 0.65 0.62
## PriceRelative 0.24 0.38 0.97 0.95
## WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy 0.28 0.25 0.15 0.21
## SeatsPremium 0.45 0.38 0.25 0.36
## PitchEconomy 0.98 0.92 0.60 0.65
## PitchPremium 0.93 0.97 0.53 0.62
## WidthEconomy 1.00 0.95 0.54 0.62
## WidthPremium 0.95 1.00 0.51 0.62
## PriceEconomy 0.54 0.51 1.00 0.90
## PricePremium 0.62 0.62 0.90 1.00
## PriceRelative 0.98 0.97 0.52 0.64
## PriceRelative
## SeatsEconomy 0.24
## SeatsPremium 0.38
## PitchEconomy 0.97
## PitchPremium 0.95
## WidthEconomy 0.98
## WidthPremium 0.97
## PriceEconomy 0.52
## PricePremium 0.64
## PriceRelative 1.00
So we get a Correlation Matrix.
## corrplot 0.84 loaded
Hypotheis: There is no effect of Airline on Price of Premium Seats and Economy Seats.
To test the above hypothesis we run the t test.
myt<-table(PriceEconomy)
myp<-table(PricePremium)
myx<-table(Airline)
t.test(myp,myx)
##
## Welch Two Sample t-test
##
## data: myp and myx
## t = -3.6197, df = 5.0004, p-value = 0.01522
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -125.9611 -21.3488
## sample estimates:
## mean of x mean of y
## 2.678363 76.333333
t.test(myt,myx)
##
## Welch Two Sample t-test
##
## data: myt and myx
## t = -3.6329, df = 5.0003, p-value = 0.01501
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -126.22904 -21.61658
## sample estimates:
## mean of x mean of y
## 2.410526 76.333333
As p-value is less (p<0.05) the data is significant. So in the next we determine the significant x-values.
Lastly, we need to find all the significant x-values(independent variables) for Price of Premium Seats and Price of economy seats.
Model1=PricePremium~SeatsEconomy+FlightDuration+SeatsPremium+PitchEconomy+PitchPremium+WidthEconomy+WidthPremium+PriceEconomy+PriceRelative+SeatsTotal+PitchDifference+WidthDifference+PercentPremiumSeats
fit<-lm(Model1,data=air.df)
summary(fit)
##
## Call:
## lm(formula = Model1, data = air.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -855.46 -127.12 -8.66 89.60 2164.59
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.114e+04 1.467e+03 7.597 1.80e-13 ***
## SeatsEconomy -2.479e+00 7.231e-01 -3.429 0.000662 ***
## FlightDuration 9.836e+00 6.312e+00 1.558 0.119830
## SeatsPremium 2.308e+01 4.275e+00 5.399 1.09e-07 ***
## PitchEconomy -2.601e+02 3.748e+01 -6.939 1.40e-11 ***
## PitchPremium -1.861e+02 1.794e+01 -10.373 < 2e-16 ***
## WidthEconomy 2.172e+02 4.035e+01 5.384 1.18e-07 ***
## WidthPremium -8.098e+00 2.236e+01 -0.362 0.717363
## PriceEconomy 1.359e+00 2.292e-02 59.307 < 2e-16 ***
## PriceRelative 1.039e+03 4.255e+01 24.410 < 2e-16 ***
## SeatsTotal NA NA NA NA
## PitchDifference NA NA NA NA
## WidthDifference NA NA NA NA
## PercentPremiumSeats -3.407e+01 1.025e+01 -3.323 0.000965 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 300.6 on 447 degrees of freedom
## Multiple R-squared: 0.9467, Adjusted R-squared: 0.9455
## F-statistic: 794.5 on 10 and 447 DF, p-value: < 2.2e-16
Model2=PriceEconomy~SeatsEconomy+FlightDuration+SeatsPremium+PitchEconomy+PitchPremium+WidthEconomy+WidthPremium+PricePremium+PriceRelative+SeatsTotal+PitchDifference+WidthDifference+PercentPremiumSeats
fit1<-lm(Model2,data=air.df)
summary(fit)
##
## Call:
## lm(formula = Model1, data = air.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -855.46 -127.12 -8.66 89.60 2164.59
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.114e+04 1.467e+03 7.597 1.80e-13 ***
## SeatsEconomy -2.479e+00 7.231e-01 -3.429 0.000662 ***
## FlightDuration 9.836e+00 6.312e+00 1.558 0.119830
## SeatsPremium 2.308e+01 4.275e+00 5.399 1.09e-07 ***
## PitchEconomy -2.601e+02 3.748e+01 -6.939 1.40e-11 ***
## PitchPremium -1.861e+02 1.794e+01 -10.373 < 2e-16 ***
## WidthEconomy 2.172e+02 4.035e+01 5.384 1.18e-07 ***
## WidthPremium -8.098e+00 2.236e+01 -0.362 0.717363
## PriceEconomy 1.359e+00 2.292e-02 59.307 < 2e-16 ***
## PriceRelative 1.039e+03 4.255e+01 24.410 < 2e-16 ***
## SeatsTotal NA NA NA NA
## PitchDifference NA NA NA NA
## WidthDifference NA NA NA NA
## PercentPremiumSeats -3.407e+01 1.025e+01 -3.323 0.000965 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 300.6 on 447 degrees of freedom
## Multiple R-squared: 0.9467, Adjusted R-squared: 0.9455
## F-statistic: 794.5 on 10 and 447 DF, p-value: < 2.2e-16
We test for all Numeric values and on the basis of above information of Regression Analysis on both PricePremium and PriceEconomic, we infer that:
SeatsEconomy , SeatsPremium ,PitchEconomy ,PitchPremium ,WidthEconomy ,PriceEconomy ,PriceRelative ,PercentPremiumSeats are highly significant Independent Variables.
FlightDuration ,WidthPremium ,SeatsTotal, PitchDifference, WidthDifference are not significant at all having p>0.05
Other Observations:
This shows us that this is a good model since the p-value is pretty low(2.2e-16) implying high correlation.
Thus we Conclude Our report.