PLEASE BE PATIENT AND GO TILL THE END
What factors explain the difference in price between an economy ticket and a premium-economy airline ticket?
CAUTION- The model used in this analysis assumes that all the assumptions of regression model are satisfied.
Due to the time limitations and the nature of this mini project the testing of model assumptions and rectifying the errors are beyond the scope of this report. This is not a Phd THESIS prepared under the guidance of some expert or renowned professor.
The purpose of this report is to help the readers to visualize the data ,draw out some inferences, test some hypothesis and to establish some sort of causation. This report is prepared based on the understanding of the author without the expert guidance of any professor. It should be considered as a learning curve in the field of DATA ANALYTICS USING R.
MOTIVATION OF THE STUDY
A common phenomena that we observe in the pricing stategy of every airlines is that they charge more for premium-economy tickets as compared to economy class tickets.Be it Domestic Or International Flight one can easily experience the price difference between economy and premium class ticket. Here in this report we are going to analyze what could be the independent factors that contribute towards this price difference. We will take the help of the data and some graphs and diagrams and on the basis of these we will try to analyse the data and try to figure out the potential factors affecting the difference of price between a premium-economy and an economy ticket of airlines.
DATA DESCRIPTION
DATA FIELD UNITS MEANING Airline Factor Factor variable denoting the name of the Airline There are 6 airlines in the data: British Airways Delta Airlines Air France Singapore Airlines Virgin Airlines Jet Airways
Aircraft Factor Manufacturer of the Airplane / Aircraft e.g. Boeing Airbus
TravelMonth Factor Factor variable denoting the month Travel Jul,Aug,Sep,Oct
FlightDuration Hours Flight Duration
IsInternational Factor International or Domestic Flight w.r.t. Airlines’ Home Country
SeatsEconomy Number Number of Economy Seats in the Aircraft
SeatsPremium Number Number of Premium Economy Seats in the Aircraft
PitchEconomy Number (Inches) Distance between two consecutive Economy Seats
PitchPremium Number (Inches) Distance between two consecutive Premium Economy Seats
WidthEconomy Number (Inches) Width between armrests of an Economy Seat
WidthPremium Number (Inches) Width between armrests of a Premium Economy Seat
PriceEconomy Number (USD) Price of Economy Seat
PricePremium Number (USD) Price of Premium Economy Seat
PriceRelative (PricePremium - PriceEconomy) / PriceEconomy
SeatsTotal SeatsEconomy + SeatsPremium
PercentPremiumSeats (SeatsPremium / SeatsTotal) * 100
PitchDifference PitchPremium - PitchEconomy
WidthDifference WidthPremium - WidthEconomy
ABSTRACT
In order to investigate the factors affecting the price difference between the tickets of economy and premium class we have used dataset available to us and done our analysis based on the correlation test and regression analysis using two models. We have also visualized the data using boxplot ,scatterplot and correlogram. some findings on the basis of visualization are-
visualizing the data we reached the conclusion that there is indeed a price difference between the tickets of premium class and economy class for all the airlines and for both the aircrafts and also for both the international flight and domestic bound flight. The difference is greater for Jet airlines and for Boeinge aircrafts and for the international flights. For all the months the median relative price difference is same. September shows the least relative price difference. Rest of the three months show somewhat equal relative price variation.
On the basis of correlation test we found out that
FlightDuration, PitchPremium, WidthPremium, widthDifference, PitchDifference are positively and significantly affecting the relative price difference.
whereas SeatPremium, PitchEconomy, PriceEconomy, PercentPremiumSeats are significantly negatively affecting the relative price difference.
So we built our regression model using these nine variables mentioned here and got the following result
FlightDuration and WidthDifference are significantly and positively affecting the relative price difference while
PriceEconomy is significantly and negatively affecting the relative price difference.
All other variables based on the correlation test came out to be insignificant.
Then We built our second regression model including all the numeric variables and we got these astonishing findings-
FlightDuration , PitchPremium , Pitch economy, WidthDifference, PercentPremiumSeats, SeatsEconomy and price Premium are positively affecting the relative price difference.
SeatsPremium, WidthPremium, PriceEconomy are negatively affecting the relative price difference.
The no of factors affecting the dependent variable got significantly increased.
The first model correctly explained the 57.4% variation in the relative price difference while the second model explained about 77% of the variations.
If we see the two models and closely examine which are the common factors then we find that the SeatsEconomy , WidthDifference and FlightDuration are common factors affecting the relative price difference.
So it is upon the person who is analyzing the price difference to select which model. It is totally based on the judgement of the reasearcher that which factors may affect the dependent variable and whether to include it or not in the model.
There is no sureshot formula to arrive at the conclusion. Every model will have its own limitation. Depends on what you want and how much deviation is accepted led to the model selection.
If we go by correlation test then the factors are different and if we go by regression model then the factors are different . It is somewhat advisable to go for regression model as it gives more stable result and more importantly it tells which one is dpendent and which one is independent variable. So it is easy to establish the causation.
Empirically evidence based on the data is always considered superior as compared to other methods. Let us try to investigate the factors contributing to the deiffernce between the price of economy ticket and premium tickets of Airlines.
RESEARCH OBJECTIVE AND METHODOLOGY
Empirical evidence based on the data is always considered superior as compared to other methods. Let us try to investigate the factors contributing to the deiffernce between the price of economy ticket and premium tickets of Airlines.
OBJECTIVE OF RESEARCH
1- To test the hypothesis that if there exists any price difference between the tickets of economy and premium class?
2- If yes then What factors explain the difference in price between an economy ticket and a premium-economy airline ticket?
We will read the dataset by creating dataframe called airlines and will use command to summarize it. We will use boxplots and scatterplot to visualize the data and try to establish any sort of relationship between the variables. We will use correlation matrix to know the correlation between the variables concerned. Correlogram and corrplot is also used to depict graphically the relationship between the variables.
we have tested the hypothesis that there is no price difference against the alternate hypothesis that there is indeed price difference between tickets of economy and premium-economy class by using an independent t-test(right-tail test).
To find out the significant factors we have used correlation test to see the significant factors affecting the pricedifference.
Lastly we have used the two regression models . one model based on the significant variables of the correlation test and the second model based on all the numeric variables.
OUR variable of concern is PriceRelative. We have taken this variable to represent the price difference of tickets of the two classes. On the basis of p-value we have reached on the conclusion of which factors are affecting the dependent variable PriceReltive.
We are not concerned with the Column Airline, Aircraft, Isinternational and travelMonth because the airlines and aircraft shows the price difference between the tickets of the two classes and exactly waht we are interested here to find is the factors affecting it. Also there might be more difference in international flight as compared to domestic flight but we are not concerned with the price variations of the economy and premium class. what we are interested here is the price difference between the classes and not on why price difference is more for particular airline or aircraft or for international flight. Also the months used in the dataset are normal month not showing any exceptional information like off season , onseason or festive season that might have contributed to the cause.
Reading the dataset SixAirlinesData into R
airlines<-read.csv(paste("SixAirlinesDataV2.csv",sep = ""))
View(airlines)
let us see some statistics based on the data
library(psych)
describe(airlines)
## vars n mean sd median trimmed mad min
## Airline* 1 458 3.01 1.65 2.00 2.89 1.48 1.00
## Aircraft* 2 458 1.67 0.47 2.00 1.71 0.00 1.00
## FlightDuration 3 458 7.58 3.54 7.79 7.57 4.81 1.25
## TravelMonth* 4 458 2.56 1.17 3.00 2.58 1.48 1.00
## IsInternational* 5 458 1.91 0.28 2.00 2.00 0.00 1.00
## SeatsEconomy 6 458 202.31 76.37 185.00 194.64 85.99 78.00
## SeatsPremium 7 458 33.65 13.26 36.00 33.35 11.86 8.00
## PitchEconomy 8 458 31.22 0.66 31.00 31.26 0.00 30.00
## PitchPremium 9 458 37.91 1.31 38.00 38.05 0.00 34.00
## WidthEconomy 10 458 17.84 0.56 18.00 17.81 0.00 17.00
## WidthPremium 11 458 19.47 1.10 19.00 19.53 0.00 17.00
## PriceEconomy 12 458 1327.08 988.27 1242.00 1244.40 1159.39 65.00
## PricePremium 13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative 14 458 0.49 0.45 0.36 0.42 0.41 0.02
## SeatsTotal 15 458 235.96 85.29 227.00 228.73 90.44 98.00
## PitchDifference 16 458 6.69 1.76 7.00 6.76 0.00 2.00
## WidthDifference 17 458 1.63 1.19 1.00 1.53 0.00 0.00
## PercentPremiumSeats 18 458 14.65 4.84 13.21 14.31 2.68 4.71
## max range skew kurtosis se
## Airline* 6.00 5.00 0.61 -0.95 0.08
## Aircraft* 2.00 1.00 -0.72 -1.48 0.02
## FlightDuration 14.66 13.41 -0.07 -1.12 0.17
## TravelMonth* 4.00 3.00 -0.14 -1.46 0.05
## IsInternational* 2.00 1.00 -2.91 6.50 0.01
## SeatsEconomy 389.00 311.00 0.72 -0.36 3.57
## SeatsPremium 66.00 58.00 0.23 -0.46 0.62
## PitchEconomy 33.00 3.00 -0.03 -0.35 0.03
## PitchPremium 40.00 6.00 -1.51 3.52 0.06
## WidthEconomy 19.00 2.00 -0.04 -0.08 0.03
## WidthPremium 21.00 4.00 -0.08 -0.31 0.05
## PriceEconomy 3593.00 3528.00 0.51 -0.88 46.18
## PricePremium 7414.00 7328.00 0.50 0.43 60.19
## PriceRelative 1.89 1.87 1.17 0.72 0.02
## SeatsTotal 441.00 343.00 0.70 -0.53 3.99
## PitchDifference 10.00 8.00 -0.54 1.78 0.08
## WidthDifference 4.00 4.00 0.84 -0.53 0.06
## PercentPremiumSeats 24.69 19.98 0.71 0.28 0.23
Here range of price economy ticket is $3528 and the range of the price premium economy ticket is $7328 . one can easily misunderstood that the data set of price economy ticket is stable as comared to price premium economy ticket because the range of the price economy ticket is lower than the range of the price of the premium economy ticket.But we need to wait untill e find out the standard deviation of these two prices separately.
summary(airlines)
## Airline Aircraft FlightDuration TravelMonth
## AirFrance: 74 AirBus:151 Min. : 1.250 Aug:127
## British :175 Boeing:307 1st Qu.: 4.260 Jul: 75
## Delta : 46 Median : 7.790 Oct:127
## Jet : 61 Mean : 7.578 Sep:129
## Singapore: 40 3rd Qu.:10.620
## Virgin : 62 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 40 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:418 1st Qu.:133.0 1st Qu.:21.00 1st Qu.:31.00
## Median :185.0 Median :36.00 Median :31.00
## Mean :202.3 Mean :33.65 Mean :31.22
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 65
## 1st Qu.:38.00 1st Qu.:18.00 1st Qu.:19.00 1st Qu.: 413
## Median :38.00 Median :18.00 Median :19.00 Median :1242
## Mean :37.91 Mean :17.84 Mean :19.47 Mean :1327
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1909
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 86.0 Min. :0.0200 Min. : 98 Min. : 2.000
## 1st Qu.: 528.8 1st Qu.:0.1000 1st Qu.:166 1st Qu.: 6.000
## Median :1737.0 Median :0.3650 Median :227 Median : 7.000
## Mean :1845.3 Mean :0.4872 Mean :236 Mean : 6.688
## 3rd Qu.:2989.0 3rd Qu.:0.7400 3rd Qu.:279 3rd Qu.: 7.000
## Max. :7414.0 Max. :1.8900 Max. :441 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.28
## Median :1.000 Median :13.21
## Mean :1.633 Mean :14.65
## 3rd Qu.:3.000 3rd Qu.:15.36
## Max. :4.000 Max. :24.69
Here we see that the minimum price of an economy ticket is 65 dollars and it went up to the maximum price of 3593 dollars. That means the price hovered between $65 and $3593.
At the same time the minimum price of a premium-economy ticket is 86 dollars and it went up to the maximum price of $7414. That means the price hovered between $86 and $7414.
Finding the mean and standard deiation of the price tickets
mean(airlines$PriceEconomy)
## [1] 1327.076
sd(airlines$PricePremium)
## [1] 1288.136
mean(airlines$PricePremium)
## [1] 1845.258
sd(airlines$PriceEconomy)
## [1] 988.2733
mean(airlines$PriceRelative)
## [1] 0.4872052
sd(airlines$PriceRelative)
## [1] 0.4505873
here we see that the average price of economy ticket is $ 1327.076 and the standard deviation in the price is $1288.136.Standard deviation shows how much the values are dispersed from the average value. in simple terms it shows the scatterdness among the data values. high standard deviation here shows larger price range or how much volatility is there in the prices. Here the data shows high fluctuation or volatility in the prices.
Similarly the average price of the premium-economy ticket is $1845.258 and the standard deviation is $988.2733. Though the fluctuation from the average value, the spread in the data is still high but it is less volatile than the price of the economy ticket.
similarly mean relative difference is 0.4872052 and the spread in the relative price difference is 0.4505873. A reasonable scatteredness.
let us see the data airlines wise
British airlines
library(psych)
describe(airlines[which(airlines$Airline=="British"),][c(3,6:18)],skew=FALSE)
## vars n mean sd min max range se
## FlightDuration 1 175 7.85 3.68 1.25 13.83 12.58 0.28
## SeatsEconomy 2 175 216.59 74.68 122.00 312.00 190.00 5.65
## SeatsPremium 3 175 43.18 9.57 24.00 56.00 32.00 0.72
## PitchEconomy 4 175 31.00 0.00 31.00 31.00 0.00 0.00
## PitchPremium 5 175 38.00 0.00 38.00 38.00 0.00 0.00
## WidthEconomy 6 175 18.00 0.00 18.00 18.00 0.00 0.00
## WidthPremium 7 175 19.00 0.00 19.00 19.00 0.00 0.00
## PriceEconomy 8 175 1293.48 781.46 65.00 3102.00 3037.00 59.07
## PricePremium 9 175 1937.03 1340.31 86.00 7414.00 7328.00 101.32
## PriceRelative 10 175 0.44 0.32 0.04 1.39 1.35 0.02
## SeatsTotal 11 175 259.77 80.55 162.00 367.00 205.00 6.09
## PitchDifference 12 175 7.00 0.00 7.00 7.00 0.00 0.00
## WidthDifference 13 175 1.00 0.00 1.00 1.00 0.00 0.00
## PercentPremiumSeats 14 175 17.79 5.19 10.57 24.69 14.12 0.39
JEt airlines
library(psych)
describe(airlines[which(airlines$Airline=="Jet"),][c(3,6:18)],skew=FALSE)
## vars n mean sd min max range se
## FlightDuration 1 61 4.14 2.07 2.50 9.50 7.00 0.26
## SeatsEconomy 2 61 140.31 16.57 124.00 162.00 38.00 2.12
## SeatsPremium 3 61 15.66 6.50 8.00 28.00 20.00 0.83
## PitchEconomy 4 61 30.23 0.64 30.00 32.00 2.00 0.08
## PitchPremium 5 61 39.77 0.64 38.00 40.00 2.00 0.08
## WidthEconomy 6 61 17.11 0.32 17.00 18.00 1.00 0.04
## WidthPremium 7 61 20.77 0.64 19.00 21.00 2.00 0.08
## PriceEconomy 8 61 276.16 154.52 108.00 676.00 568.00 19.78
## PricePremium 9 61 483.36 185.17 228.00 931.00 703.00 23.71
## PriceRelative 10 61 0.94 0.49 0.12 1.89 1.77 0.06
## SeatsTotal 11 61 155.97 14.40 140.00 170.00 30.00 1.84
## PitchDifference 12 61 9.54 1.29 6.00 10.00 4.00 0.16
## WidthDifference 13 61 3.66 0.96 1.00 4.00 3.00 0.12
## PercentPremiumSeats 14 61 10.17 4.10 4.71 16.87 12.16 0.52
Virgin ailrlines
library(psych)
describe(airlines[which(airlines$Airline=="Virgin"),][c(3,6:18)],skew=FALSE)
## vars n mean sd min max range se
## FlightDuration 1 62 9.25 1.94 6.58 12.58 6.00 0.25
## SeatsEconomy 2 62 230.18 59.26 185.00 375.00 190.00 7.53
## SeatsPremium 3 62 42.53 10.23 35.00 66.00 31.00 1.30
## PitchEconomy 4 62 31.00 0.00 31.00 31.00 0.00 0.00
## PitchPremium 5 62 38.00 0.00 38.00 38.00 0.00 0.00
## WidthEconomy 6 62 18.00 0.00 18.00 18.00 0.00 0.00
## WidthPremium 7 62 21.00 0.00 21.00 21.00 0.00 0.00
## PriceEconomy 8 62 1603.53 532.90 540.00 2445.00 1905.00 67.68
## PricePremium 9 62 2721.69 809.55 594.00 3694.00 3100.00 102.81
## PriceRelative 10 62 0.76 0.48 0.10 1.82 1.72 0.06
## SeatsTotal 11 62 272.71 67.59 233.00 441.00 208.00 8.58
## PitchDifference 12 62 7.00 0.00 7.00 7.00 0.00 0.00
## WidthDifference 13 62 3.00 0.00 3.00 3.00 0.00 0.00
## PercentPremiumSeats 14 62 15.75 2.43 14.02 20.60 6.58 0.31
AirFrance airlines
library(psych)
describe(airlines[which(airlines$Airline=="AirFrance"),][c(3,6:18)],skew=FALSE)
## vars n mean sd min max range se
## FlightDuration 1 74 8.99 1.62 6.83 13.00 6.17 0.19
## SeatsEconomy 2 74 214.46 88.24 147.00 389.00 242.00 10.26
## SeatsPremium 3 74 26.70 6.20 21.00 38.00 17.00 0.72
## PitchEconomy 4 74 32.00 0.00 32.00 32.00 0.00 0.00
## PitchPremium 5 74 38.00 0.00 38.00 38.00 0.00 0.00
## WidthEconomy 6 74 17.57 0.50 17.00 18.00 1.00 0.06
## WidthPremium 7 74 19.00 0.00 19.00 19.00 0.00 0.00
## PriceEconomy 8 74 2769.78 749.67 630.00 3593.00 2963.00 87.15
## PricePremium 9 74 3065.22 543.21 1611.00 3972.00 2361.00 63.15
## PriceRelative 10 74 0.20 0.41 0.02 1.64 1.62 0.05
## SeatsTotal 11 74 241.16 94.24 168.00 427.00 259.00 10.96
## PitchDifference 12 74 6.00 0.00 6.00 6.00 0.00 0.00
## WidthDifference 13 74 1.43 0.50 1.00 2.00 1.00 0.06
## PercentPremiumSeats 14 74 11.59 1.42 8.90 12.50 3.60 0.16
Singapore Airlines
library(psych)
describe(airlines[which(airlines$Airline=="Singapore"),][c(3,6:18)],skew=FALSE)
## vars n mean sd min max range se
## FlightDuration 1 40 10.48 3.58 3.83 14.66 10.83 0.57
## SeatsEconomy 2 40 243.60 73.92 184.00 333.00 149.00 11.69
## SeatsPremium 3 40 31.20 3.97 28.00 36.00 8.00 0.63
## PitchEconomy 4 40 32.00 0.00 32.00 32.00 0.00 0.00
## PitchPremium 5 40 38.00 0.00 38.00 38.00 0.00 0.00
## WidthEconomy 6 40 19.00 0.00 19.00 19.00 0.00 0.00
## WidthPremium 7 40 20.00 0.00 20.00 20.00 0.00 0.00
## PriceEconomy 8 40 860.25 349.42 505.00 1431.00 926.00 55.25
## PricePremium 9 40 1239.92 359.13 619.00 1947.00 1328.00 56.78
## PriceRelative 10 40 0.53 0.35 0.09 1.11 1.02 0.06
## SeatsTotal 11 40 274.80 77.89 212.00 369.00 157.00 12.32
## PitchDifference 12 40 6.00 0.00 6.00 6.00 0.00 0.00
## WidthDifference 13 40 1.00 0.00 1.00 1.00 0.00 0.00
## PercentPremiumSeats 14 40 11.83 1.71 9.76 13.21 3.45 0.27
Delta airlines
library(psych)
describe(airlines[which(airlines$Airline=="Delta"),][c(3,6:18)],skew=FALSE)
## vars n mean sd min max range se
## FlightDuration 1 46 4.03 2.24 1.57 9.50 7.93 0.33
## SeatsEconomy 2 46 137.22 44.93 78.00 233.00 155.00 6.62
## SeatsPremium 3 46 22.57 6.79 18.00 38.00 20.00 1.00
## PitchEconomy 4 46 31.72 0.66 31.00 33.00 2.00 0.10
## PitchPremium 5 46 34.72 1.34 34.00 38.00 4.00 0.20
## WidthEconomy 6 46 17.39 0.49 17.00 18.00 1.00 0.07
## WidthPremium 7 46 17.78 1.33 17.00 21.00 4.00 0.20
## PriceEconomy 8 46 560.93 547.65 158.00 1999.00 1841.00 80.75
## PricePremium 9 46 684.67 790.56 173.00 2765.00 2592.00 116.56
## PriceRelative 10 46 0.12 0.11 0.03 0.46 0.43 0.02
## SeatsTotal 11 46 159.78 50.97 98.00 271.00 173.00 7.52
## PitchDifference 12 46 3.00 1.63 2.00 7.00 5.00 0.24
## WidthDifference 13 46 0.39 1.02 0.00 3.00 3.00 0.15
## PercentPremiumSeats 14 46 14.48 2.86 12.50 20.41 7.91 0.42
Now let us plot the data
plot(jitter(airlines$PriceEconomy),jitter(airlines$PricePremium), xlab = " economy pricing" ,ylab = "premium pricing", main = "plot visualization")
we can see most of the data points are clustered in the botton panel in the range (1000,2000). there are outliers too. Few data points are having extreme values. A nice positive relationship is being depicted here between the premium pricing and economy pricing of the tickets
boxplot(airlines$PriceEconomy, xlab="economy price", main="price of economy ticket for all airlines",ylab="economy", horizontal = TRUE,col=c("peachpuff"))
Here we see that there are no outliers in the data . median price is $1242 and 25% of the data values are less than $413 and 25% of the data values are greater than $1909. 75% of the time the price is below $2000. From the diagram it is evident that more data values are towards left and spread out towards right.The data values are more concentrated towards left.so it is a right skew data so more than 50% of the data values are less than the average because median is $1242 and mean is $1327. More than 50% of the time airlines have charged less than $1327.
boxplot(airlines$PriceEconomy, xlab="premium price", main="price of premium-economy ticket for all airlines",ylab="premium-economy", horizontal = TRUE,col=c("peachpuff"))
same here there are no outliers.median price is $1737 and 25% of the dataset is greater than $2989 and 25% of the dataset is less than $528.8. 75% of the dataset is below $1900 (approx).Same is the case here, it is right skew data .So most of the data points are clustered towards left.So more than 50% of the time airlines have charged less than $1845.3. because mean is $1845.3 and median is $1737
boxplot(airlines$FlightDuration, xlab="time(hours)", main="Flight duration for all airlines",ylab="Flight duration", horizontal = TRUE,col=c("peachpuff"))
Here minimum flight duration is 1.25 hours and the average duration is 7.578 hours with 25% of the dataset is below 4.26 hours and 25% of the dataset is greater than 10.62 hours. There is no outlier that is no extreme time duration that a flight is taking to reach the destination. It is somewhat symmetric data. the dataset is evenly distributed over the quartiles. See the tails also having equal lenght.
boxplot(airlines$SeatsEconomy, xlab=" no of economy seats", main=" No. of economy seats for all airlines",ylab="economy seats", horizontal = TRUE,col=c("peachpuff"))
Again right skew data. More data values are gathered to the left and more spread out towards right.Here on an average 202 seats are available fair enough number, with only 25% of the seats below 133 in number and 25% of the seats above 243 in number. This dataset is not having extremely low numbers of seat or extremely high number of seats.
boxplot(airlines$SeatsPremium, xlab="no of premium seats", main="No of premium seats for all airlines",ylab="premium seats", horizontal = TRUE,col=c("peachpuff"))
less than 75% of the datavalues are below 40.
This indicates that maximum of airlines have no. of premium seats below 40 only.
boxplot(airlines$PitchEconomy, xlab="distance", main="distance between two economy seats: for all airlines",ylab="economy seats", horizontal = TRUE,col=c("peachpuff"))
Highly skewed data. The data is highly positively skew. Maximum data values are clustered at 31 which is median and Q1(first quartile). Data values are heavily dispersed between Q2 and Q3 i.e between 31 inches and 32 inches.All the airlines are having 31 inches distance between maximum no.of the seats of economy class(almost all seats).
boxplot(airlines$PitchPremium, xlab="distance", main="distance between two premium seats: for all airlines",ylab="premium seats", horizontal = TRUE)
Here almost all the airlines have maintained a distance of 38 inches between most of the seats (almost between all of the seats). One can see the distance between two seats in this case is greater than the distance of the economy seats.
boxplot(airlines$WidthEconomy, xlab="width(inches)", main="width between armrests of economy seats: for all airlines",ylab="economy seats", horizontal = TRUE,col=c("peachpuff"))
Here only two oultliers that is width between two pairs of the seats is 17 inches and 19 inches respectively.
In all the airlines all the armrests are 18 inches wide.
boxplot(airlines$WidthPremium, xlab="width(inches)", main="width between armrests of premium seats: for all airlines",ylab="premium seats", horizontal = TRUE,col=c("peachpuff"))
The dataset is more scattered between 17 to 19 inches and 19 inches to 21 inches. Maximum values of the data set are clustered at 19 inches.
So all the airlines have maintained width of the armrests for the premium seats is 19 inches. Armrests are only one inch wider than that of economy seats for most of the seats in all the airlines.
boxplot(airlines$PriceRelative,horizontal=TRUE, xlab="relativeprice",ylab="relation",main=" relative price difference for all airlines", col=c("orchid3"))
Here we see positively skewed data . More of the data values are bunched up towards left. The relative price difference between price of the tickets of economy and premium class for all the airlines falls in the range of 0.0 to 0.8 based on this data with average difference of .4872. More than 50% of the time the relative price difference is less than 0.4872.
boxplot(airlines$PercentPremiumSeats,horizontal=TRUE, xlab="percentage of premium seats",main=" percentage of premium seats for all airlines", col=c("chartreuse4"))
Few outliers are observed here. Median is 13.21%. 75% of dataset says almost 16% of the total seats available for all airlines are premium seats. This figure is reasonably low which can be one of the factor contributing to increase in the prices of premium class eventually increasing the relative price difference.
Let us now see the price breakup airlines-wise and aircraft-wise
boxplot(airlines$PriceEconomy~airlines$Airline,horizontal=TRUE, xlab="economyprice",ylab="economy",main=" price for economy ticket separated by airlines",las=1, col=c("red","blue","green","peachpuff","brown"))
Here there are few outliers for the airlines virgin, jet, delta and airFrance. Prices are bit on lower side for jet and delta airways. Most of the data values are clustered in the lower range for Jet and Delta airlines. For virgin, Jet,Delta and AirFrance airlnes there is consistency in the price range (see the box width) . Singapore and British airways are showing more fluctuation in setting prices.AirFrance prices are higher of all the airlines. There is more spread in the prices of British airlines. 75% of the time Virgin airlines set the price which was higher than the median price of the British airlines. Jet airlines is the lowest among all.
boxplot(airlines$PricePremium~airlines$Airline,horizontal=TRUE, xlab="premium price",ylab="premium",main=" price for premium ticket separated by airlines",las=1, col=c("red","blue","green","peachpuff","brown","orchid3"))
Here also the price data for the AirFrance is highest and more than 80% of the time Virgin airlines charged price which was higher than the median price of British airlines. Most of the data price values are clustered on the lower side for Singapore, Jet and Delta airlines. The prices are more dispersed for the British airways. few outliers can be seen for the airlines Virgin, Delta and AirFrance. In terms of consistency , the prices of Jet ,Delta and Airfrance airlines are more consistent as compared to Singapore ,Virgin and British airlines.
boxplot(airlines$PriceEconomy~airlines$Aircraft,horizontal=TRUE, xlab="economyprice",ylab="economy",main=" price for economy ticket separated by aircraft",las=1, col=c("red","blue","green","peachpuff","brown","beige"))
Airbus aircraft is expensive as compared to Boeing. See the median price is higher for Boeing .More than 70%(approx) of the time Boeing aircraft charged lower price than the median price of Airbus. Regarding price consistency see the box width for each aircraft. Boeing is somewhat more consistent than the Airbus pricing.
boxplot(airlines$PricePremium~airlines$Aircraft,horizontal=TRUE, xlab="premium price",ylab="premium",main=" price for premium ticket separated by aircraft",las=1, col=c("green","peachpuff"))
Here the same thing is happening for the prices of premium tickets also. Boeing is having almost 70% of the price data values lower than the median price of premium tickets of Airbus aircraft. But one thing interesting to note is both aircraft are equally consisitent while pricing premium tickets.
boxplot(airlines$PriceEconomy~airlines$IsInternational,horizontal=TRUE, xlab="economy price",ylab="economy",main=" price for economy ticket separated by destination",las=1, col=c("green","peachpuff"))
Here we see that the price for economy class is more for international flights as compared to domestic flights.Highest fare for domestic bound flight is less than the median price for international flights.
boxplot(airlines$PricePremium~airlines$IsInternational,horizontal=TRUE, xlab="premium price",ylab="premium",main=" price for premium ticket separated by destination",las=1, col=c("green","peachpuff"))
Same is the case here premium tickets . premium tickets are expensive for international flights. from the two above boxplots we see that the premium tickets for international flights are also greater than the economy price for the international flights.
boxplot(airlines$PriceRelative~airlines$Airline,horizontal=TRUE, xlab="relativeprice",ylab="airlines",main=" relative price difference separated by airlines",las=1, col=c("red","blue","green","peachpuff","brown","beige"))
For Airfrance and Delta airlines the relative price difference is on the lower side. spread in the relative price difference between the prices of economy and premium tickets is very less compared to other ailine price differences.Few outliers are seen for the Delta , British, and AirFrance airlines. Extreme difference is there for AirFrance airlines.
Regarding the consisteny in the price differences Delta,British and AirFrance airlines are more consistent than the other three airlines. For Singapore airlines 75% relative price differences in the ticket price is less than the median relative price difference of Jet airlines.But Jet And Singapore airkines are the most inconsistent ones here in this regard.The maximum relative price difference is there for Jet airlines.
boxplot(airlines$PriceRelative~airlines$Aircraft,horizontal=TRUE, xlab="relative price",ylab="aircrafts",main="relative price difference separated by aircrafts",las=1, col=c("red","blue","green","peachpuff","brown","orchid3"))
Very few extreme reltive differencess here for both the aircrafts. We can say that the relative price difference for Airbus is less as compared to the relative price difference of Boeing.Median difference is higher for Boeing. Also Boeing is inconsistent .i.e. fluctuation in relative price difference is higher for Boeing.
boxplot(airlines$PriceRelative~airlines$IsInternational,horizontal=TRUE, xlab="relative price",main=" relative price difference of the tickets separated by destination",las=1, col=c("green","peachpuff"))
Here also huge relative price difference is depicted. relative price difference for international flights is far more greater than the relative price difference between the tickets of economy and premium for the domestic flights. what could be the reason for the price difference we will investigate.
boxplot(airlines$PriceRelative~airlines$TravelMonth,horizontal=TRUE, xlab="relative price",main=" relative price difference of the tickets separated by moths",las=1, col=c("green","peachpuff","beige","orchid3"))
Here dataset is more dispersed for all the four months after the 3rd quartiles .dataset is more condensed towards the left For all the months the median relative price difference is same. Sep shows the least relative price difference.
. That means most of the time four all the months the relative price difference is low say little less than 1.
boxplot(airlines$SeatsEconomy~airlines$Airline,horizontal=TRUE, xlab="seats(no.)",ylab="economy",main="no of economy seats separated by airlines",las=1, col=c("red","blue","green","peachpuff","brown","orchid3"))
Virgin ,Delta, and AirFrance airlines are showing some extremely high no of economy seats. Virgin ,Jet and DElta are consistent in offering seats. Maximum number of seats are offered by Singapore whareas least no of seats are offered by Jet and Delta airlines. The seats offered by the Airfrance is even less than the median no of seats offered by British and Virgin airlines. The fluctuation in the offering of seat is highest for British Airlines but still it is offering more than those of other airlines except Singapore airlines.
boxplot(airlines$SeatsPremium~airlines$Airline,horizontal=TRUE, xlab="seats(no.)",ylab="premium-economy",main="no of premium-economy seats separated by airlines",las=1, col=c("red","blue","green","peachpuff","brown","orchid3"))
Jet is offering the least premium seats while British ailrlines is offering the most no of premium seats.
boxplot(airlines$SeatsEconomy~airlines$Aircraft,horizontal=TRUE, xlab="seats(no.)",ylab="economy",main="no of economy seats separated by aircrafts",las=1, col=c("peachpuff","orchid3"))
Airbus aircraft is offering more no of seats than Boeing in the economy class
boxplot(airlines$SeatsPremium~airlines$Aircraft,horizontal=TRUE, xlab="seats(no.)",ylab="premium",main="no of premium seats separated by aircrafts",las=1, col=c("peachpuff","orchid3"))
Again Airbus is offering higher no of seats than Boeing in premium category also.
boxplot(airlines$PitchEconomy~airlines$Airline,horizontal=TRUE, xlab="distance(inches.)",ylab="economy",main="distance between seats in economy class separated by airlines",las=1, col=c("peachpuff","orchid3","blue","green","black","red"))
See only Delta airlines is showing inconsistency here. The Jet airline is having least distance between the seats. Singapore and Airfrance are most consisitent ones here. They are offering good amount of distance between the seats all the time.
boxplot(airlines$PitchPremium~airlines$Airline,horizontal=TRUE, xlab="distance(inches.)",ylab="premium-economy",main="distance between seats in premium class separated by airlines",las=1, col=c("peachpuff","orchid3","blue","green","black","red"))
Surprisingly Jet is offering the max distance between the seats in the premium class with few exceptions. Delta is most inconsistent and also offering least no. of seats.
boxplot(airlines$PitchEconomy~airlines$Aircraft,horizontal=TRUE, xlab="distance(inches.)",ylab="economy",main="distance between seats in economy class separated by aircrafts",las=1, col=c("peachpuff","orchid3"))
No difference in the aircrafts in this regard. Both are equally good. Distance is in the range of 31 inches to 32 inches for 50% of the data values
boxplot(airlines$PitchPremium~airlines$Aircraft,horizontal=TRUE, xlab="distance(inches.)",ylab="premium",main="distance between seats in premium class separated by aircrafts",las=1, col=c("peachpuff","orchid3"))
Again no difference here aircraft-wise. Almost all the time distance between the seats is 38 inches except some outliers.
boxplot(airlines$PercentPremiumSeats~airlines$Airline,horizontal=TRUE, xlab="% premium seats",main="percentage of premium seats available-separated by airlines",las=1, col=c("red","blue","green","peachpuff","brown"))
Here we see that the consistency in offering the premium seats is very low for British airlines but still it is offering the highest percentage of premium seats. Percentage of premium seats out of total is somewhat close to average for the airlines Delta and Virgin. Jet is offering the lowest percentage of premium seats out of the total seats available.For AirFrance the no. is little bit below than the average.
boxplot(airlines$PercentPremiumSeats~airlines$Aircraft,horizontal=TRUE, xlab="distance(inches.)",ylab="premium",main="percentage of premium seats separated by aircrafts",las=1, col=c("peachpuff","orchid3"))
Here the median for Airbus is greater than those of Boeing. Fluctuation for Boeing is greater. Airbus is consistent while offering premium seats but lower in number than that offered by Boeing.
Let us move on to the scatterplot to see some different kind of relationship among the variables pairwise
plot(airlines$PriceRelative, airlines$FlightDuration, main = "Scatterplot between pricerelative and flight duration", xlab = "relative price difference",ylab = "flight duration",log="xy",las=1,col="blue")
abline(lm(airlines$PriceRelative~airlines$FlightDuration))
Relatively flat and horizontal line is the best fit here. Just by looking at it we can’t infer anything.
plot(airlines$PriceRelative, airlines$SeatsEconomy, main = "Scatterplot between pricerelative and SeatsEconomy", xlab = "relative price difference",ylab = "economy class seats",col="blue")
abline(lm(airlines$PriceRelative~airlines$SeatsEconomy),col="red")
plot(airlines$PriceRelative, airlines$SeatsPremium, main = "Scatterplot between pricerelative and SeatsPremium", xlab = "relative price difference",ylab = "premium class seats",log="xy",col="blue")
abline(lm(airlines$PriceRelative~airlines$SeatsPremium),col="red")
plot(airlines$SeatsEconomy, airlines$SeatsTotal, main = "Scatterplot between seatstotal and seatseconomy", xlab = "econmy seats",ylab = "total seats",las=1,col="blue")
We see here a nice positive relation between seats economy and setas total of airlines.
plot(airlines$PriceRelative, airlines$PitchPremium, main = "Scatterplot between pricerelative and pitchpremium", xlab = "relative price difference",ylab = "distance between seats",log="xy",las=1,col="blue")
abline(lm(airlines$PriceRelative~airlines$PitchPremium))
plot(airlines$PriceRelative, airlines$WidthPremium, main = "Scatterplot between pricerelative and WidthPremium", xlab = "relative price difference",ylab = "width of armrests premium class",log="xy",col="blue")
abline(lm(airlines$PriceRelative~airlines$WidthPremium))
Very scattered plot is what we are noticing here.
plot(airlines$PercentPremiumSeats, airlines$PriceRelative, main = "Scatterplot between pricerelative and %premiumSeats", ylab = "relative price difference",xlab = "% premium seats",xlim=c(10,25),log="xy",col="blue")
abline(lm(airlines$PercentPremiumSeats~airlines$PriceRelative))
Again the plot is very scattered.
plot(airlines$SeatsPremium, airlines$SeatsTotal, main = "Scatterplot between SeatsPremium and seatsTotal", xlab = "premium seats",ylab = "total seats",las=1,col="blue")
though the trend is not so upward but there is positive relation between these two.
plot(airlines$PitchEconomy, airlines$SeatsTotal, main = "Scatterplot between PitchEconomy and SeatsTotal", xlab = "width between seats",ylab = "total seats",las=1,col="blue")
no relation is established here.
plot(airlines$PitchPremium, airlines$SeatsTotal, main = "Scatterplot between PitchPremium and SeatsTotal", xlab = "width between seats",ylab = "total seats",las=1,col="blue")
again no relation can be inferred on the basis of this
plot(airlines$PriceRelative, airlines$SeatsTotal, main = "Scatterplot between PriceRelative and SeatsTotal", xlab = "relative difference",ylab = "total seats",las=1,col="blue")
plot(airlines$PriceRelative, airlines$PitchEconomy, main = "Scatterplot between pricerelative and PitchEconomy", xlab = "price difference",ylab = "width between seats",las=1,col="blue")
Scatterplots are not giving any sort of correlation between the variables.
plot(airlines$PriceRelative, airlines$WidthEconomy, main = "Scatterplot of PriceRelative and WidthEconomy", xlab = "relative price difference",ylab = "width of armrests",las=1,col="blue")
plot(airlines$PriceRelative, airlines$WidthDifference, main = "Scatterplot relative price and width difference", xlab = "relative price difference",ylab = "width of armrests",las=1,col="blue")
abline(lm(airlines$PriceRelative~airlines$WidthDifference),col="red")
Here the plot is scattered but the best fit line is saying positive relation between relative price difference and width difference of armrests.
plot(airlines$PriceRelative, airlines$PitchDifference, main = "Scatterplot relative price and pitch difference", xlab = "relative price difference",ylab = "width of armrests",las=1,col="blue")
abline(lm(airlines$PriceRelative~airlines$PitchDifference),col="red")
Scatterplot of the entire data
plot(airlines)
It is difficult to infer any pattern from the above figure .So try to see the scatterplot of few variables together.
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplotMatrix(formula=~PriceRelative+FlightDuration+SeatsPremium+WidthPremium+PitchPremium,data=airlines,diagonal="histogram")
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
Here WidthPremium and PitchPremium is showing clearcut trends (upward). Diagonal shows the distribution of the variables itself. A little bit of weak positive correlation is there between PriceRelative and FlightDuration . A somewhat better positive correlation we can see between priceRelative and WidthPremium.
After visualizing the data we reached the conclusion that there is indeed a price difference between the tickets of premium class and economy class for all the airlines and for both the aircrafts and also for both the international flight and domestic bound flight. The difference is greater for Jet airlines and for Boeinge aircrafts and for the international flights.
Now let us visualize the data using correlogram
library(corrgram)
corrgram(airlines,order=TRUE,lower.panel=panel.shade,upper.panel=panel.pie,text.panel=panel.txt,main="corregram of airlines intercorrelations")
Here the compete blue ellipse shows the strong positive correlation.Seats total and SeatsEconomy are strongly positively correlated. the red shades ellipses are showing negative correlation. The extent to which the ellipses are filled shows the intensity of the relationship.Pricerelative is negatively though weakly correlated with PercentPremiumSeats. Likewise we can interpret others.
Now let us see the data through the correlation matrix(only numeric data).
new<-airlines[,-1:-2]
airlines1<-new[,-2:-3]
coor<-cor(airlines1)
coor
## FlightDuration SeatsEconomy SeatsPremium PitchEconomy
## FlightDuration 1.00000000 0.195621187 0.161236400 0.29377174
## SeatsEconomy 0.19562119 1.000000000 0.625056587 0.14412692
## SeatsPremium 0.16123640 0.625056587 1.000000000 -0.03421296
## PitchEconomy 0.29377174 0.144126924 -0.034212963 1.00000000
## PitchPremium 0.09621471 0.119221250 0.004883123 -0.55060624
## WidthEconomy 0.45647720 0.373670252 0.455782883 0.29448586
## WidthPremium 0.10343747 0.102431959 -0.002717527 -0.53929285
## PriceEconomy 0.56664039 0.128167220 0.113642176 0.36866123
## PricePremium 0.64873981 0.177000928 0.217612376 0.22614179
## PriceRelative 0.12107501 0.003956939 -0.097196009 -0.42302204
## SeatsTotal 0.20023299 0.992607966 0.715171053 0.12373524
## PitchDifference -0.03749288 0.035318044 0.016365566 -0.78254993
## WidthDifference -0.11856070 -0.080670148 -0.216168666 -0.63557430
## PercentPremiumSeats 0.06051625 -0.330935223 0.485029771 -0.10280880
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## FlightDuration 0.096214708 0.45647720 0.103437469 0.56664039
## SeatsEconomy 0.119221250 0.37367025 0.102431959 0.12816722
## SeatsPremium 0.004883123 0.45578288 -0.002717527 0.11364218
## PitchEconomy -0.550606241 0.29448586 -0.539292852 0.36866123
## PitchPremium 1.000000000 -0.02374087 0.750259029 0.05038455
## WidthEconomy -0.023740873 1.00000000 0.081918728 0.06799061
## WidthPremium 0.750259029 0.08191873 1.000000000 -0.05704522
## PriceEconomy 0.050384550 0.06799061 -0.057045224 1.00000000
## PricePremium 0.088539147 0.15054837 0.064020043 0.90138870
## PriceRelative 0.417539056 -0.04396116 0.504247591 -0.28856711
## SeatsTotal 0.107512784 0.40545860 0.091297500 0.13243313
## PitchDifference 0.950591466 -0.12722421 0.760121272 -0.09952511
## WidthDifference 0.703281797 -0.39320512 0.884149655 -0.08449975
## PercentPremiumSeats -0.175487414 0.22714172 -0.183312058 0.06532232
## PricePremium PriceRelative SeatsTotal PitchDifference
## FlightDuration 0.64873981 0.121075014 0.20023299 -0.03749288
## SeatsEconomy 0.17700093 0.003956939 0.99260797 0.03531804
## SeatsPremium 0.21761238 -0.097196009 0.71517105 0.01636557
## PitchEconomy 0.22614179 -0.423022038 0.12373524 -0.78254993
## PitchPremium 0.08853915 0.417539056 0.10751278 0.95059147
## WidthEconomy 0.15054837 -0.043961160 0.40545860 -0.12722421
## WidthPremium 0.06402004 0.504247591 0.09129750 0.76012127
## PriceEconomy 0.90138870 -0.288567110 0.13243313 -0.09952511
## PricePremium 1.00000000 0.031846537 0.19232533 -0.01806629
## PriceRelative 0.03184654 1.000000000 -0.01156894 0.46873025
## SeatsTotal 0.19232533 -0.011568942 1.00000000 0.03416915
## PitchDifference -0.01806629 0.468730249 0.03416915 1.00000000
## WidthDifference -0.01151218 0.485802437 -0.10584398 0.76089108
## PercentPremiumSeats 0.11639097 -0.161565556 -0.22091465 -0.09264869
## WidthDifference PercentPremiumSeats
## FlightDuration -0.11856070 0.06051625
## SeatsEconomy -0.08067015 -0.33093522
## SeatsPremium -0.21616867 0.48502977
## PitchEconomy -0.63557430 -0.10280880
## PitchPremium 0.70328180 -0.17548741
## WidthEconomy -0.39320512 0.22714172
## WidthPremium 0.88414965 -0.18331206
## PriceEconomy -0.08449975 0.06532232
## PricePremium -0.01151218 0.11639097
## PriceRelative 0.48580244 -0.16156556
## SeatsTotal -0.10584398 -0.22091465
## PitchDifference 0.76089108 -0.09264869
## WidthDifference 1.00000000 -0.27559416
## PercentPremiumSeats -0.27559416 1.00000000
Here we have generated a correlation matrix .Every numeric data is showing some sort of relationship with other numeric data. We can see Flight duration and PricePremium is showing somewhat strong positive corelation as the coefficient is 0.64873981. Relative price difference is positively correlated with FlightDuration, SeatsEconomy,PitchPremium,WidthPremium, PricePremium etc.(just see the coefficient in this matrix).
let us visualize the correlation matrix graphically
library(corrplot)
## corrplot 0.84 loaded
corrplot(corr=cor(airlines1,use = "complete.obs"),method = "ellipse")
THE BLUE SHADES are showing positive correlation and red shades are showing negative correlation. The intensity of the shades will show the weakness or strongness of the association. The lines will show strong correlation. The oval shapes will show weak correlation. There is strong positive correlation between PitchPremium and WidthPremium .PricePremium and PricePremium are strongly positively correlated. PitchEconomy and PitchDifference is strongly negatively correlated. PriceRelative is showing negative correlation with PitchEconomy, PriceEconomy andPercentPremiumSeats, while positive correlation with PitchPremium,WidthPremium,Pitchdifference and WidthDifference. PriceRelative is also having very weak positive correlation with FlightDuration,while very weak negative correlation with SeatsPremium.
Another way of representing this matrix is
par(mfrow=c(1,1))
corrplot.mixed(corr = cor(airlines1,use = "complete.obs"),upper="ellipse",tl.pos = "lt",colo=c(50,"red","grey60","blue4"))
## Warning in text.default(pos.xlabel[, 1], pos.xlabel[, 2], newcolnames, srt
## = tl.srt, : "colo" is not a graphical parameter
## Warning in text.default(pos.ylabel[, 1], pos.ylabel[, 2], newrownames, col
## = tl.col, : "colo" is not a graphical parameter
## Warning in title(title, ...): "colo" is not a graphical parameter
## Warning in title(title, ...): "colo" is not a graphical parameter
The same graph with numbers written on it. The blue ones showing the positive correlation and the red shades are showing the negative correlation.
NOW try to test our hypothesis which seems relevant here.
FIRST- Let us set the null hypothesis
H0: that there is no price difference between the tickets of economy and premium-economy class i.e the economy ticket price is equal to the premium ticket price. i.e PricePremium=PriceEconomy
Alternate Hypothesis
H1: there is price difference between the tickets of the two classes.i.e the economy ticket price is not equal to the premium-economy ticket i.e premium ticet is costlier than economy test.i.e PricePremium>PriceEconomy. we will run right tail t test here.
Use independent t-test without the assumption of equality of variances. We have seen earler that vriances of PriceEconomy and PricePremium are not same.
t.test(airlines$PricePremium,airlines$PriceEconomy,data=airlines,alternative = "greater")
##
## Welch Two Sample t-test
##
## data: airlines$PricePremium and airlines$PriceEconomy
## t = 6.8304, df = 856.56, p-value = 8.027e-12
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 393.2603 Inf
## sample estimates:
## mean of x mean of y
## 1845.258 1327.076
Here based on the p-value<0.05 we are rejecting our null hypothesis and conclude that yes price of the premium ticket is higher than the price of the economy ticket i.e there is price difference between the prices of the tickets of the two classes.
Now test whether this correlation is significant or not. We are intersted in finding the factors affecting the relative price difference in tickets of economy and premium tickets of airlines. So one variable which we will take here is by default PriceRelative
cor.test(airlines$PriceRelative, airlines$FlightDuration)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$FlightDuration
## t = 2.6046, df = 456, p-value = 0.009498
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.02977856 0.21036806
## sample estimates:
## cor
## 0.121075
Here we can se that p-value< 0.05 which says our null hypothesis is rejected. Null hypothesis is there is no correlation between these two variables. But here we have found out true correlation is different from zero. Flight duration is positively affecting the price difference of tickets.Correlation is weak but significant
here 95% confidence interval says that the there is 95% chance that the coeffiecient which we have estimated here(0.121075) is going to fall between 0.02977856 and 0.21036806.
See here 0 lies outside of the range 0.02977856 and 0.21036806. So another way of saying there is indeed a correlation between these two varibles.
NOTE-It can be easily interpreted as an increase in 1% of flight duration will increase the relative price difference by 0.121075%. But we can’t be sure about the magnitude of the change as correlation does not predict the values and it does not tell which one is dependent and which one is independent variable. In short we can not prove causation. for that we need to wait till we do regression analysis to strengthen our claim.
cor.test(airlines$PriceRelative,airlines$SeatsEconomy)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$SeatsEconomy
## t = 0.084498, df = 456, p-value = 0.9327
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08770167 0.09554911
## sample estimates:
## cor
## 0.003956939
Here p-value>0.05 So null hypothesis is getting accepted . that means there is no significant correlation between relative price difference and no of economy seats . We are not concerned with this Varible (SeatsEconomy) now.
cor.test(airlines$PriceRelative,airlines$SeatsPremium)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$SeatsPremium
## t = -2.0854, df = 456, p-value = 0.03759
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.18715605 -0.00561924
## sample estimates:
## cor
## -0.09719601
Here p value<0.05 .So we reject the null hypothesis and say that there is indeed a correlation between no of premium seats and relative price difference of tickets. The correlation is somewhat weak but significant
See there is 95% chance that the estimated coefficient is going to fall between -0.18715605 and -0.00561924. We notice that 0 is outside of this interval. So there is indeed a correlation between these two variables.
Here we notice that correlation is negative means increase of 1% in the number of premium seats will lead to narrow down the gap between price difference of two ticket classes by 0.09719601%.
NOTE-But we can’t be sure about the magnitude of the change as correlation does not predict the values and it does not tell which one is dependent and which one is independent variable. In short we can not prove causation.we have established a negative relation only.For that we need to wait till we do regression analysis to strengthen our claim.
cor.test(airlines$PriceRelative,airlines$PitchEconomy)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$PitchEconomy
## t = -9.9692, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4954453 -0.3447581
## sample estimates:
## cor
## -0.423022
p-value is very small<0.05. So we accept the alternate hypothesis that there is a correlation between distance of economy seats and relative price difference. Increase of 1% in the distance between the seats of the economy class will lead to 0.423022% decrease in the relative price difference of tickets of two classes.
NOTE-But we can’t be sure about the magnitude of the change as correlation does not predict the values and it does not tell which one is dependent and which one is independent variable. In short we can not prove causation. for that we need to wait till we do regression analysis to strengthen our claim.
cor.test(airlines$PriceRelative,airlines$PitchPremium)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$PitchPremium
## t = 9.8125, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3388769 0.4904041
## sample estimates:
## cor
## 0.4175391
Here p-value is very small<0.05. So we accept the alternate hypothesis that there is a correlation between distance of premium seats and relative price difference.
NOTE- It can be easily interpreted as ani ncrease of 1% in the distance between the seats of the premium class will lead to 0.4175391 % increase in the relative price difference of tickets of two classes. see that the correlation coefficient is 0.4175391 a positive quantity. But we can’t be sure about the magnitude of the change as correlation does not predict the values and it does not tell which one is dependent and which one is independent variable. In short we can not prove causation. for that we need to wait till we do regression analysis to strengthen our claim.
Also 0 does not belong to the interval (0.3388768 , 0.4904041). It shows there is indeed a correlation between these two variables.
cor.test(airlines$PriceRelative,airlines$WidthEconomy)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$WidthEconomy
## t = -0.93966, df = 456, p-value = 0.3479
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.13504401 0.04785843
## sample estimates:
## cor
## -0.04396116
p-value is >0.05. So we accept the null hypothesis that there is a no correlation between width of economy seats armrests and relative price difference. Increase/decrease in the width of the armrests of the economy class will not affect the relative price difference of tickets of two classes. This can’t be a factor in determining the cause of relative price difference between tickets of two classes.
cor.test(airlines$PriceRelative,airlines$WidthPremium)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$WidthPremium
## t = 12.469, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4326084 0.5695593
## sample estimates:
## cor
## 0.5042476
p-value is very small<0.05. So we accept the alternate hypothesis that there is a correlation between width of premium seats armrests and relative price difference. Increase of 1% in the width of the armrests of the premium seats will lead to 0.5042476% increase in the relative price difference of tickets of two classes. the coefficient is positive and 95% of the chance that the estimated coefficient is going to fall between0.4326084 and 0.5695593.
NOTE-But we can’t be sure about the magnitude of the change as correlation does not predict the values and it does not tell which one is dependent and which one is independent variable. In short we can not prove causation. for that we need to wait till we do regression analysis to strengthen our claim.
cor.test(airlines$PriceRelative,airlines$PriceEconomy)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$PriceEconomy
## t = -6.4359, df = 456, p-value = 3.112e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3704004 -0.2022889
## sample estimates:
## cor
## -0.2885671
See the p-value which is very small<0.05. So we accept the alternate hypothesis that there is indeed a correlation between price of the economy class ticket and relative price difference. the coefficient is negative i.e -0.2885671. 1% Increase in the price of the economy class ticket will lead to 0.2885671% decrease in the relative price difference of tickets of two classes . or 1% decrese in the in the price of the economy class ticket will lead to 0.2885671% increase in the relative price difference of tickets of two class. Correlation is somewhat relatively weak.
NOTE-But we can’t be sure about the magnitude of the change as correlation does not predict the values and it does not tell which one is dependent and which one is independent variable. In short we can not prove causation. for that we need to wait till we do regression analysis to strengthen our claim.
cor.test(airlines$PriceRelative,airlines$PricePremium)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$PricePremium
## t = 0.6804, df = 456, p-value = 0.4966
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.05995522 0.12311410
## sample estimates:
## cor
## 0.03184654
Here we see that the p-value is >0.05. so we accept the null hypothesis that there is no correlation between the price of the premium ticket and relative difference between prices of tickets. So we are not concerned with this particular variable now.
cor.test(airlines$PriceRelative,airlines$SeatsTotal)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$SeatsTotal
## t = -0.24706, df = 456, p-value = 0.805
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.10308648 0.08014282
## sample estimates:
## cor
## -0.01156894
total Seats is also irrelevant variable here. p-value>0.05 . So we will accept the null hypothesis that there is no correlation between the total no of seats and relative price difference.
cor.test(airlines$PriceRelative,airlines$PitchDifference)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$PitchDifference
## t = 11.331, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3940262 0.5372817
## sample estimates:
## cor
## 0.4687302
Here the p-value<0.05. that means this variable is having non-zero correlation with relative price difference. coefficient is 0.4687302. The more the gap between the distance between two seats of premium class and distance between the two seats of economy class , more is the relative price difference.1% increase in the gap of this variable will lead to 0.4687302% increase in the relative price difference.
NOTE-But we can’t be sure about the magnitude of the change as correlation does not predict the values and it does not tell which one is dependent and which one is independent variable. In short we can not prove causation. for that we need to wait till we do regression analysis to strengthen our claim.
cor.test(airlines$PriceRelative,airlines$WidthDifference)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$WidthDifference
## t = 11.869, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4125388 0.5528218
## sample estimates:
## cor
## 0.4858024
Same is the case here, p-value<0.05. So there is a positive correlation between the difference of the width of the armrests of the premium and economy class and the relative price difference of the tickets. Increase of 1% in this variable will increase the relative price difference of the tickets by 0.4858024%.
NOTE- But we can’t be sure about the magnitude of the change as correlation does not predict the values and it does not tell which one is dependent and which one is independent variable. In short we can not prove causation. for that we need to wait till we do regression analysis to strengthen our claim.
cor.test(airlines$PriceRelative,airlines$PercentPremiumSeats)
##
## Pearson's product-moment correlation
##
## data: airlines$PriceRelative and airlines$PercentPremiumSeats
## t = -3.496, df = 456, p-value = 0.0005185
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.24949885 -0.07098966
## sample estimates:
## cor
## -0.1615656
Here p-value <0.05. So we accept the alternate hypothesis that there is a non-zero correlation between the % premium seats available and the relative price difference of the tickets of the two classes. the coeffcient is -0.1615656 which says 1% increase of percent premium seats available in the airlines will decrease the relative price difference by 0.1615656% . Though the correlation is relatively weak but it is significant.
NOTE_But we can’t be sure about the magnitude of the change as correlation does not predict the values and it does not tell which one is dependent and which one is independent variable. In short we can not prove causation. for that we need to wait till we do regression analysis to strengthen our claim.
CONCLUSION/RESULT BASED ON THE CORRELATION TEST- (THOUGH WE CAN’T ESTABLISH THE CAUSATION YET BUT FOR THE TIME BEING WE CAN DEFINITELY INFER) THE following-
On the basis of this correlation test we can say that the factors which are significantly affecting the price difference of the tickets are
B)PitchPremium
C)SeatsPremium
D)PitchEconomy
E)WidthPremium
F)PriceEconomy
G)widthDifference
H)PitchDifference
I)PercentPremiumSeats
The factors which are insignificantly afffecting the relativeprice difference are-
A)SeatsEconomy
B)WidthEconomy
C)PricePremium
D)Seat total
The factors which are insignificant here will not matter for the analysis.
The factors which are significant here will be the reason behind the price difference between the ticket of premium class and ticket of economy class.
Out of the significant factors the factors which are strongly affecting the price difference between the tickets of two classes are-
A)PitchPremium
B)PitchEconomy
C)WidthPremium
D)WidthDifference
E)PitchDifference
Out of the significant factors the factors which are relatively weakly affecting the price difference between the tickets of two classes are-
A)FlightDuration
B)SeatsPremium
C)PriceEconomy
D)PercentPremiumSeats
Out of the significant factors the factors which are positively(+vely) affecting the price difference between the tickets of two classes are-
A)FlightDuration
B)PitchPremium
C)WidthPremium
D)widthDifference
E)PitchDifference
Out of the significant factors the factors which are negatively(-vely) affecting the price difference between the tickets of two classes are-
A)SeatPremium
B)PitchEconomy
C)PriceEconomy
D)PercentPremiumSeats
FACTORS MATRIX
POSITIVE NEGATIVE
PitchPremium PitchEconomy
WidthPremium
WidthDifference
STRONG PitchDifference
WEAK FlightDuration SeatsPremium
PriceEconomy
PercentPremiumSeats
ANALYSIS of few variables -
Here we can say that PitchPremium will strongly positively affect the price difference. An increase of 1% in the distance between the seats of the premium class will lead to 0.4175391 % increase in the relative price difference of tickets of two classes.
The distance between the seats make passengers comfortable. So more spacing means less no of Premium seats which will lead to increase in the premium prices which will eventually lead to increase the price difference of the tickets of the two classes.
WidthPremium is strongly positively affecting the difference of price tickets. More width of the armrests will make passenger more comfortable so definitely more comfort will come at more price which will lead to increase the price difference.
PitchEconomy is strongly but negatively affects the relative price difference. The more the distance between the economy seats will definitely make the economy passengers more comfortable. This will lead to the increase of the price of economy class tickets. This will reduce the relative price difference.
FlightDuration is weakly but positively affecting the relative price difference. The more no. of hours a flight take to reach the destination the more comfort the passengers would expect. This will keep the prices of premium tickets on higher side as compared to economy tickets which will contribute to widen the gap between the relative prices.
SeatPremium is weakly and negatively affecting the relative price difference. The more no of premium seats available in the plane will ease some pressure on the premium pricing which will narrow down the gap between the prices of the tickets of the two classes.
PriceEconomy is weakly and negatively affecting the relative price difference. obviously if the price of the economy class is little bit higher means difference between the price of the economy and premium will be reduced.
NOW double check our findings based on the correlation test by the help of regression analysis to strengthen our analysis. This is also required to give the exact magnitude of the change and it clearly tells you which varible is dependent, that is what we are interested in estimating and what are the affecting variables.
NoW set the regression model for analysis.
for the purpose of this analysis we have taken PriceRelative to show the difference in the price of the premium and economy tickets and rest all the numeric varibles as factors or independent variables. Based on the correlation test we have excluded those variables which came out to be insignificant. We want to see how robust was our correlation test
our model is
PriceRelative= b0 +b1FlightDuration+b2SeatsPremium+b3PitchEconomy+b4PitchPremium+b5WidthPremium+b6PriceEconomy+b7PitchDifference+b8WidthDifference+b9PercentPremiumSeats +error
and we are testing the hypothesis
H0: b1=b2=b3=b4=b5=b6=b7=b8=b9=0 i.e there is no effect of these nine quantitative variables on the realtive price difference of the tickets.
H1: at least one of these coefficients is not zero that is at least one these variables is affecting the relativePrice difference of the tickets.i.e. atleast one bi!=0
model<-PriceRelative~ FlightDuration+SeatsPremium+PitchEconomy+PitchPremium+WidthPremium+PriceEconomy+PitchDifference+WidthDifference+PercentPremiumSeats
fit<-lm(model,data = airlines)
summary(fit)
##
## Call:
## lm(formula = model, data = airlines)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.78085 -0.18868 -0.01862 0.11915 0.99189
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.335e+00 1.623e+00 1.438 0.1511
## FlightDuration 6.328e-02 6.342e-03 9.978 < 2e-16 ***
## SeatsPremium -1.079e-04 1.519e-03 -0.071 0.9434
## PitchEconomy -7.930e-02 4.144e-02 -1.914 0.0563 .
## PitchPremium 3.093e-02 1.985e-02 1.558 0.1199
## WidthPremium -4.792e-02 4.254e-02 -1.126 0.2606
## PriceEconomy -2.266e-04 2.281e-05 -9.938 < 2e-16 ***
## PitchDifference NA NA NA NA
## WidthDifference 1.723e-01 4.205e-02 4.098 4.95e-05 ***
## PercentPremiumSeats -4.627e-03 4.103e-03 -1.128 0.2601
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3339 on 449 degrees of freedom
## Multiple R-squared: 0.4606, Adjusted R-squared: 0.451
## F-statistic: 47.92 on 8 and 449 DF, p-value: < 2.2e-16
FINDINGS-
Here based on the F-statistic p-value which is less than 0.05. We can say that the overall model is significant and we safely reject the null hypothesis and conclude that at least one the variablesis affecting the relative price difference.
Aestrisk mark shows that the variables are significantly affecting the relativePreiceDifference
so those variables are affecting the relative price difference.
Increase in the flight duration is going to increase the relative price difference.
Increase in the economy price ticket will decrease the relative price difference.
increase in the distance of the seats in the economy class will decrease the relative price difference.
Here R-squared is 0.5744 i.e 57.44% variation in relative price difference is explained by the model
Adjusted R-squared is also .5669 which is good.
Second model by including more variables
modell<-PriceRelative~ FlightDuration+SeatsPremium+PitchEconomy+PitchPremium+WidthPremium+PriceEconomy+PitchDifference+WidthDifference+PercentPremiumSeats+SeatsEconomy+WidthEconomy+PricePremium+SeatsTotal
fit1<-lm(modell,data = airlines1)
summary(fit1)
##
## Call:
## lm(formula = modell, data = airlines1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.77611 -0.10464 0.00752 0.07428 0.84432
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.091e+00 1.108e+00 -4.594 5.66e-06 ***
## FlightDuration 2.163e-02 4.490e-03 4.817 2.00e-06 ***
## SeatsPremium -1.501e-02 3.131e-03 -4.793 2.25e-06 ***
## PitchEconomy 1.085e-01 2.824e-02 3.840 0.000141 ***
## PitchPremium 1.151e-01 1.349e-02 8.530 2.28e-16 ***
## WidthPremium -1.401e-01 2.867e-02 -4.885 1.44e-06 ***
## PriceEconomy -8.462e-04 2.939e-05 -28.788 < 2e-16 ***
## PitchDifference NA NA NA NA
## WidthDifference 1.998e-01 2.878e-02 6.940 1.39e-11 ***
## PercentPremiumSeats 2.211e-02 7.480e-03 2.956 0.003285 **
## SeatsEconomy 1.768e-03 5.264e-04 3.358 0.000851 ***
## WidthEconomy NA NA NA NA
## PricePremium 5.500e-04 2.253e-05 24.410 < 2e-16 ***
## SeatsTotal NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2187 on 447 degrees of freedom
## Multiple R-squared: 0.7695, Adjusted R-squared: 0.7643
## F-statistic: 149.2 on 10 and 447 DF, p-value: < 2.2e-16
Here also see the F-statistics p-value<0.05 . so we are accepting our alternte hypothesis that atleast one of the variables is affecting the relative price difference.
Aestrisk mark shows that the variables are significantly affecting the relativePreiceDifference
FINDINGS
FlightDuration , PitchPremium , Pitch economy, WidthDifference, PercentPremiumSeats, SeatsEconomy and price Premium are positively affecting the relative price difference.
SeatsPremium, WidthPremium, PriceEconomy are negatively affecting the relative price difference.
r- squared is fairly high in this model i.e. almost 77% variation in the relative pricedifference is explained by the model
Adjusted R-squared is also fairly high.
This model seems to be robust than the previous one.
If we see the two models and closely examine which are the common factors then we find that the SeatsEconomy , WidthDifference and FlightDuration are common factors affecting the relative price difference.
conclusion we reached the conclusion that there is indeed a price difference between the tickets of premium class and economy class for all the airlines and for both the aircrafts and also for both the international flight and domestic bound flight. The difference is greater for Jet airlines and for Boeinge aircrafts and for the international flights. For all the months the median relative price difference is same. September shows the least relative price difference. Rest of the three months show somewhat equal relative price variation.