Reading the Airlinescase study dataset into a data frame for futher investigation and insights.
dd <- read.csv(paste("SixAirlinesDataV2.csv",sep=""))
some(dd)
## Airline Aircraft FlightDuration TravelMonth IsInternational
## 23 British Boeing 6.66 Aug International
## 79 Delta Boeing 1.75 Jul Domestic
## 98 Delta Boeing 4.26 Jul Domestic
## 115 British AirBus 3.58 Sep International
## 134 British AirBus 2.83 Sep International
## 203 Virgin AirBus 11.33 Sep International
## 209 Virgin AirBus 7.75 Jul International
## 246 British Boeing 9.91 Jul International
## 301 Delta Boeing 2.86 Aug Domestic
## 344 AirFrance Boeing 8.75 Jul International
## SeatsEconomy SeatsPremium PitchEconomy PitchPremium WidthEconomy
## 23 122 40 31 38 18
## 79 78 20 31 34 18
## 98 132 26 32 34 17
## 115 303 55 31 38 18
## 134 303 55 31 38 18
## 203 233 38 31 38 18
## 209 233 38 31 38 18
## 246 243 36 31 38 18
## 301 136 20 33 35 17
## 344 200 28 32 38 17
## WidthPremium PriceEconomy PricePremium PriceRelative SeatsTotal
## 23 19 1651 2191 0.33 162
## 79 18 391 406 0.04 98
## 98 17 458 497 0.09 158
## 115 19 402 442 0.10 358
## 134 19 171 201 0.18 358
## 203 21 2369 3540 0.49 271
## 209 21 540 594 0.10 271
## 246 19 2356 3200 0.36 279
## 301 17 354 378 0.07 156
## 344 19 3026 3226 0.07 228
## PitchDifference WidthDifference PercentPremiumSeats
## 23 7 1 24.69
## 79 3 0 20.41
## 98 2 0 16.46
## 115 7 1 15.36
## 134 7 1 15.36
## 203 7 3 14.02
## 209 7 3 14.02
## 246 7 1 12.90
## 301 2 0 12.82
## 344 6 2 12.28
describe(dd)
## vars n mean sd median trimmed mad min
## Airline* 1 458 3.01 1.65 2.00 2.89 1.48 1.00
## Aircraft* 2 458 1.67 0.47 2.00 1.71 0.00 1.00
## FlightDuration 3 458 7.58 3.54 7.79 7.57 4.81 1.25
## TravelMonth* 4 458 2.56 1.17 3.00 2.58 1.48 1.00
## IsInternational* 5 458 1.91 0.28 2.00 2.00 0.00 1.00
## SeatsEconomy 6 458 202.31 76.37 185.00 194.64 85.99 78.00
## SeatsPremium 7 458 33.65 13.26 36.00 33.35 11.86 8.00
## PitchEconomy 8 458 31.22 0.66 31.00 31.26 0.00 30.00
## PitchPremium 9 458 37.91 1.31 38.00 38.05 0.00 34.00
## WidthEconomy 10 458 17.84 0.56 18.00 17.81 0.00 17.00
## WidthPremium 11 458 19.47 1.10 19.00 19.53 0.00 17.00
## PriceEconomy 12 458 1327.08 988.27 1242.00 1244.40 1159.39 65.00
## PricePremium 13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative 14 458 0.49 0.45 0.36 0.42 0.41 0.02
## SeatsTotal 15 458 235.96 85.29 227.00 228.73 90.44 98.00
## PitchDifference 16 458 6.69 1.76 7.00 6.76 0.00 2.00
## WidthDifference 17 458 1.63 1.19 1.00 1.53 0.00 0.00
## PercentPremiumSeats 18 458 14.65 4.84 13.21 14.31 2.68 4.71
## max range skew kurtosis se
## Airline* 6.00 5.00 0.61 -0.95 0.08
## Aircraft* 2.00 1.00 -0.72 -1.48 0.02
## FlightDuration 14.66 13.41 -0.07 -1.12 0.17
## TravelMonth* 4.00 3.00 -0.14 -1.46 0.05
## IsInternational* 2.00 1.00 -2.91 6.50 0.01
## SeatsEconomy 389.00 311.00 0.72 -0.36 3.57
## SeatsPremium 66.00 58.00 0.23 -0.46 0.62
## PitchEconomy 33.00 3.00 -0.03 -0.35 0.03
## PitchPremium 40.00 6.00 -1.51 3.52 0.06
## WidthEconomy 19.00 2.00 -0.04 -0.08 0.03
## WidthPremium 21.00 4.00 -0.08 -0.31 0.05
## PriceEconomy 3593.00 3528.00 0.51 -0.88 46.18
## PricePremium 7414.00 7328.00 0.50 0.43 60.19
## PriceRelative 1.89 1.87 1.17 0.72 0.02
## SeatsTotal 441.00 343.00 0.70 -0.53 3.99
## PitchDifference 10.00 8.00 -0.54 1.78 0.08
## WidthDifference 4.00 4.00 0.84 -0.53 0.06
## PercentPremiumSeats 24.69 19.98 0.71 0.28 0.23
Lets find the summmary statistics and generate tables for some important variables.Lets have a detailed look into the variables:
str(dd)
## 'data.frame': 458 obs. of 18 variables:
## $ Airline : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Aircraft : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
## $ FlightDuration : num 12.25 12.25 12.25 12.25 8.16 ...
## $ TravelMonth : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
## $ IsInternational : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
## $ SeatsEconomy : int 122 122 122 122 122 122 122 122 122 122 ...
## $ SeatsPremium : int 40 40 40 40 40 40 40 40 40 40 ...
## $ PitchEconomy : int 31 31 31 31 31 31 31 31 31 31 ...
## $ PitchPremium : int 38 38 38 38 38 38 38 38 38 38 ...
## $ WidthEconomy : int 18 18 18 18 18 18 18 18 18 18 ...
## $ WidthPremium : int 19 19 19 19 19 19 19 19 19 19 ...
## $ PriceEconomy : int 2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
## $ PricePremium : int 3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
## $ PriceRelative : num 0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
## $ SeatsTotal : int 162 162 162 162 162 162 162 162 162 162 ...
## $ PitchDifference : int 7 7 7 7 7 7 7 7 7 7 ...
## $ WidthDifference : int 1 1 1 1 1 1 1 1 1 1 ...
## $ PercentPremiumSeats: num 24.7 24.7 24.7 24.7 24.7 ...
Some Important tables
xtabs(~Airline + Aircraft, data=dd)
## Aircraft
## Airline AirBus Boeing
## AirFrance 36 38
## British 47 128
## Delta 12 34
## Jet 7 54
## Singapore 16 24
## Virgin 33 29
xtabs(~Airline + IsInternational,data=dd)
## IsInternational
## Airline Domestic International
## AirFrance 0 74
## British 0 175
## Delta 40 6
## Jet 0 61
## Singapore 0 40
## Virgin 0 62
Summary statistics are as follows:
summary(dd$PriceEconomy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 65 413 1242 1327 1909 3593
summary(dd$PricePremium)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 86.0 528.8 1737.0 1845.3 2989.0 7414.0
Lets study the box plots and bar plots of the variables
table(dd$Airline,dd$Aircraft)
##
## AirBus Boeing
## AirFrance 36 38
## British 47 128
## Delta 12 34
## Jet 7 54
## Singapore 16 24
## Virgin 33 29
table(dd$Airline,dd$IsInternational)
##
## Domestic International
## AirFrance 0 74
## British 0 175
## Delta 40 6
## Jet 0 61
## Singapore 0 40
## Virgin 0 62
table(dd$TravelMonth,dd$IsInternational)
##
## Domestic International
## Aug 10 117
## Jul 10 65
## Oct 11 116
## Sep 9 120
boxplot(dd$FlightDuration,
xlab="Travel duration in Hours",
main="Box plot of Flight Duration",horizontal=TRUE)
lets have some boxplots for the different parameters of Specifics related to the listed 2 classes.
boxplot(dd$SeatsEconomy,
xlab="Number of seats in Economy class",col="yellow",
main="Box plot of Number of seats in Economy class",horizontal=TRUE)
boxplot(dd$SeatsPremium,
xlab="Number of seats in Premium class",col="red",
main="Box plot of Number of seats in Premium class",horizontal=TRUE)
boxplot(dd$PitchEconomy,
xlab="Number of seats in Premium class",col="darkolivegreen",
main="Box plot of pitch in Economy class",horizontal=TRUE)
boxplot(dd$PitchPremium,
xlab="Number of seats in Premium class",col="darkorange",
main="Box plot of Pitch in Premium class",horizontal=TRUE)
boxplot(dd$WidthEconomy,
xlab="Number of seats in Premium class",col="hotpink",
main="Box plot of Width in Economy class",horizontal=TRUE)
boxplot(dd$WidthPremium,
xlab="Number of seats in Premium class",col="seagreen",
main="Box plot of Width in Premium class",horizontal=TRUE)
lets study the correlation through a correlation Matrix in detail
dd1 <- subset(dd,select=c(SeatsEconomy,SeatsPremium,PitchEconomy,PitchPremium,WidthEconomy,
WidthPremium,PriceEconomy,PricePremium,PitchDifference,WidthDifference))
corrs <- cor(dd1, use="pairwise.complete.obs")
corrs
## SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy 1.00000000 0.625056587 0.14412692 0.119221250
## SeatsPremium 0.62505659 1.000000000 -0.03421296 0.004883123
## PitchEconomy 0.14412692 -0.034212963 1.00000000 -0.550606241
## PitchPremium 0.11922125 0.004883123 -0.55060624 1.000000000
## WidthEconomy 0.37367025 0.455782883 0.29448586 -0.023740873
## WidthPremium 0.10243196 -0.002717527 -0.53929285 0.750259029
## PriceEconomy 0.12816722 0.113642176 0.36866123 0.050384550
## PricePremium 0.17700093 0.217612376 0.22614179 0.088539147
## PitchDifference 0.03531804 0.016365566 -0.78254993 0.950591466
## WidthDifference -0.08067015 -0.216168666 -0.63557430 0.703281797
## WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy 0.37367025 0.102431959 0.12816722 0.17700093
## SeatsPremium 0.45578288 -0.002717527 0.11364218 0.21761238
## PitchEconomy 0.29448586 -0.539292852 0.36866123 0.22614179
## PitchPremium -0.02374087 0.750259029 0.05038455 0.08853915
## WidthEconomy 1.00000000 0.081918728 0.06799061 0.15054837
## WidthPremium 0.08191873 1.000000000 -0.05704522 0.06402004
## PriceEconomy 0.06799061 -0.057045224 1.00000000 0.90138870
## PricePremium 0.15054837 0.064020043 0.90138870 1.00000000
## PitchDifference -0.12722421 0.760121272 -0.09952511 -0.01806629
## WidthDifference -0.39320512 0.884149655 -0.08449975 -0.01151218
## PitchDifference WidthDifference
## SeatsEconomy 0.03531804 -0.08067015
## SeatsPremium 0.01636557 -0.21616867
## PitchEconomy -0.78254993 -0.63557430
## PitchPremium 0.95059147 0.70328180
## WidthEconomy -0.12722421 -0.39320512
## WidthPremium 0.76012127 0.88414965
## PriceEconomy -0.09952511 -0.08449975
## PricePremium -0.01806629 -0.01151218
## PitchDifference 1.00000000 0.76089108
## WidthDifference 0.76089108 1.00000000
Lets have a correlation matrix of the entire dataset.
par(mfrow=c(1,1))
corrplot(corr=cor(dd[,c(8:18)]),use="complete.obs",
method="ellipse")
Lets us statistically answer some qestions related to the airlines dataset in order to get better insights.
Q1) What is the maximum price of tickets in Economy and Premium class for Six differnt airlines? Maximum price of the ticket of economy class is 3593 USD and that of premium economy class is 7414 USD
p<- aggregate(dd$PriceEconomy,by=list(Name_of_Airline=dd$Airline),max)
names(p)[2] <- "Max Price"
p
## Name_of_Airline Max Price
## 1 AirFrance 3593
## 2 British 3102
## 3 Delta 1999
## 4 Jet 676
## 5 Singapore 1431
## 6 Virgin 2445
q<- aggregate(dd$PriceEconomy,by=list(Name_of_Airline=dd$Airline),min)
names(q)[2] <- "Min Price"
q
## Name_of_Airline Min Price
## 1 AirFrance 630
## 2 British 65
## 3 Delta 158
## 4 Jet 108
## 5 Singapore 505
## 6 Virgin 540
The following table will give the maximum price of Economy class for different airlines and also shows the minimum price for travelling by Economy class
by(dd$PricePremium,dd$Airline,max)
## dd$Airline: AirFrance
## [1] 3972
## --------------------------------------------------------
## dd$Airline: British
## [1] 7414
## --------------------------------------------------------
## dd$Airline: Delta
## [1] 2765
## --------------------------------------------------------
## dd$Airline: Jet
## [1] 931
## --------------------------------------------------------
## dd$Airline: Singapore
## [1] 1947
## --------------------------------------------------------
## dd$Airline: Virgin
## [1] 3694
r<- aggregate(dd$PricePremium,by=list(Name_of_Airline=dd$Airline),min)
names(r)[2] <- "Min Price"
r
## Name_of_Airline Min Price
## 1 AirFrance 1611
## 2 British 86
## 3 Delta 173
## 4 Jet 228
## 5 Singapore 619
## 6 Virgin 594
The table here is self explanatory and shows the maximum and minimum price of tickets in Premium class for 6 different airlines.
Q2)Which of the following six airlines are cheapest and Costliest for travelling?
s<- aggregate(dd$PricePremium,by=list(Name_of_Airline=dd$Airline),mean)
names(s)[2] <- "Average Price of Economy"
s
## Name_of_Airline Average Price of Economy
## 1 AirFrance 3065.2162
## 2 British 1937.0286
## 3 Delta 684.6739
## 4 Jet 483.3607
## 5 Singapore 1239.9250
## 6 Virgin 2721.6935
t<- aggregate(dd$PricePremium,by=list(Name_of_Airline=dd$Airline),mean)
names(t)[2] <- "Average Price of Premium"
t
## Name_of_Airline Average Price of Premium
## 1 AirFrance 3065.2162
## 2 British 1937.0286
## 3 Delta 684.6739
## 4 Jet 483.3607
## 5 Singapore 1239.9250
## 6 Virgin 2721.6935
Air France is the costliest flight for travelling while jet and delta flight seems to be cheapest.
Q3)Which is the best month to travel irrespective of the airlines? July is the best month to travel.
tab2 <- bwplot(TravelMonth ~ PriceEconomy, data= dd, horizontal = TRUE)
tab2
tab3<- bwplot(TravelMonth ~ PricePremium, data= dd, horizontal = TRUE)
tab3
This shows that July is the best Month for travelling irrespective of Airlines,Outliers and Class.
Q4) what is the average number of seats alloted for Economy and Premium class irrespective of the airlines?
mlv(dd$SeatsEconomy, method = "mfv")
## Mode (most frequent value): 303
## Bickel's modal skewness: -0.7161572
## Call: mlv.integer(x = dd$SeatsEconomy, method = "mfv")
mlv(dd$SeatsPremium, method = "mfv")
## Mode (most frequent value): 36
## Bickel's modal skewness: -0.07641921
## Call: mlv.integer(x = dd$SeatsPremium, method = "mfv")
Most of the flights have 303 economy seats and 36 premium economy seats
cor(dd$SeatsEconomy,dd$SeatsPremium)
## [1] 0.6250566
cor(dd$PriceEconomy,dd$PricePremium)
## [1] 0.9013887
As can be seen from the above analysis Prices of Econommy and Premium class are highly correlated. Apart from that the number of seats from the Economy and Premium class are highly correlated. These results come out as expected.
lets us study the inter correlations between the same classes
cor(dd$PitchPremium,dd$WidthPremium)
## [1] 0.750259
cor(dd$PitchDifference,dd$WidthDifference)
## [1] 0.7608911
Also the pitch and the width of premium class is moderately correlated as expected and seen.
Lets us the study the effect on various factors on prices of the Economy and Premium classes respectively
cor.test(dd$PriceEconomy,dd$SeatsPremium)
##
## Pearson's product-moment correlation
##
## data: dd$PriceEconomy and dd$SeatsPremium
## t = 2.4426, df = 456, p-value = 0.01496
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.02224701 0.20315362
## sample estimates:
## cor
## 0.1136422
P-value from pearson’s test comes out to be 2.2e-16 which is less than 0.05. The two variables that is price of economy ticket and price of premium economy ticket are strongly correlated(0.901) and have a statistical significance