Reading the Airlinescase study dataset into a data frame for futher investigation and insights.

dd <- read.csv(paste("SixAirlinesDataV2.csv",sep=""))
some(dd)
##       Airline Aircraft FlightDuration TravelMonth IsInternational
## 23    British   Boeing           6.66         Aug   International
## 79      Delta   Boeing           1.75         Jul        Domestic
## 98      Delta   Boeing           4.26         Jul        Domestic
## 115   British   AirBus           3.58         Sep   International
## 134   British   AirBus           2.83         Sep   International
## 203    Virgin   AirBus          11.33         Sep   International
## 209    Virgin   AirBus           7.75         Jul   International
## 246   British   Boeing           9.91         Jul   International
## 301     Delta   Boeing           2.86         Aug        Domestic
## 344 AirFrance   Boeing           8.75         Jul   International
##     SeatsEconomy SeatsPremium PitchEconomy PitchPremium WidthEconomy
## 23           122           40           31           38           18
## 79            78           20           31           34           18
## 98           132           26           32           34           17
## 115          303           55           31           38           18
## 134          303           55           31           38           18
## 203          233           38           31           38           18
## 209          233           38           31           38           18
## 246          243           36           31           38           18
## 301          136           20           33           35           17
## 344          200           28           32           38           17
##     WidthPremium PriceEconomy PricePremium PriceRelative SeatsTotal
## 23            19         1651         2191          0.33        162
## 79            18          391          406          0.04         98
## 98            17          458          497          0.09        158
## 115           19          402          442          0.10        358
## 134           19          171          201          0.18        358
## 203           21         2369         3540          0.49        271
## 209           21          540          594          0.10        271
## 246           19         2356         3200          0.36        279
## 301           17          354          378          0.07        156
## 344           19         3026         3226          0.07        228
##     PitchDifference WidthDifference PercentPremiumSeats
## 23                7               1               24.69
## 79                3               0               20.41
## 98                2               0               16.46
## 115               7               1               15.36
## 134               7               1               15.36
## 203               7               3               14.02
## 209               7               3               14.02
## 246               7               1               12.90
## 301               2               0               12.82
## 344               6               2               12.28
describe(dd)
##                     vars   n    mean      sd  median trimmed     mad   min
## Airline*               1 458    3.01    1.65    2.00    2.89    1.48  1.00
## Aircraft*              2 458    1.67    0.47    2.00    1.71    0.00  1.00
## FlightDuration         3 458    7.58    3.54    7.79    7.57    4.81  1.25
## TravelMonth*           4 458    2.56    1.17    3.00    2.58    1.48  1.00
## IsInternational*       5 458    1.91    0.28    2.00    2.00    0.00  1.00
## SeatsEconomy           6 458  202.31   76.37  185.00  194.64   85.99 78.00
## SeatsPremium           7 458   33.65   13.26   36.00   33.35   11.86  8.00
## PitchEconomy           8 458   31.22    0.66   31.00   31.26    0.00 30.00
## PitchPremium           9 458   37.91    1.31   38.00   38.05    0.00 34.00
## WidthEconomy          10 458   17.84    0.56   18.00   17.81    0.00 17.00
## WidthPremium          11 458   19.47    1.10   19.00   19.53    0.00 17.00
## PriceEconomy          12 458 1327.08  988.27 1242.00 1244.40 1159.39 65.00
## PricePremium          13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative         14 458    0.49    0.45    0.36    0.42    0.41  0.02
## SeatsTotal            15 458  235.96   85.29  227.00  228.73   90.44 98.00
## PitchDifference       16 458    6.69    1.76    7.00    6.76    0.00  2.00
## WidthDifference       17 458    1.63    1.19    1.00    1.53    0.00  0.00
## PercentPremiumSeats   18 458   14.65    4.84   13.21   14.31    2.68  4.71
##                         max   range  skew kurtosis    se
## Airline*               6.00    5.00  0.61    -0.95  0.08
## Aircraft*              2.00    1.00 -0.72    -1.48  0.02
## FlightDuration        14.66   13.41 -0.07    -1.12  0.17
## TravelMonth*           4.00    3.00 -0.14    -1.46  0.05
## IsInternational*       2.00    1.00 -2.91     6.50  0.01
## SeatsEconomy         389.00  311.00  0.72    -0.36  3.57
## SeatsPremium          66.00   58.00  0.23    -0.46  0.62
## PitchEconomy          33.00    3.00 -0.03    -0.35  0.03
## PitchPremium          40.00    6.00 -1.51     3.52  0.06
## WidthEconomy          19.00    2.00 -0.04    -0.08  0.03
## WidthPremium          21.00    4.00 -0.08    -0.31  0.05
## PriceEconomy        3593.00 3528.00  0.51    -0.88 46.18
## PricePremium        7414.00 7328.00  0.50     0.43 60.19
## PriceRelative          1.89    1.87  1.17     0.72  0.02
## SeatsTotal           441.00  343.00  0.70    -0.53  3.99
## PitchDifference       10.00    8.00 -0.54     1.78  0.08
## WidthDifference        4.00    4.00  0.84    -0.53  0.06
## PercentPremiumSeats   24.69   19.98  0.71     0.28  0.23

Lets find the summmary statistics and generate tables for some important variables.Lets have a detailed look into the variables:

str(dd)
## 'data.frame':    458 obs. of  18 variables:
##  $ Airline            : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Aircraft           : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
##  $ FlightDuration     : num  12.25 12.25 12.25 12.25 8.16 ...
##  $ TravelMonth        : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
##  $ IsInternational    : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
##  $ SeatsEconomy       : int  122 122 122 122 122 122 122 122 122 122 ...
##  $ SeatsPremium       : int  40 40 40 40 40 40 40 40 40 40 ...
##  $ PitchEconomy       : int  31 31 31 31 31 31 31 31 31 31 ...
##  $ PitchPremium       : int  38 38 38 38 38 38 38 38 38 38 ...
##  $ WidthEconomy       : int  18 18 18 18 18 18 18 18 18 18 ...
##  $ WidthPremium       : int  19 19 19 19 19 19 19 19 19 19 ...
##  $ PriceEconomy       : int  2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
##  $ PricePremium       : int  3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
##  $ PriceRelative      : num  0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
##  $ SeatsTotal         : int  162 162 162 162 162 162 162 162 162 162 ...
##  $ PitchDifference    : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ WidthDifference    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PercentPremiumSeats: num  24.7 24.7 24.7 24.7 24.7 ...

Some Important tables

xtabs(~Airline + Aircraft, data=dd)
##            Aircraft
## Airline     AirBus Boeing
##   AirFrance     36     38
##   British       47    128
##   Delta         12     34
##   Jet            7     54
##   Singapore     16     24
##   Virgin        33     29
xtabs(~Airline + IsInternational,data=dd)
##            IsInternational
## Airline     Domestic International
##   AirFrance        0            74
##   British          0           175
##   Delta           40             6
##   Jet              0            61
##   Singapore        0            40
##   Virgin           0            62

Summary statistics are as follows:

summary(dd$PriceEconomy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      65     413    1242    1327    1909    3593
summary(dd$PricePremium)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    86.0   528.8  1737.0  1845.3  2989.0  7414.0

Lets study the box plots and bar plots of the variables

table(dd$Airline,dd$Aircraft)
##            
##             AirBus Boeing
##   AirFrance     36     38
##   British       47    128
##   Delta         12     34
##   Jet            7     54
##   Singapore     16     24
##   Virgin        33     29
table(dd$Airline,dd$IsInternational)
##            
##             Domestic International
##   AirFrance        0            74
##   British          0           175
##   Delta           40             6
##   Jet              0            61
##   Singapore        0            40
##   Virgin           0            62
table(dd$TravelMonth,dd$IsInternational)
##      
##       Domestic International
##   Aug       10           117
##   Jul       10            65
##   Oct       11           116
##   Sep        9           120
boxplot(dd$FlightDuration,
        xlab="Travel duration in Hours",
        main="Box plot of Flight Duration",horizontal=TRUE)

lets have some boxplots for the different parameters of Specifics related to the listed 2 classes.

boxplot(dd$SeatsEconomy,
        xlab="Number of seats in Economy class",col="yellow",
        main="Box plot of Number of seats in Economy class",horizontal=TRUE)

boxplot(dd$SeatsPremium,
        xlab="Number of seats in  Premium class",col="red",
        main="Box plot of Number of seats in Premium class",horizontal=TRUE)

boxplot(dd$PitchEconomy,
        xlab="Number of seats in  Premium class",col="darkolivegreen",
        main="Box plot of pitch in Economy class",horizontal=TRUE)

boxplot(dd$PitchPremium,
        xlab="Number of seats in  Premium class",col="darkorange",
        main="Box plot of Pitch in Premium class",horizontal=TRUE)

boxplot(dd$WidthEconomy,
        xlab="Number of seats in  Premium class",col="hotpink",
        main="Box plot of Width in Economy class",horizontal=TRUE)

boxplot(dd$WidthPremium,
        xlab="Number of seats in  Premium class",col="seagreen",
        main="Box plot of Width in Premium class",horizontal=TRUE)

lets study the correlation through a correlation Matrix in detail

dd1 <- subset(dd,select=c(SeatsEconomy,SeatsPremium,PitchEconomy,PitchPremium,WidthEconomy,
                          WidthPremium,PriceEconomy,PricePremium,PitchDifference,WidthDifference))
corrs <- cor(dd1, use="pairwise.complete.obs")
corrs
##                 SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy      1.00000000  0.625056587   0.14412692  0.119221250
## SeatsPremium      0.62505659  1.000000000  -0.03421296  0.004883123
## PitchEconomy      0.14412692 -0.034212963   1.00000000 -0.550606241
## PitchPremium      0.11922125  0.004883123  -0.55060624  1.000000000
## WidthEconomy      0.37367025  0.455782883   0.29448586 -0.023740873
## WidthPremium      0.10243196 -0.002717527  -0.53929285  0.750259029
## PriceEconomy      0.12816722  0.113642176   0.36866123  0.050384550
## PricePremium      0.17700093  0.217612376   0.22614179  0.088539147
## PitchDifference   0.03531804  0.016365566  -0.78254993  0.950591466
## WidthDifference  -0.08067015 -0.216168666  -0.63557430  0.703281797
##                 WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy      0.37367025  0.102431959   0.12816722   0.17700093
## SeatsPremium      0.45578288 -0.002717527   0.11364218   0.21761238
## PitchEconomy      0.29448586 -0.539292852   0.36866123   0.22614179
## PitchPremium     -0.02374087  0.750259029   0.05038455   0.08853915
## WidthEconomy      1.00000000  0.081918728   0.06799061   0.15054837
## WidthPremium      0.08191873  1.000000000  -0.05704522   0.06402004
## PriceEconomy      0.06799061 -0.057045224   1.00000000   0.90138870
## PricePremium      0.15054837  0.064020043   0.90138870   1.00000000
## PitchDifference  -0.12722421  0.760121272  -0.09952511  -0.01806629
## WidthDifference  -0.39320512  0.884149655  -0.08449975  -0.01151218
##                 PitchDifference WidthDifference
## SeatsEconomy         0.03531804     -0.08067015
## SeatsPremium         0.01636557     -0.21616867
## PitchEconomy        -0.78254993     -0.63557430
## PitchPremium         0.95059147      0.70328180
## WidthEconomy        -0.12722421     -0.39320512
## WidthPremium         0.76012127      0.88414965
## PriceEconomy        -0.09952511     -0.08449975
## PricePremium        -0.01806629     -0.01151218
## PitchDifference      1.00000000      0.76089108
## WidthDifference      0.76089108      1.00000000

Lets have a correlation matrix of the entire dataset.

par(mfrow=c(1,1))
 corrplot(corr=cor(dd[,c(8:18)]),use="complete.obs",
              method="ellipse")

Lets us statistically answer some qestions related to the airlines dataset in order to get better insights.

Q1) What is the maximum price of tickets in Economy and Premium class for Six differnt airlines? Maximum price of the ticket of economy class is 3593 USD and that of premium economy class is 7414 USD

p<- aggregate(dd$PriceEconomy,by=list(Name_of_Airline=dd$Airline),max)
names(p)[2] <- "Max Price"
p
##   Name_of_Airline Max Price
## 1       AirFrance      3593
## 2         British      3102
## 3           Delta      1999
## 4             Jet       676
## 5       Singapore      1431
## 6          Virgin      2445
q<- aggregate(dd$PriceEconomy,by=list(Name_of_Airline=dd$Airline),min)
names(q)[2] <- "Min Price"
q
##   Name_of_Airline Min Price
## 1       AirFrance       630
## 2         British        65
## 3           Delta       158
## 4             Jet       108
## 5       Singapore       505
## 6          Virgin       540

The following table will give the maximum price of Economy class for different airlines and also shows the minimum price for travelling by Economy class

by(dd$PricePremium,dd$Airline,max)
## dd$Airline: AirFrance
## [1] 3972
## -------------------------------------------------------- 
## dd$Airline: British
## [1] 7414
## -------------------------------------------------------- 
## dd$Airline: Delta
## [1] 2765
## -------------------------------------------------------- 
## dd$Airline: Jet
## [1] 931
## -------------------------------------------------------- 
## dd$Airline: Singapore
## [1] 1947
## -------------------------------------------------------- 
## dd$Airline: Virgin
## [1] 3694
r<- aggregate(dd$PricePremium,by=list(Name_of_Airline=dd$Airline),min)
names(r)[2] <- "Min Price"
r
##   Name_of_Airline Min Price
## 1       AirFrance      1611
## 2         British        86
## 3           Delta       173
## 4             Jet       228
## 5       Singapore       619
## 6          Virgin       594

The table here is self explanatory and shows the maximum and minimum price of tickets in Premium class for 6 different airlines.

Q2)Which of the following six airlines are cheapest and Costliest for travelling?

s<- aggregate(dd$PricePremium,by=list(Name_of_Airline=dd$Airline),mean)
names(s)[2] <- "Average Price of Economy"
s
##   Name_of_Airline Average Price of Economy
## 1       AirFrance                3065.2162
## 2         British                1937.0286
## 3           Delta                 684.6739
## 4             Jet                 483.3607
## 5       Singapore                1239.9250
## 6          Virgin                2721.6935
t<- aggregate(dd$PricePremium,by=list(Name_of_Airline=dd$Airline),mean)
names(t)[2] <- "Average Price of Premium"
t
##   Name_of_Airline Average Price of Premium
## 1       AirFrance                3065.2162
## 2         British                1937.0286
## 3           Delta                 684.6739
## 4             Jet                 483.3607
## 5       Singapore                1239.9250
## 6          Virgin                2721.6935

Air France is the costliest flight for travelling while jet and delta flight seems to be cheapest.

Q3)Which is the best month to travel irrespective of the airlines? July is the best month to travel.

tab2 <- bwplot(TravelMonth ~ PriceEconomy, data= dd, horizontal = TRUE)
tab2

tab3<- bwplot(TravelMonth ~ PricePremium, data= dd, horizontal = TRUE)
tab3

This shows that July is the best Month for travelling irrespective of Airlines,Outliers and Class.

Q4) what is the average number of seats alloted for Economy and Premium class irrespective of the airlines?

mlv(dd$SeatsEconomy, method = "mfv")
## Mode (most frequent value): 303 
## Bickel's modal skewness: -0.7161572 
## Call: mlv.integer(x = dd$SeatsEconomy, method = "mfv")
mlv(dd$SeatsPremium, method = "mfv")
## Mode (most frequent value): 36 
## Bickel's modal skewness: -0.07641921 
## Call: mlv.integer(x = dd$SeatsPremium, method = "mfv")

Most of the flights have 303 economy seats and 36 premium economy seats

cor(dd$SeatsEconomy,dd$SeatsPremium)
## [1] 0.6250566
cor(dd$PriceEconomy,dd$PricePremium)
## [1] 0.9013887

As can be seen from the above analysis Prices of Econommy and Premium class are highly correlated. Apart from that the number of seats from the Economy and Premium class are highly correlated. These results come out as expected.

lets us study the inter correlations between the same classes

cor(dd$PitchPremium,dd$WidthPremium)
## [1] 0.750259
cor(dd$PitchDifference,dd$WidthDifference)
## [1] 0.7608911

Also the pitch and the width of premium class is moderately correlated as expected and seen.

Lets us the study the effect on various factors on prices of the Economy and Premium classes respectively

cor.test(dd$PriceEconomy,dd$SeatsPremium)
## 
##  Pearson's product-moment correlation
## 
## data:  dd$PriceEconomy and dd$SeatsPremium
## t = 2.4426, df = 456, p-value = 0.01496
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.02224701 0.20315362
## sample estimates:
##       cor 
## 0.1136422

P-value from pearson’s test comes out to be 2.2e-16 which is less than 0.05. The two variables that is price of economy ticket and price of premium economy ticket are strongly correlated(0.901) and have a statistical significance