R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

air<-read.csv(paste("SixAirlinesDatav2.csv",sep=""))

str(air)
## 'data.frame':    458 obs. of  18 variables:
##  $ Airline            : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Aircraft           : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
##  $ FlightDuration     : num  12.25 12.25 12.25 12.25 8.16 ...
##  $ TravelMonth        : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
##  $ IsInternational    : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
##  $ SeatsEconomy       : int  122 122 122 122 122 122 122 122 122 122 ...
##  $ SeatsPremium       : int  40 40 40 40 40 40 40 40 40 40 ...
##  $ PitchEconomy       : int  31 31 31 31 31 31 31 31 31 31 ...
##  $ PitchPremium       : int  38 38 38 38 38 38 38 38 38 38 ...
##  $ WidthEconomy       : int  18 18 18 18 18 18 18 18 18 18 ...
##  $ WidthPremium       : int  19 19 19 19 19 19 19 19 19 19 ...
##  $ PriceEconomy       : int  2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
##  $ PricePremium       : int  3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
##  $ PriceRelative      : num  0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
##  $ SeatsTotal         : int  162 162 162 162 162 162 162 162 162 162 ...
##  $ PitchDifference    : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ WidthDifference    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PercentPremiumSeats: num  24.7 24.7 24.7 24.7 24.7 ...

Data set Description

library(psych)
describe(air)
##                     vars   n    mean      sd  median trimmed     mad   min
## Airline*               1 458    3.01    1.65    2.00    2.89    1.48  1.00
## Aircraft*              2 458    1.67    0.47    2.00    1.71    0.00  1.00
## FlightDuration         3 458    7.58    3.54    7.79    7.57    4.81  1.25
## TravelMonth*           4 458    2.56    1.17    3.00    2.58    1.48  1.00
## IsInternational*       5 458    1.91    0.28    2.00    2.00    0.00  1.00
## SeatsEconomy           6 458  202.31   76.37  185.00  194.64   85.99 78.00
## SeatsPremium           7 458   33.65   13.26   36.00   33.35   11.86  8.00
## PitchEconomy           8 458   31.22    0.66   31.00   31.26    0.00 30.00
## PitchPremium           9 458   37.91    1.31   38.00   38.05    0.00 34.00
## WidthEconomy          10 458   17.84    0.56   18.00   17.81    0.00 17.00
## WidthPremium          11 458   19.47    1.10   19.00   19.53    0.00 17.00
## PriceEconomy          12 458 1327.08  988.27 1242.00 1244.40 1159.39 65.00
## PricePremium          13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative         14 458    0.49    0.45    0.36    0.42    0.41  0.02
## SeatsTotal            15 458  235.96   85.29  227.00  228.73   90.44 98.00
## PitchDifference       16 458    6.69    1.76    7.00    6.76    0.00  2.00
## WidthDifference       17 458    1.63    1.19    1.00    1.53    0.00  0.00
## PercentPremiumSeats   18 458   14.65    4.84   13.21   14.31    2.68  4.71
##                         max   range  skew kurtosis    se
## Airline*               6.00    5.00  0.61    -0.95  0.08
## Aircraft*              2.00    1.00 -0.72    -1.48  0.02
## FlightDuration        14.66   13.41 -0.07    -1.12  0.17
## TravelMonth*           4.00    3.00 -0.14    -1.46  0.05
## IsInternational*       2.00    1.00 -2.91     6.50  0.01
## SeatsEconomy         389.00  311.00  0.72    -0.36  3.57
## SeatsPremium          66.00   58.00  0.23    -0.46  0.62
## PitchEconomy          33.00    3.00 -0.03    -0.35  0.03
## PitchPremium          40.00    6.00 -1.51     3.52  0.06
## WidthEconomy          19.00    2.00 -0.04    -0.08  0.03
## WidthPremium          21.00    4.00 -0.08    -0.31  0.05
## PriceEconomy        3593.00 3528.00  0.51    -0.88 46.18
## PricePremium        7414.00 7328.00  0.50     0.43 60.19
## PriceRelative          1.89    1.87  1.17     0.72  0.02
## SeatsTotal           441.00  343.00  0.70    -0.53  3.99
## PitchDifference       10.00    8.00 -0.54     1.78  0.08
## WidthDifference        4.00    4.00  0.84    -0.53  0.06
## PercentPremiumSeats   24.69   19.98  0.71     0.28  0.23

Data set Summary

summary(air)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69
plot(air$IsInternational,main = "domestic vs international flights",col="grey")

plot(air$TravelMonth,main = "Monthly Travel",col="grey", xlab="month", ylab="count")

plot(air$Airline, air$SeatsEconomy, main="Airline and The Total Number of seats in Economy Class",col = c("light blue","light green","orange","cyan","grey","light yellow"))

plot(air$Airline, air$SeatsPremium, main="Airline and The Total Number of seats in Premium Class",col = c("light blue","light green","orange","cyan","grey","light yellow"))

boxplot(FlightDuration~Aircraft,data=air,xlab="Aircraft type", ylab="Flight duration",col = c("light blue","light green"))

par(mfrow=c(1,2))
hist(air$WidthEconomy, xlab="Economy Seats Width",col = "violet",main="Economy class")
hist(air$WidthPremium, xlab="Premium Seats Width",col = "grey",main="Premium class")

par(mfrow=c(1,2))
hist(air$PitchEconomy, xlab="Economy Seats Pitch",col = "light blue",main="Economy class ")
hist(air$PitchPremium, xlab="Premium Seats Pitch",col = "green",main="Premium class ")

par(mfrow=c(1,2))
hist(air$PriceEconomy, xlab="Economy Seats Price",col = "orange",main="Economy class")
hist(air$PricePremium, xlab="Premium Seats Price",col = "light green",main="Premium class")

aggregate(air$PricePremium~air$Airline, FUN=mean)
##   air$Airline air$PricePremium
## 1   AirFrance        3065.2162
## 2     British        1937.0286
## 3       Delta         684.6739
## 4         Jet         483.3607
## 5   Singapore        1239.9250
## 6      Virgin        2721.6935
aggregate(air$PricePremium~air$Airline, FUN=mean)
##   air$Airline air$PricePremium
## 1   AirFrance        3065.2162
## 2     British        1937.0286
## 3       Delta         684.6739
## 4         Jet         483.3607
## 5   Singapore        1239.9250
## 6      Virgin        2721.6935

At higher percentage relative pricing seem to be lower in general.

aggregate(air$PriceRelative~air$PercentPremiumSeats, FUN = mean)
##    air$PercentPremiumSeats air$PriceRelative
## 1                     4.71        0.95263158
## 2                     8.90        0.20928571
## 3                     9.76        0.80000000
## 4                    10.00        1.32000000
## 5                    10.57        0.60416667
## 6                    11.43        1.15370370
## 7                    12.12        0.03000000
## 8                    12.28        0.08590909
## 9                    12.50        0.23125000
## 10                   12.82        0.07200000
## 11                   12.90        0.50414634
## 12                   13.04        0.06200000
## 13                   13.13        0.10750000
## 14                   13.21        0.34958333
## 15                   14.02        0.61111111
## 16                   14.50        0.09500000
## 17                   14.97        0.74000000
## 18                   14.99        0.31000000
## 19                   15.02        1.03523810
## 20                   15.36        0.32211538
## 21                   16.46        0.09000000
## 22                   16.87        0.39625000
## 23                   18.73        0.73500000
## 24                   20.41        0.06125000
## 25                   20.60        0.44666667
## 26                   23.49        0.41600000
## 27                   24.69        0.42254902

It is confirmed that higher perks would lead to more relative pricing. A width difference of 4 led to the almost doubling of prices from economic to premium.

library(lattice)
bwplot(PriceRelative~Airline|IsInternational, data=air)

bwplot(PriceRelative~Aircraft|IsInternational, data=air)

bwplot(PriceRelative~TravelMonth, data=air)

Relative prices were higher for Boeing than Airbus

library (car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
par(mfrow=c(1,2))
with(air,plot(Airline,PriceEconomy, xlab = "Airline", ylab="Price Economy"))
with(air,plot(Airline,PricePremium,xlab = "Airline", ylab="Price Premium"))

par(mfrow=c(1,2))
with(air,plot(FlightDuration,PriceEconomy))
with(air,plot(FlightDuration,PricePremium))

we notice that flight durations between 0 to 4 hrs have low prices while that for 5 to 14 hrs is widely distributed

par(mfrow=c(1,2))
with(air,plot(TravelMonth,PriceEconomy, ylab = "Economy Price"))
with(air,plot(TravelMonth,PricePremium, ylab="Premium Price"))

pricing for the month of july has a lower mean than the other months

pairs(formula=~PriceRelative+PitchDifference, data=air)

library(corrgram)
corrgram(air, order=NULL, panel=panel.cor,text.panel=panel.txt,main="Corrogram")

corrgram(air, order=TRUE, upper.panel=panel.pie,lower.panel=panel.shade, text.panel=panel.txt,main="Corrgram")

This confirms the hypothesis that premium seats with more perks would be priced more higher relative to economy tickets.

Using the probable factors, a regression model is proposed

T-test Hypotheses H1: There is no relation between relative price and width difference. H2: There is no relation between relative price and pitch difference.

fit<-lm(PriceRelative~WidthDifference+PitchDifference, data=air)
summary(fit)
## 
## Call:
## lm(formula = PriceRelative ~ WidthDifference + PitchDifference, 
##     data = air)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.84163 -0.28484 -0.07241  0.17698  1.18778 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -0.10514    0.08304  -1.266 0.206077    
## WidthDifference  0.11621    0.02356   4.933 1.14e-06 ***
## PitchDifference  0.06019    0.01590   3.785 0.000174 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3886 on 455 degrees of freedom
## Multiple R-squared:  0.2593, Adjusted R-squared:  0.2561 
## F-statistic: 79.65 on 2 and 455 DF,  p-value: < 2.2e-16
fit<-lm(PricePremium~PriceEconomy+WidthDifference+PitchDifference, data=air)
summary(fit)
## 
## Call:
## lm(formula = PricePremium ~ PriceEconomy + WidthDifference + 
##     PitchDifference, data = air)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -809.9 -325.8  -97.1  176.3 3470.6 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -33.37619  125.69676  -0.266   0.7907    
## PriceEconomy      1.18456    0.02623  45.152   <2e-16 ***
## WidthDifference  26.25535   33.43032   0.785   0.4326    
## PitchDifference  39.43892   22.59939   1.745   0.0816 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 551.5 on 454 degrees of freedom
## Multiple R-squared:  0.8179, Adjusted R-squared:  0.8167 
## F-statistic: 679.9 on 3 and 454 DF,  p-value: < 2.2e-16

The Null hypothesis is rejected as the p value << 0.05 and Relative price mainly depends on the width and pitch difference of the seats.

fit1<-lm(air$PriceRelative~air$TravelMonth)
summary(fit1)
## 
## Call:
## lm(formula = air$PriceRelative ~ air$TravelMonth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4908 -0.3779 -0.1179  0.2523  1.4321 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         0.47661    0.04005  11.899   <2e-16 ***
## air$TravelMonthJul  0.02205    0.06574   0.335    0.737    
## air$TravelMonthOct  0.04417    0.05665   0.780    0.436    
## air$TravelMonthSep -0.01871    0.05643  -0.332    0.740    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4514 on 454 degrees of freedom
## Multiple R-squared:  0.002997,   Adjusted R-squared:  -0.003591 
## F-statistic: 0.4549 on 3 and 454 DF,  p-value: 0.714
Model1 <- PricePremium ~ PitchPremium + WidthPremium + SeatsPremium
fit2 <- lm(Model1, data = air)
summary(fit2)
## 
## Call:
## lm(formula = Model1, data = air)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2219.2  -936.9  -120.4  1078.6  5762.8 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2127.171   1736.937  -1.225    0.221    
## PitchPremium    87.481     67.656   1.293    0.197    
## WidthPremium    -2.744     81.021  -0.034    0.973    
## SeatsPremium    21.095      4.432   4.760 2.61e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1256 on 454 degrees of freedom
## Multiple R-squared:  0.05501,    Adjusted R-squared:  0.04877 
## F-statistic: 8.809 on 3 and 454 DF,  p-value: 1.094e-05

P-value ??? 0.05 : The correlation is statistically significant

k<-lm(FlightDuration~PricePremium+PriceEconomy, data=air)
summary(k)
## 
## Call:
## lm(formula = FlightDuration ~ PricePremium + PriceEconomy, data = air)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4012 -2.0051 -0.6418  1.2002  7.6382 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.3037849  0.2208679  19.486   <2e-16 ***
## PricePremium  0.0020235  0.0002262   8.945   <2e-16 ***
## PriceEconomy -0.0003465  0.0002949  -1.175    0.241    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.697 on 455 degrees of freedom
## Multiple R-squared:  0.4226, Adjusted R-squared:  0.4201 
## F-statistic: 166.5 on 2 and 455 DF,  p-value: < 2.2e-16
t.test(I(air$PriceEconomy+air$PricePremium)~air$Aircraft)
## 
##  Welch Two Sample t-test
## 
## data:  I(air$PriceEconomy + air$PricePremium) by air$Aircraft
## t = 0.4542, df = 300.02, p-value = 0.65
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -333.7254  534.0008
## sample estimates:
## mean in group AirBus mean in group Boeing 
##             3239.457             3139.319

Final Summary: Difference between the maximum and minimum ticket cost (for both classes) is dependent upon the airline type.

The Airline factor is statistically related to the economy class air ticket price, the premium economy class air ticket price and the relative price of both the classes from the correlation tests for the same.

Due to the p value being < 0.05, fight duration factor is very much significantly important in determining the difference in the prices of the economy class and the premium economy class air tickets.

The Aircraft factor is negatively correlated to the difference in the prices of the economy and premium economy class air tickets due to negative correlation coefficient. With a p-value > 0.05, the Aircraft factor s not a significant contributor to the latter.