R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

air <- read.csv("AirlinesData.csv")
library(psych)
## Warning: package 'psych' was built under R version 3.4.3
describe(air)
##                     vars   n    mean      sd  median trimmed     mad   min
## Airline*               1 458    3.01    1.65    2.00    2.89    1.48  1.00
## Aircraft*              2 458    1.67    0.47    2.00    1.71    0.00  1.00
## FlightDuration         3 458    7.58    3.54    7.79    7.57    4.81  1.25
## TravelMonth*           4 458    2.56    1.17    3.00    2.58    1.48  1.00
## IsInternational*       5 458    1.91    0.28    2.00    2.00    0.00  1.00
## SeatsEconomy           6 458  202.31   76.37  185.00  194.64   85.99 78.00
## SeatsPremium           7 458   33.65   13.26   36.00   33.35   11.86  8.00
## PitchEconomy           8 458   31.22    0.66   31.00   31.26    0.00 30.00
## PitchPremium           9 458   37.91    1.31   38.00   38.05    0.00 34.00
## WidthEconomy          10 458   17.84    0.56   18.00   17.81    0.00 17.00
## WidthPremium          11 458   19.47    1.10   19.00   19.53    0.00 17.00
## PriceEconomy          12 458 1327.08  988.27 1242.00 1244.40 1159.39 65.00
## PricePremium          13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative         14 458    0.49    0.45    0.36    0.42    0.41  0.02
## SeatsTotal            15 458  235.96   85.29  227.00  228.73   90.44 98.00
## PitchDifference       16 458    6.69    1.76    7.00    6.76    0.00  2.00
## WidthDifference       17 458    1.63    1.19    1.00    1.53    0.00  0.00
## PercentPremiumSeats   18 458   14.65    4.84   13.21   14.31    2.68  4.71
##                         max   range  skew kurtosis    se
## Airline*               6.00    5.00  0.61    -0.95  0.08
## Aircraft*              2.00    1.00 -0.72    -1.48  0.02
## FlightDuration        14.66   13.41 -0.07    -1.12  0.17
## TravelMonth*           4.00    3.00 -0.14    -1.46  0.05
## IsInternational*       2.00    1.00 -2.91     6.50  0.01
## SeatsEconomy         389.00  311.00  0.72    -0.36  3.57
## SeatsPremium          66.00   58.00  0.23    -0.46  0.62
## PitchEconomy          33.00    3.00 -0.03    -0.35  0.03
## PitchPremium          40.00    6.00 -1.51     3.52  0.06
## WidthEconomy          19.00    2.00 -0.04    -0.08  0.03
## WidthPremium          21.00    4.00 -0.08    -0.31  0.05
## PriceEconomy        3593.00 3528.00  0.51    -0.88 46.18
## PricePremium        7414.00 7328.00  0.50     0.43 60.19
## PriceRelative          1.89    1.87  1.17     0.72  0.02
## SeatsTotal           441.00  343.00  0.70    -0.53  3.99
## PitchDifference       10.00    8.00 -0.54     1.78  0.08
## WidthDifference        4.00    4.00  0.84    -0.53  0.06
## PercentPremiumSeats   24.69   19.98  0.71     0.28  0.23

Including Plots

You can also embed plots, for example:

str(air)
## 'data.frame':    458 obs. of  18 variables:
##  $ Airline            : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Aircraft           : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
##  $ FlightDuration     : num  12.25 12.25 12.25 12.25 8.16 ...
##  $ TravelMonth        : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
##  $ IsInternational    : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
##  $ SeatsEconomy       : int  122 122 122 122 122 122 122 122 122 122 ...
##  $ SeatsPremium       : int  40 40 40 40 40 40 40 40 40 40 ...
##  $ PitchEconomy       : int  31 31 31 31 31 31 31 31 31 31 ...
##  $ PitchPremium       : int  38 38 38 38 38 38 38 38 38 38 ...
##  $ WidthEconomy       : int  18 18 18 18 18 18 18 18 18 18 ...
##  $ WidthPremium       : int  19 19 19 19 19 19 19 19 19 19 ...
##  $ PriceEconomy       : int  2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
##  $ PricePremium       : int  3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
##  $ PriceRelative      : num  0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
##  $ SeatsTotal         : int  162 162 162 162 162 162 162 162 162 162 ...
##  $ PitchDifference    : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ WidthDifference    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PercentPremiumSeats: num  24.7 24.7 24.7 24.7 24.7 ...
summary(air)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69

Now plotting the dependent variable that is pricedifference

hist(air$PriceRelative,main="Price Difference",xlab="Price Difference")

library(car)
## Warning: package 'car' was built under R version 3.4.3
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(air$PriceRelative~air$FlightDuration,spread=FALSE,smoother.args=list(lty=2))

boxplot(air$PriceRelative~air$TravelMonth,spread=FALSE)

There is not much difference in price relative for different months.

boxplot(air$FlightDuration~air$Airline,col="purple",horizontal=TRUE,xlab="Flight Duration")

boxplot(air$PriceRelative~air$Aircraft,spread=FALSE,col="green",horizontal = TRUE)

boxplot(air$PriceRelative~air$Airline,spread=FALSE,horizontal=TRUE,col="red")

Clearly Price Difference is statistically dependent on the airline

scatterplot(air$PriceRelative~air$PercentPremiumSeats,spread=FALSE,smoother.args=list(lty=2))

boxplot(air$PriceRelative~air$WidthDifference,spread=FALSE,horizontal=TRUE,col="yellow",xlab="Price Difference",ylab="Width Difference")

As the width difference increases,the price tends to increasehowever a correlational test would be the best to check this trend.

library(corrgram)
## Warning: package 'corrgram' was built under R version 3.4.3
corrgram(air,order=FALSE,lower.panel = panel.shade,upper.panel = panel.pie,text.panel = panel.txt,main="Corrgram")

round(cor(air[,6:18]),2)
##                     SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy                1.00         0.63         0.14         0.12
## SeatsPremium                0.63         1.00        -0.03         0.00
## PitchEconomy                0.14        -0.03         1.00        -0.55
## PitchPremium                0.12         0.00        -0.55         1.00
## WidthEconomy                0.37         0.46         0.29        -0.02
## WidthPremium                0.10         0.00        -0.54         0.75
## PriceEconomy                0.13         0.11         0.37         0.05
## PricePremium                0.18         0.22         0.23         0.09
## PriceRelative               0.00        -0.10        -0.42         0.42
## SeatsTotal                  0.99         0.72         0.12         0.11
## PitchDifference             0.04         0.02        -0.78         0.95
## WidthDifference            -0.08        -0.22        -0.64         0.70
## PercentPremiumSeats        -0.33         0.49        -0.10        -0.18
##                     WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy                0.37         0.10         0.13         0.18
## SeatsPremium                0.46         0.00         0.11         0.22
## PitchEconomy                0.29        -0.54         0.37         0.23
## PitchPremium               -0.02         0.75         0.05         0.09
## WidthEconomy                1.00         0.08         0.07         0.15
## WidthPremium                0.08         1.00        -0.06         0.06
## PriceEconomy                0.07        -0.06         1.00         0.90
## PricePremium                0.15         0.06         0.90         1.00
## PriceRelative              -0.04         0.50        -0.29         0.03
## SeatsTotal                  0.41         0.09         0.13         0.19
## PitchDifference            -0.13         0.76        -0.10        -0.02
## WidthDifference            -0.39         0.88        -0.08        -0.01
## PercentPremiumSeats         0.23        -0.18         0.07         0.12
##                     PriceRelative SeatsTotal PitchDifference
## SeatsEconomy                 0.00       0.99            0.04
## SeatsPremium                -0.10       0.72            0.02
## PitchEconomy                -0.42       0.12           -0.78
## PitchPremium                 0.42       0.11            0.95
## WidthEconomy                -0.04       0.41           -0.13
## WidthPremium                 0.50       0.09            0.76
## PriceEconomy                -0.29       0.13           -0.10
## PricePremium                 0.03       0.19           -0.02
## PriceRelative                1.00      -0.01            0.47
## SeatsTotal                  -0.01       1.00            0.03
## PitchDifference              0.47       0.03            1.00
## WidthDifference              0.49      -0.11            0.76
## PercentPremiumSeats         -0.16      -0.22           -0.09
##                     WidthDifference PercentPremiumSeats
## SeatsEconomy                  -0.08               -0.33
## SeatsPremium                  -0.22                0.49
## PitchEconomy                  -0.64               -0.10
## PitchPremium                   0.70               -0.18
## WidthEconomy                  -0.39                0.23
## WidthPremium                   0.88               -0.18
## PriceEconomy                  -0.08                0.07
## PricePremium                  -0.01                0.12
## PriceRelative                  0.49               -0.16
## SeatsTotal                    -0.11               -0.22
## PitchDifference                0.76               -0.09
## WidthDifference                1.00               -0.28
## PercentPremiumSeats           -0.28                1.00
cor.test(air$PitchDifference,air$PitchPremium)
## 
##  Pearson's product-moment correlation
## 
## data:  air$PitchDifference and air$PitchPremium
## t = 65.387, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9409183 0.9587146
## sample estimates:
##       cor 
## 0.9505915

There is a high correlation between PitchDifference and PitchPremium.so there are not independent factors.

fit1<-lm(air$PitchDifference~air$PitchEconomy+air$PitchPremium+air$WidthEconomy)
summary(fit1)
## Warning in summary.lm(fit1): essentially perfect fit: summary may be
## unreliable
## 
## Call:
## lm(formula = air$PitchDifference ~ air$PitchEconomy + air$PitchPremium + 
##     air$WidthEconomy)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.348e-13  3.240e-16  3.240e-16  3.240e-16  4.433e-14 
## 
## Coefficients:
##                    Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)      -4.940e-13  2.640e-14 -1.872e+01   <2e-16 ***
## air$PitchEconomy -1.000e+00  6.272e-16 -1.594e+15   <2e-16 ***
## air$PitchPremium  1.000e+00  2.990e-16  3.345e+15   <2e-16 ***
## air$WidthEconomy -3.045e-16  6.154e-16 -4.950e-01    0.621    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.903e-15 on 454 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 9.921e+30 on 3 and 454 DF,  p-value: < 2.2e-16

SOclearly PitchDifference is dependent on PitchEconomy also.so these are not independent variables.

cor.test(air$WidthDifference,air$WidthPremium)
## 
##  Pearson's product-moment correlation
## 
## data:  air$WidthDifference and air$WidthPremium
## t = 40.411, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8623863 0.9026511
## sample estimates:
##       cor 
## 0.8841497

Same thing happens with width.width difference highly dependent on widtheconomy and widthpremium.

boxplot(PriceRelative~IsInternational,data = air,col="blue",horizontal=TRUE)

So international flights have higher price difference than domestic.still will check through a regression

scatterplot(air$PercentPremiumSeats,air$PriceRelative,spread=FALSE,smoother.args=list(lty=2))

cor.test(air$PercentPremiumSeats,air$FlightDuration)
## 
##  Pearson's product-moment correlation
## 
## data:  air$PercentPremiumSeats and air$FlightDuration
## t = 1.2946, df = 456, p-value = 0.1961
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.03128403  0.15130409
## sample estimates:
##        cor 
## 0.06051625

since p>0.05 so true correlation is zero between Percent Premium Seats and Flight DUration

scatterplot(air$SeatsTotal,air$PriceRelative,spread = FALSE,smoother.args = list(lty=2))

cor.test(air$PitchDifference,air$WidthDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  air$PitchDifference and air$WidthDifference
## t = 25.04, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7194209 0.7969557
## sample estimates:
##       cor 
## 0.7608911

SO pitch Difference and Width Difference are highly correlated that is they are not independent now to test how well they are correlated to price difference.

cor.test(air$PriceRelative,air$PitchDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  air$PriceRelative and air$PitchDifference
## t = 11.331, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3940262 0.5372817
## sample estimates:
##       cor 
## 0.4687302

Since p<<<0.05,there is high correlation.

cor.test(air$PriceRelative,air$WidthDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  air$PriceRelative and air$WidthDifference
## t = 11.869, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4125388 0.5528218
## sample estimates:
##       cor 
## 0.4858024

SO there is high correlation.

model <- lm(air$PriceRelative~air$Airline+air$Aircraft+air$FlightDuration+air$TravelMonth+air$IsInternational+air$PitchDifference+air$SeatsTotal+air$PercentPremiumSeats)
summary(model)
## 
## Call:
## lm(formula = air$PriceRelative ~ air$Airline + air$Aircraft + 
##     air$FlightDuration + air$TravelMonth + air$IsInternational + 
##     air$PitchDifference + air$SeatsTotal + air$PercentPremiumSeats)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.86507 -0.20861 -0.05295  0.11137  1.49224 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)                      -2.063e-01  2.070e-01  -0.996 0.319621
## air$AirlineBritish                2.535e-01  7.252e-02   3.496 0.000520
## air$AirlineDelta                  1.336e-01  1.611e-01   0.829 0.407430
## air$AirlineJet                    5.173e-01  1.411e-01   3.666 0.000276
## air$AirlineSingapore              2.806e-01  7.198e-02   3.899 0.000112
## air$AirlineVirgin                 5.119e-01  7.961e-02   6.430 3.31e-10
## air$AircraftBoeing                2.057e-02  4.429e-02   0.464 0.642582
## air$FlightDuration                3.376e-02  6.588e-03   5.125 4.45e-07
## air$TravelMonthJul               -1.859e-02  5.273e-02  -0.353 0.724580
## air$TravelMonthOct                5.413e-02  4.481e-02   1.208 0.227754
## air$TravelMonthSep               -1.068e-02  4.467e-02  -0.239 0.811092
## air$IsInternationalInternational -3.203e-01  2.617e-01  -1.224 0.221598
## air$PitchDifference               9.854e-02  3.785e-02   2.603 0.009541
## air$SeatsTotal                   -8.353e-05  3.109e-04  -0.269 0.788304
## air$PercentPremiumSeats          -1.400e-02  5.502e-03  -2.544 0.011283
##                                     
## (Intercept)                         
## air$AirlineBritish               ***
## air$AirlineDelta                    
## air$AirlineJet                   ***
## air$AirlineSingapore             ***
## air$AirlineVirgin                ***
## air$AircraftBoeing                  
## air$FlightDuration               ***
## air$TravelMonthJul                  
## air$TravelMonthOct                  
## air$TravelMonthSep                  
## air$IsInternationalInternational    
## air$PitchDifference              ** 
## air$SeatsTotal                      
## air$PercentPremiumSeats          *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.357 on 443 degrees of freedom
## Multiple R-squared:  0.3916, Adjusted R-squared:  0.3724 
## F-statistic: 20.37 on 14 and 443 DF,  p-value: < 2.2e-16
confint(model)
##                                          2.5 %        97.5 %
## (Intercept)                      -0.6131972478  0.2006197173
## air$AirlineBritish                0.1109811169  0.3960187832
## air$AirlineDelta                 -0.1830596880  0.4502772832
## air$AirlineJet                    0.2400326198  0.7946524604
## air$AirlineSingapore              0.1391596548  0.4220731791
## air$AirlineVirgin                 0.3554144754  0.6683273505
## air$AircraftBoeing               -0.0664731340  0.1076088802
## air$FlightDuration                0.0208154606  0.0467110565
## air$TravelMonthJul               -0.1222268861  0.0850439158
## air$TravelMonthOct               -0.0339454272  0.1421986158
## air$TravelMonthSep               -0.0984724448  0.0771063697
## air$IsInternationalInternational -0.8346998221  0.1940148311
## air$PitchDifference               0.0241510134  0.1729306533
## air$SeatsTotal                   -0.0006945588  0.0005274948
## air$PercentPremiumSeats          -0.0248126751 -0.0031863978

confidence interval of Aircraft,Travel Month,IsInternational,Seats Total include Zero.

coefficients(model)
##                      (Intercept)               air$AirlineBritish 
##                     -0.206288765                      0.253499950 
##                 air$AirlineDelta                   air$AirlineJet 
##                      0.133608798                      0.517342540 
##             air$AirlineSingapore                air$AirlineVirgin 
##                      0.280616417                      0.511870913 
##               air$AircraftBoeing               air$FlightDuration 
##                      0.020567873                      0.033763259 
##               air$TravelMonthJul               air$TravelMonthOct 
##                     -0.018591485                      0.054126594 
##               air$TravelMonthSep air$IsInternationalInternational 
##                     -0.010683038                     -0.320342496 
##              air$PitchDifference                   air$SeatsTotal 
##                      0.098540833                     -0.000083532 
##          air$PercentPremiumSeats 
##                     -0.013999536
                             SUMMARY

SO we wanted to find out what are the factors that decide the Price DIfference.Now after fitting a linear model we see that Aircraft,Month,IsInternational,Seats Total have confidence Intervals that include Zero and thier p value is less than 0.05.so these factors are to be excluded. Since Width Difference and Pitch Difference were highly correlated I did not take them both in the linear Model yet if considered Independent the pearson test shows they both are highly correlated to the Price Difference.So the Statistically Significant Factors are Airline,Flight Duration,Pitch Difference,Width Difference and PercentPremiumSeats.