This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
air <- read.csv("AirlinesData.csv")
library(psych)
## Warning: package 'psych' was built under R version 3.4.3
describe(air)
## vars n mean sd median trimmed mad min
## Airline* 1 458 3.01 1.65 2.00 2.89 1.48 1.00
## Aircraft* 2 458 1.67 0.47 2.00 1.71 0.00 1.00
## FlightDuration 3 458 7.58 3.54 7.79 7.57 4.81 1.25
## TravelMonth* 4 458 2.56 1.17 3.00 2.58 1.48 1.00
## IsInternational* 5 458 1.91 0.28 2.00 2.00 0.00 1.00
## SeatsEconomy 6 458 202.31 76.37 185.00 194.64 85.99 78.00
## SeatsPremium 7 458 33.65 13.26 36.00 33.35 11.86 8.00
## PitchEconomy 8 458 31.22 0.66 31.00 31.26 0.00 30.00
## PitchPremium 9 458 37.91 1.31 38.00 38.05 0.00 34.00
## WidthEconomy 10 458 17.84 0.56 18.00 17.81 0.00 17.00
## WidthPremium 11 458 19.47 1.10 19.00 19.53 0.00 17.00
## PriceEconomy 12 458 1327.08 988.27 1242.00 1244.40 1159.39 65.00
## PricePremium 13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative 14 458 0.49 0.45 0.36 0.42 0.41 0.02
## SeatsTotal 15 458 235.96 85.29 227.00 228.73 90.44 98.00
## PitchDifference 16 458 6.69 1.76 7.00 6.76 0.00 2.00
## WidthDifference 17 458 1.63 1.19 1.00 1.53 0.00 0.00
## PercentPremiumSeats 18 458 14.65 4.84 13.21 14.31 2.68 4.71
## max range skew kurtosis se
## Airline* 6.00 5.00 0.61 -0.95 0.08
## Aircraft* 2.00 1.00 -0.72 -1.48 0.02
## FlightDuration 14.66 13.41 -0.07 -1.12 0.17
## TravelMonth* 4.00 3.00 -0.14 -1.46 0.05
## IsInternational* 2.00 1.00 -2.91 6.50 0.01
## SeatsEconomy 389.00 311.00 0.72 -0.36 3.57
## SeatsPremium 66.00 58.00 0.23 -0.46 0.62
## PitchEconomy 33.00 3.00 -0.03 -0.35 0.03
## PitchPremium 40.00 6.00 -1.51 3.52 0.06
## WidthEconomy 19.00 2.00 -0.04 -0.08 0.03
## WidthPremium 21.00 4.00 -0.08 -0.31 0.05
## PriceEconomy 3593.00 3528.00 0.51 -0.88 46.18
## PricePremium 7414.00 7328.00 0.50 0.43 60.19
## PriceRelative 1.89 1.87 1.17 0.72 0.02
## SeatsTotal 441.00 343.00 0.70 -0.53 3.99
## PitchDifference 10.00 8.00 -0.54 1.78 0.08
## WidthDifference 4.00 4.00 0.84 -0.53 0.06
## PercentPremiumSeats 24.69 19.98 0.71 0.28 0.23
You can also embed plots, for example:
str(air)
## 'data.frame': 458 obs. of 18 variables:
## $ Airline : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Aircraft : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
## $ FlightDuration : num 12.25 12.25 12.25 12.25 8.16 ...
## $ TravelMonth : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
## $ IsInternational : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
## $ SeatsEconomy : int 122 122 122 122 122 122 122 122 122 122 ...
## $ SeatsPremium : int 40 40 40 40 40 40 40 40 40 40 ...
## $ PitchEconomy : int 31 31 31 31 31 31 31 31 31 31 ...
## $ PitchPremium : int 38 38 38 38 38 38 38 38 38 38 ...
## $ WidthEconomy : int 18 18 18 18 18 18 18 18 18 18 ...
## $ WidthPremium : int 19 19 19 19 19 19 19 19 19 19 ...
## $ PriceEconomy : int 2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
## $ PricePremium : int 3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
## $ PriceRelative : num 0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
## $ SeatsTotal : int 162 162 162 162 162 162 162 162 162 162 ...
## $ PitchDifference : int 7 7 7 7 7 7 7 7 7 7 ...
## $ WidthDifference : int 1 1 1 1 1 1 1 1 1 1 ...
## $ PercentPremiumSeats: num 24.7 24.7 24.7 24.7 24.7 ...
summary(air)
## Airline Aircraft FlightDuration TravelMonth
## AirFrance: 74 AirBus:151 Min. : 1.250 Aug:127
## British :175 Boeing:307 1st Qu.: 4.260 Jul: 75
## Delta : 46 Median : 7.790 Oct:127
## Jet : 61 Mean : 7.578 Sep:129
## Singapore: 40 3rd Qu.:10.620
## Virgin : 62 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 40 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:418 1st Qu.:133.0 1st Qu.:21.00 1st Qu.:31.00
## Median :185.0 Median :36.00 Median :31.00
## Mean :202.3 Mean :33.65 Mean :31.22
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 65
## 1st Qu.:38.00 1st Qu.:18.00 1st Qu.:19.00 1st Qu.: 413
## Median :38.00 Median :18.00 Median :19.00 Median :1242
## Mean :37.91 Mean :17.84 Mean :19.47 Mean :1327
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1909
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 86.0 Min. :0.0200 Min. : 98 Min. : 2.000
## 1st Qu.: 528.8 1st Qu.:0.1000 1st Qu.:166 1st Qu.: 6.000
## Median :1737.0 Median :0.3650 Median :227 Median : 7.000
## Mean :1845.3 Mean :0.4872 Mean :236 Mean : 6.688
## 3rd Qu.:2989.0 3rd Qu.:0.7400 3rd Qu.:279 3rd Qu.: 7.000
## Max. :7414.0 Max. :1.8900 Max. :441 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.28
## Median :1.000 Median :13.21
## Mean :1.633 Mean :14.65
## 3rd Qu.:3.000 3rd Qu.:15.36
## Max. :4.000 Max. :24.69
Now plotting the dependent variable that is pricedifference
hist(air$PriceRelative,main="Price Difference",xlab="Price Difference")
library(car)
## Warning: package 'car' was built under R version 3.4.3
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplot(air$PriceRelative~air$FlightDuration,spread=FALSE,smoother.args=list(lty=2))
boxplot(air$PriceRelative~air$TravelMonth,spread=FALSE)
There is not much difference in price relative for different months.
boxplot(air$FlightDuration~air$Airline,col="purple",horizontal=TRUE,xlab="Flight Duration")
boxplot(air$PriceRelative~air$Aircraft,spread=FALSE,col="green",horizontal = TRUE)
boxplot(air$PriceRelative~air$Airline,spread=FALSE,horizontal=TRUE,col="red")
Clearly Price Difference is statistically dependent on the airline
scatterplot(air$PriceRelative~air$PercentPremiumSeats,spread=FALSE,smoother.args=list(lty=2))
boxplot(air$PriceRelative~air$WidthDifference,spread=FALSE,horizontal=TRUE,col="yellow",xlab="Price Difference",ylab="Width Difference")
As the width difference increases,the price tends to increasehowever a correlational test would be the best to check this trend.
library(corrgram)
## Warning: package 'corrgram' was built under R version 3.4.3
corrgram(air,order=FALSE,lower.panel = panel.shade,upper.panel = panel.pie,text.panel = panel.txt,main="Corrgram")
round(cor(air[,6:18]),2)
## SeatsEconomy SeatsPremium PitchEconomy PitchPremium
## SeatsEconomy 1.00 0.63 0.14 0.12
## SeatsPremium 0.63 1.00 -0.03 0.00
## PitchEconomy 0.14 -0.03 1.00 -0.55
## PitchPremium 0.12 0.00 -0.55 1.00
## WidthEconomy 0.37 0.46 0.29 -0.02
## WidthPremium 0.10 0.00 -0.54 0.75
## PriceEconomy 0.13 0.11 0.37 0.05
## PricePremium 0.18 0.22 0.23 0.09
## PriceRelative 0.00 -0.10 -0.42 0.42
## SeatsTotal 0.99 0.72 0.12 0.11
## PitchDifference 0.04 0.02 -0.78 0.95
## WidthDifference -0.08 -0.22 -0.64 0.70
## PercentPremiumSeats -0.33 0.49 -0.10 -0.18
## WidthEconomy WidthPremium PriceEconomy PricePremium
## SeatsEconomy 0.37 0.10 0.13 0.18
## SeatsPremium 0.46 0.00 0.11 0.22
## PitchEconomy 0.29 -0.54 0.37 0.23
## PitchPremium -0.02 0.75 0.05 0.09
## WidthEconomy 1.00 0.08 0.07 0.15
## WidthPremium 0.08 1.00 -0.06 0.06
## PriceEconomy 0.07 -0.06 1.00 0.90
## PricePremium 0.15 0.06 0.90 1.00
## PriceRelative -0.04 0.50 -0.29 0.03
## SeatsTotal 0.41 0.09 0.13 0.19
## PitchDifference -0.13 0.76 -0.10 -0.02
## WidthDifference -0.39 0.88 -0.08 -0.01
## PercentPremiumSeats 0.23 -0.18 0.07 0.12
## PriceRelative SeatsTotal PitchDifference
## SeatsEconomy 0.00 0.99 0.04
## SeatsPremium -0.10 0.72 0.02
## PitchEconomy -0.42 0.12 -0.78
## PitchPremium 0.42 0.11 0.95
## WidthEconomy -0.04 0.41 -0.13
## WidthPremium 0.50 0.09 0.76
## PriceEconomy -0.29 0.13 -0.10
## PricePremium 0.03 0.19 -0.02
## PriceRelative 1.00 -0.01 0.47
## SeatsTotal -0.01 1.00 0.03
## PitchDifference 0.47 0.03 1.00
## WidthDifference 0.49 -0.11 0.76
## PercentPremiumSeats -0.16 -0.22 -0.09
## WidthDifference PercentPremiumSeats
## SeatsEconomy -0.08 -0.33
## SeatsPremium -0.22 0.49
## PitchEconomy -0.64 -0.10
## PitchPremium 0.70 -0.18
## WidthEconomy -0.39 0.23
## WidthPremium 0.88 -0.18
## PriceEconomy -0.08 0.07
## PricePremium -0.01 0.12
## PriceRelative 0.49 -0.16
## SeatsTotal -0.11 -0.22
## PitchDifference 0.76 -0.09
## WidthDifference 1.00 -0.28
## PercentPremiumSeats -0.28 1.00
cor.test(air$PitchDifference,air$PitchPremium)
##
## Pearson's product-moment correlation
##
## data: air$PitchDifference and air$PitchPremium
## t = 65.387, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9409183 0.9587146
## sample estimates:
## cor
## 0.9505915
There is a high correlation between PitchDifference and PitchPremium.so there are not independent factors.
fit1<-lm(air$PitchDifference~air$PitchEconomy+air$PitchPremium+air$WidthEconomy)
summary(fit1)
## Warning in summary.lm(fit1): essentially perfect fit: summary may be
## unreliable
##
## Call:
## lm(formula = air$PitchDifference ~ air$PitchEconomy + air$PitchPremium +
## air$WidthEconomy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.348e-13 3.240e-16 3.240e-16 3.240e-16 4.433e-14
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.940e-13 2.640e-14 -1.872e+01 <2e-16 ***
## air$PitchEconomy -1.000e+00 6.272e-16 -1.594e+15 <2e-16 ***
## air$PitchPremium 1.000e+00 2.990e-16 3.345e+15 <2e-16 ***
## air$WidthEconomy -3.045e-16 6.154e-16 -4.950e-01 0.621
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.903e-15 on 454 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 9.921e+30 on 3 and 454 DF, p-value: < 2.2e-16
SOclearly PitchDifference is dependent on PitchEconomy also.so these are not independent variables.
cor.test(air$WidthDifference,air$WidthPremium)
##
## Pearson's product-moment correlation
##
## data: air$WidthDifference and air$WidthPremium
## t = 40.411, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8623863 0.9026511
## sample estimates:
## cor
## 0.8841497
Same thing happens with width.width difference highly dependent on widtheconomy and widthpremium.
boxplot(PriceRelative~IsInternational,data = air,col="blue",horizontal=TRUE)
So international flights have higher price difference than domestic.still will check through a regression
scatterplot(air$PercentPremiumSeats,air$PriceRelative,spread=FALSE,smoother.args=list(lty=2))
cor.test(air$PercentPremiumSeats,air$FlightDuration)
##
## Pearson's product-moment correlation
##
## data: air$PercentPremiumSeats and air$FlightDuration
## t = 1.2946, df = 456, p-value = 0.1961
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.03128403 0.15130409
## sample estimates:
## cor
## 0.06051625
since p>0.05 so true correlation is zero between Percent Premium Seats and Flight DUration
scatterplot(air$SeatsTotal,air$PriceRelative,spread = FALSE,smoother.args = list(lty=2))
cor.test(air$PitchDifference,air$WidthDifference)
##
## Pearson's product-moment correlation
##
## data: air$PitchDifference and air$WidthDifference
## t = 25.04, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7194209 0.7969557
## sample estimates:
## cor
## 0.7608911
SO pitch Difference and Width Difference are highly correlated that is they are not independent now to test how well they are correlated to price difference.
cor.test(air$PriceRelative,air$PitchDifference)
##
## Pearson's product-moment correlation
##
## data: air$PriceRelative and air$PitchDifference
## t = 11.331, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3940262 0.5372817
## sample estimates:
## cor
## 0.4687302
Since p<<<0.05,there is high correlation.
cor.test(air$PriceRelative,air$WidthDifference)
##
## Pearson's product-moment correlation
##
## data: air$PriceRelative and air$WidthDifference
## t = 11.869, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4125388 0.5528218
## sample estimates:
## cor
## 0.4858024
SO there is high correlation.
model <- lm(air$PriceRelative~air$Airline+air$Aircraft+air$FlightDuration+air$TravelMonth+air$IsInternational+air$PitchDifference+air$SeatsTotal+air$PercentPremiumSeats)
summary(model)
##
## Call:
## lm(formula = air$PriceRelative ~ air$Airline + air$Aircraft +
## air$FlightDuration + air$TravelMonth + air$IsInternational +
## air$PitchDifference + air$SeatsTotal + air$PercentPremiumSeats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.86507 -0.20861 -0.05295 0.11137 1.49224
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.063e-01 2.070e-01 -0.996 0.319621
## air$AirlineBritish 2.535e-01 7.252e-02 3.496 0.000520
## air$AirlineDelta 1.336e-01 1.611e-01 0.829 0.407430
## air$AirlineJet 5.173e-01 1.411e-01 3.666 0.000276
## air$AirlineSingapore 2.806e-01 7.198e-02 3.899 0.000112
## air$AirlineVirgin 5.119e-01 7.961e-02 6.430 3.31e-10
## air$AircraftBoeing 2.057e-02 4.429e-02 0.464 0.642582
## air$FlightDuration 3.376e-02 6.588e-03 5.125 4.45e-07
## air$TravelMonthJul -1.859e-02 5.273e-02 -0.353 0.724580
## air$TravelMonthOct 5.413e-02 4.481e-02 1.208 0.227754
## air$TravelMonthSep -1.068e-02 4.467e-02 -0.239 0.811092
## air$IsInternationalInternational -3.203e-01 2.617e-01 -1.224 0.221598
## air$PitchDifference 9.854e-02 3.785e-02 2.603 0.009541
## air$SeatsTotal -8.353e-05 3.109e-04 -0.269 0.788304
## air$PercentPremiumSeats -1.400e-02 5.502e-03 -2.544 0.011283
##
## (Intercept)
## air$AirlineBritish ***
## air$AirlineDelta
## air$AirlineJet ***
## air$AirlineSingapore ***
## air$AirlineVirgin ***
## air$AircraftBoeing
## air$FlightDuration ***
## air$TravelMonthJul
## air$TravelMonthOct
## air$TravelMonthSep
## air$IsInternationalInternational
## air$PitchDifference **
## air$SeatsTotal
## air$PercentPremiumSeats *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.357 on 443 degrees of freedom
## Multiple R-squared: 0.3916, Adjusted R-squared: 0.3724
## F-statistic: 20.37 on 14 and 443 DF, p-value: < 2.2e-16
confint(model)
## 2.5 % 97.5 %
## (Intercept) -0.6131972478 0.2006197173
## air$AirlineBritish 0.1109811169 0.3960187832
## air$AirlineDelta -0.1830596880 0.4502772832
## air$AirlineJet 0.2400326198 0.7946524604
## air$AirlineSingapore 0.1391596548 0.4220731791
## air$AirlineVirgin 0.3554144754 0.6683273505
## air$AircraftBoeing -0.0664731340 0.1076088802
## air$FlightDuration 0.0208154606 0.0467110565
## air$TravelMonthJul -0.1222268861 0.0850439158
## air$TravelMonthOct -0.0339454272 0.1421986158
## air$TravelMonthSep -0.0984724448 0.0771063697
## air$IsInternationalInternational -0.8346998221 0.1940148311
## air$PitchDifference 0.0241510134 0.1729306533
## air$SeatsTotal -0.0006945588 0.0005274948
## air$PercentPremiumSeats -0.0248126751 -0.0031863978
confidence interval of Aircraft,Travel Month,IsInternational,Seats Total include Zero.
coefficients(model)
## (Intercept) air$AirlineBritish
## -0.206288765 0.253499950
## air$AirlineDelta air$AirlineJet
## 0.133608798 0.517342540
## air$AirlineSingapore air$AirlineVirgin
## 0.280616417 0.511870913
## air$AircraftBoeing air$FlightDuration
## 0.020567873 0.033763259
## air$TravelMonthJul air$TravelMonthOct
## -0.018591485 0.054126594
## air$TravelMonthSep air$IsInternationalInternational
## -0.010683038 -0.320342496
## air$PitchDifference air$SeatsTotal
## 0.098540833 -0.000083532
## air$PercentPremiumSeats
## -0.013999536
SUMMARY
SO we wanted to find out what are the factors that decide the Price DIfference.Now after fitting a linear model we see that Aircraft,Month,IsInternational,Seats Total have confidence Intervals that include Zero and thier p value is less than 0.05.so these factors are to be excluded. Since Width Difference and Pitch Difference were highly correlated I did not take them both in the linear Model yet if considered Independent the pearson test shows they both are highly correlated to the Price Difference.So the Statistically Significant Factors are Airline,Flight Duration,Pitch Difference,Width Difference and PercentPremiumSeats.