setwd("C:/Users/alouk/Downloads")
airlines <- read.csv("SixAirlinesDataV2.csv")
str(airlines)
## 'data.frame': 458 obs. of 18 variables:
## $ Airline : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Aircraft : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
## $ FlightDuration : num 12.25 12.25 12.25 12.25 8.16 ...
## $ TravelMonth : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
## $ IsInternational : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
## $ SeatsEconomy : int 122 122 122 122 122 122 122 122 122 122 ...
## $ SeatsPremium : int 40 40 40 40 40 40 40 40 40 40 ...
## $ PitchEconomy : int 31 31 31 31 31 31 31 31 31 31 ...
## $ PitchPremium : int 38 38 38 38 38 38 38 38 38 38 ...
## $ WidthEconomy : int 18 18 18 18 18 18 18 18 18 18 ...
## $ WidthPremium : int 19 19 19 19 19 19 19 19 19 19 ...
## $ PriceEconomy : int 2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
## $ PricePremium : int 3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
## $ PriceRelative : num 0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
## $ SeatsTotal : int 162 162 162 162 162 162 162 162 162 162 ...
## $ PitchDifference : int 7 7 7 7 7 7 7 7 7 7 ...
## $ WidthDifference : int 1 1 1 1 1 1 1 1 1 1 ...
## $ PercentPremiumSeats: num 24.7 24.7 24.7 24.7 24.7 ...
attach(airlines)
plot(table(IsInternational))
The majority is internatinal flights
addmargins(xtabs(~IsInternational+Airline))
## Airline
## IsInternational AirFrance British Delta Jet Singapore Virgin Sum
## Domestic 0 0 40 0 0 0 40
## International 74 175 6 61 40 62 418
## Sum 74 175 46 61 40 62 458
the relation between Prices and the Width of the seats
For Economy seats
boxplot(PriceEconomy~WidthEconomy,horizontal=TRUE,col=c("orange","pink","red"))
For premium seats
boxplot(PricePremium~WidthPremium,horizontal=TRUE,col=c("black","blue","yellow","pink","red"))
correlation between the price and seat width for economy class
cor.test(PriceEconomy,WidthEconomy)
##
## Pearson's product-moment correlation
##
## data: PriceEconomy and WidthEconomy
## t = 1.4552, df = 456, p-value = 0.1463
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.02378438 0.15862920
## sample estimates:
## cor
## 0.06799061
The p value is > 0.05 , we reject the hypothesis that price and seat width in economy class are correlated
cor.test(PricePremium,WidthPremium)
##
## Pearson's product-moment correlation
##
## data: PricePremium and WidthPremium
## t = 1.3699, df = 456, p-value = 0.1714
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.02776966 0.15473916
## sample estimates:
## cor
## 0.06402004
Since the p-value is > 0.05 , we can reject the hypothesis that price and seat width in premium class are correlated.
Performing correlation test
cor.test(PitchDifference,WidthDifference)
##
## Pearson's product-moment correlation
##
## data: PitchDifference and WidthDifference
## t = 25.04, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7194209 0.7969557
## sample estimates:
## cor
## 0.7608911
The p-value is very low, we can say that there is a correlation between the pitch difference and the width difference.
library(corrplot)
## corrplot 0.84 loaded
corrplot(corr=cor(airlines[,c(3,6:18)]))
For economy class
# For Economy
cor.test(FlightDuration,PriceEconomy)
##
## Pearson's product-moment correlation
##
## data: FlightDuration and PriceEconomy
## t = 14.685, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5010266 0.6257772
## sample estimates:
## cor
## 0.5666404
Here with the small p-value, we can see that there is strong correlation between Price Economy and Flight Duration.
For Premium class
#For Premium
cor.test(FlightDuration,PricePremium)
##
## Pearson's product-moment correlation
##
## data: FlightDuration and PricePremium
## t = 18.204, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5923218 0.6988270
## sample estimates:
## cor
## 0.6487398
Here also, we can see the small p-value and the strong correalation between Price Premium and Flight Duration.
Now, drawing plotting them
par(mfrow=c(2,2))
plot(FlightDuration~PriceEconomy,col="blue",main="Without Log scale")
plot(FlightDuration~PriceEconomy,col="blue",log="xy",main="With log scale")
plot(FlightDuration~PricePremium,col="red")
plot(FlightDuration~PricePremium,col="red",log="xy")
par(mfrow=c(1,1))
summary(lm(PriceEconomy~FlightDuration))
##
## Call:
## lm(formula = PriceEconomy ~ FlightDuration)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1731.5 -493.2 -156.9 470.9 1863.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 129.03 90.04 1.433 0.153
## FlightDuration 158.10 10.77 14.685 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 815.2 on 456 degrees of freedom
## Multiple R-squared: 0.3211, Adjusted R-squared: 0.3196
## F-statistic: 215.7 on 1 and 456 DF, p-value: < 2.2e-16
The p-value is very small. Hence, the regression model can be used. The expected change in PriceEconomy with 1 ht increase in flight duration is $158.10