Setting up the direcrory

setwd("C:/Users/alouk/Downloads")
airlines <- read.csv("SixAirlinesDataV2.csv")

Summary of the dataset

str(airlines)
## 'data.frame':    458 obs. of  18 variables:
##  $ Airline            : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Aircraft           : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
##  $ FlightDuration     : num  12.25 12.25 12.25 12.25 8.16 ...
##  $ TravelMonth        : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
##  $ IsInternational    : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
##  $ SeatsEconomy       : int  122 122 122 122 122 122 122 122 122 122 ...
##  $ SeatsPremium       : int  40 40 40 40 40 40 40 40 40 40 ...
##  $ PitchEconomy       : int  31 31 31 31 31 31 31 31 31 31 ...
##  $ PitchPremium       : int  38 38 38 38 38 38 38 38 38 38 ...
##  $ WidthEconomy       : int  18 18 18 18 18 18 18 18 18 18 ...
##  $ WidthPremium       : int  19 19 19 19 19 19 19 19 19 19 ...
##  $ PriceEconomy       : int  2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
##  $ PricePremium       : int  3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
##  $ PriceRelative      : num  0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
##  $ SeatsTotal         : int  162 162 162 162 162 162 162 162 162 162 ...
##  $ PitchDifference    : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ WidthDifference    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PercentPremiumSeats: num  24.7 24.7 24.7 24.7 24.7 ...

International and Domestic Flights numbers

attach(airlines)
plot(table(IsInternational))

The majority is internatinal flights

addmargins(xtabs(~IsInternational+Airline))
##                Airline
## IsInternational AirFrance British Delta Jet Singapore Virgin Sum
##   Domestic              0       0    40   0         0      0  40
##   International        74     175     6  61        40     62 418
##   Sum                  74     175    46  61        40     62 458

Prices and Width

the relation between Prices and the Width of the seats

For Economy seats

boxplot(PriceEconomy~WidthEconomy,horizontal=TRUE,col=c("orange","pink","red"))

For premium seats

boxplot(PricePremium~WidthPremium,horizontal=TRUE,col=c("black","blue","yellow","pink","red"))

Correlation

correlation between the price and seat width for economy class

cor.test(PriceEconomy,WidthEconomy)
## 
##  Pearson's product-moment correlation
## 
## data:  PriceEconomy and WidthEconomy
## t = 1.4552, df = 456, p-value = 0.1463
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.02378438  0.15862920
## sample estimates:
##        cor 
## 0.06799061

The p value is > 0.05 , we reject the hypothesis that price and seat width in economy class are correlated

cor.test(PricePremium,WidthPremium)
## 
##  Pearson's product-moment correlation
## 
## data:  PricePremium and WidthPremium
## t = 1.3699, df = 456, p-value = 0.1714
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.02776966  0.15473916
## sample estimates:
##        cor 
## 0.06402004

Since the p-value is > 0.05 , we can reject the hypothesis that price and seat width in premium class are correlated.

1.Relation between pitch difference and Width difference

Performing correlation test

 cor.test(PitchDifference,WidthDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  PitchDifference and WidthDifference
## t = 25.04, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7194209 0.7969557
## sample estimates:
##       cor 
## 0.7608911

The p-value is very low, we can say that there is a correlation between the pitch difference and the width difference.

2.Observing the corrplot of airlines dataframe

library(corrplot)
## corrplot 0.84 loaded
corrplot(corr=cor(airlines[,c(3,6:18)]))

3.Correlation test between Price and Flight Duration

For economy class

# For Economy
cor.test(FlightDuration,PriceEconomy)
## 
##  Pearson's product-moment correlation
## 
## data:  FlightDuration and PriceEconomy
## t = 14.685, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5010266 0.6257772
## sample estimates:
##       cor 
## 0.5666404

Here with the small p-value, we can see that there is strong correlation between Price Economy and Flight Duration.

For Premium class

#For Premium

cor.test(FlightDuration,PricePremium)
## 
##  Pearson's product-moment correlation
## 
## data:  FlightDuration and PricePremium
## t = 18.204, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5923218 0.6988270
## sample estimates:
##       cor 
## 0.6487398

Here also, we can see the small p-value and the strong correalation between Price Premium and Flight Duration.

Now, drawing plotting them

par(mfrow=c(2,2))
plot(FlightDuration~PriceEconomy,col="blue",main="Without Log scale")

plot(FlightDuration~PriceEconomy,col="blue",log="xy",main="With log scale")

plot(FlightDuration~PricePremium,col="red")

plot(FlightDuration~PricePremium,col="red",log="xy")

par(mfrow=c(1,1))

4.Regression between PriceEconomy and FlightDuration

summary(lm(PriceEconomy~FlightDuration))
## 
## Call:
## lm(formula = PriceEconomy ~ FlightDuration)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1731.5  -493.2  -156.9   470.9  1863.3 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      129.03      90.04   1.433    0.153    
## FlightDuration   158.10      10.77  14.685   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 815.2 on 456 degrees of freedom
## Multiple R-squared:  0.3211, Adjusted R-squared:  0.3196 
## F-statistic: 215.7 on 1 and 456 DF,  p-value: < 2.2e-16

The p-value is very small. Hence, the regression model can be used. The expected change in PriceEconomy with 1 ht increase in flight duration is $158.10

5.Regression between PricePremium and FlightDuration

summary(lm(PricePremium~FlightDuration))
## 
## Call:
## lm(formula = PricePremium ~ FlightDuration)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2292.5  -664.7  -103.8   803.0  4093.7 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       57.45     108.39    0.53    0.596    
## FlightDuration   235.93      12.96   18.20   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 981.4 on 456 degrees of freedom
## Multiple R-squared:  0.4209, Adjusted R-squared:  0.4196 
## F-statistic: 331.4 on 1 and 456 DF,  p-value: < 2.2e-16

The p-value is very small. Hence, the regression model can be used. The expected change in PricePremium with 1 hr increase in flight duration is $235.93