An introduction to the data :
premium <- read.csv(paste("abcd.csv" , sep = ""))
str(premium)
## 'data.frame': 458 obs. of 18 variables:
## $ Airline : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Aircraft : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
## $ FlightDuration : num 12.25 12.25 12.25 12.25 8.16 ...
## $ TravelMonth : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
## $ IsInternational : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
## $ SeatsEconomy : int 122 122 122 122 122 122 122 122 122 122 ...
## $ SeatsPremium : int 40 40 40 40 40 40 40 40 40 40 ...
## $ PitchEconomy : int 31 31 31 31 31 31 31 31 31 31 ...
## $ PitchPremium : int 38 38 38 38 38 38 38 38 38 38 ...
## $ WidthEconomy : int 18 18 18 18 18 18 18 18 18 18 ...
## $ WidthPremium : int 19 19 19 19 19 19 19 19 19 19 ...
## $ PriceEconomy : int 2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
## $ PricePremium : int 3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
## $ PriceRelative : num 0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
## $ SeatsTotal : int 162 162 162 162 162 162 162 162 162 162 ...
## $ PitchDifference : int 7 7 7 7 7 7 7 7 7 7 ...
## $ WidthDifference : int 1 1 1 1 1 1 1 1 1 1 ...
## $ PercentPremiumSeats: num 24.7 24.7 24.7 24.7 24.7 ...
A summary of various fields:
premium$PriceDifference <- premium$PricePremium - premium$PriceEconomy
summary(premium)
## Airline Aircraft FlightDuration TravelMonth
## AirFrance: 74 AirBus:151 Min. : 1.250 Aug:127
## British :175 Boeing:307 1st Qu.: 4.260 Jul: 75
## Delta : 46 Median : 7.790 Oct:127
## Jet : 61 Mean : 7.578 Sep:129
## Singapore: 40 3rd Qu.:10.620
## Virgin : 62 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 40 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:418 1st Qu.:133.0 1st Qu.:21.00 1st Qu.:31.00
## Median :185.0 Median :36.00 Median :31.00
## Mean :202.3 Mean :33.65 Mean :31.22
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 65
## 1st Qu.:38.00 1st Qu.:18.00 1st Qu.:19.00 1st Qu.: 413
## Median :38.00 Median :18.00 Median :19.00 Median :1242
## Mean :37.91 Mean :17.84 Mean :19.47 Mean :1327
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1909
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 86.0 Min. :0.0200 Min. : 98 Min. : 2.000
## 1st Qu.: 528.8 1st Qu.:0.1000 1st Qu.:166 1st Qu.: 6.000
## Median :1737.0 Median :0.3650 Median :227 Median : 7.000
## Mean :1845.3 Mean :0.4872 Mean :236 Mean : 6.688
## 3rd Qu.:2989.0 3rd Qu.:0.7400 3rd Qu.:279 3rd Qu.: 7.000
## Max. :7414.0 Max. :1.8900 Max. :441 Max. :10.000
## WidthDifference PercentPremiumSeats PriceDifference
## Min. :0.000 Min. : 4.71 Min. : 15.0
## 1st Qu.:1.000 1st Qu.:12.28 1st Qu.: 110.0
## Median :1.000 Median :13.21 Median : 275.5
## Mean :1.633 Mean :14.65 Mean : 518.2
## 3rd Qu.:3.000 3rd Qu.:15.36 3rd Qu.: 760.5
## Max. :4.000 Max. :24.69 Max. :4312.0
A more mathematical description of the data fields :
describe(premium[,c(3,6:19)] , fast = TRUE)
## vars n mean sd min max range se
## FlightDuration 1 458 7.58 3.54 1.25 14.66 13.41 0.17
## SeatsEconomy 2 458 202.31 76.37 78.00 389.00 311.00 3.57
## SeatsPremium 3 458 33.65 13.26 8.00 66.00 58.00 0.62
## PitchEconomy 4 458 31.22 0.66 30.00 33.00 3.00 0.03
## PitchPremium 5 458 37.91 1.31 34.00 40.00 6.00 0.06
## WidthEconomy 6 458 17.84 0.56 17.00 19.00 2.00 0.03
## WidthPremium 7 458 19.47 1.10 17.00 21.00 4.00 0.05
## PriceEconomy 8 458 1327.08 988.27 65.00 3593.00 3528.00 46.18
## PricePremium 9 458 1845.26 1288.14 86.00 7414.00 7328.00 60.19
## PriceRelative 10 458 0.49 0.45 0.02 1.89 1.87 0.02
## SeatsTotal 11 458 235.96 85.29 98.00 441.00 343.00 3.99
## PitchDifference 12 458 6.69 1.76 2.00 10.00 8.00 0.08
## WidthDifference 13 458 1.63 1.19 0.00 4.00 4.00 0.06
## PercentPremiumSeats 14 458 14.65 4.84 4.71 24.69 19.98 0.23
## PriceDifference 15 458 518.18 583.94 15.00 4312.00 4297.00 27.29
Airline-wise distribution of no. of seats in Economy Class:
par(cex.axis = 0.8,las = 2,cex.lab = 0.5)
boxplot(SeatsEconomy ~ Airline , horizontal = TRUE , col = rainbow(6) , las = 2 )
Airline-wise distribution of no. of seats in Premium Economy Class:
par(cex.axis = 0.8,las = 2,cex.lab = 0.5)
boxplot(SeatsPremium ~ Airline , horizontal = TRUE , col = rainbow(20) , las = 2 )
Airline-wise variation in the relative Price of Premium Economy as compared to Economy:
par(cex.axis = 0.8,las = 2,cex.lab = 0.5)
boxplot(PriceRelative ~ Airline , horizontal = TRUE , col = rainbow(20) , las = 2 )
Airline-wise variation in the relative Percentage of Seats in Premium Economy as compared to Economy:
par(cex.axis = 0.8,las = 2,cex.lab = 0.5)
boxplot(PercentPremiumSeats ~ Airline , horizontal = TRUE , col = rainbow(20) , las = 2 )
Hypothesis :
No. of Premium Seats depends upon the nature of the flight(Domestic or International)
boxplot(SeatsPremium ~ IsInternational , horizontal = TRUE , col = rainbow(7))
t.test(SeatsPremium ~ IsInternational)
##
## Welch Two Sample t-test
##
## data: SeatsPremium by IsInternational
## t = -17.656, df = 199.99, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -16.32020 -13.04104
## sample estimates:
## mean in group Domestic mean in group International
## 20.25000 34.93062
So, we fail to reject the above hypothesis.(There is indeed a significant difference).
Hypothesis :
Difference in Pitch of Premium Seats depends upon the nature of the flight(Domestic or International)
t.test(PitchDifference ~ IsInternational)
##
## Welch Two Sample t-test
##
## data: PitchDifference by IsInternational
## t = -47.917, df = 92.439, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.892803 -4.503369
## sample estimates:
## mean in group Domestic mean in group International
## 2.400000 7.098086
So, we fail to reject the above hypothesis.(There is indeed a significant difference).
Hypothesis :
Difference in Width of Premium Seats depends upon the nature of the flight(Domestic or International)
t.test(WidthDifference ~ IsInternational)
##
## Welch Two Sample t-test
##
## data: WidthDifference by IsInternational
## t = -32.468, df = 417, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.897811 -1.681137
## sample estimates:
## mean in group Domestic mean in group International
## 0.000000 1.789474
So, we fail to reject the above hypothesis.(There is indeed a significant difference).
Airline-wise Price, Width and Pitch differences
attach(premium)
## The following objects are masked from premium (pos = 3):
##
## Aircraft, Airline, FlightDuration, IsInternational,
## PercentPremiumSeats, PitchDifference, PitchEconomy,
## PitchPremium, PriceDifference, PriceEconomy, PricePremium,
## PriceRelative, SeatsEconomy, SeatsPremium, SeatsTotal,
## TravelMonth, WidthDifference, WidthEconomy, WidthPremium
fir <- aggregate(PriceDifference , by = list(company = Airline) , mean)
sec <- aggregate(WidthDifference , by = list(company = Airline) , mean)
thir <- aggregate(PitchDifference , by = list(company = Airline) , mean)
d3 = merge(x = fir, y = sec , by = "company")
d3 = merge(x = d3 , y = thir , by = "company")
colnames(d3) <- c("Airline" , "Mean Price Difference" , "Mean Width Difference" , "Mean Pitch Difference")
d3
## Airline Mean Price Difference Mean Width Difference
## 1 AirFrance 295.4324 1.4324324
## 2 British 643.5486 1.0000000
## 3 Delta 123.7391 0.3913043
## 4 Jet 207.1967 3.6557377
## 5 Singapore 379.6750 1.0000000
## 6 Virgin 1118.1613 3.0000000
## Mean Pitch Difference
## 1 6.000000
## 2 7.000000
## 3 3.000000
## 4 9.540984
## 5 6.000000
## 6 7.000000
Conversion of the unreadable matrix into a Corrgram
library("corrgram", lib.loc="~/R/win-library/3.4")
library("corrplot", lib.loc="~/R/win-library/3.4")
## corrplot 0.84 loaded
M <- cor(premium[,c(3,6,7,8,9,10,11,12,13,14,15,17,18,16,19)] , use = "everything")
corr.test(premium[,c(3,6,7,8,9,10,11,12,13,14,15,17,18,16,19)] , use = "everything")
## Call:corr.test(x = premium[, c(3, 6, 7, 8, 9, 10, 11, 12, 13, 14,
## 15, 17, 18, 16, 19)], use = "everything")
## Correlation matrix
## FlightDuration SeatsEconomy SeatsPremium PitchEconomy
## FlightDuration 1.00 0.20 0.16 0.29
## SeatsEconomy 0.20 1.00 0.63 0.14
## SeatsPremium 0.16 0.63 1.00 -0.03
## PitchEconomy 0.29 0.14 -0.03 1.00
## PitchPremium 0.10 0.12 0.00 -0.55
## WidthEconomy 0.46 0.37 0.46 0.29
## WidthPremium 0.10 0.10 0.00 -0.54
## PriceEconomy 0.57 0.13 0.11 0.37
## PricePremium 0.65 0.18 0.22 0.23
## PriceRelative 0.12 0.00 -0.10 -0.42
## SeatsTotal 0.20 0.99 0.72 0.12
## WidthDifference -0.12 -0.08 -0.22 -0.64
## PercentPremiumSeats 0.06 -0.33 0.49 -0.10
## PitchDifference -0.04 0.04 0.02 -0.78
## PriceDifference 0.47 0.17 0.29 -0.13
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## FlightDuration 0.10 0.46 0.10 0.57
## SeatsEconomy 0.12 0.37 0.10 0.13
## SeatsPremium 0.00 0.46 0.00 0.11
## PitchEconomy -0.55 0.29 -0.54 0.37
## PitchPremium 1.00 -0.02 0.75 0.05
## WidthEconomy -0.02 1.00 0.08 0.07
## WidthPremium 0.75 0.08 1.00 -0.06
## PriceEconomy 0.05 0.07 -0.06 1.00
## PricePremium 0.09 0.15 0.06 0.90
## PriceRelative 0.42 -0.04 0.50 -0.29
## SeatsTotal 0.11 0.41 0.09 0.13
## WidthDifference 0.70 -0.39 0.88 -0.08
## PercentPremiumSeats -0.18 0.23 -0.18 0.07
## PitchDifference 0.95 -0.13 0.76 -0.10
## PriceDifference 0.11 0.22 0.24 0.30
## PricePremium PriceRelative SeatsTotal WidthDifference
## FlightDuration 0.65 0.12 0.20 -0.12
## SeatsEconomy 0.18 0.00 0.99 -0.08
## SeatsPremium 0.22 -0.10 0.72 -0.22
## PitchEconomy 0.23 -0.42 0.12 -0.64
## PitchPremium 0.09 0.42 0.11 0.70
## WidthEconomy 0.15 -0.04 0.41 -0.39
## WidthPremium 0.06 0.50 0.09 0.88
## PriceEconomy 0.90 -0.29 0.13 -0.08
## PricePremium 1.00 0.03 0.19 -0.01
## PriceRelative 0.03 1.00 -0.01 0.49
## SeatsTotal 0.19 -0.01 1.00 -0.11
## WidthDifference -0.01 0.49 -0.11 1.00
## PercentPremiumSeats 0.12 -0.16 -0.22 -0.28
## PitchDifference -0.02 0.47 0.03 0.76
## PriceDifference 0.68 0.56 0.20 0.12
## PercentPremiumSeats PitchDifference PriceDifference
## FlightDuration 0.06 -0.04 0.47
## SeatsEconomy -0.33 0.04 0.17
## SeatsPremium 0.49 0.02 0.29
## PitchEconomy -0.10 -0.78 -0.13
## PitchPremium -0.18 0.95 0.11
## WidthEconomy 0.23 -0.13 0.22
## WidthPremium -0.18 0.76 0.24
## PriceEconomy 0.07 -0.10 0.30
## PricePremium 0.12 -0.02 0.68
## PriceRelative -0.16 0.47 0.56
## SeatsTotal -0.22 0.03 0.20
## WidthDifference -0.28 0.76 0.12
## PercentPremiumSeats 1.00 -0.09 0.15
## PitchDifference -0.09 1.00 0.13
## PriceDifference 0.15 0.13 1.00
## Sample Size
## [1] 458
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## FlightDuration SeatsEconomy SeatsPremium PitchEconomy
## FlightDuration 0.00 0.00 0.03 0.00
## SeatsEconomy 0.00 0.00 0.00 0.10
## SeatsPremium 0.00 0.00 0.00 1.00
## PitchEconomy 0.00 0.00 0.47 0.00
## PitchPremium 0.04 0.01 0.92 0.00
## WidthEconomy 0.00 0.00 0.00 0.00
## WidthPremium 0.03 0.03 0.95 0.00
## PriceEconomy 0.00 0.01 0.01 0.00
## PricePremium 0.00 0.00 0.00 0.00
## PriceRelative 0.01 0.93 0.04 0.00
## SeatsTotal 0.00 0.00 0.00 0.01
## WidthDifference 0.01 0.08 0.00 0.00
## PercentPremiumSeats 0.20 0.00 0.00 0.03
## PitchDifference 0.42 0.45 0.73 0.00
## PriceDifference 0.00 0.00 0.00 0.01
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## FlightDuration 1.00 0.00 0.86 0.00
## SeatsEconomy 0.43 0.00 0.86 0.27
## SeatsPremium 1.00 0.00 1.00 0.54
## PitchEconomy 0.00 0.00 0.00 0.00
## PitchPremium 0.00 1.00 0.00 1.00
## WidthEconomy 0.61 0.00 1.00 1.00
## WidthPremium 0.00 0.08 0.00 1.00
## PriceEconomy 0.28 0.15 0.22 0.00
## PricePremium 0.06 0.00 0.17 0.00
## PriceRelative 0.00 0.35 0.00 0.00
## SeatsTotal 0.02 0.00 0.05 0.00
## WidthDifference 0.00 0.00 0.00 0.07
## PercentPremiumSeats 0.00 0.00 0.00 0.16
## PitchDifference 0.00 0.01 0.00 0.03
## PriceDifference 0.02 0.00 0.00 0.00
## PricePremium PriceRelative SeatsTotal WidthDifference
## FlightDuration 0.00 0.39 0.00 0.43
## SeatsEconomy 0.01 1.00 0.00 1.00
## SeatsPremium 0.00 1.00 0.00 0.00
## PitchEconomy 0.00 0.00 0.34 0.00
## PitchPremium 1.00 0.00 0.73 0.00
## WidthEconomy 0.06 1.00 0.00 0.00
## WidthPremium 1.00 0.00 1.00 0.00
## PriceEconomy 0.00 0.00 0.21 1.00
## PricePremium 0.00 1.00 0.00 1.00
## PriceRelative 0.50 0.00 1.00 0.00
## SeatsTotal 0.00 0.80 0.00 0.78
## WidthDifference 0.81 0.00 0.02 0.00
## PercentPremiumSeats 0.01 0.00 0.00 0.00
## PitchDifference 0.70 0.00 0.47 0.00
## PriceDifference 0.00 0.00 0.00 0.01
## PercentPremiumSeats PitchDifference PriceDifference
## FlightDuration 1.00 1.00 0.00
## SeatsEconomy 0.00 1.00 0.01
## SeatsPremium 0.00 1.00 0.00
## PitchEconomy 0.86 0.00 0.32
## PitchPremium 0.01 0.00 0.65
## WidthEconomy 0.00 0.28 0.00
## WidthPremium 0.00 0.00 0.00
## PriceEconomy 1.00 0.96 0.00
## PricePremium 0.47 1.00 0.00
## PriceRelative 0.03 0.00 0.00
## SeatsTotal 0.00 1.00 0.00
## WidthDifference 0.00 0.00 0.45
## PercentPremiumSeats 0.00 1.00 0.08
## PitchDifference 0.05 0.00 0.27
## PriceDifference 0.00 0.01 0.00
##
## To see confidence intervals of the correlations, print with the short=FALSE option
#M <- cor(premium)
corrplot(M , method = "circle")
#corrgram(premium , upper.panel = panel.pie , cex.labels = 0.5 , order = TRUE , outer.labels = list("top"))
#scatterplotMatrix(formula = ~ SeatsEconomy + SeatsPremium + SeatsTotal + WidthEconomy + WidthPremium + PriceRelative)
Based on the corrgram, we can formulate hypotheses and use linear regression models to check them.
- Hypothesis :
Price Difference depends on Width Difference
fit1 <- lm(PriceDifference ~ WidthDifference)
summary(fit1)
##
## Call:
## lm(formula = PriceDifference ~ WidthDifference)
##
## Residuals:
## Min 1Q Median 3Q Max
## -595.9 -408.9 -257.6 243.4 3830.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 423.87 46.11 9.192 <2e-16 ***
## WidthDifference 57.75 22.83 2.529 0.0118 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 580.5 on 456 degrees of freedom
## Multiple R-squared: 0.01383, Adjusted R-squared: 0.01167
## F-statistic: 6.396 on 1 and 456 DF, p-value: 0.01177
The values are significant, so yes it depends
2. Hypothesis : Price Difference depends on Pitch Difference
fit2 <- lm(PriceDifference ~ PitchDifference)
summary(fit2)
##
## Call:
## lm(formula = PriceDifference ~ PitchDifference)
##
## Residuals:
## Min 1Q Median 3Q Max
## -600.4 -379.9 -274.4 241.0 3780.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 233.14 106.45 2.190 0.02902 *
## PitchDifference 42.62 15.39 2.769 0.00586 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 579.7 on 456 degrees of freedom
## Multiple R-squared: 0.01653, Adjusted R-squared: 0.01438
## F-statistic: 7.666 on 1 and 456 DF, p-value: 0.005855
The values are significant, so yes it depends
3. Hypothesis :
Price Difference depends on both Width and Pitch Difference
fit3 <- lm(PriceDifference ~ PitchDifference + WidthDifference)
summary(fit3)
##
## Call:
## lm(formula = PriceDifference ~ PitchDifference + WidthDifference)
##
## Residuals:
## Min 1Q Median 3Q Max
## -615.7 -395.5 -279.4 241.2 3798.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 274.70 123.94 2.216 0.0272 *
## PitchDifference 30.78 23.74 1.297 0.1955
## WidthDifference 23.06 35.16 0.656 0.5123
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 580.1 on 455 degrees of freedom
## Multiple R-squared: 0.01746, Adjusted R-squared: 0.01314
## F-statistic: 4.043 on 2 and 455 DF, p-value: 0.01817
These three observations show that because Price Difference doesn’t depend on Width and Pitch Difference when taken together but depends on them indivisually, then they both must have a strong correlation.
cor.test(WidthDifference,PitchDifference)
##
## Pearson's product-moment correlation
##
## data: WidthDifference and PitchDifference
## t = 25.04, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7194209 0.7969557
## sample estimates:
## cor
## 0.7608911
Now, see the the dependance of relative price on Pitch and Width Difference :
1. Relative Price vs WidthDifference
fit4 <- lm(PriceRelative ~ WidthDifference)
summary(fit4)
##
## Call:
## lm(formula = PriceRelative ~ WidthDifference)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8028 -0.2907 -0.0766 0.1852 1.1893
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.18660 0.03132 5.958 5.11e-09 ***
## WidthDifference 0.18406 0.01551 11.869 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3943 on 456 degrees of freedom
## Multiple R-squared: 0.236, Adjusted R-squared: 0.2343
## F-statistic: 140.9 on 1 and 456 DF, p-value: < 2.2e-16
2. Relative Price vs PitchDifference
fit5 <- lm(PriceRelative ~ PitchDifference)
summary(fit5)
##
## Call:
## lm(formula = PriceRelative ~ PitchDifference)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7643 -0.3247 -0.1146 0.2052 1.2954
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.31456 0.07317 -4.299 2.1e-05 ***
## PitchDifference 0.11989 0.01058 11.331 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3985 on 456 degrees of freedom
## Multiple R-squared: 0.2197, Adjusted R-squared: 0.218
## F-statistic: 128.4 on 1 and 456 DF, p-value: < 2.2e-16
3. Relative Price vs both of them
fit6 <- lm(PriceRelative ~ PitchDifference + WidthDifference)
summary(fit6)
##
## Call:
## lm(formula = PriceRelative ~ PitchDifference + WidthDifference)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.84163 -0.28484 -0.07241 0.17698 1.18778
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.10514 0.08304 -1.266 0.206077
## PitchDifference 0.06019 0.01590 3.785 0.000174 ***
## WidthDifference 0.11621 0.02356 4.933 1.14e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3886 on 455 degrees of freedom
## Multiple R-squared: 0.2593, Adjusted R-squared: 0.2561
## F-statistic: 79.65 on 2 and 455 DF, p-value: < 2.2e-16
Carefully observe the p-vlaues in the three cases. It also shows the internal correlation in between Pitch and Width difference. Indivisually, the values are almost 0 but taken together they increase a little. Although it depends in both of them, but they are correlated, so we should take only one of them under consideration when doing regression.
Also from the corrgram, we see a correlation in between relative price and total premium seats
fit7 <- lm(PriceRelative ~ SeatsPremium)
summary(fit7)
##
## Call:
## lm(formula = PriceRelative ~ SeatsPremium)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5023 -0.3862 -0.1129 0.2038 1.3445
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.598328 0.057266 10.448 <2e-16 ***
## SeatsPremium -0.003302 0.001584 -2.085 0.0376 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4489 on 456 degrees of freedom
## Multiple R-squared: 0.009447, Adjusted R-squared: 0.007275
## F-statistic: 4.349 on 1 and 456 DF, p-value: 0.03759
Relative price v/s Flight duration
fit10 <- lm(PriceRelative ~ FlightDuration)
summary(fit10)
##
## Call:
## lm(formula = PriceRelative ~ FlightDuration)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5507 -0.3373 -0.1167 0.2363 1.4694
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.370491 0.049454 7.492 3.56e-13 ***
## FlightDuration 0.015402 0.005913 2.605 0.0095 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4478 on 456 degrees of freedom
## Multiple R-squared: 0.01466, Adjusted R-squared: 0.0125
## F-statistic: 6.784 on 1 and 456 DF, p-value: 0.009498
So, we formulate a regression model of relative price as the dependant variable with independant variables as : flight duration, difference in width (which accounts for difference in pitch).
fit17 <- lm(PriceRelative ~ FlightDuration + WidthDifference)
summary(fit17)
##
## Call:
## lm(formula = PriceRelative ~ FlightDuration + WidthDifference)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.70689 -0.30285 -0.01623 0.13842 1.15018
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.001382 0.051926 -0.027 0.979
## FlightDuration 0.023053 0.005137 4.487 9.14e-06 ***
## WidthDifference 0.192198 0.015300 12.562 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3863 on 455 degrees of freedom
## Multiple R-squared: 0.2684, Adjusted R-squared: 0.2652
## F-statistic: 83.45 on 2 and 455 DF, p-value: < 2.2e-16
Hence, the answer to the question is : Airlines take extra money for providing comfort in the form of Pitch and Width difference and longer the comfort provided, more is the money taken.