Loading the dataframe

airlines <- read.csv(paste("BOMDELBOM.csv", sep = ""))
summary(airlines)

##   FlightNumber      Airline    DepartureCityCode ArrivalCityCode
##  6E 129 :  5   Air India: 41   BOM:130           BOM:175        
##  6E 155 :  5   IndiGo   : 80   DEL:175           DEL:130        
##  6E 167 :  5   Jet      :144                                    
##  6E 171 :  5   Spice Jet: 40                                    
##  6E 179 :  5                                                    
##  6E 181 :  5                                                    
##  (Other):275                                                    
##  DepartureTime   ArrivalTime   Departure FlyingMinutes   Aircraft  
##  Min.   : 225   Min.   :  20   AM:169    Min.   :125   Airbus:140  
##  1st Qu.: 755   1st Qu.: 935   PM:136    1st Qu.:135   Boeing:165  
##  Median :1035   Median :1215             Median :135               
##  Mean   :1250   Mean   :1329             Mean   :136               
##  3rd Qu.:1800   3rd Qu.:1925             3rd Qu.:140               
##  Max.   :2320   Max.   :2345             Max.   :145               
##                                                                    
##    PlaneModel     Capacity       SeatPitch       SeatWidth    
##  738    :113   Min.   :138.0   Min.   :29.00   Min.   :17.00  
##  A320   : 80   1st Qu.:156.0   1st Qu.:30.00   1st Qu.:17.00  
##  739    : 36   Median :180.0   Median :30.00   Median :17.00  
##  A321   : 25   Mean   :176.4   Mean   :30.26   Mean   :17.41  
##  A332   : 25   3rd Qu.:189.0   3rd Qu.:30.00   3rd Qu.:18.00  
##  77W    : 10   Max.   :303.0   Max.   :33.00   Max.   :18.00  
##  (Other): 16                                                  
##    DataCollectionDate     DateDeparture IsWeekend     Price      
##  Sep 10 2018:40       Nov 8 2018 : 62   No :264   Min.   : 2607  
##  Sep 13 2018:30       Nov 6 2018 : 59   Yes: 41   1st Qu.: 4051  
##  Sep 14 2018:30       Sep 21 2018: 23             Median : 4681  
##  Sep 15 2018:45       Sep 17 2018: 17             Mean   : 5395  
##  Sep 17 2018:39       Oct 19 2018: 16             3rd Qu.: 5725  
##  Sep 19 2018:81       Sep 26 2018: 16             Max.   :18015  
##  Sep 8 2018 :40       (Other)    :112                            
##  AdvancedBookingDays IsDiwali  DayBeforeDiwali DayAfterDiwali
##  Min.   : 2.0        No :184   No :246         No :243       
##  1st Qu.: 7.0        Yes:121   Yes: 59         Yes: 62       
##  Median :30.0                                                
##  Mean   :28.9                                                
##  3rd Qu.:52.0                                                
##  Max.   :61.0                                                
##                                                              
##   MarketShare      LoadFactor   
##  Min.   :13.20   Min.   :78.73  
##  1st Qu.:13.30   1st Qu.:83.32  
##  Median :15.40   Median :83.32  
##  Mean   :21.18   Mean   :85.13  
##  3rd Qu.:39.60   3rd Qu.:87.20  
##  Max.   :39.60   Max.   :94.06  
##

Linear Model

model1 <- Price~AdvancedBookingDays + Airline + Departure + IsWeekend + IsDiwali + DepartureCityCode + FlyingMinutes + SeatPitch + SeatWidth
fit1 <- lm(model1, data = airlines)
summary(fit1)

## 
## Call:
## lm(formula = model1, data = airlines)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2671.2 -1266.2  -456.4   517.4 11953.9 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -4292.94    8897.87  -0.482   0.6298    
## AdvancedBookingDays    -87.70      12.47  -7.033 1.43e-11 ***
## AirlineIndiGo         -577.17     778.64  -0.741   0.4591    
## AirlineJet            -120.75     436.69  -0.277   0.7823    
## AirlineSpice Jet     -1118.38     697.85  -1.603   0.1101    
## DeparturePM           -589.79     275.23  -2.143   0.0329 *  
## IsWeekendYes          -345.92     408.06  -0.848   0.3973    
## IsDiwaliYes           4346.80     568.14   7.651 2.90e-13 ***
## DepartureCityCodeDEL -1413.46     351.54  -4.021 7.38e-05 ***
## FlyingMinutes           38.97      29.27   1.331   0.1841    
## SeatPitch             -279.19     226.64  -1.232   0.2190    
## SeatWidth              868.58     507.54   1.711   0.0881 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2079 on 293 degrees of freedom
## Multiple R-squared:  0.2695, Adjusted R-squared:  0.2421 
## F-statistic: 9.828 on 11 and 293 DF,  p-value: 3.604e-15

Log Linear Model

model2 <- log(Price) ~ AdvancedBookingDays + Airline + Departure + IsWeekend + IsDiwali + DepartureCityCode + FlyingMinutes + SeatPitch + SeatWidth
fit2 <- lm(model2, data = airlines)
summary(fit2)

## 
## Call:
## lm(formula = model2, data = airlines)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57006 -0.19770 -0.05792  0.12935  1.24672 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           6.549474   1.243788   5.266 2.71e-07 ***
## AdvancedBookingDays  -0.014639   0.001743  -8.399 1.97e-15 ***
## AirlineIndiGo        -0.098622   0.108842  -0.906   0.3656    
## AirlineJet            0.001113   0.061043   0.018   0.9855    
## AirlineSpice Jet     -0.127169   0.097548  -1.304   0.1934    
## DeparturePM          -0.055844   0.038473  -1.452   0.1477    
## IsWeekendYes         -0.036748   0.057041  -0.644   0.5199    
## IsDiwaliYes           0.744738   0.079418   9.377  < 2e-16 ***
## DepartureCityCodeDEL -0.264017   0.049140  -5.373 1.58e-07 ***
## FlyingMinutes         0.008717   0.004092   2.131   0.0340 *  
## SeatPitch            -0.032824   0.031681  -1.036   0.3010    
## SeatWidth             0.122364   0.070947   1.725   0.0856 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2906 on 293 degrees of freedom
## Multiple R-squared:  0.3671, Adjusted R-squared:  0.3433 
## F-statistic: 15.45 on 11 and 293 DF,  p-value: < 2.2e-16

3) Log Linear Model is preferred because it has higher R Squared value.

Predicting Price using Model 1

pricemodel1 <- fitted.values(fit1)
airlines <- cbind(airlines, pricemodel1)
pricemodel2 <- fitted.values(fit2)
airlines <- cbind(airlines, pricemodel2)

QQPlot for Model 1

library(car)

## Loading required package: carData

qqPlot(fit1)

## [1] 182 183

QQPlot for Model 2

library(car)
qqPlot(fit2)

## [1] 182 183

4a) We can see significant deviation from the straight line but the difference is subjective, Hence normality tests should be conducted.

4b) Normality test For Model 1 (Anderson’s)

library(nortest)
ad.test(airlines$pricemodel1)

## 
##  Anderson-Darling normality test
## 
## data:  airlines$pricemodel1
## A = 2.9058, p-value = 2.559e-07

Normality test For Model 1 (Shapiro-Wilk’s)

shapiro.test(airlines$pricemodel1)

## 
##  Shapiro-Wilk normality test
## 
## data:  airlines$pricemodel1
## W = 0.96562, p-value = 1.224e-06

Normality test For Model 2 (Anderson’s)

library(nortest)
ad.test(airlines$pricemodel2)

## 
##  Anderson-Darling normality test
## 
## data:  airlines$pricemodel2
## A = 3.4876, p-value = 9.815e-09

Normality test For Model 2 (Shapiro-Wilk’s)

library(nortest)
shapiro.test(airlines$pricemodel2)

## 
##  Shapiro-Wilk normality test
## 
## data:  airlines$pricemodel2
## W = 0.96081, p-value = 2.589e-07

Results of Normality test

Both the models do not conform to normality as the p-values are less than 0.05 in both the tests. Hence the null hypothesis, that both conform has no normality errors can be easily rejected.

5) Linearity Visual Inspection (Model 1)

par(mfrow=c(1,1))
plot(fit1, 1)

Linearity Visual Inspection (Model 2)

par(mfrow=c(1,1))
plot(fit2, 1)

Linearity Visual Inspection Results

As can be seen from the previous diagrams, the first model’s and second model’s residual values do not conform to a linear pattern and are scattered all over the place.

6) Box Cox Transformation

library(caret)

## Loading required package: lattice

## Loading required package: ggplot2

priceTrans <- BoxCoxTrans(airlines$Price)
priceTrans

## Box-Cox Transformation
## 
## 305 data points used to estimate Lambda
## 
## Input data summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2607    4051    4681    5395    5725   18015 
## 
## Largest/Smallest: 6.91 
## Sample Skewness: 2.26 
## 
## Estimated Lambda: -0.8

Refined model after transformation

priceNew = predict(priceTrans, airlines$Price)
airlines <- cbind(airlines, priceNew)

fitPriceTransModel <- lm(priceNew ~ AdvancedBookingDays + Airline + Departure + IsWeekend + IsDiwali + DepartureCityCode + FlyingMinutes + SeatPitch + SeatWidth, data = airlines)

Refined model after transformation

summary(fitPriceTransModel)

## 
## Call:
## lm(formula = priceNew ~ AdvancedBookingDays + Airline + Departure + 
##     IsWeekend + IsDiwali + DepartureCityCode + FlyingMinutes + 
##     SeatPitch + SeatWidth, data = airlines)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -7.868e-04 -1.930e-04 -2.246e-05  1.777e-04  9.443e-04 
## 
## Coefficients:
##                        Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)           1.246e+00  1.218e-03 1022.900  < 2e-16 ***
## AdvancedBookingDays  -1.551e-05  1.707e-06   -9.084  < 2e-16 ***
## AirlineIndiGo        -1.190e-04  1.066e-04   -1.117  0.26509    
## AirlineJet            6.273e-06  5.979e-05    0.105  0.91652    
## AirlineSpice Jet     -1.075e-04  9.555e-05   -1.125  0.26147    
## DeparturePM          -3.419e-05  3.769e-05   -0.907  0.36506    
## IsWeekendYes         -2.680e-05  5.587e-05   -0.480  0.63183    
## IsDiwaliYes           7.987e-04  7.779e-05   10.267  < 2e-16 ***
## DepartureCityCodeDEL -2.844e-04  4.813e-05   -5.909 9.53e-09 ***
## FlyingMinutes         1.056e-05  4.008e-06    2.635  0.00887 ** 
## SeatPitch            -2.475e-05  3.103e-05   -0.798  0.42579    
## SeatWidth             1.143e-04  6.950e-05    1.645  0.10106    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0002847 on 293 degrees of freedom
## Multiple R-squared:  0.4183, Adjusted R-squared:  0.3965 
## F-statistic: 19.16 on 11 and 293 DF,  p-value: < 2.2e-16

7a) Visual comparison using qq plot

Linear Linear Model vs Trans Model

par(mfrow=c(1,2))
library("car")
qqPlot(airlines$pricemodel1)

## [1] 137 148

qqPlot(airlines$priceNew)

## [1] 183  56

Normality test For Trans Model (Anderson’s)

library(nortest)
ad.test(airlines$priceNew)

## 
##  Anderson-Darling normality test
## 
## data:  airlines$priceNew
## A = 1.6763, p-value = 0.0002619

Normality test For Trans model (Shapiro-Wilk’s)

shapiro.test(airlines$priceNew)

## 
##  Shapiro-Wilk normality test
## 
## data:  airlines$priceNew
## W = 0.98545, p-value = 0.003551

Results of Normality test

Both the models do not conform to normality as the p-values are less than 0.05 in both the tests. Hence the null hypothesis, that both has no normality errors can be easily rejected. Normality assumption violated.

7c) Visual comparison using qq plot

Linear Linear Model vs Trans Model

par(mfrow=c(1,2))
plot(fit1, 1)
plot(fitPriceTransModel,1)

Lecture11 Assignment

Loading the dataframe

Linear Model

Log Linear Model

3) Log Linear Model is preferred because it has higher R Squared value.

Predicting Price using Model 1

QQPlot for Model 1

QQPlot for Model 2

4b) Normality test For Model 1 (Anderson’s)

Normality test For Model 1 (Shapiro-Wilk’s)

Normality test For Model 2 (Anderson’s)

Normality test For Model 2 (Shapiro-Wilk’s)

Results of Normality test

5) Linearity Visual Inspection (Model 1)

Linearity Visual Inspection (Model 2)

Linearity Visual Inspection Results

6) Box Cox Transformation

Refined model after transformation

Refined model after transformation

7a) Visual comparison using qq plot

Normality test For Trans Model (Anderson’s)

Normality test For Trans model (Shapiro-Wilk’s)

Results of Normality test

7c) Visual comparison using qq plot