Variable Explanation

# Following are the variables involved in dataset concerning Airline Ticket Prices:
# •   FlightNumber (Factor Variable): These are unique numbers assigned for varied flight routes & airline
# •   Airline (Factor Variable): Dataset consists of four major airlines, with market collective market share of over 80% in domestic Indian market. These airlines are Air India, IndiGo, Jet Airways & Spicejet
# •   DepartureCity (Factor Variable): Dataset consists of 79 different departure cities
# •   ArrivalCity (Factor Variable): Dataset consists of 76 different arrival cities
# •   DepartureTime (Continuous Variable): Departure time is an important variable, since it may have influence on ticket prices. Prices may follow a pattern based on departure time on weekdays and weekends
# •   ArrivalTime (Continuous Variable): Arrival time is an important variable, since it may have influence on ticket prices. Prices may follow a pattern based on Arrival time on weekdays and weekends
# •   Departure (Factor Variable): Consists of only two unique entries – AM & PM. Departure, clubbed with DepartureTime may have a strong pattern of ticket price
# •   Fly (Factor Variable): This is an important variable with four levels. These are combinations of departure & arrival city being Metro & Non-Metro. A metro-to-metro flight is expected to be costliest while non metro-to-non metro cheapest
# •   FlyingTime (Continuous Variable): Flying time may be one of the most important factors with cost & ticket price implication. Higher flying time indicates higher distance & hence, fuel consumption
# •   Aircraft (Factor Variable): These are type of airplanes with some implication on seating capacity, arrangement, size etc.
# •   PlaneModel (Factor Variable): Within aircraft, PlaneModel indicates different models
# •   Capacity (Continuous Variable): Number of seats in airplane: Higher number of seats may tend to lower down price per ticket (economies of scale)
# •   SeatPitch (Continuous Variable): It is distance between two consecutive seats (back and front). Higher the SeatPitch, higher will be the legroom. It may have a small impact on airline ticket price
# •   SeatWidth (Continuous Variable): It is distance between one armrest to the other
# •   DataCollectionDate (Factor Variable): All entries in dataset were collected in a period of 7 days, 18th Oct 2018 – 24th Oct 2018
# •   DateDeparture (Factor Variable): 
# •   DayDeparture (Factor Variable): This variable has seven days in a week as its entries. 
# •   Weekend (Factor Variable): This variable has only two entries, yes or no. On weekend, since the traffic if higher or demand is more, prices are expected to be more
# •   Price (Continuous Variable): Dependent variable
# •   AdBookDays (Continuous Variable): Difference between departure date and date of booking
# •   Diwali (Factor Variable): Diwali festival in India stimulates high air travel, hence higher demand. Therefore, ticket price on Diwali could be much higher than any regular day
# •   DayBeforeDiwali (Factor Variable) Diwali festival in India stimulates high air travel, hence higher demand. Therefore, ticket price one day before Diwali could be much higher than any regular day
# •   DayAfterDiwali (Factor Variable) Diwali festival in India stimulates high air travel, hence higher demand. Therefore, ticket price one day after Diwali could be much higher than any regular day
knitr::opts_chunk$set(echo = TRUE)
setwd("d:/IIML/Term 5/DAM/Project/")
df <- read.csv("FourIndianAirlinesData.csv")
colnames(df)
##  [1] "FlightNumber"       "Airline"            "DepartureCity"     
##  [4] "ArrivalCity"        "DepartureTime"      "ArrivalTime"       
##  [7] "Departure"          "Fly"                "FlyingTime"        
## [10] "Aircraft"           "PlaneModel"         "Capacity"          
## [13] "SeatPitch"          "SeatWidth"          "DataCollectionDate"
## [16] "DateDeparture"      "DayDeparture"       "Weekend"           
## [19] "Price"              "AdvBookDays"        "Diwali"            
## [22] "DayBeforeDiwali"    "DayAfterDiwali"
attach(df)
str(df)
## 'data.frame':    8187 obs. of  23 variables:
##  $ FlightNumber      : Factor w/ 1765 levels "6E 101","6E 102",..: 203 301 404 889 934 1146 1255 1358 1514 790 ...
##  $ Airline           : Factor w/ 4 levels "Air India","IndiGo",..: 2 2 2 3 3 3 1 1 1 2 ...
##  $ DepartureCity     : Factor w/ 79 levels "Agartala","Agatti",..: 54 11 54 4 51 51 51 63 68 29 ...
##  $ ArrivalCity       : Factor w/ 76 levels "Agartala","Ahmedabad",..: 13 66 61 48 8 51 24 51 51 2 ...
##  $ DepartureTime     : int  2250 820 1810 2110 305 900 200 730 1225 500 ...
##  $ ArrivalTime       : int  100 945 2000 2255 445 1125 320 910 205 705 ...
##  $ Departure         : Factor w/ 2 levels "AM","PM": 2 1 2 2 1 1 1 1 2 1 ...
##  $ Fly               : Factor w/ 4 levels "MM","MN","NM",..: 2 4 2 3 2 1 2 3 3 4 ...
##  $ FlyingTime        : int  130 85 110 90 100 145 80 100 100 125 ...
##  $ Aircraft          : Factor w/ 4 levels "Aerospatiale",..: 2 2 2 3 3 3 2 2 2 2 ...
##  $ PlaneModel        : Factor w/ 122 levels "-","   Boeing 737-800 (738)",..: 68 68 68 109 111 111 70 60 70 61 ...
##  $ Capacity          : int  180 180 180 168 168 168 182 144 182 180 ...
##  $ SeatPitch         : num  30 30 30 30 30 30 31.5 29.5 31.5 30 ...
##  $ SeatWidth         : num  18 18 18 17 17 17 17.5 17.8 17.5 18 ...
##  $ DataCollectionDate: Factor w/ 7 levels "Oct 18 2018",..: 6 6 6 6 6 6 6 6 6 3 ...
##  $ DateDeparture     : Factor w/ 29 levels "Nov 03 2018",..: 22 22 22 22 22 22 22 22 22 20 ...
##  $ DayDeparture      : Factor w/ 7 levels "Friday","Monday",..: 6 6 6 6 6 6 6 6 6 2 ...
##  $ Weekend           : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Price             : int  6623 4051 6623 15297 9405 12188 1892 11134 10454 5625 ...
##  $ AdvBookDays       : int  1 1 1 1 1 1 1 1 1 2 ...
##  $ Diwali            : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ DayBeforeDiwali   : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ DayAfterDiwali    : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...

Histogram

# plotting histogram
hist(df$Price ,main = "Histogram of variable Price",
xlab = "Price",col = c("gray"))

hist(df$FlyingTime ,main = "Histogram of variable FlyingTime",
xlab = "FlyingTime",col = c("Blue"))

hist(df$AdvBookDays ,main = "Histogram of variable AdvBookDays",
xlab = "AdvBookDays",col = c("gray"))

Boxplot for Price, FlyingTime and AdvBookDays

boxplot(df$Price,width = 0.5,
horizontal = TRUE,main = "boxplot for variable Price",
xlab = "Price",col = c("lightblue"))

boxplot(df$FlyingTime,width = 0.5,
horizontal = TRUE,main = "boxplot for variable FlyingTime",
xlab = "FlyingTime",col = c("gray"))

boxplot(df$AdvBookDays,width = 0.5,
horizontal = TRUE,main = "boxplot for variable AdvBookDays",
xlab = "AdvBookDays",col = c("lightblue"))

Pie Chart for Airline and Departure

# Pie Chart with Percentages (Aieline)
slices <- c(1543, 3811, 1905, 928)
lbls <- c("Air India", "IndiGo", "Jet Airways", "Spice Jet")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct) # add percents to labels
lbls <- paste(lbls,"%",sep="") # ad % to labels
pie(slices,labels = lbls, col=rainbow(length(lbls)),
   main="Pie Chart of Airlines")

#Departure
slices <- c(874,8187)
lbls <- c("AM", "PM")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct) # add percents to labels
lbls <- paste(lbls,"%",sep="") # ad % to labels
pie(slices,labels = lbls, col=rainbow(length(lbls)),
   main="Pie Chart of Departure")

DescribeBy

library(psych)
describeBy(Price,Airline, mat = TRUE)[,c(2,4:7, 10:12)]
##          group1    n     mean       sd median  min   max range
## X11   Air India 1543 6712.027 4455.835   5490 1025 32003 30978
## X12      IndiGo 3811 4979.133 2595.443   4355  637 43454 42817
## X13 Jet Airways 1905 6134.127 3126.986   5569 1145 38560 37415
## X14   Spice Jet  928 4815.122 2435.603   4207 1299 21639 20340
describeBy(Price,Departure, mat = TRUE)[,c(2,4:7, 10:12)]
##     group1    n     mean       sd median  min   max range
## X11     AM 4802 5686.845 3495.559   4780 1025 43454 42429
## X12     PM 3385 5370.117 2772.478   4775  637 32003 31366
describeBy(Price,Weekend, mat = TRUE)[,c(2,4:7, 10:12)]
##     group1    n     mean       sd median min   max range
## X11     No 6774 5500.027 3267.063   4671 637 43454 42817
## X12    Yes 1413 5823.706 2970.390   5363 999 32003 31004
describeBy(Price,DayDeparture, mat = TRUE)[,c(2,4:7, 10:12)]
##        group1    n     mean       sd median  min   max range
## X11    Friday  118 5815.271 2884.999 5434.0 2134 25025 22891
## X12    Monday 1298 6189.549 3564.124 5484.5  999 38560 37561
## X13  Saturday  624 5566.455 2289.683 5240.5  999 20422 19423
## X14    Sunday  789 6027.160 3401.674 5417.0 1529 32003 30474
## X15  Thursday 1786 5243.237 3065.742 4361.0  637 22238 21601
## X16   Tuesday 2761 5353.585 3335.518 4511.0  974 43454 42480
## X17 Wednesday  811 5414.641 2843.221 4750.0 1025 21992 20967

Grouped Summary

tapply(Price,Weekend, summary)
## $No
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     637    3380    4671    5500    6890   43454 
## 
## $Yes
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     999    4051    5363    5824    6954   32003
tapply(Price,Departure, summary)
## $AM
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1025    3491    4780    5687    6989   43454 
## 
## $PM
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     637    3518    4775    5370    6753   32003
tapply(Price,Fly, summary)
## $MM
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2703    4440    6435    6633    7791   43454 
## 
## $MN
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     637    3782    4900    5729    6932   31931 
## 
## $NM
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1336    3579    4750    5594    6756   32003 
## 
## $NN
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     974    2817    3994    4666    5776   38560
tapply(Price,DayDeparture, summary)
## $Friday
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2134    4065    5434    5815    6958   25025 
## 
## $Monday
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     999    4037    5484    6190    7410   38560 
## 
## $Saturday
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     999    4067    5240    5566    6885   20422 
## 
## $Sunday
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1529    4043    5417    6027    7046   32003 
## 
## $Thursday
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     637    3203    4361    5243    6418   22238 
## 
## $Tuesday
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     974    3209    4511    5354    6623   43454 
## 
## $Wednesday
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1025    3585    4750    5415    6934   21992
tapply(Price,Airline, summary)
## $`Air India`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1025    4099    5490    6712    7695   32003 
## 
## $IndiGo
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     637    3208    4355    4979    6361   43454 
## 
## $`Jet Airways`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1145    4052    5569    6134    7488   38560 
## 
## $`Spice Jet`
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1299    3156    4207    4815    5923   21639

Boxplots

library(ggpubr)
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
## Loading required package: magrittr
ggplot(df, aes(x=Weekend, y=Price)) + geom_boxplot()

ggplot(df, aes(x=Airline, y=Price)) + geom_boxplot()

ggplot(df, aes(x=Fly, y=Price)) + geom_boxplot()

ggplot(df, aes(x=Departure, y=Price)) + geom_boxplot()

ggplot(df, aes(x=DayDeparture, y=Price)) + geom_boxplot()

Mean Plots

library(gplots)
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
plotmeans(Price~Weekend, cex=0.8)

plotmeans(Price~Airline, cex=0.8)
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

plotmeans(Price~Fly, cex=0.8)

plotmeans(Price~Departure, cex=0.8)

plotmeans(Price~DayDeparture, cex=0.8)

Correlation - Bivariate

cor(Price,AdvBookDays, method = "pearson")
## [1] -0.2401463
cor.test(Price,AdvBookDays, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  Price and AdvBookDays
## t = -22.381, df = 8185, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2604534 -0.2196269
## sample estimates:
##        cor 
## -0.2401463
cor(Price,FlyingTime, method = "pearson")
## [1] 0.3143604
cor.test(Price,FlyingTime, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  Price and FlyingTime
## t = 29.959, df = 8185, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2947053 0.3337496
## sample estimates:
##       cor 
## 0.3143604
cor(Price,FlyingTime, method = "pearson")
## [1] 0.3143604
cor.test(Price,FlyingTime, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  Price and FlyingTime
## t = 29.959, df = 8185, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2947053 0.3337496
## sample estimates:
##       cor 
## 0.3143604

Scatter plots

library(ggplot2)
library(gdata)
## gdata: Unable to locate valid perl interpreter
## gdata: 
## gdata: read.xls() will be unable to read Excel XLS and XLSX files
## gdata: unless the 'perl=' argument is used to specify the location
## gdata: of a valid perl intrpreter.
## gdata: 
## gdata: (To avoid display of this message in the future, please
## gdata: ensure perl is installed and available on the executable
## gdata: search path.)
## gdata: Unable to load perl libaries needed by read.xls()
## gdata: to support 'XLX' (Excel 97-2004) files.
## 
## gdata: Unable to load perl libaries needed by read.xls()
## gdata: to support 'XLSX' (Excel 2007+) files.
## 
## gdata: Run the function 'installXLSXsupport()'
## gdata: to automatically download and install the perl
## gdata: libaries needed to support Excel XLS and XLSX formats.
## 
## Attaching package: 'gdata'
## The following object is masked from 'package:stats':
## 
##     nobs
## The following object is masked from 'package:utils':
## 
##     object.size
## The following object is masked from 'package:base':
## 
##     startsWith
library(reshape2)

g1<-ggplot(df, aes(x=AdvBookDays, y=Price)) + geom_point()
g1

g2<-ggplot(df, aes(x=FlyingTime, y=Price)) + geom_point()
g2

Two sided tables

t1=as.data.frame(tapply(df$Price, list(df$Airline,df$DayDeparture), mean))
t1=cbind(row.names(t1),t1)
colnames(t1)[1]="Airline"
t1
##                 Airline   Friday   Monday Saturday   Sunday Thursday
## Air India     Air India 8452.333 6940.439 6550.514 8271.277 6301.280
## IndiGo           IndiGo 5482.904 5770.741 5261.912 5254.350 4597.084
## Jet Airways Jet Airways 5240.409 6476.875 5953.511 6138.136 5951.884
## Spice Jet     Spice Jet 4708.826 5941.024 4596.714 5355.231 4414.737
##              Tuesday Wednesday
## Air India   6710.780  5763.346
## IndiGo      4742.719  4769.169
## Jet Airways 6058.924  6395.697
## Spice Jet   4487.682  4731.229
t2=as.data.frame(tapply(df$Price, list(df$Airline,df$Weekend), mean))
t2=cbind(row.names(t2),t2)
colnames(t2)[1]="Airline"
t2
##                 Airline       No      Yes
## Air India     Air India 6536.871 7576.354
## IndiGo           IndiGo 4916.839 5257.935
## Jet Airways Jet Airways 6148.546 6050.439
## Spice Jet     Spice Jet 4744.386 5115.249
t3=as.data.frame(tapply(df$Price, list(df$Airline,df$Fly), mean))
t3=cbind(row.names(t3),t3)
colnames(t3)[1]="Airline"
t3
##                 Airline       MM       MN       NM       NN
## Air India     Air India 7924.236 6726.250 6524.852 5376.669
## IndiGo           IndiGo 5838.798 5007.977 5106.394 4572.920
## Jet Airways Jet Airways 7104.712 6335.525 5834.895 5079.274
## Spice Jet     Spice Jet 5543.304 4989.598 4768.590 4410.424
t4=as.data.frame(tapply(df$Price, list(df$Airline,df$Departure), mean))
t4=cbind(row.names(t4),t4)
colnames(t4)[1]="Airline"
t4
##                 Airline       AM       PM
## Air India     Air India 6701.908 6812.645
## IndiGo           IndiGo 5144.172 4854.327
## Jet Airways Jet Airways 6018.686 6234.288
## Spice Jet     Spice Jet 4741.458 6007.389

Bar plots

data_t1 <- melt(t1, id.vars='Airline')
colnames(data_t1)[2:3]<-c("DayDeparture","Price")
g3<-ggplot(data_t1, aes(fill=Airline, y=Price, x=DayDeparture)) + geom_bar(position="dodge", stat="identity")
g3

data_t2 <- melt(t2, id.vars='Airline')
colnames(data_t2)[2:3]<-c("Weekend","Price")
g4<-ggplot(data_t2, aes(fill=Airline, y=Price, x=Weekend)) + geom_bar(position="dodge", stat="identity")
g4

data_t3 <- melt(t3, id.vars='Airline')
colnames(data_t3)[2:3]<-c("Fly","Price")
g5<-ggplot(data_t3, aes(fill=Airline, y=Price, x=Fly)) + geom_bar(position="dodge", stat="identity")
g5

data_t4 <- melt(t4, id.vars='Airline')
colnames(data_t4)[2:3]<-c("Departure","Price")
g6<-ggplot(data_t4, aes(fill=Airline, y=Price, x=Departure)) + geom_bar(position="dodge", stat="identity")
g6

# frequency table
tab1 <- prop.table(table((Airline)))
tab2 <- round(tab1*100,2)

# bar-plot
bp <- barplot(tab2,
        xlab = "Airline", ylab = "Percent (%)",
        main = "Distribution of Flights",
        col = c("lightblue"), 
        beside = TRUE,
        ylim = c(0, 60))
text(bp, 0, round(tab2, 2),cex=1,pos=3) 

# frequency table
tab1 <- prop.table(table(Fly))
tab2 <- round(tab1*100,2)

# bar-plot
bp <- barplot(tab2,
        xlab = "Fly", ylab = "Percent (%)",
        main = "",
        col = c("lightblue"), 
        beside = TRUE,
        ylim = c(0, 50))
text(bp, 0, round(tab2, 2),cex=1,pos=3)

# frequency table
tab1 <- prop.table(table(Weekend))
tab2 <- round(tab1*100,2)

# bar-plot
bp <- barplot(tab2,
        xlab = "Weekend", ylab = "Percent (%)",
        main = "Flights for Weekend/Weekdays",
        col = c("lightblue"), 
        beside = TRUE,
        ylim = c(0, 90))
text(bp, 0, round(tab2, 2),cex=1,pos=3)

# frequency table
tab1 <- prop.table(table(Departure))
tab2 <- round(tab1*100,2)

# bar-plot
bp <- barplot(tab2,
        xlab = "Departure (AM/PM)", ylab = "Percent (%)",
        main = "Flights for Weekend/Weekdays",
        col = c("lightblue"), 
        beside = TRUE,
        ylim = c(0, 90))
text(bp, 0, round(tab2, 2),cex=1,pos=3)

Bar Chart for Combination of Two Discreate Variable

library(ggplot2)
airlineData.df<-df

# bar plot for variable Airline by fly
ggplot(data = airlineData.df) + 
  geom_bar(mapping = aes(x = Airline, fill = Fly))

 # bar chart for combination of two discreate variable
ggplot(data = airlineData.df) + 
  geom_bar(mapping = aes(x = Airline, fill = Fly), position = "dodge")

# bar plot for variable Airline by weekend/weekday
ggplot(data = airlineData.df) + 
  geom_bar(mapping = aes(x = Airline, fill = Weekend))

 # bar chart for combination of two discreate variable
ggplot(data = airlineData.df) + 
  geom_bar(mapping = aes(x = Airline, fill = Weekend), position = "dodge")

# bar plot for variable Airline by Departure (AM/PM)
ggplot(data = airlineData.df) + 
  geom_bar(mapping = aes(x = Airline, fill = Departure))

 # bar chart for combination of two discreate variable
ggplot(data = airlineData.df) + 
  geom_bar(mapping = aes(x = Airline, fill = Departure), position = "dodge")

# bar plot for variable Airline by weekend/weekday
ggplot(data = airlineData.df) + 
  geom_bar(mapping = aes(x = Airline, fill = Diwali))

 # bar chart for combination of two discreate variable
ggplot(data = airlineData.df) + 
  geom_bar(mapping = aes(x = Airline, fill = Diwali), position = "dodge")

Correlogram

library(corrgram)
## Registered S3 method overwritten by 'seriation':
##   method         from 
##   reorder.hclust gclus
airlineSubset <- df[,c('FlyingTime','Price','AdvBookDays')]

corMat <- cor(airlineSubset, use = "complete")

round(corMat, 3)
##             FlyingTime  Price AdvBookDays
## FlyingTime       1.000  0.314      -0.004
## Price            0.314  1.000      -0.240
## AdvBookDays     -0.004 -0.240       1.000
library(corrplot)
## corrplot 0.84 loaded
corrplot(cor(airlineSubset), method = "circle")

corrgram(df[,c('Price','FlyingTime','AdvBookDays')],
lower.panel=panel.shade,
upper.panel=panel.conf,
text.panel=panel.txt,main="corrgram",)

library(PerformanceAnalytics)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:gdata':
## 
##     first, last
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:gplots':
## 
##     textplot
## The following object is masked from 'package:graphics':
## 
##     legend
chart.Correlation(df[,c('Price','FlyingTime','AdvBookDays')])

Regression

Two sided tables

m2 <- lm(Price ~ AdvBookDays
             + Capacity
             + Airline
             + Departure
             + Weekend
             + Diwali
             + FlyingTime
             + SeatWidth
             + SeatPitch,
            data = df)
summary(m2)
## 
## Call:
## lm(formula = Price ~ AdvBookDays + Capacity + Airline + Departure + 
##     Weekend + Diwali + FlyingTime + SeatWidth + SeatPitch, data = df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -6429  -1584   -581    653  37035 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         2944.0912  1291.5782   2.279   0.0227 *  
## AdvBookDays          -78.9776     3.3723 -23.420  < 2e-16 ***
## Capacity               4.9094     0.9367   5.241 1.63e-07 ***
## AirlineIndiGo      -1880.4286   126.3666 -14.881  < 2e-16 ***
## AirlineJet Airways  -602.3146   109.8470  -5.483 4.30e-08 ***
## AirlineSpice Jet   -1878.9739   130.5904 -14.388  < 2e-16 ***
## DeparturePM          -73.6311    71.4847  -1.030   0.3030    
## WeekendYes          -231.7869    91.8946  -2.522   0.0117 *  
## DiwaliYes           -200.7778    69.7397  -2.879   0.0040 ** 
## FlyingTime            28.2956     0.8998  31.446  < 2e-16 ***
## SeatWidth            -36.0753   113.2461  -0.319   0.7501    
## SeatPitch             59.1792    44.5663   1.328   0.1843    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2844 on 8175 degrees of freedom
## Multiple R-squared:  0.2208, Adjusted R-squared:  0.2198 
## F-statistic: 210.6 on 11 and 8175 DF,  p-value: < 2.2e-16
summary(m2)$coef
##                        Estimate   Std. Error     t value      Pr(>|t|)
## (Intercept)         2944.091226 1291.5782282   2.2794525  2.266583e-02
## AdvBookDays          -78.977565    3.3722811 -23.4196266 1.862825e-117
## Capacity               4.909391    0.9366660   5.2413461  1.633928e-07
## AirlineIndiGo      -1880.428567  126.3665675 -14.8807442  1.943284e-49
## AirlineJet Airways  -602.314571  109.8469876  -5.4832143  4.301228e-08
## AirlineSpice Jet   -1878.973898  130.5903651 -14.3883042  2.251747e-46
## DeparturePM          -73.631115   71.4846672  -1.0300267  3.030279e-01
## WeekendYes          -231.786925   91.8946219  -2.5223122  1.167745e-02
## DiwaliYes           -200.777812   69.7397416  -2.8789584  4.000263e-03
## FlyingTime            28.295550    0.8998066  31.4462589 5.313716e-205
## SeatWidth            -36.075291  113.2460956  -0.3185566  7.500709e-01
## SeatPitch             59.179242   44.5663329   1.3278912  1.842511e-01
predictions <- predict(m2, df)
# Prediction error, RMSE
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'lattice'
## The following object is masked from 'package:corrgram':
## 
##     panel.fill
RMSE(predictions, df$Price)
## [1] 2842.118
m3 <- lm(Price ~ AdvBookDays
             + Capacity
             + Airline
             + Weekend
             + Diwali
             + FlyingTime,
            data = df)
summary(m3)
## 
## Call:
## lm(formula = Price ~ AdvBookDays + Capacity + Airline + Weekend + 
##     Diwali + FlyingTime, data = df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -6392  -1582   -581    659  37074 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         4117.5131   186.8627  22.035  < 2e-16 ***
## AdvBookDays          -78.9636     3.3721 -23.416  < 2e-16 ***
## Capacity               5.0797     0.9252   5.490 4.13e-08 ***
## AirlineIndiGo      -2002.1397    87.1808 -22.965  < 2e-16 ***
## AirlineJet Airways  -669.0198    98.2959  -6.806 1.07e-11 ***
## AirlineSpice Jet   -1965.3467   118.3120 -16.612  < 2e-16 ***
## WeekendYes          -230.7086    91.8949  -2.511  0.01207 *  
## DiwaliYes           -201.1643    69.7427  -2.884  0.00393 ** 
## FlyingTime            28.3205     0.8992  31.494  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2844 on 8178 degrees of freedom
## Multiple R-squared:  0.2204, Adjusted R-squared:  0.2197 
## F-statistic: 289.1 on 8 and 8178 DF,  p-value: < 2.2e-16
summary(m3)$coef
##                        Estimate  Std. Error    t value      Pr(>|t|)
## (Intercept)         4117.513093 186.8627388  22.034961 1.406507e-104
## AdvBookDays          -78.963634   3.3721385 -23.416486 1.989816e-117
## Capacity               5.079675   0.9251836   5.490451  4.129230e-08
## AirlineIndiGo      -2002.139659  87.1807827 -22.965378 3.713179e-113
## AirlineJet Airways  -669.019810  98.2959165  -6.806181  1.072921e-11
## AirlineSpice Jet   -1965.346742 118.3119852 -16.611561  5.694290e-61
## WeekendYes          -230.708572  91.8948700  -2.510571  1.207276e-02
## DiwaliYes           -201.164310  69.7427371  -2.884376  3.932138e-03
## FlyingTime            28.320512   0.8992316  31.494125 1.371434e-205
predictions <- predict(m3, df)
# Prediction error, RMSE
library(caret)
RMSE(predictions, df$Price)
## [1] 2842.805
m4 <- lm(Price ~ AdvBookDays
             + Capacity
             + Airline
             + Weekend
             + Diwali
             + FlyingTime
             +FlyingTime*Weekend
             +FlyingTime*Diwali,
            data = df)
summary(m4)
## 
## Call:
## lm(formula = Price ~ AdvBookDays + Capacity + Airline + Weekend + 
##     Diwali + FlyingTime + FlyingTime * Weekend + FlyingTime * 
##     Diwali, data = df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -6483  -1590   -586    646  37367 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3.741e+03  2.183e+02  17.137  < 2e-16 ***
## AdvBookDays           -7.893e+01  3.368e+00 -23.436  < 2e-16 ***
## Capacity               5.061e+00  9.242e-01   5.476 4.47e-08 ***
## AirlineIndiGo         -2.003e+03  8.708e+01 -23.004  < 2e-16 ***
## AirlineJet Airways    -6.710e+02  9.818e+01  -6.835 8.79e-12 ***
## AirlineSpice Jet      -1.964e+03  1.182e+02 -16.621  < 2e-16 ***
## WeekendYes            -2.209e+02  2.974e+02  -0.743 0.457700    
## DiwaliYes              7.500e+02  2.276e+02   3.295 0.000987 ***
## FlyingTime             3.169e+01  1.340e+00  23.654  < 2e-16 ***
## WeekendYes:FlyingTime -3.202e-02  2.535e+00  -0.013 0.989921    
## DiwaliYes:FlyingTime  -8.442e+00  1.922e+00  -4.391 1.14e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2841 on 8176 degrees of freedom
## Multiple R-squared:  0.2226, Adjusted R-squared:  0.2216 
## F-statistic:   234 on 10 and 8176 DF,  p-value: < 2.2e-16
summary(m4)$coef
##                            Estimate  Std. Error      t value      Pr(>|t|)
## (Intercept)            3.740609e+03 218.2819884  17.13659210  1.058353e-64
## AdvBookDays           -7.893186e+01   3.3680024 -23.43580832 1.303642e-117
## Capacity               5.061170e+00   0.9242001   5.47627056  4.472722e-08
## AirlineIndiGo         -2.003086e+03  87.0768091 -23.00367184 1.625007e-113
## AirlineJet Airways    -6.710496e+02  98.1792155  -6.83494598  8.792286e-12
## AirlineSpice Jet      -1.964060e+03 118.1706436 -16.62054125  4.929066e-61
## WeekendYes            -2.208859e+02 297.4191904  -0.74267527  4.576996e-01
## DiwaliYes              7.499885e+02 227.5938898   3.29529262  9.873788e-04
## FlyingTime             3.168749e+01   1.3396324  23.65386664 1.055018e-119
## WeekendYes:FlyingTime -3.201871e-02   2.5346557  -0.01263237  9.899214e-01
## DiwaliYes:FlyingTime  -8.441762e+00   1.9224457  -4.39115742  1.141679e-05
predictions <- predict(m4, df)
# Prediction error, RMSE
library(caret)
RMSE(predictions, df$Price)
## [1] 2838.965
library(interactions)
interact_plot(m2, pred = "AdvBookDays", modx = "Diwali", 
              main.title = "Interaction Plot between Advance Booking Days and Diwali",
              x.label = "Advance Booking (Days)", y.label = "Ticket Price (INR)",
              colors = c("black", "black"), 
              modxvals = c("No","Yes"),
              interval = FALSE)
## Warning: AdvBookDays and Diwali are not included in an interaction with one
## another in the model.

library(interactions)
interact_plot(m3, pred = "AdvBookDays", modx = "Diwali", 
              main.title = "Interaction Plot between Advance Booking Days and Diwali",
              x.label = "Advance Booking (Days)", y.label = "Ticket Price (INR)",
              colors = c("black", "black"), 
              modxvals = c("No","Yes"),
              interval = FALSE)
## Warning: AdvBookDays and Diwali are not included in an interaction with one
## another in the model.

library(interactions)
interact_plot(m4, pred = "AdvBookDays", modx = "Diwali", 
              main.title = "Interaction Plot between Advance Booking Days and Diwali",
              x.label = "Advance Booking (Days)", y.label = "Ticket Price (INR)",
              colors = c("black", "black"), 
              modxvals = c("No","Yes"),
              interval = FALSE)
## Warning: AdvBookDays and Diwali are not included in an interaction with one
## another in the model.