Effect of Seasonality and Advance Booking on New Delhi - Mumbai Air Ticket Prices

Sameer Mathur

Delhi-Mumbai Airline Pricing Analysis

Factors influencing ticket prices

The following are the two major factors influencing airline ticket prcing:

  1. Seasonality

  2. Advance Booking

Part 1: Marketing Research Focus

Effect of Seasonality on Ticket Prices

Seasonality

The following are the three types of seasonality in this dataset:

  1. Time of departure (AM vs. PM)

  2. Day of the week (Weekend vs. Weekdays)

  3. Special events (e.g. Diwali)

Seasonality: Research Questions

  1. Are morning flights more expensive than evening flights?

  2. Are weekend flights more expensive than weekday flights?

  3. Are flights more expensive on the day before / after Diwali?

Data

  1. BOM-DEL & DEL-BOM - one way airfares

  2. 4 Airlines

    • Air India
    • IndiGo
    • Jet Airways and
    • Spice Jet
  3. 63 unique flights

Data

Count of flights by airline

# Count of flights by airline
library(data.table)
dt <- data.table(airline.df)
dt[, list(UniqueFlights = length(unique(FlightNumber)), DataCount=.N), by = list(Airline)]
     Airline UniqueFlights DataCount
1:       Jet            30       144
2: Spice Jet             8        40
3:    IndiGo            16        80
4: Air India             9        41
length(unique(FlightNumber)) #Total Unique Flights
[1] 63
nrow(dt) #Total Data
[1] 305

Average Airfare on all DEL-BOM and BOM-DEL flights

library(gplots)
plotmeans(Price ~ MetroArrival, data = airline.df, mean.labels = TRUE,
          main = "Average Airfare on All BOM-DEL, DEL-BOM Flights",
          xlab = "All Airlines", legends = FALSE)

plot of chunk unnamed-chunk-5

Scatterplot of airfare

# scatterplot of airfare
plot(Price, type = "p", data = airline.df,
     main = "Scatterplot of Airfare")
abline(h = mean(airline.df$Price), col="red", lwd=3, lty=2)

plot of chunk unnamed-chunk-7

Average airfares by airline

agg1 <- aggregate(Price, by = list(Airline), mean)
colnames(agg1) <- c("Airline", "AverageAirfares")
agg1
    Airline AverageAirfares
1 Air India        6335.000
2    IndiGo        4879.525
3       Jet        5496.146
4 Spice Jet        5094.850
# Average airfare for all airlines
avg = mean(airline.df$Price)
round(avg,2)
[1] 5394.54

Boxplot of airfares by airline

# boxplot of airfares by airline
boxplot(Price ~ Airline, data = airline.df,
        main = "Distribution of Airfares by Airlines",
        ylab = "Airline", xlab = "Airfare (INR)",
        horizontal = TRUE)

plot of chunk unnamed-chunk-10

Seasonality Data

Time of departure (AM vs. PM)

Average airfares and number of unique flights by departure time

# average airfares and unique flights by departure
library(data.table)
dt <- data.table(airline.df)
dt[, list(UniqueFlights = length(unique(FlightNumber)),
          DataCount = .N,
                   Price = round(mean(Price), 2)), 
            by = list(Departure)]
   Departure UniqueFlights DataCount   Price
1:        AM            34       169 5598.89
2:        PM            29       136 5140.61

Boxplot of airfares by Departure Time (AM/PM)

# boxplot of airfares by Departure Time (AM/PM)
boxplot(Price ~ Departure, data = airline.df,
        main = "Distribution of Airfares by Departure Time (AM/PM)",
        ylab = "Departure", xlab = "Airfare (INR)",
        horizontal = TRUE)

plot of chunk unnamed-chunk-13

Mean plots of airfares by Departure Time (AM/PM)

# mean plot of airfares by Departure Time (AM/PM)
library(gplots)
plotmeans(Price ~ Departure, data = airline.df,
          main = "Mean plot of Airfares by Departure Time (AM/PM)",
          xlab = "Departure", ylab = "Airfare (INR)",
          mean.labels = TRUE, frame = TRUE)

plot of chunk unnamed-chunk-15

Seasonality Data

Day of the week (Weekend vs. Weekdays)

Average airfares by Weekend / Weekday

# average airfares and unique flights by Weekend / Weekday
library(data.table)
dt <- data.table(airline.df)
dt[, list(NoOfFlights = length(unique(FlightNumber)),
                   Price = round(mean(Price), 2)), 
            by = list(IsWeekend)]
   IsWeekend NoOfFlights   Price
1:        No          63 5518.55
2:       Yes          29 4596.07

Boxplot of airfares by weekend

# boxplot of airfares by Weekday/Weekend
boxplot(Price ~ IsWeekend, data = airline.df,
        main = "Distribution of Airfares by Weekday/Weekend",
        ylab = "Weekend", xlab = "Airfare (INR)",
        horizontal = TRUE)

plot of chunk unnamed-chunk-18

Average airfares by Weekend / Weekday

# mean plot of airfares by Weekday/Weekend
library(gplots)
plotmeans(Price ~ IsWeekend, data = airline.df,
          main = "Average Airfares by Weekday/Weekend",
          xlab = "Weekend", ylab = "Airfare (INR)",
          mean.labels = TRUE, frame = TRUE)

plot of chunk unnamed-chunk-20

Seasonality Data

Holiday/Festival (e.g. Diwali)

Average airfares at Diwali

# average airfares at Diwali
library(data.table)
dt <- data.table(airline.df)
dt[, list(NoOfFlights = length(unique(FlightNumber)),
                   Price = round(mean(Price), 2)), 
            by = list(IsDiwali)]
   IsDiwali NoOfFlights   Price
1:        1          62 5897.48
2:        0          63 5063.81

Boxplot of airfares by Diwali

# boxplot of airfares by Diwali
boxplot(Price ~ IsDiwali, data = airline.df,
        main = "Distribution of Airfares by Diwali",
        ylab = "Diwali", xlab = "Airfare (INR)",
        horizontal = TRUE)

plot of chunk unnamed-chunk-23

Mean plots of airfares by Diwali

# mean plot of airfares by Diwali
library(gplots)
plotmeans(Price ~ IsDiwali, data = airline.df,
          main = "Mean plot of Airfares by Diwali",
          xlab = "Diwali", ylab = "Airfare (INR)",
          mean.labels = TRUE, frame = TRUE)

plot of chunk unnamed-chunk-25

Part 2: Marketing Research Focus

Effect of Advanced Booking on Ticket Prices

Advance Booking

  • Tickets are cheaper if you buy them in advance.

  • Last minute travel is expensive. (But by how much?)

Advanced Booking: Research Questions

By how much do ticket prices increase for every additional day's delay in purchasing a ticket?

Advanced Booking Data

  1. Date of data collection (Sep 8-19, 2018)

  2. Data was collected for flights departing

    • 2-days in advance
    • 7-days in advance
    • 30-days in advance
    • Day before Diwali (Nov 6, 2018)
    • Day after Diwali (Nov 8, 2018)

Scatterplot of airfares by advanced booking days

plot(AdvancedBookingDays, Price, data = airline.df,
     main = "Scatter Plot of Airfares by Advanced Booking Days",
     xlab = "Advanced Booking Days", ylab = "Airfare (INR)")

plot of chunk unnamed-chunk-27

Distribution of airline by advanced booking days (2 and 30 days)

# preparing subset of data
subData1 <- subset(airline.df, AdvancedBookingDays >= 2 & AdvancedBookingDays < 7)
subData2 <- subset(airline.df, AdvancedBookingDays > 7 & AdvancedBookingDays <= 30)
# merging subsets of data
subData3 <- rbind(subData1, subData2)
# converting into AdvancedBookingDays factor
subData3$AdvancedBookingDays <- as.factor(subData3$AdvancedBookingDays)
# table of advanced booking days by airline
#table(subData3$Airline, subData3$AdvancedBookingDays)

# average airfares and unique flights by airline and advance booking
library(data.table)
dt <- data.table(subData3)
dt[, list(NoOfFlights = length(unique(FlightNumber)),
                   Price = round(mean(Price), 2)), 
            by = list(AdvancedBookingDays, Airline)]
   AdvancedBookingDays   Airline NoOfFlights   Price
1:                   2    IndiGo          16 8628.06
2:                   2       Jet          30 6259.47
3:                   2 Spice Jet           8 6325.75
4:                   2 Air India           9 5613.78
5:                  30    IndiGo          16 2966.12
6:                  30       Jet          27 4129.37
7:                  30 Spice Jet           8 4324.00
8:                  30 Air India           8 4766.88

Interaction plots

interaction.plot(subData3$AdvancedBookingDays, subData3$Airline, subData3$Price,
                 main = "Interaction Plot of Advance Booking Days and Airline", 
                 xlab = "Advance Booking Days", ylab = "Avaerage Airfare (INR)",
                 col=c("red","black","green", "blue"),
                 fixed=TRUE, lwd = 5,
                 leg.bty = "o")

plot of chunk unnamed-chunk-30

Other control variables

Many additioanal factors are likely to also influence airline ticket prices. Our dataset includes the following additional variables:

  1. DEL-BOM vs. BOM-DEL

  2. Flying Time

  3. Seating Quality

    • SeatPitch
    • SeatWidth

Average Airfare by Departure City (DEL or BOM)

# Average Airfare by Departure City (DEL or BOM)
library(gplots)
plotmeans(Price ~ DepartureCityCode, data = airline.df,
          main = "Mean plot of Airfares by Departure City (New Delhi / Mumbai)",
          xlab = "Departure City", ylab = "Airfare (INR)",
          mean.labels = TRUE, frame = TRUE)

plot of chunk unnamed-chunk-32

Summary of flying time

# descriptive statistics of Flying Time
library(psych)
describe(airline.df$FlyingMinutes)[, c(2:5, 8:9)]
     n   mean   sd median min max
X1 305 136.03 4.71    135 125 145

Scatterplot of flying time

# scatterplot of flying time
plot(FlyingMinutes, type = "p", data = airline.df, 
     main = "Scatterplot of Flying Time")
abline(h = mean(airline.df$FlyingMinutes), col = "blue", lwd=3, lty=2)

plot of chunk unnamed-chunk-35

Summary statistics of Seating Comfort (Seat Pitch and Seat Width)

library(psych)
# descriptive statistics of SeatPitch
describe(airline.df$SeatPitch)[, c(2:5, 8:9)]
     n  mean   sd median min max
X1 305 30.26 0.93     30  29  33
# descriptive statistics of SeatWidth
describe(airline.df$SeatWidth)[, c(2:5, 8:9)]
     n  mean   sd median min max
X1 305 17.41 0.49     17  17  18

Correlation and Corrgram

Correlation matrix for the given continuous variables {"Price", "AdvancedBookingDays", "FlyingMinutes", "Capacity", "SeatPitch", "SeatWidth"}

expVar <- airline.df[c("Price", "AdvancedBookingDays", "FlyingMinutes", "Capacity", "SeatPitch", "SeatWidth")]
round(cor(expVar), 2)
                    Price AdvancedBookingDays FlyingMinutes Capacity
Price                1.00               -0.01         -0.02    -0.03
AdvancedBookingDays -0.01                1.00          0.01    -0.01
FlyingMinutes       -0.02                0.01          1.00    -0.32
Capacity            -0.03               -0.01         -0.32     1.00
SeatPitch            0.07               -0.01         -0.03     0.51
SeatWidth           -0.06                0.05         -0.18     0.45
                    SeatPitch SeatWidth
Price                    0.07     -0.06
AdvancedBookingDays     -0.01      0.05
FlyingMinutes           -0.03     -0.18
Capacity                 0.51      0.45
SeatPitch                1.00      0.32
SeatWidth                0.32      1.00

Correlation matrix, along with their significance values based on Pearson's correlation test for the given continuous variables {"Price", "AdvancedBookingDays", "FlyingMinutes", "Capacity", "SeatPitch", "SeatWidth"}

expVar <- airline.df[c("Price", "AdvancedBookingDays", "FlyingMinutes", "Capacity", "SeatPitch", "SeatWidth")]
library(Hmisc)
rcorr(as.matrix(expVar))
                    Price AdvancedBookingDays FlyingMinutes Capacity
Price                1.00               -0.01         -0.02    -0.03
AdvancedBookingDays -0.01                1.00          0.01    -0.01
FlyingMinutes       -0.02                0.01          1.00    -0.32
Capacity            -0.03               -0.01         -0.32     1.00
SeatPitch            0.07               -0.01         -0.03     0.51
SeatWidth           -0.06                0.05         -0.18     0.45
                    SeatPitch SeatWidth
Price                    0.07     -0.06
AdvancedBookingDays     -0.01      0.05
FlyingMinutes           -0.03     -0.18
Capacity                 0.51      0.45
SeatPitch                1.00      0.32
SeatWidth                0.32      1.00

n= 305 


P
                    Price  AdvancedBookingDays FlyingMinutes Capacity
Price                      0.8732              0.7513        0.6513  
AdvancedBookingDays 0.8732                     0.9292        0.8781  
FlyingMinutes       0.7513 0.9292                            0.0000  
Capacity            0.6513 0.8781              0.0000                
SeatPitch           0.1942 0.8052              0.5521        0.0000  
SeatWidth           0.2998 0.3411              0.0013        0.0000  
                    SeatPitch SeatWidth
Price               0.1942    0.2998   
AdvancedBookingDays 0.8052    0.3411   
FlyingMinutes       0.5521    0.0013   
Capacity            0.0000    0.0000   
SeatPitch                     0.0000   
SeatWidth           0.0000             

Corrgram

library(corrgram)
corrgram(expVar, order=TRUE, lower.panel=panel.conf,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram")

plot of chunk unnamed-chunk-41

Corrgram using package PerformanceAnalytics

expVar <- airline.df[c("Price", "AdvancedBookingDays", "FlyingMinutes", "Capacity", "SeatPitch", "SeatWidth")]
library("PerformanceAnalytics")
chart.Correlation(expVar, histogram = TRUE, pch=19)

plot of chunk unnamed-chunk-43

T-Test

Test whether the ticket prices of Mumbai to Delhi flights are more than INR 5000.

One sample t-test

# subset of data having only Bombay to Delhi flights
depBOMData <- subset(airline.df, DepartureCityCode == "BOM")

# one-sample t-test
t.test(depBOMData$Price, mu = 5000)

    One Sample t-test

data:  depBOMData$Price
t = 6.0784, df = 129, p-value = 1.277e-08
alternative hypothesis: true mean is not equal to 5000
95 percent confidence interval:
 5844.506 6659.601
sample estimates:
mean of x 
 6252.054 

The p-value of the test is 1.277e-08, which is less than the significance level alpha = 0.05. Here, we fail to reject our null hypothesis. We can conclude that the average ticket prices of Mumbai to Delhi flights are greater than INR 5000.

Test whether the ticket prices of morning flights are greater than the afternoon flights

Independent t-test

Visualizing departure using boxplot

# boxplot of departure
boxplot(Price ~ Departure, data = airline.df,
        horizontal = TRUE)

plot of chunk unnamed-chunk-46

Do the two populations have the same variances?

varTest1 <- var.test(Price ~ Departure, data = airline.df)
varTest1

    F test to compare two variances

data:  Price by Departure
F = 2.0941, num df = 168, denom df = 135, p-value = 1.074e-05
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 1.513842 2.880244
sample estimates:
ratio of variances 
          2.094081 
# independent t-test (AM-PM effect)
t.test(Price ~ Departure, data = airline.df, var.equal = FALSE, alternative = "greater")

    Welch Two Sample t-test

data:  Price by Departure
t = 1.736, df = 296.58, p-value = 0.0418
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 22.71262      Inf
sample estimates:
mean in group AM mean in group PM 
        5598.893         5140.610 

The p-value of the test is 0.04791, which is less than the significance level alpha = 0.05. We can conclude that the ticket prices of morning flights are greater than the afternoon flights.

Test whether the ticket prices around Diwali is more compared to non-Diwali ticket prices.

Independent t-Test

Visualizing Diwali using boxplot

# boxplot of Diwali
boxplot(Price ~ IsDiwali, data = airline.df,
        horizontal = TRUE)

plot of chunk unnamed-chunk-50

Do the two populations have the same variances?

varTest2 <- var.test(Price ~ IsDiwali, data = airline.df)
varTest2

    F test to compare two variances

data:  Price by IsDiwali
F = 0.87328, num df = 183, denom df = 120, p-value = 0.4069
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.626232 1.204170
sample estimates:
ratio of variances 
         0.8732808 
# independent t-test (Diwali effect)
t.test(Price ~ IsDiwali, data = airline.df, var.equal = TRUE, alternative = "less")

    Two Sample t-test

data:  Price by IsDiwali
t = -3.022, df = 303, p-value = 0.001363
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -378.5134
sample estimates:
mean in group 0 mean in group 1 
       5063.810        5897.479 

The p-value of the test is 0.001363, which is less than the significance level alpha = 0.05. We can conclude that airline charges more price on Diwali compared to non-Diwali

Test whether the ticket prices at weekend is more compared to weekdays ticket prices.

Independent t-Test

Visualizing Weekend prices using boxplot

# boxplot of Airfare versus Weekday/Weekend
boxplot(Price ~ IsWeekend, data = airline.df,
        horizontal = TRUE)

plot of chunk unnamed-chunk-54

Do the two populations have the same variances?

varTest2 <- var.test(Price ~ IsWeekend, data = airline.df)
varTest2

    F test to compare two variances

data:  Price by IsWeekend
F = 2.9781, num df = 263, denom df = 40, p-value = 9.145e-05
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 1.774635 4.596324
sample estimates:
ratio of variances 
          2.978065 
# independent t-test (Weekend effect)
t.test(Price ~ IsWeekend, data = airline.df, var.equal = FALSE, alternative = "greater")

    Welch Two Sample t-test

data:  Price by IsWeekend
t = 3.3951, df = 82.861, p-value = 0.0005276
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 470.5005      Inf
sample estimates:
 mean in group No mean in group Yes 
         5518.549          4596.073 

The p-value of the test is 0.0005276, which is less than the significance level alpha = 0.05. We can conclude that airline charges more price on weekdays compared to weekend.

Test whether the ticket prices on Air India flights are greater than IndiGo flights.

Independent t-test

Visualizing using boxplot

# subset of data having only Air India and IndiGo airline
subAirline <- subset(airline.df, Airline %in% c("Air India", "IndiGo"))
# boxplot of airline
boxplot(Price ~ Airline, data = subAirline,
        horizontal = TRUE)

plot of chunk unnamed-chunk-58

Do the two populations have the same variances?

# subset of data having only Air India and IndiGo airline
subAirline <- subset(airline.df, Airline %in% c("Air India", "IndiGo"))
varTest3 <- var.test(Price ~ Airline, data = subAirline)
varTest3

    F test to compare two variances

data:  Price by Airline
F = 0.83011, num df = 40, denom df = 79, p-value = 0.5233
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.4936707 1.4658591
sample estimates:
ratio of variances 
         0.8301099 
# subset of data having only Air India and IndiGo airline
subAirline <- subset(airline.df, Airline %in% c("Air India", "IndiGo"))
# dependent t-test (Airline effect)
t.test(subAirline$Price ~ Airline, data = subAirline, var.equal = TRUE, alternative = "greater")

    Two Sample t-test

data:  subAirline$Price by Airline
t = 2.6396, df = 119, p-value = 0.004705
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 541.4039      Inf
sample estimates:
mean in group Air India    mean in group IndiGo 
               6335.000                4879.525 

The p-value of the test is 0.004705, which is less than the significance level alpha = 0.05. We can conclude that the ticket prices on Air India are greater than IndiGo.