Changing working directory
setwd("C:/Users/Makka/Desktop/term 5/dam")
Reading csv and displaying table stats
airline.df = read.csv("C:/Users/Makka/Desktop/term 5/dam/AirlinePricingData.csv",sep = ",")
View(airline.df)
nrow(airline.df)
## [1] 305
ncol(airline.df)
## [1] 25
Attaching
#attach(airline.df)
str(airline.df)
## 'data.frame': 305 obs. of 25 variables:
## $ FlightNumber : Factor w/ 63 levels "6E 129","6E 155",..: 25 32 62 4 61 45 57 16 59 17 ...
## $ Airline : Factor w/ 4 levels "Air India","IndiGo",..: 3 3 4 2 4 3 4 2 4 3 ...
## $ DepartureCityCode : Factor w/ 2 levels "BOM","DEL": 2 1 2 2 1 1 2 2 1 1 ...
## $ ArrivalCityCode : Factor w/ 2 levels "BOM","DEL": 1 2 1 1 2 2 1 1 2 2 ...
## $ DepartureTime : int 225 300 350 455 555 605 635 640 645 700 ...
## $ ArrivalTime : int 435 505 605 710 805 815 850 855 855 915 ...
## $ Departure : Factor w/ 2 levels "AM","PM": 1 1 1 1 1 1 1 1 1 1 ...
## $ FlyingMinutes : int 130 125 135 135 130 130 135 135 130 135 ...
## $ Aircraft : Factor w/ 2 levels "Airbus","Boeing": 2 2 2 1 2 2 2 1 2 2 ...
## $ PlaneModel : Factor w/ 9 levels "738","739","77W",..: 1 1 1 6 1 1 1 6 1 2 ...
## $ Capacity : int 156 156 189 180 189 156 189 180 189 138 ...
## $ SeatPitch : int 30 30 29 30 29 30 29 30 29 30 ...
## $ SeatWidth : num 17 17 17 18 17 17 17 18 17 17 ...
## $ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 2 4 6 7 6 4 6 7 6 4 ...
## $ DateDeparture : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ IsWeekend : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ Price : int 4051 11587 3977 4234 6837 6518 3189 4234 8623 6833 ...
## $ AdvancedBookingDays: int 54 52 48 59 48 52 48 59 48 52 ...
## $ IsDiwali : int 1 1 1 1 1 1 1 1 1 1 ...
## $ DayBeforeDiwali : int 1 1 1 1 1 1 1 1 1 1 ...
## $ DayAfterDiwali : int 0 0 0 0 0 0 0 0 0 0 ...
## $ MetroDeparture : int 1 1 1 1 1 1 1 1 1 1 ...
## $ MetroArrival : int 1 1 1 1 1 1 1 1 1 1 ...
## $ MarketShare : num 15.4 15.4 13.2 39.6 13.2 15.4 13.2 39.6 13.2 15.4 ...
## $ LoadFactor : num 83.3 83.3 94.1 87.2 94.1 ...
Q1a. Write R code to generate the correlation matrix for the given continuous variables {“Price”, “AdvancedBookingDays”, “FlyingMinutes”, “Capacity”, “SeatPitch”, “SeatWidth”}
matrix<-airline.df[,c(17:18,8,11:13)]
corrMatrix<-round(cor(matrix),2)
corrMatrix
## Price AdvancedBookingDays FlyingMinutes Capacity
## Price 1.00 -0.01 -0.02 -0.03
## AdvancedBookingDays -0.01 1.00 0.01 -0.01
## FlyingMinutes -0.02 0.01 1.00 -0.32
## Capacity -0.03 -0.01 -0.32 1.00
## SeatPitch 0.07 -0.01 -0.03 0.51
## SeatWidth -0.06 0.05 -0.18 0.45
## SeatPitch SeatWidth
## Price 0.07 -0.06
## AdvancedBookingDays -0.01 0.05
## FlyingMinutes -0.03 -0.18
## Capacity 0.51 0.45
## SeatPitch 1.00 0.32
## SeatWidth 0.32 1.00
Q1b. Write R code to generate the correlation matrix, along with their significance values, for the given continuous variables {“Price”, “AdvancedBookingDays”, “FlyingMinutes”, “Capacity”, “SeatPitch”, “SeatWidth”}
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
rcorr(corrMatrix,type="pearson")
## Price AdvancedBookingDays FlyingMinutes Capacity
## Price 1.00 -0.22 -0.10 -0.30
## AdvancedBookingDays -0.22 1.00 -0.07 -0.29
## FlyingMinutes -0.10 -0.07 1.00 -0.77
## Capacity -0.30 -0.29 -0.77 1.00
## SeatPitch -0.23 -0.41 -0.46 0.71
## SeatWidth -0.40 -0.21 -0.63 0.68
## SeatPitch SeatWidth
## Price -0.23 -0.40
## AdvancedBookingDays -0.41 -0.21
## FlyingMinutes -0.46 -0.63
## Capacity 0.71 0.68
## SeatPitch 1.00 0.44
## SeatWidth 0.44 1.00
##
## n= 6
##
##
## P
## Price AdvancedBookingDays FlyingMinutes Capacity
## Price 0.6731 0.8545 0.5618
## AdvancedBookingDays 0.6731 0.8991 0.5789
## FlyingMinutes 0.8545 0.8991 0.0754
## Capacity 0.5618 0.5789 0.0754
## SeatPitch 0.6636 0.4228 0.3645 0.1130
## SeatWidth 0.4305 0.6956 0.1832 0.1358
## SeatPitch SeatWidth
## Price 0.6636 0.4305
## AdvancedBookingDays 0.4228 0.6956
## FlyingMinutes 0.3645 0.1832
## Capacity 0.1130 0.1358
## SeatPitch 0.3778
## SeatWidth 0.3778
Q1c. Write R code to visualize the correlation matrix in Q1b.
library(corrgram)
##
## Attaching package: 'corrgram'
## The following object is masked from 'package:lattice':
##
## panel.fill
corrgram(matrix, order=TRUE, lower.panel=panel.conf, upper.panel=panel.pie, text.panel=panel.txt, main="Corrgram of Airline data")

Q2a. Test whether the ticket prices of Mumbai to Delhi flights are more than INR 5000.
print("Null Hypothesis: Ticket prices of Mumbai to Delhi flights are not more than INR 5000")
## [1] "Null Hypothesis: Ticket prices of Mumbai to Delhi flights are not more than INR 5000"
oneSampleTest <- t.test(matrix$Price,mu = 5000, alternative = "greater")
oneSampleTest
##
## One Sample t-test
##
## data: matrix$Price
## t = 2.8851, df = 304, p-value = 0.002096
## alternative hypothesis: true mean is greater than 5000
## 95 percent confidence interval:
## 5168.918 Inf
## sample estimates:
## mean of x
## 5394.544
Result: As P-value is <0.05 at 95% confidence, we can reject the null hypothesis that ticket prices of Mumbai to Delhi flights are not more than INR 5000
Q2b. Test whether the ticket prices of morning flights are greater than the afternoon flights
print("Null Hypothesis: Ticket prices of Mumbai to Delhi morning flights are not greater than afternoon flights")
## [1] "Null Hypothesis: Ticket prices of Mumbai to Delhi morning flights are not greater than afternoon flights"
library(MASS)
library(psych)
##
## Attaching package: 'psych'
## The following object is masked from 'package:Hmisc':
##
## describe
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
boxplot(airline.df$Price~airline.df$Departure,main="Ticket prices from Mumbai to Delhi",xlab="Time of the day")

logTransPrices = log(airline.df$Price)
pairedTwoSample<-t.test(logTransPrices~airline.df$Departure,data = airline.df,alternative="greater")
pairedTwoSample
##
## Welch Two Sample t-test
##
## data: logTransPrices by airline.df$Departure
## t = 1.287, df = 302.99, p-value = 0.09955
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -0.01462919 Inf
## sample estimates:
## mean in group AM mean in group PM
## 8.544260 8.492387
Result: As P-value is >0.05 at 95% confidence, we fail to reject the null hypothesis that ticket prices of Mumbai to Delhi morning flights are not greater than afternoon flights
Q2c.Test whether the ticket prices around Diwali is more compared to non-Diwali ticket prices.
print("Null Hypothesis: Ticket prices of Mumbai to Delhi Diwali flights are not greater than Non Diwali flights")
## [1] "Null Hypothesis: Ticket prices of Mumbai to Delhi Diwali flights are not greater than Non Diwali flights"
library(MASS)
library(psych)
boxplot(airline.df$Price~airline.df$IsDiwali,main="Ticket prices from Mumbai to Delhi",xlab="Diwali & Non diwali Days")

logTransPrices = log(airline.df$Price)
diwaliPairedSample<-t.test(logTransPrices~airline.df$IsDiwali,data = airline.df,alternative="greater")
diwaliPairedSample
##
## Welch Two Sample t-test
##
## data: logTransPrices by airline.df$IsDiwali
## t = -3.6909, df = 252.4, p-value = 0.9999
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -0.2208012 Inf
## sample estimates:
## mean in group 0 mean in group 1
## 8.460605 8.613167
Result: As P-value is >0.05 at 95% confidence for t<0, we reject the null hypothesis that ticket prices of Mumbai to Delhi Diwali flights are not greater than Non Diwali flights
Q2d. Test whether the ticket prices on Air India flights are greater than IndiGo flights
print("Null Hypothesis: Ticket prices of Mumbai to Delhi flights by Air India are not greater than Indigo flights")
## [1] "Null Hypothesis: Ticket prices of Mumbai to Delhi flights by Air India are not greater than Indigo flights"
library(MASS)
library(psych)
newAirlinesDF <- subset(airline.df, airline.df$Airline %in% c("Air India", "IndiGo"))
newAirlinesDF$Airline<-droplevels(newAirlinesDF$Airline)
boxplot(newAirlinesDF$Price~newAirlinesDF$Airline,main="Ticket prices from Mumbai to Delhi",xlab="Airline")

logTransPrices = log(newAirlinesDF$Price)
airlinePairedSample<-t.test(logTransPrices~newAirlinesDF$Airline,data = airline.df,alternative="greater")
airlinePairedSample
##
## Welch Two Sample t-test
##
## data: logTransPrices by newAirlinesDF$Airline
## t = 4.4266, df = 97.301, p-value = 1.252e-05
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.1885682 Inf
## sample estimates:
## mean in group Air India mean in group IndiGo
## 8.691907 8.390123
Result: As P-value is <0.05 at 95% confidence, we reject the null hypothesis that ticket prices of Mumbai to Delhi flights by Air India are not greater than Indigo flights
Q3a. Run a simple linear regression of airline ticket Price on the Advanced Booking Days. Write R code to output the summary of the model.
priceAdvancedBooking <- lm(airline.df$Price~airline.df$AdvancedBookingDays,data = airline.df)
summary(priceAdvancedBooking)
##
## Call:
## lm(formula = airline.df$Price ~ airline.df$AdvancedBookingDays,
## data = airline.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2786.5 -1320.8 -688.9 351.2 12594.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5422.959 224.497 24.16 <2e-16 ***
## airline.df$AdvancedBookingDays -0.983 6.154 -0.16 0.873
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2392 on 303 degrees of freedom
## Multiple R-squared: 8.422e-05, Adjusted R-squared: -0.003216
## F-statistic: 0.02552 on 1 and 303 DF, p-value: 0.8732