Changing working directory

setwd("C:/Users/Makka/Desktop/term 5/dam")

Reading csv and displaying table stats

airline.df = read.csv("C:/Users/Makka/Desktop/term 5/dam/AirlinePricingData.csv",sep = ",")
View(airline.df)
nrow(airline.df)
## [1] 305
ncol(airline.df)
## [1] 25

Attaching

#attach(airline.df)
str(airline.df)
## 'data.frame':    305 obs. of  25 variables:
##  $ FlightNumber       : Factor w/ 63 levels "6E 129","6E 155",..: 25 32 62 4 61 45 57 16 59 17 ...
##  $ Airline            : Factor w/ 4 levels "Air India","IndiGo",..: 3 3 4 2 4 3 4 2 4 3 ...
##  $ DepartureCityCode  : Factor w/ 2 levels "BOM","DEL": 2 1 2 2 1 1 2 2 1 1 ...
##  $ ArrivalCityCode    : Factor w/ 2 levels "BOM","DEL": 1 2 1 1 2 2 1 1 2 2 ...
##  $ DepartureTime      : int  225 300 350 455 555 605 635 640 645 700 ...
##  $ ArrivalTime        : int  435 505 605 710 805 815 850 855 855 915 ...
##  $ Departure          : Factor w/ 2 levels "AM","PM": 1 1 1 1 1 1 1 1 1 1 ...
##  $ FlyingMinutes      : int  130 125 135 135 130 130 135 135 130 135 ...
##  $ Aircraft           : Factor w/ 2 levels "Airbus","Boeing": 2 2 2 1 2 2 2 1 2 2 ...
##  $ PlaneModel         : Factor w/ 9 levels "738","739","77W",..: 1 1 1 6 1 1 1 6 1 2 ...
##  $ Capacity           : int  156 156 189 180 189 156 189 180 189 138 ...
##  $ SeatPitch          : int  30 30 29 30 29 30 29 30 29 30 ...
##  $ SeatWidth          : num  17 17 17 18 17 17 17 18 17 17 ...
##  $ DataCollectionDate : Factor w/ 7 levels "Sep 10 2018",..: 2 4 6 7 6 4 6 7 6 4 ...
##  $ DateDeparture      : Factor w/ 20 levels "Nov 6 2018","Nov 8 2018",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ IsWeekend          : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Price              : int  4051 11587 3977 4234 6837 6518 3189 4234 8623 6833 ...
##  $ AdvancedBookingDays: int  54 52 48 59 48 52 48 59 48 52 ...
##  $ IsDiwali           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DayBeforeDiwali    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DayAfterDiwali     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ MetroDeparture     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ MetroArrival       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ MarketShare        : num  15.4 15.4 13.2 39.6 13.2 15.4 13.2 39.6 13.2 15.4 ...
##  $ LoadFactor         : num  83.3 83.3 94.1 87.2 94.1 ...

Q1a. Write R code to generate the correlation matrix for the given continuous variables {“Price”, “AdvancedBookingDays”, “FlyingMinutes”, “Capacity”, “SeatPitch”, “SeatWidth”}

matrix<-airline.df[,c(17:18,8,11:13)]
corrMatrix<-round(cor(matrix),2)
corrMatrix
##                     Price AdvancedBookingDays FlyingMinutes Capacity
## Price                1.00               -0.01         -0.02    -0.03
## AdvancedBookingDays -0.01                1.00          0.01    -0.01
## FlyingMinutes       -0.02                0.01          1.00    -0.32
## Capacity            -0.03               -0.01         -0.32     1.00
## SeatPitch            0.07               -0.01         -0.03     0.51
## SeatWidth           -0.06                0.05         -0.18     0.45
##                     SeatPitch SeatWidth
## Price                    0.07     -0.06
## AdvancedBookingDays     -0.01      0.05
## FlyingMinutes           -0.03     -0.18
## Capacity                 0.51      0.45
## SeatPitch                1.00      0.32
## SeatWidth                0.32      1.00

Q1b. Write R code to generate the correlation matrix, along with their significance values, for the given continuous variables {“Price”, “AdvancedBookingDays”, “FlyingMinutes”, “Capacity”, “SeatPitch”, “SeatWidth”}

library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
rcorr(corrMatrix,type="pearson")
##                     Price AdvancedBookingDays FlyingMinutes Capacity
## Price                1.00               -0.22         -0.10    -0.30
## AdvancedBookingDays -0.22                1.00         -0.07    -0.29
## FlyingMinutes       -0.10               -0.07          1.00    -0.77
## Capacity            -0.30               -0.29         -0.77     1.00
## SeatPitch           -0.23               -0.41         -0.46     0.71
## SeatWidth           -0.40               -0.21         -0.63     0.68
##                     SeatPitch SeatWidth
## Price                   -0.23     -0.40
## AdvancedBookingDays     -0.41     -0.21
## FlyingMinutes           -0.46     -0.63
## Capacity                 0.71      0.68
## SeatPitch                1.00      0.44
## SeatWidth                0.44      1.00
## 
## n= 6 
## 
## 
## P
##                     Price  AdvancedBookingDays FlyingMinutes Capacity
## Price                      0.6731              0.8545        0.5618  
## AdvancedBookingDays 0.6731                     0.8991        0.5789  
## FlyingMinutes       0.8545 0.8991                            0.0754  
## Capacity            0.5618 0.5789              0.0754                
## SeatPitch           0.6636 0.4228              0.3645        0.1130  
## SeatWidth           0.4305 0.6956              0.1832        0.1358  
##                     SeatPitch SeatWidth
## Price               0.6636    0.4305   
## AdvancedBookingDays 0.4228    0.6956   
## FlyingMinutes       0.3645    0.1832   
## Capacity            0.1130    0.1358   
## SeatPitch                     0.3778   
## SeatWidth           0.3778

Q1c. Write R code to visualize the correlation matrix in Q1b.

library(corrgram)
## 
## Attaching package: 'corrgram'
## The following object is masked from 'package:lattice':
## 
##     panel.fill
corrgram(matrix, order=TRUE, lower.panel=panel.conf, upper.panel=panel.pie, text.panel=panel.txt, main="Corrgram of Airline data")

Q1e. Write R code to generate the following corrgram. (Hint: This is a repeat of the previous question, where you had flexibility to create the corrgram of YOUR choice. Here, you will need to use package PerformanceAnalytics.)

library("PerformanceAnalytics")
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend
chart.Correlation(matrix, histogram = TRUE, pch=19)

Q2a. Test whether the ticket prices of Mumbai to Delhi flights are more than INR 5000.

print("Null Hypothesis: Ticket prices of Mumbai to Delhi flights are not more than INR 5000")
## [1] "Null Hypothesis: Ticket prices of Mumbai to Delhi flights are not more than INR 5000"
oneSampleTest <- t.test(matrix$Price,mu = 5000, alternative = "greater")
oneSampleTest
## 
##  One Sample t-test
## 
## data:  matrix$Price
## t = 2.8851, df = 304, p-value = 0.002096
## alternative hypothesis: true mean is greater than 5000
## 95 percent confidence interval:
##  5168.918      Inf
## sample estimates:
## mean of x 
##  5394.544

Result: As P-value is <0.05 at 95% confidence, we can reject the null hypothesis that ticket prices of Mumbai to Delhi flights are not more than INR 5000

Q2b. Test whether the ticket prices of morning flights are greater than the afternoon flights

print("Null Hypothesis: Ticket prices of Mumbai to Delhi morning flights are not greater than afternoon flights")
## [1] "Null Hypothesis: Ticket prices of Mumbai to Delhi morning flights are not greater than afternoon flights"
library(MASS)
library(psych)
## 
## Attaching package: 'psych'
## The following object is masked from 'package:Hmisc':
## 
##     describe
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
boxplot(airline.df$Price~airline.df$Departure,main="Ticket prices from Mumbai to Delhi",xlab="Time of the day")

logTransPrices = log(airline.df$Price)
pairedTwoSample<-t.test(logTransPrices~airline.df$Departure,data = airline.df,alternative="greater")
pairedTwoSample
## 
##  Welch Two Sample t-test
## 
## data:  logTransPrices by airline.df$Departure
## t = 1.287, df = 302.99, p-value = 0.09955
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.01462919         Inf
## sample estimates:
## mean in group AM mean in group PM 
##         8.544260         8.492387

Result: As P-value is >0.05 at 95% confidence, we fail to reject the null hypothesis that ticket prices of Mumbai to Delhi morning flights are not greater than afternoon flights

Q2c.Test whether the ticket prices around Diwali is more compared to non-Diwali ticket prices.

print("Null Hypothesis: Ticket prices of Mumbai to Delhi Diwali flights are not greater than Non Diwali flights")
## [1] "Null Hypothesis: Ticket prices of Mumbai to Delhi Diwali flights are not greater than Non Diwali flights"
library(MASS)
library(psych)
boxplot(airline.df$Price~airline.df$IsDiwali,main="Ticket prices from Mumbai to Delhi",xlab="Diwali & Non diwali Days")

logTransPrices = log(airline.df$Price)
diwaliPairedSample<-t.test(logTransPrices~airline.df$IsDiwali,data = airline.df,alternative="greater")
diwaliPairedSample
## 
##  Welch Two Sample t-test
## 
## data:  logTransPrices by airline.df$IsDiwali
## t = -3.6909, df = 252.4, p-value = 0.9999
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.2208012        Inf
## sample estimates:
## mean in group 0 mean in group 1 
##        8.460605        8.613167

Result: As P-value is >0.05 at 95% confidence for t<0, we reject the null hypothesis that ticket prices of Mumbai to Delhi Diwali flights are not greater than Non Diwali flights

Q2d. Test whether the ticket prices on Air India flights are greater than IndiGo flights

print("Null Hypothesis: Ticket prices of Mumbai to Delhi flights by Air India are not greater than Indigo flights")
## [1] "Null Hypothesis: Ticket prices of Mumbai to Delhi flights by Air India are not greater than Indigo flights"
library(MASS)
library(psych)
newAirlinesDF <- subset(airline.df, airline.df$Airline %in% c("Air India", "IndiGo"))
newAirlinesDF$Airline<-droplevels(newAirlinesDF$Airline)
boxplot(newAirlinesDF$Price~newAirlinesDF$Airline,main="Ticket prices from Mumbai to Delhi",xlab="Airline")

logTransPrices = log(newAirlinesDF$Price)
airlinePairedSample<-t.test(logTransPrices~newAirlinesDF$Airline,data = airline.df,alternative="greater")
airlinePairedSample
## 
##  Welch Two Sample t-test
## 
## data:  logTransPrices by newAirlinesDF$Airline
## t = 4.4266, df = 97.301, p-value = 1.252e-05
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.1885682       Inf
## sample estimates:
## mean in group Air India    mean in group IndiGo 
##                8.691907                8.390123

Result: As P-value is <0.05 at 95% confidence, we reject the null hypothesis that ticket prices of Mumbai to Delhi flights by Air India are not greater than Indigo flights

Q3a. Run a simple linear regression of airline ticket Price on the Advanced Booking Days. Write R code to output the summary of the model.

priceAdvancedBooking <- lm(airline.df$Price~airline.df$AdvancedBookingDays,data = airline.df)
summary(priceAdvancedBooking)
## 
## Call:
## lm(formula = airline.df$Price ~ airline.df$AdvancedBookingDays, 
##     data = airline.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2786.5 -1320.8  -688.9   351.2 12594.0 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    5422.959    224.497   24.16   <2e-16 ***
## airline.df$AdvancedBookingDays   -0.983      6.154   -0.16    0.873    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2392 on 303 degrees of freedom
## Multiple R-squared:  8.422e-05,  Adjusted R-squared:  -0.003216 
## F-statistic: 0.02552 on 1 and 303 DF,  p-value: 0.8732