QUESTION 1

Q1a. Write R code to generate the correlation matrix for the given continuous variables {“Price”, “AdvancedBookingDays”, “FlyingMinutes”, “Capacity”, “SeatPitch”, “SeatWidth”}

airline=read.csv("AirlinePricingData.csv")
airlineselected=airline[,c("Price", "AdvancedBookingDays", "FlyingMinutes", "Capacity", "SeatPitch", "SeatWidth")]
matrix = cov(airlineselected)
matrix
##                             Price AdvancedBookingDays FlyingMinutes
## Price               5703943.28833        -488.6618960  -204.7317192
## AdvancedBookingDays    -488.66190         497.0929034     0.5360224
## FlyingMinutes          -204.73172           0.5360224    22.1370794
## Capacity              -2009.91554          -6.3672239   -48.8078085
## SeatPitch               165.91627          -0.2946829    -0.1498598
## SeatWidth               -69.61279           0.5966782    -0.4212683
##                         Capacity   SeatPitch   SeatWidth
## Price               -2009.915541 165.9162748 -69.6127912
## AdvancedBookingDays    -6.367224  -0.2946829   0.5966782
## FlyingMinutes         -48.807808  -0.1498598  -0.4212683
## Capacity             1049.026467  15.2898835   7.1798857
## SeatPitch              15.289884   0.8685936   0.1456859
## SeatWidth               7.179886   0.1456859   0.2394305

Q1b. Write R code to generate the correlation matrix, along with their significance values, for the given continuous variables {“Price”, “AdvancedBookingDays”, “FlyingMinutes”, “Capacity”, “SeatPitch”, “SeatWidth”}

library(psych)
matrix1=corr.test(airlineselected, use = "complete")
matrix1
## Call:corr.test(x = airlineselected, use = "complete")
## Correlation matrix 
##                     Price AdvancedBookingDays FlyingMinutes Capacity
## Price                1.00               -0.01         -0.02    -0.03
## AdvancedBookingDays -0.01                1.00          0.01    -0.01
## FlyingMinutes       -0.02                0.01          1.00    -0.32
## Capacity            -0.03               -0.01         -0.32     1.00
## SeatPitch            0.07               -0.01         -0.03     0.51
## SeatWidth           -0.06                0.05         -0.18     0.45
##                     SeatPitch SeatWidth
## Price                    0.07     -0.06
## AdvancedBookingDays     -0.01      0.05
## FlyingMinutes           -0.03     -0.18
## Capacity                 0.51      0.45
## SeatPitch                1.00      0.32
## SeatWidth                0.32      1.00
## Sample Size 
## [1] 305
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##                     Price AdvancedBookingDays FlyingMinutes Capacity
## Price                0.00                1.00          1.00        1
## AdvancedBookingDays  0.87                0.00          1.00        1
## FlyingMinutes        0.75                0.93          0.00        0
## Capacity             0.65                0.88          0.00        0
## SeatPitch            0.19                0.81          0.55        0
## SeatWidth            0.30                0.34          0.00        0
##                     SeatPitch SeatWidth
## Price                       1      1.00
## AdvancedBookingDays         1      1.00
## FlyingMinutes               1      0.01
## Capacity                    0      0.00
## SeatPitch                   0      0.00
## SeatWidth                   0      0.00
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

Q1c. Write R code to visualize the correlation matrix in Q1b.

library(corrgram)
corrgram(airlineselected, order=TRUE, lower.panel=panel.conf,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of Airlines")

Q1e. Write R code to generate the following corrgram. (Hint: This is a repeat of the previous question, where you had flexibility to create the corrgram of YOUR choice. Here, you will need to use package PerformanceAnalytics.) plot of chunk unnamed-chunk-6

library("PerformanceAnalytics")
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend
chart.Correlation(airlineselected, histogram = TRUE, pch=19)

QUESTION 2

Q2a. Test whether the ticket prices of Mumbai to Delhi flights are more than INR 5000

mumbai=subset(airline, DepartureCityCode == "BOM", select = "Price")
t.test(mumbai$Price, mu=5000, alternative = "greater")
## 
##  One Sample t-test
## 
## data:  mumbai$Price
## t = 6.0784, df = 129, p-value = 6.385e-09
## alternative hypothesis: true mean is greater than 5000
## 95 percent confidence interval:
##  5910.787      Inf
## sample estimates:
## mean of x 
##  6252.054

Thus, at 5% level of significance, the ticket price of Mumbai to Delhi flights is more than INR 5000.

Q2b. Test whether the ticket prices of morning flights are greater than the afternoon flights

airline$Time=factor(airline$Departure, levels = c("AM","PM"), labels = c("morning","afternoon"))
t.test(airline$Price~airline$Time, alternative = "greater")
## 
##  Welch Two Sample t-test
## 
## data:  airline$Price by airline$Time
## t = 1.736, df = 296.58, p-value = 0.0418
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  22.71262      Inf
## sample estimates:
##   mean in group morning mean in group afternoon 
##                5598.893                5140.610

Thus, at 5% level of significance, the price of morning flights is greater than that of afternoon flight.

Q2c. Test whether the ticket prices around Diwali is more compared to non-Diwali ticket prices.

t.test(airline$Price~airline$IsDiwali, alternative = "less")
## 
##  Welch Two Sample t-test
## 
## data:  airline$Price by airline$IsDiwali
## t = -2.9799, df = 244.52, p-value = 0.001587
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -371.7482
## sample estimates:
## mean in group 0 mean in group 1 
##        5063.810        5897.479

Thus, the price around Diwali is greater than non-Diwali price at 5% level of significance.

Q2d. Test whether the ticket prices on Air India flights are greater than IndiGo flights.

airline$flight=factor(airline$Airline, levels = c("Air India", "IndiGo"), labels = c("Air India", "IndiGo"))
t.test(airline$Price~airline$flight, alternative = "greater")
## 
##  Welch Two Sample t-test
## 
## data:  airline$Price by airline$flight
## t = 2.7205, df = 87.71, p-value = 0.00393
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  566.0833      Inf
## sample estimates:
## mean in group Air India    mean in group IndiGo 
##                6335.000                4879.525

Thus, at 5% level of significance, we reject the null hypothesis and can say that the price of Air India flights is more than IndiGo flights.

QUESTION 3

Q3a. Run a simple linear regression of airline ticket Price on the Advanced Booking Days. Write R code to output the summary of the model.

m=lm(airline$Price~airline$AdvancedBookingDays)
summary(m)
## 
## Call:
## lm(formula = airline$Price ~ airline$AdvancedBookingDays)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2786.5 -1320.8  -688.9   351.2 12594.0 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 5422.959    224.497   24.16   <2e-16 ***
## airline$AdvancedBookingDays   -0.983      6.154   -0.16    0.873    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2392 on 303 degrees of freedom
## Multiple R-squared:  8.422e-05,  Adjusted R-squared:  -0.003216 
## F-statistic: 0.02552 on 1 and 303 DF,  p-value: 0.8732