Q1a. Write R code to generate the correlation matrix for the given continuous variables {“Price”, “AdvancedBookingDays”, “FlyingMinutes”, “Capacity”, “SeatPitch”, “SeatWidth”}
airline=read.csv("AirlinePricingData.csv")
airlineselected=airline[,c("Price", "AdvancedBookingDays", "FlyingMinutes", "Capacity", "SeatPitch", "SeatWidth")]
matrix = cov(airlineselected)
matrix
## Price AdvancedBookingDays FlyingMinutes
## Price 5703943.28833 -488.6618960 -204.7317192
## AdvancedBookingDays -488.66190 497.0929034 0.5360224
## FlyingMinutes -204.73172 0.5360224 22.1370794
## Capacity -2009.91554 -6.3672239 -48.8078085
## SeatPitch 165.91627 -0.2946829 -0.1498598
## SeatWidth -69.61279 0.5966782 -0.4212683
## Capacity SeatPitch SeatWidth
## Price -2009.915541 165.9162748 -69.6127912
## AdvancedBookingDays -6.367224 -0.2946829 0.5966782
## FlyingMinutes -48.807808 -0.1498598 -0.4212683
## Capacity 1049.026467 15.2898835 7.1798857
## SeatPitch 15.289884 0.8685936 0.1456859
## SeatWidth 7.179886 0.1456859 0.2394305
Q1b. Write R code to generate the correlation matrix, along with their significance values, for the given continuous variables {“Price”, “AdvancedBookingDays”, “FlyingMinutes”, “Capacity”, “SeatPitch”, “SeatWidth”}
library(psych)
matrix1=corr.test(airlineselected, use = "complete")
matrix1
## Call:corr.test(x = airlineselected, use = "complete")
## Correlation matrix
## Price AdvancedBookingDays FlyingMinutes Capacity
## Price 1.00 -0.01 -0.02 -0.03
## AdvancedBookingDays -0.01 1.00 0.01 -0.01
## FlyingMinutes -0.02 0.01 1.00 -0.32
## Capacity -0.03 -0.01 -0.32 1.00
## SeatPitch 0.07 -0.01 -0.03 0.51
## SeatWidth -0.06 0.05 -0.18 0.45
## SeatPitch SeatWidth
## Price 0.07 -0.06
## AdvancedBookingDays -0.01 0.05
## FlyingMinutes -0.03 -0.18
## Capacity 0.51 0.45
## SeatPitch 1.00 0.32
## SeatWidth 0.32 1.00
## Sample Size
## [1] 305
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## Price AdvancedBookingDays FlyingMinutes Capacity
## Price 0.00 1.00 1.00 1
## AdvancedBookingDays 0.87 0.00 1.00 1
## FlyingMinutes 0.75 0.93 0.00 0
## Capacity 0.65 0.88 0.00 0
## SeatPitch 0.19 0.81 0.55 0
## SeatWidth 0.30 0.34 0.00 0
## SeatPitch SeatWidth
## Price 1 1.00
## AdvancedBookingDays 1 1.00
## FlyingMinutes 1 0.01
## Capacity 0 0.00
## SeatPitch 0 0.00
## SeatWidth 0 0.00
##
## To see confidence intervals of the correlations, print with the short=FALSE option
Q1c. Write R code to visualize the correlation matrix in Q1b.
library(corrgram)
corrgram(airlineselected, order=TRUE, lower.panel=panel.conf,
upper.panel=panel.pie, text.panel=panel.txt,
main="Corrgram of Airlines")
Q1e. Write R code to generate the following corrgram. (Hint: This is a repeat of the previous question, where you had flexibility to create the corrgram of YOUR choice. Here, you will need to use package PerformanceAnalytics.) plot of chunk unnamed-chunk-6
library("PerformanceAnalytics")
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
chart.Correlation(airlineselected, histogram = TRUE, pch=19)
Q2a. Test whether the ticket prices of Mumbai to Delhi flights are more than INR 5000
mumbai=subset(airline, DepartureCityCode == "BOM", select = "Price")
t.test(mumbai$Price, mu=5000, alternative = "greater")
##
## One Sample t-test
##
## data: mumbai$Price
## t = 6.0784, df = 129, p-value = 6.385e-09
## alternative hypothesis: true mean is greater than 5000
## 95 percent confidence interval:
## 5910.787 Inf
## sample estimates:
## mean of x
## 6252.054
Thus, at 5% level of significance, the ticket price of Mumbai to Delhi flights is more than INR 5000.
Q2b. Test whether the ticket prices of morning flights are greater than the afternoon flights
airline$Time=factor(airline$Departure, levels = c("AM","PM"), labels = c("morning","afternoon"))
t.test(airline$Price~airline$Time, alternative = "greater")
##
## Welch Two Sample t-test
##
## data: airline$Price by airline$Time
## t = 1.736, df = 296.58, p-value = 0.0418
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 22.71262 Inf
## sample estimates:
## mean in group morning mean in group afternoon
## 5598.893 5140.610
Thus, at 5% level of significance, the price of morning flights is greater than that of afternoon flight.
Q2c. Test whether the ticket prices around Diwali is more compared to non-Diwali ticket prices.
t.test(airline$Price~airline$IsDiwali, alternative = "less")
##
## Welch Two Sample t-test
##
## data: airline$Price by airline$IsDiwali
## t = -2.9799, df = 244.52, p-value = 0.001587
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -371.7482
## sample estimates:
## mean in group 0 mean in group 1
## 5063.810 5897.479
Thus, the price around Diwali is greater than non-Diwali price at 5% level of significance.
Q2d. Test whether the ticket prices on Air India flights are greater than IndiGo flights.
airline$flight=factor(airline$Airline, levels = c("Air India", "IndiGo"), labels = c("Air India", "IndiGo"))
t.test(airline$Price~airline$flight, alternative = "greater")
##
## Welch Two Sample t-test
##
## data: airline$Price by airline$flight
## t = 2.7205, df = 87.71, p-value = 0.00393
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 566.0833 Inf
## sample estimates:
## mean in group Air India mean in group IndiGo
## 6335.000 4879.525
Thus, at 5% level of significance, we reject the null hypothesis and can say that the price of Air India flights is more than IndiGo flights.
Q3a. Run a simple linear regression of airline ticket Price on the Advanced Booking Days. Write R code to output the summary of the model.
m=lm(airline$Price~airline$AdvancedBookingDays)
summary(m)
##
## Call:
## lm(formula = airline$Price ~ airline$AdvancedBookingDays)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2786.5 -1320.8 -688.9 351.2 12594.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5422.959 224.497 24.16 <2e-16 ***
## airline$AdvancedBookingDays -0.983 6.154 -0.16 0.873
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2392 on 303 degrees of freedom
## Multiple R-squared: 8.422e-05, Adjusted R-squared: -0.003216
## F-statistic: 0.02552 on 1 and 303 DF, p-value: 0.8732