1. Read the data.
Airlines.df <- read.csv("C:/interships/SixAirlinesData.csv")
View(Airlines.df)
2. Summary of the data.
summary(Airlines.df)
## Airline Aircraft FlightDuration TravelMonth
## AirFrance: 74 AirBus:151 Min. : 1.250 Aug:127
## British :175 Boeing:307 1st Qu.: 4.260 Jul: 75
## Delta : 46 Median : 7.790 Oct:127
## Jet : 61 Mean : 7.578 Sep:129
## Singapore: 40 3rd Qu.:10.620
## Virgin : 62 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 40 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:418 1st Qu.:133.0 1st Qu.:21.00 1st Qu.:31.00
## Median :185.0 Median :36.00 Median :31.00
## Mean :202.3 Mean :33.65 Mean :31.22
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 65
## 1st Qu.:38.00 1st Qu.:18.00 1st Qu.:19.00 1st Qu.: 413
## Median :38.00 Median :18.00 Median :19.00 Median :1242
## Mean :37.91 Mean :17.84 Mean :19.47 Mean :1327
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1909
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 86.0 Min. :0.0200 Min. : 98 Min. : 2.000
## 1st Qu.: 528.8 1st Qu.:0.1000 1st Qu.:166 1st Qu.: 6.000
## Median :1737.0 Median :0.3650 Median :227 Median : 7.000
## Mean :1845.3 Mean :0.4872 Mean :236 Mean : 6.688
## 3rd Qu.:2989.0 3rd Qu.:0.7400 3rd Qu.:279 3rd Qu.: 7.000
## Max. :7414.0 Max. :1.8900 Max. :441 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.28
## Median :1.000 Median :13.21
## Mean :1.633 Mean :14.65
## 3rd Qu.:3.000 3rd Qu.:15.36
## Max. :4.000 Max. :24.69
Here we see that the mean price of economy class is 1327 whereas mean price of premium economy class is 1845.3 which shows that according to the collected data, the prices of premium-economy class is higher than economy class.
3. Boxplot for the price of Premium Economy Class.
boxplot(Airlines.df$PricePremium, xlab="Price",ylab="Premium Economy Class", horizontal = TRUE)

This shows the graphical representation of the summary statistic where the median of the premium-economy class on the basis of its price is 1737.0.
4. Boxplot for the price of Economy class.
boxplot(Airlines.df$PriceEconomy, xlab="Price",ylab="Economy Class", horizontal = TRUE)

This shows the graphical representation of the summary statistic where the median of the economy class on the basis of its price is 1242.
5.Testing the null hypothesis that there is no significant difference in the prices of Premium economy and Economy.
t.test(Airlines.df$PriceEconomy, Airlines.df$PricePremium)
##
## Welch Two Sample t-test
##
## data: Airlines.df$PriceEconomy and Airlines.df$PricePremium
## t = -6.8304, df = 856.56, p-value = 1.605e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -667.0831 -369.2793
## sample estimates:
## mean of x mean of y
## 1327.076 1845.258
After performing the t-test, we see that the p-value< 0.05 which concludes the fact that we reject the null hypothesis: there is no significant different between the price of economy and the price of premium economy. Hence, there is indeed a significant differece between the prices of the two different classes.
6. Regression Analysis to predict the price of Premium Economy Class on the basis of FlightDuration, Airline, Aircraft,Percent Premium Seats and PriceEconomy.
regg<-lm(Airlines.df$PricePremium~Airlines.df$FlightDuration+Airlines.df$Airline+Airlines.df$Aircraft+Airlines.df$PercentPremiumSeats+Airlines.df$PriceEconomy)
summary(regg)
##
## Call:
## lm(formula = Airlines.df$PricePremium ~ Airlines.df$FlightDuration +
## Airlines.df$Airline + Airlines.df$Aircraft + Airlines.df$PercentPremiumSeats +
## Airlines.df$PriceEconomy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -918.24 -216.98 -57.92 102.04 2911.28
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -541.29751 118.58545 -4.565 6.47e-06 ***
## Airlines.df$FlightDuration 54.60305 8.50687 6.419 3.51e-10 ***
## Airlines.df$AirlineBritish 760.48572 90.75621 8.379 6.92e-16 ***
## Airlines.df$AirlineDelta 535.85205 114.10694 4.696 3.53e-06 ***
## Airlines.df$AirlineJet 602.64373 116.45710 5.175 3.45e-07 ***
## Airlines.df$AirlineSingapore 350.22229 118.70600 2.950 0.00334 **
## Airlines.df$AirlineVirgin 1077.86793 92.94457 11.597 < 2e-16 ***
## Airlines.df$AircraftBoeing 12.77156 47.39150 0.269 0.78768
## Airlines.df$PercentPremiumSeats -13.92410 5.48173 -2.540 0.01142 *
## Airlines.df$PriceEconomy 1.18078 0.03867 30.534 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 438.8 on 448 degrees of freedom
## Multiple R-squared: 0.8862, Adjusted R-squared: 0.884
## F-statistic: 387.8 on 9 and 448 DF, p-value: < 2.2e-16
The regression analysis helps us to see and predict the prices of the premium class. Here, we see that there is 88.62% variance in the prices which tells us that it is a pretty good model. The p-value being less than 0.05 makes is a good fit. The variables that are responsible for the changes in the prices of the premium economy are the flight duration,price of economy, percent premium seats, and also the aircraft and airlines people prefer.
7. Regression Analysis to predict the price of Premium Economy Class on the basis of FlightDuration,travel month, pitch premium, width premium,Percent Premium Seats and PriceEconomy.
regg1<-lm(Airlines.df$PricePremium~Airlines.df$FlightDuration+Airlines.df$TravelMonth+Airlines.df$PitchPremium+Airlines.df$WidthPremium+Airlines.df$PercentPremiumSeats+Airlines.df$PriceEconomy)
summary(regg1)
##
## Call:
## lm(formula = Airlines.df$PricePremium ~ Airlines.df$FlightDuration +
## Airlines.df$TravelMonth + Airlines.df$PitchPremium + Airlines.df$WidthPremium +
## Airlines.df$PercentPremiumSeats + Airlines.df$PriceEconomy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -891.9 -263.7 -43.9 122.9 3471.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.047e+03 7.052e+02 -1.484 0.138432
## Airlines.df$FlightDuration 6.339e+01 8.053e+00 7.872 2.64e-14 ***
## Airlines.df$TravelMonthJul -4.012e+01 7.240e+01 -0.554 0.579751
## Airlines.df$TravelMonthOct 2.704e+01 6.177e+01 0.438 0.661745
## Airlines.df$TravelMonthSep -3.236e+00 6.152e+01 -0.053 0.958068
## Airlines.df$PitchPremium -8.219e+01 2.703e+01 -3.041 0.002498 **
## Airlines.df$WidthPremium 1.982e+02 3.283e+01 6.036 3.31e-09 ***
## Airlines.df$PercentPremiumSeats 1.803e+01 4.876e+00 3.697 0.000245 ***
## Airlines.df$PriceEconomy 1.058e+00 2.894e-02 36.568 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 492.1 on 449 degrees of freedom
## Multiple R-squared: 0.8566, Adjusted R-squared: 0.8541
## F-statistic: 335.4 on 8 and 449 DF, p-value: < 2.2e-16
The regression analysis shows that there is 85.66% variance in the prices which tells us that it is a pretty good model. the flight duration as people prefer to fly for less no. of hours, pitch premium, width premium, percent premium seats and economy prices really affect the price of premium. The p-value obtained through regression model is really less which is a good sign.
8. corrgram of all the variables.
library("corrgram", lib.loc="~/R/win-library/3.4")
corrgram(Airlines.df, order=TRUE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Corrgram of Variables")

Through the corrgram it can also be seen that price of premium is correlated with the price of economy which means it varies as the price of economy class varies.Though prices depend on much more other factors, these variables are the real factors which affect on a larger proportion.