1. Read the data.

Airlines.df <- read.csv("C:/interships/SixAirlinesData.csv")
View(Airlines.df)

2. Summary of the data.

summary(Airlines.df)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69

Here we see that the mean price of economy class is 1327 whereas mean price of premium economy class is 1845.3 which shows that according to the collected data, the prices of premium-economy class is higher than economy class.

3. Boxplot for the price of Premium Economy Class.

boxplot(Airlines.df$PricePremium, xlab="Price",ylab="Premium Economy Class", horizontal = TRUE)

This shows the graphical representation of the summary statistic where the median of the premium-economy class on the basis of its price is 1737.0.

4. Boxplot for the price of Economy class.

boxplot(Airlines.df$PriceEconomy, xlab="Price",ylab="Economy Class", horizontal = TRUE)

This shows the graphical representation of the summary statistic where the median of the economy class on the basis of its price is 1242.

5.Testing the null hypothesis that there is no significant difference in the prices of Premium economy and Economy.

t.test(Airlines.df$PriceEconomy, Airlines.df$PricePremium)
## 
##  Welch Two Sample t-test
## 
## data:  Airlines.df$PriceEconomy and Airlines.df$PricePremium
## t = -6.8304, df = 856.56, p-value = 1.605e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -667.0831 -369.2793
## sample estimates:
## mean of x mean of y 
##  1327.076  1845.258

After performing the t-test, we see that the p-value< 0.05 which concludes the fact that we reject the null hypothesis: there is no significant different between the price of economy and the price of premium economy. Hence, there is indeed a significant differece between the prices of the two different classes.

6. Regression Analysis to predict the price of Premium Economy Class on the basis of FlightDuration, Airline, Aircraft,Percent Premium Seats and PriceEconomy.

regg<-lm(Airlines.df$PricePremium~Airlines.df$FlightDuration+Airlines.df$Airline+Airlines.df$Aircraft+Airlines.df$PercentPremiumSeats+Airlines.df$PriceEconomy)
summary(regg)
## 
## Call:
## lm(formula = Airlines.df$PricePremium ~ Airlines.df$FlightDuration + 
##     Airlines.df$Airline + Airlines.df$Aircraft + Airlines.df$PercentPremiumSeats + 
##     Airlines.df$PriceEconomy)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -918.24 -216.98  -57.92  102.04 2911.28 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -541.29751  118.58545  -4.565 6.47e-06 ***
## Airlines.df$FlightDuration        54.60305    8.50687   6.419 3.51e-10 ***
## Airlines.df$AirlineBritish       760.48572   90.75621   8.379 6.92e-16 ***
## Airlines.df$AirlineDelta         535.85205  114.10694   4.696 3.53e-06 ***
## Airlines.df$AirlineJet           602.64373  116.45710   5.175 3.45e-07 ***
## Airlines.df$AirlineSingapore     350.22229  118.70600   2.950  0.00334 ** 
## Airlines.df$AirlineVirgin       1077.86793   92.94457  11.597  < 2e-16 ***
## Airlines.df$AircraftBoeing        12.77156   47.39150   0.269  0.78768    
## Airlines.df$PercentPremiumSeats  -13.92410    5.48173  -2.540  0.01142 *  
## Airlines.df$PriceEconomy           1.18078    0.03867  30.534  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 438.8 on 448 degrees of freedom
## Multiple R-squared:  0.8862, Adjusted R-squared:  0.884 
## F-statistic: 387.8 on 9 and 448 DF,  p-value: < 2.2e-16

The regression analysis helps us to see and predict the prices of the premium class. Here, we see that there is 88.62% variance in the prices which tells us that it is a pretty good model. The p-value being less than 0.05 makes is a good fit. The variables that are responsible for the changes in the prices of the premium economy are the flight duration,price of economy, percent premium seats, and also the aircraft and airlines people prefer.

7. Regression Analysis to predict the price of Premium Economy Class on the basis of FlightDuration,travel month, pitch premium, width premium,Percent Premium Seats and PriceEconomy.

regg1<-lm(Airlines.df$PricePremium~Airlines.df$FlightDuration+Airlines.df$TravelMonth+Airlines.df$PitchPremium+Airlines.df$WidthPremium+Airlines.df$PercentPremiumSeats+Airlines.df$PriceEconomy)
summary(regg1)
## 
## Call:
## lm(formula = Airlines.df$PricePremium ~ Airlines.df$FlightDuration + 
##     Airlines.df$TravelMonth + Airlines.df$PitchPremium + Airlines.df$WidthPremium + 
##     Airlines.df$PercentPremiumSeats + Airlines.df$PriceEconomy)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -891.9 -263.7  -43.9  122.9 3471.7 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -1.047e+03  7.052e+02  -1.484 0.138432    
## Airlines.df$FlightDuration       6.339e+01  8.053e+00   7.872 2.64e-14 ***
## Airlines.df$TravelMonthJul      -4.012e+01  7.240e+01  -0.554 0.579751    
## Airlines.df$TravelMonthOct       2.704e+01  6.177e+01   0.438 0.661745    
## Airlines.df$TravelMonthSep      -3.236e+00  6.152e+01  -0.053 0.958068    
## Airlines.df$PitchPremium        -8.219e+01  2.703e+01  -3.041 0.002498 ** 
## Airlines.df$WidthPremium         1.982e+02  3.283e+01   6.036 3.31e-09 ***
## Airlines.df$PercentPremiumSeats  1.803e+01  4.876e+00   3.697 0.000245 ***
## Airlines.df$PriceEconomy         1.058e+00  2.894e-02  36.568  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 492.1 on 449 degrees of freedom
## Multiple R-squared:  0.8566, Adjusted R-squared:  0.8541 
## F-statistic: 335.4 on 8 and 449 DF,  p-value: < 2.2e-16

The regression analysis shows that there is 85.66% variance in the prices which tells us that it is a pretty good model. the flight duration as people prefer to fly for less no. of hours, pitch premium, width premium, percent premium seats and economy prices really affect the price of premium. The p-value obtained through regression model is really less which is a good sign.

8. corrgram of all the variables.

library("corrgram", lib.loc="~/R/win-library/3.4")
corrgram(Airlines.df, order=TRUE, lower.panel=panel.shade,
  upper.panel=panel.pie, text.panel=panel.txt,
  main="Corrgram of Variables")

Through the corrgram it can also be seen that price of premium is correlated with the price of economy which means it varies as the price of economy class varies.Though prices depend on much more other factors, these variables are the real factors which affect on a larger proportion.