MINI PROJECT-Airline Pricing Analysis

Setting directory

## Parsed with column specification:
## cols(
##   Airline = col_character(),
##   Aircraft = col_character(),
##   FlightDuration = col_double(),
##   TravelMonth = col_character(),
##   IsInternational = col_character(),
##   SeatsEconomy = col_integer(),
##   SeatsPremium = col_integer(),
##   PitchEconomy = col_integer(),
##   PitchPremium = col_integer(),
##   WidthEconomy = col_integer(),
##   WidthPremium = col_integer(),
##   PriceEconomy = col_integer(),
##   PricePremium = col_integer(),
##   PriceRelative = col_double(),
##   SeatsTotal = col_integer(),
##   PitchDifference = col_integer(),
##   WidthDifference = col_integer(),
##   PercentPremiumSeats = col_double()
## )

SUMMARIZING THE DATSET

##    Airline            Aircraft         FlightDuration   TravelMonth       
##  Length:458         Length:458         Min.   : 1.250   Length:458        
##  Class :character   Class :character   1st Qu.: 4.260   Class :character  
##  Mode  :character   Mode  :character   Median : 7.790   Mode  :character  
##                                        Mean   : 7.578                     
##                                        3rd Qu.:10.620                     
##                                        Max.   :14.660                     
##  IsInternational     SeatsEconomy    SeatsPremium    PitchEconomy  
##  Length:458         Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  Class :character   1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##  Mode  :character   Median :185.0   Median :36.00   Median :31.00  
##                     Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                     3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                     Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69

PLOTS

Visualisation of each variable

Corrgram -Variance-Covariance Matrix

hypothesis: the percent premium seats are higher in international than in domestic

## 
##  Welch Two Sample t-test
## 
## data:  mytable and mytablea
## t = 1.1218, df = 1.0005, p-value = 0.4634
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2186.980  2611.054
## sample estimates:
## mean of x mean of y 
## 229.00000  16.96296

The p value = 0.4634 we accept null hypothesis , and there is no significant difference between the percent of premium seats of inernational and doestic airlines

## 
## Call:
## lm(formula = PriceRelative ~ FlightDuration + IsInternational + 
##     SeatsEconomy + SeatsPremium + WidthEconomy + WidthPremium + 
##     PriceEconomy + PricePremium + PitchEconomy + PitchPremium + 
##     PercentPremiumSeats)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.77203 -0.10435  0.00911  0.07462  0.85167 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -3.627e+00  1.918e+00  -1.891 0.059229 .  
## FlightDuration                2.079e-02  4.580e-03   4.538 7.30e-06 ***
## IsInternationalInternational  1.495e-01  1.597e-01   0.936 0.349857    
## SeatsEconomy                  1.819e-03  5.294e-04   3.437 0.000644 ***
## SeatsPremium                 -1.592e-02  3.279e-03  -4.854 1.68e-06 ***
## WidthEconomy                 -2.124e-01  3.181e-02  -6.678 7.24e-11 ***
## WidthPremium                  6.248e-02  1.630e-02   3.833 0.000145 ***
## PriceEconomy                 -8.512e-04  2.987e-05 -28.495  < 2e-16 ***
## PricePremium                  5.491e-04  2.256e-05  24.338  < 2e-16 ***
## PitchEconomy                  1.015e-01  2.921e-02   3.475 0.000560 ***
## PitchPremium                  8.383e-02  3.599e-02   2.329 0.020291 *  
## PercentPremiumSeats           2.253e-02  7.495e-03   3.006 0.002794 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2188 on 446 degrees of freedom
## Multiple R-squared:   0.77,  Adjusted R-squared:  0.7643 
## F-statistic: 135.7 on 11 and 446 DF,  p-value: < 2.2e-16

explanatory variable(s) whose beta-coefficients are not statistically significant (p > 0.05). 1.IsInternationalInternational explanatory variable(s) whose beta-coefficients are statistically significant (p < 0.05). FlightDuration
SeatsEconomy
SeatsPremium
WidthEconomy
WidthPremium
PriceEconomy
PricePremium
PitchEconomy
PitchPremium
PercentPremiumSeats

The regression analysis shows that the p-value: < 2.2e-16(significant),of Ftest is less than 0.05 which means the developed regression model is good fit. The multiple R square value is:0.77,which means that 77% of variations in the dependent variable can be explained by the independent variable.The value increases when we add independent variable to it.Adjusted R square value is 0.7643.It means 76% variation in the dependent variable can be explained by the independent variable it become precise when we add independent variable to it.