Introduction

The data set that we are working on is a classification data set which is used to differentiate the economy class air-ticket price from the premium economy class air-ticket price. Several other factors are provided in the data set which are closely related to the airline industry.

The aim of this analysis report is to find out which factors exactly contribute toward to difference in the prices of the premium economy class and the economy class air tickets.

Analysis

Reading the data set

setwd("C:/Users/Shreyas Jadhav/Downloads")  
airlines <- read.csv(paste("SixAirlinesDataV2.csv",sep="."))
#View(airlines)
airlines$TravelMonth <- as.numeric(airlines$TravelMonth)
airlines$IsInternational <- as.numeric(airlines$IsInternational)
airlines$Aircraft <- as.numeric(airlines$Aircraft)
str(airlines)
## 'data.frame':    458 obs. of  18 variables:
##  $ Airline            : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Aircraft           : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ FlightDuration     : num  12.25 12.25 12.25 12.25 8.16 ...
##  $ TravelMonth        : num  2 1 4 3 1 4 3 1 4 4 ...
##  $ IsInternational    : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ SeatsEconomy       : int  122 122 122 122 122 122 122 122 122 122 ...
##  $ SeatsPremium       : int  40 40 40 40 40 40 40 40 40 40 ...
##  $ PitchEconomy       : int  31 31 31 31 31 31 31 31 31 31 ...
##  $ PitchPremium       : int  38 38 38 38 38 38 38 38 38 38 ...
##  $ WidthEconomy       : int  18 18 18 18 18 18 18 18 18 18 ...
##  $ WidthPremium       : int  19 19 19 19 19 19 19 19 19 19 ...
##  $ PriceEconomy       : int  2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
##  $ PricePremium       : int  3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
##  $ PriceRelative      : num  0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
##  $ SeatsTotal         : int  162 162 162 162 162 162 162 162 162 162 ...
##  $ PitchDifference    : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ WidthDifference    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PercentPremiumSeats: num  24.7 24.7 24.7 24.7 24.7 ...

Producing the summary of the data set

summary(airlines)
##       Airline       Aircraft    FlightDuration    TravelMonth   
##  AirFrance: 74   Min.   :1.00   Min.   : 1.250   Min.   :1.000  
##  British  :175   1st Qu.:1.00   1st Qu.: 4.260   1st Qu.:1.000  
##  Delta    : 46   Median :2.00   Median : 7.790   Median :3.000  
##  Jet      : 61   Mean   :1.67   Mean   : 7.578   Mean   :2.563  
##  Singapore: 40   3rd Qu.:2.00   3rd Qu.:10.620   3rd Qu.:4.000  
##  Virgin   : 62   Max.   :2.00   Max.   :14.660   Max.   :4.000  
##  IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Min.   :1.000   Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  1st Qu.:2.000   1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##  Median :2.000   Median :185.0   Median :36.00   Median :31.00  
##  Mean   :1.913   Mean   :202.3   Mean   :33.65   Mean   :31.22  
##  3rd Qu.:2.000   3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##  Max.   :2.000   Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69

Describing the data set by using various parameters like mean, median, sd, etc.

library(psych)
describe(airlines)
##                     vars   n    mean      sd  median trimmed     mad   min
## Airline*               1 458    3.01    1.65    2.00    2.89    1.48  1.00
## Aircraft               2 458    1.67    0.47    2.00    1.71    0.00  1.00
## FlightDuration         3 458    7.58    3.54    7.79    7.57    4.81  1.25
## TravelMonth            4 458    2.56    1.17    3.00    2.58    1.48  1.00
## IsInternational        5 458    1.91    0.28    2.00    2.00    0.00  1.00
## SeatsEconomy           6 458  202.31   76.37  185.00  194.64   85.99 78.00
## SeatsPremium           7 458   33.65   13.26   36.00   33.35   11.86  8.00
## PitchEconomy           8 458   31.22    0.66   31.00   31.26    0.00 30.00
## PitchPremium           9 458   37.91    1.31   38.00   38.05    0.00 34.00
## WidthEconomy          10 458   17.84    0.56   18.00   17.81    0.00 17.00
## WidthPremium          11 458   19.47    1.10   19.00   19.53    0.00 17.00
## PriceEconomy          12 458 1327.08  988.27 1242.00 1244.40 1159.39 65.00
## PricePremium          13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative         14 458    0.49    0.45    0.36    0.42    0.41  0.02
## SeatsTotal            15 458  235.96   85.29  227.00  228.73   90.44 98.00
## PitchDifference       16 458    6.69    1.76    7.00    6.76    0.00  2.00
## WidthDifference       17 458    1.63    1.19    1.00    1.53    0.00  0.00
## PercentPremiumSeats   18 458   14.65    4.84   13.21   14.31    2.68  4.71
##                         max   range  skew kurtosis    se
## Airline*               6.00    5.00  0.61    -0.95  0.08
## Aircraft               2.00    1.00 -0.72    -1.48  0.02
## FlightDuration        14.66   13.41 -0.07    -1.12  0.17
## TravelMonth            4.00    3.00 -0.14    -1.46  0.05
## IsInternational        2.00    1.00 -2.91     6.50  0.01
## SeatsEconomy         389.00  311.00  0.72    -0.36  3.57
## SeatsPremium          66.00   58.00  0.23    -0.46  0.62
## PitchEconomy          33.00    3.00 -0.03    -0.35  0.03
## PitchPremium          40.00    6.00 -1.51     3.52  0.06
## WidthEconomy          19.00    2.00 -0.04    -0.08  0.03
## WidthPremium          21.00    4.00 -0.08    -0.31  0.05
## PriceEconomy        3593.00 3528.00  0.51    -0.88 46.18
## PricePremium        7414.00 7328.00  0.50     0.43 60.19
## PriceRelative          1.89    1.87  1.17     0.72  0.02
## SeatsTotal           441.00  343.00  0.70    -0.53  3.99
## PitchDifference       10.00    8.00 -0.54     1.78  0.08
## WidthDifference        4.00    4.00  0.84    -0.53  0.06
## PercentPremiumSeats   24.69   19.98  0.71     0.28  0.23

Airlines

table(airlines$Airline)
## 
## AirFrance   British     Delta       Jet Singapore    Virgin 
##        74       175        46        61        40        62

Boxplot graph of:

  1. PriceEconomy vs Airline
plot(x=airlines$Airline,y=airlines$PriceEconomy)

  1. PricePremium vs Airline
plot(x=airlines$Airline,y=airlines$PricePremium)

A ggplot for Airlines factor

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
ggplot(airlines, aes(x = Airline, fill = Airline)) + geom_bar()

Correlation tables for specified columns

cor(airlines[,c(6,12)])
##              SeatsEconomy PriceEconomy
## SeatsEconomy    1.0000000    0.1281672
## PriceEconomy    0.1281672    1.0000000
cor(airlines[,c(7,13)])
##              SeatsPremium PricePremium
## SeatsPremium    1.0000000    0.2176124
## PricePremium    0.2176124    1.0000000

Pearson’s Chi-squared tests between:

  1. Airline and PriceEconomy
mytable1<-xtabs(~airlines$Airline+airlines$PriceEconomy)
chisq.test(mytable1)
## Warning in chisq.test(mytable1): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable1
## X-squared = 2156.7, df = 945, p-value < 2.2e-16
  1. Airline and PricePremium
mytable2<-xtabs(~airlines$Airline+airlines$PricePremium)
chisq.test(mytable2)
## Warning in chisq.test(mytable2): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable2
## X-squared = 2195.1, df = 850, p-value < 2.2e-16
  1. Airline and PriceRelative
mytable3<-xtabs(~airlines$Airline+airlines$PriceRelative)
chisq.test(mytable3)
## Warning in chisq.test(mytable3): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable3
## X-squared = 1402.9, df = 485, p-value < 2.2e-16
  1. Aircraft and PriceEconomy
mytable4<-xtabs(~airlines$Aircraft+airlines$PriceEconomy)
chisq.test(mytable4)
## Warning in chisq.test(mytable4): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable4
## X-squared = 396.14, df = 189, p-value < 2.2e-16
  1. Aircraft and PricePremium
mytable5<-xtabs(~airlines$Aircraft+airlines$PricePremium)
chisq.test(mytable5)
## Warning in chisq.test(mytable5): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable5
## X-squared = 399.65, df = 170, p-value < 2.2e-16

Linear Regression Models to study relationship between different factors:

Dependant variable: Price Relative

Predictor variables: Airlines

fit1<-lm(airlines$PriceRelative~airlines$TravelMonth)
summary(fit1)
## 
## Call:
## lm(formula = airlines$PriceRelative ~ airlines$TravelMonth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4690 -0.3827 -0.1177  0.2534  1.4073 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           0.49520    0.05084   9.741   <2e-16 ***
## airlines$TravelMonth -0.00312    0.01805  -0.173    0.863    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4511 on 456 degrees of freedom
## Multiple R-squared:  6.554e-05,  Adjusted R-squared:  -0.002127 
## F-statistic: 0.02989 on 1 and 456 DF,  p-value: 0.8628

Dependant variable: Price Relative

Predictor variables: FlightDuration

fit2<-lm(airlines$PriceRelative~airlines$FlightDuration)
summary(fit2)
## 
## Call:
## lm(formula = airlines$PriceRelative ~ airlines$FlightDuration)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5507 -0.3373 -0.1167  0.2363  1.4694 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.370491   0.049454   7.492 3.56e-13 ***
## airlines$FlightDuration 0.015402   0.005913   2.605   0.0095 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4478 on 456 degrees of freedom
## Multiple R-squared:  0.01466,    Adjusted R-squared:  0.0125 
## F-statistic: 6.784 on 1 and 456 DF,  p-value: 0.009498

Dependant variable: PriceEconomy

Predictor variables: SeatsEconomy

fit3<-lm(airlines$PriceEconomy~airlines$SeatsEconomy)
summary(fit3)
## 
## Call:
## lm(formula = airlines$PriceEconomy ~ airlines$SeatsEconomy)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1434.99  -861.30   -49.71   619.12  2312.88 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            991.545    129.941   7.631 1.38e-13 ***
## airlines$SeatsEconomy    1.659      0.601   2.760  0.00602 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 981.2 on 456 degrees of freedom
## Multiple R-squared:  0.01643,    Adjusted R-squared:  0.01427 
## F-statistic: 7.616 on 1 and 456 DF,  p-value: 0.006019

Dependant variable: PricePremium

Predictor variables: SeatsPremium

fit4<-lm(airlines$PricePremium~airlines$SeatsPremium)
summary(fit4)
## 
## Call:
## lm(formula = airlines$PricePremium ~ airlines$SeatsPremium)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2210.6  -999.1  -111.0  1082.6  5772.7 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            1134.01     160.55   7.063 6.11e-12 ***
## airlines$SeatsPremium    21.14       4.44   4.761 2.59e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1259 on 456 degrees of freedom
## Multiple R-squared:  0.04736,    Adjusted R-squared:  0.04527 
## F-statistic: 22.67 on 1 and 456 DF,  p-value: 2.591e-06

Dependant variable: Price Relative

Predictor variables: PitchDifference

fit5<-lm(airlines$PriceRelative~airlines$PitchDifference)
summary(fit5)
## 
## Call:
## lm(formula = airlines$PriceRelative ~ airlines$PitchDifference)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7643 -0.3247 -0.1146  0.2052  1.2954 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              -0.31456    0.07317  -4.299  2.1e-05 ***
## airlines$PitchDifference  0.11989    0.01058  11.331  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3985 on 456 degrees of freedom
## Multiple R-squared:  0.2197, Adjusted R-squared:  0.218 
## F-statistic: 128.4 on 1 and 456 DF,  p-value: < 2.2e-16

Dependant variable: PriceRelative

Predictor variables: WidthDifference

fit6<-lm(airlines$PriceRelative~airlines$WidthDifference)
summary(fit6)
## 
## Call:
## lm(formula = airlines$PriceRelative ~ airlines$WidthDifference)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8028 -0.2907 -0.0766  0.1852  1.1893 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               0.18660    0.03132   5.958 5.11e-09 ***
## airlines$WidthDifference  0.18406    0.01551  11.869  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3943 on 456 degrees of freedom
## Multiple R-squared:  0.236,  Adjusted R-squared:  0.2343 
## F-statistic: 140.9 on 1 and 456 DF,  p-value: < 2.2e-16

Dependant variable: Price Relative

Predictor variables: SeatsEconomy

fit7<-lm(airlines$PriceRelative~airlines$SeatsEconomy)
summary(fit7)
## 
## Call:
## lm(formula = airlines$PriceRelative ~ airlines$SeatsEconomy)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4716 -0.3863 -0.1213  0.2546  1.4046 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           4.825e-01  5.974e-02   8.077 5.99e-15 ***
## airlines$SeatsEconomy 2.335e-05  2.763e-04   0.084    0.933    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4511 on 456 degrees of freedom
## Multiple R-squared:  1.566e-05,  Adjusted R-squared:  -0.002177 
## F-statistic: 0.00714 on 1 and 456 DF,  p-value: 0.9327

Dependant variable: Price Relative

Predictor variables: SeatsPremium

fit8<-lm(airlines$PriceRelative~airlines$SeatsPremium)
summary(fit8)
## 
## Call:
## lm(formula = airlines$PriceRelative ~ airlines$SeatsPremium)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5023 -0.3862 -0.1129  0.2038  1.3445 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.598328   0.057266  10.448   <2e-16 ***
## airlines$SeatsPremium -0.003302   0.001584  -2.085   0.0376 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4489 on 456 degrees of freedom
## Multiple R-squared:  0.009447,   Adjusted R-squared:  0.007275 
## F-statistic: 4.349 on 1 and 456 DF,  p-value: 0.03759

Dependant variable: Price Relative

Predictor variables: SeatsPremium-SeatsEconomy

fit9<-lm(airlines$PriceRelative ~ I(airlines$SeatsPremium-airlines$SeatsEconomy))
summary(fit9)
## 
## Call:
## lm(formula = airlines$PriceRelative ~ I(airlines$SeatsPremium - 
##     airlines$SeatsEconomy))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4948 -0.3848 -0.1150  0.2620  1.4120 
## 
## Coefficients:
##                                                    Estimate Std. Error
## (Intercept)                                       0.4617079  0.0557965
## I(airlines$SeatsPremium - airlines$SeatsEconomy) -0.0001512  0.0003063
##                                                  t value Pr(>|t|)    
## (Intercept)                                        8.275 1.43e-15 ***
## I(airlines$SeatsPremium - airlines$SeatsEconomy)  -0.494    0.622    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.451 on 456 degrees of freedom
## Multiple R-squared:  0.0005338,  Adjusted R-squared:  -0.001658 
## F-statistic: 0.2436 on 1 and 456 DF,  p-value: 0.6219

Dependant variable: PricePremium-PriceEconomy

Predictor variables: PitchDifference

fit10<-lm(I(airlines$PricePremium-airlines$PriceEconomy) ~ airlines$PitchDifference)
summary(fit10)
## 
## Call:
## lm(formula = I(airlines$PricePremium - airlines$PriceEconomy) ~ 
##     airlines$PitchDifference)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -600.4 -379.9 -274.4  241.0 3780.5 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                233.14     106.45   2.190  0.02902 * 
## airlines$PitchDifference    42.62      15.39   2.769  0.00586 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 579.7 on 456 degrees of freedom
## Multiple R-squared:  0.01653,    Adjusted R-squared:  0.01438 
## F-statistic: 7.666 on 1 and 456 DF,  p-value: 0.005855

Dependant variable: PricePremium-PriceEconomy

Predictor variables: WidthDifference

fit11<-lm(I(airlines$PricePremium-airlines$PriceEconomy) ~ airlines$WidthDifference)
summary(fit11)
## 
## Call:
## lm(formula = I(airlines$PricePremium - airlines$PriceEconomy) ~ 
##     airlines$WidthDifference)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -595.9 -408.9 -257.6  243.4 3830.4 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                423.87      46.11   9.192   <2e-16 ***
## airlines$WidthDifference    57.75      22.83   2.529   0.0118 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 580.5 on 456 degrees of freedom
## Multiple R-squared:  0.01383,    Adjusted R-squared:  0.01167 
## F-statistic: 6.396 on 1 and 456 DF,  p-value: 0.01177

Dependant variable: PricePremium-PriceEconomy

Predictor variables: PitchDifference

fit12<-lm(I(airlines$PricePremium-airlines$PriceEconomy) ~ airlines$FlightDuration)
summary(fit12)
## 
## Call:
## lm(formula = I(airlines$PricePremium - airlines$PriceEconomy) ~ 
##     airlines$FlightDuration)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -891.4 -339.0  -53.9  148.4 3307.2 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              -71.581     56.918  -1.258    0.209    
## airlines$FlightDuration   77.827      6.806  11.435   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 515.3 on 456 degrees of freedom
## Multiple R-squared:  0.2229, Adjusted R-squared:  0.2212 
## F-statistic: 130.8 on 1 and 456 DF,  p-value: < 2.2e-16

Correlation tables between the different variables.

cor(x=I(airlines$FlightDuration),y=I(airlines$PricePremium-airlines$PriceEconomy))
## [1] 0.4720837
cor(x=I(airlines$TravelMonth),y=I(airlines$PricePremium-airlines$PriceEconomy))
## [1] 0.007286108

Correlation tests between the different variables.

cor.test(airlines$TravelMonth,I(airlines$PricePremium-airlines$PriceEconomy))
## 
##  Pearson's product-moment correlation
## 
## data:  airlines$TravelMonth and I(airlines$PricePremium - airlines$PriceEconomy)
## t = 0.15559, df = 456, p-value = 0.8764
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08439705  0.09884693
## sample estimates:
##         cor 
## 0.007286108
cor.test(airlines$IsInternational,I(airlines$PricePremium-airlines$PriceEconomy))
## 
##  Pearson's product-moment correlation
## 
## data:  airlines$IsInternational and I(airlines$PricePremium - airlines$PriceEconomy)
## t = 5.7328, df = 456, p-value = 1.799e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1717354 0.3427659
## sample estimates:
##       cor 
## 0.2592822
cor.test(airlines$Aircraft,I(airlines$PricePremium-airlines$PriceEconomy))
## 
##  Pearson's product-moment correlation
## 
## data:  airlines$Aircraft and I(airlines$PricePremium - airlines$PriceEconomy)
## t = 0.47848, df = 456, p-value = 0.6325
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.06936787  0.11379457
## sample estimates:
##        cor 
## 0.02240132

T-test for different variables

t.test(I(airlines$PricePremium-airlines$PriceEconomy)~airlines$Aircraft)
## 
##  Welch Two Sample t-test
## 
## data:  I(airlines$PricePremium - airlines$PriceEconomy) by airlines$Aircraft
## t = -0.50194, df = 338.8, p-value = 0.616
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -136.72105   81.12983
## sample estimates:
## mean in group 1 mean in group 2 
##        499.5497        527.3453
t.test(I(airlines$PricePremium)~airlines$Aircraft)
## 
##  Welch Two Sample t-test
## 
## data:  I(airlines$PricePremium) by airlines$Aircraft
## t = 0.28645, df = 310.38, p-value = 0.7747
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -212.2929  284.6350
## sample estimates:
## mean in group 1 mean in group 2 
##        1869.503        1833.332
t.test(I(airlines$PriceEconomy)~airlines$Aircraft)
## 
##  Welch Two Sample t-test
## 
## data:  I(airlines$PriceEconomy) by airlines$Aircraft
## t = 0.64317, df = 289.45, p-value = 0.5206
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -131.7801  259.7135
## sample estimates:
## mean in group 1 mean in group 2 
##        1369.954        1305.987

Scatterplot between FlightDuration and PriceEconomy

plot(airlines$FlightDuration,airlines$PriceEconomy, col="green",main="Price economy vs flight hours",xlab="Hours", ylab="Price")
abline(h=mean(airlines$PriceEconomy), col="black", lty="dotted")
abline(v=mean(airlines$FlightDuration), col="black", lty="dotted")
abline(lm(airlines$PriceEconomy ~ airlines$FlightDuration))

Conclusions

  1. The British Airline has the maximum frequency of occurrence in the data set.It has the maximum number of aeroplanes with a different set of factor values for every aeroplane.

  2. The boxplot graphs of Airlines vs Premium air-ticket cost and Airlines vs Economy class air-ticket cost have a similar hyperbolic curve shape which suggests that the difference between the maximum and minimum ticket cost (for both classes) is dependent upon the airline type.

  3. The Airline factor is statistically related to the economy class air ticket price, the premium economy class air ticket price and the relative price of both the classes from the correlation tests for the same.

  4. From the linear regression model between the flight duration and the relative price between the two classes as well as between the flight duration and the difference price between the two classes, due to the p-value being <0.05, we conclude that the fight duration factor is very much significantly important in determining the difference in the prices of the economy class and the premium economy class air tickets.

  5. The seats in the economy class are statistically related to the price of the economy class air ticket, as per the adjoining regression model and correlation table.

  6. The seats in the premium economy class are statistically related to the price of the premium economy class air ticket, as per the adjoining regression model and correlation table.

  7. The difference in the number of seats in the economy class and the premium economy class does not contribute significantly to the difference in the air ticket cost of the economy class and the premium economy class, since the p-value > 0.05 as per the adjoining linear regression model.

  8. The difference in the pitch of the economy class seat and the premium economy class seat does contribute significantly to the relative prices of the air tickets betweem the economy class and the premium economy class as well as the, as the p-value < 0.05 from the adjoining linear regression model.

  9. The difference in the width of the economy class seat and the premium economy class seat does contribute significantly to the difference in the prices of the air tickets of the economy class and the premium economy class as well as the relative price, as the p-value < 0.05 from the adjoining linear regression model.

  10. Surprisingly, the travel month is positively correlated to the difference in the prices of the economy class and premium economy class air tickets, from the adjoining correlation table and test but its close to zero therefore its very weakly correlated. Also the travel month is not statistically significant to the relative prices from the regression model.

  11. Based on the correlation test, the IsInternational factor shows strong correlation with the difference in prices of the economy and premium economy class air tickets.

  12. The Aircraft factor is negatively correlated to the difference in the prices of the economy and premium economy class air tickets due to negative correlation coefficient. With a p-value > 0.05, the Aircraft factor s not a significant contributor to the latter.