Mini-Project

Reading the dataset

airline.df<-read.csv(paste("SixAirlinesDataV2.csv"))
View(airline.df)

Summary and description of the dataset

summary(airline.df)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69
library(psych)
describe(airline.df)
##                     vars   n    mean      sd  median trimmed     mad   min
## Airline*               1 458    3.01    1.65    2.00    2.89    1.48  1.00
## Aircraft*              2 458    1.67    0.47    2.00    1.71    0.00  1.00
## FlightDuration         3 458    7.58    3.54    7.79    7.57    4.81  1.25
## TravelMonth*           4 458    2.56    1.17    3.00    2.58    1.48  1.00
## IsInternational*       5 458    1.91    0.28    2.00    2.00    0.00  1.00
## SeatsEconomy           6 458  202.31   76.37  185.00  194.64   85.99 78.00
## SeatsPremium           7 458   33.65   13.26   36.00   33.35   11.86  8.00
## PitchEconomy           8 458   31.22    0.66   31.00   31.26    0.00 30.00
## PitchPremium           9 458   37.91    1.31   38.00   38.05    0.00 34.00
## WidthEconomy          10 458   17.84    0.56   18.00   17.81    0.00 17.00
## WidthPremium          11 458   19.47    1.10   19.00   19.53    0.00 17.00
## PriceEconomy          12 458 1327.08  988.27 1242.00 1244.40 1159.39 65.00
## PricePremium          13 458 1845.26 1288.14 1737.00 1799.05 1845.84 86.00
## PriceRelative         14 458    0.49    0.45    0.36    0.42    0.41  0.02
## SeatsTotal            15 458  235.96   85.29  227.00  228.73   90.44 98.00
## PitchDifference       16 458    6.69    1.76    7.00    6.76    0.00  2.00
## WidthDifference       17 458    1.63    1.19    1.00    1.53    0.00  0.00
## PercentPremiumSeats   18 458   14.65    4.84   13.21   14.31    2.68  4.71
##                         max   range  skew kurtosis    se
## Airline*               6.00    5.00  0.61    -0.95  0.08
## Aircraft*              2.00    1.00 -0.72    -1.48  0.02
## FlightDuration        14.66   13.41 -0.07    -1.12  0.17
## TravelMonth*           4.00    3.00 -0.14    -1.46  0.05
## IsInternational*       2.00    1.00 -2.91     6.50  0.01
## SeatsEconomy         389.00  311.00  0.72    -0.36  3.57
## SeatsPremium          66.00   58.00  0.23    -0.46  0.62
## PitchEconomy          33.00    3.00 -0.03    -0.35  0.03
## PitchPremium          40.00    6.00 -1.51     3.52  0.06
## WidthEconomy          19.00    2.00 -0.04    -0.08  0.03
## WidthPremium          21.00    4.00 -0.08    -0.31  0.05
## PriceEconomy        3593.00 3528.00  0.51    -0.88 46.18
## PricePremium        7414.00 7328.00  0.50     0.43 60.19
## PriceRelative          1.89    1.87  1.17     0.72  0.02
## SeatsTotal           441.00  343.00  0.70    -0.53  3.99
## PitchDifference       10.00    8.00 -0.54     1.78  0.08
## WidthDifference        4.00    4.00  0.84    -0.53  0.06
## PercentPremiumSeats   24.69   19.98  0.71     0.28  0.23

Attaching the dataframe

attach(airline.df)

Visualizing the data the distribution of variables

Distribution of data for Flight Duration

boxplot(airline.df$FlightDuration, main="Boxplot-Flight duration",
        xlab="Flight duration", col="maroon")

Comparison of number of seats in Premium and Economy

par(mfrow=c(1, 2))
boxplot(airline.df$SeatsEconomy, main="Boxplot-Seats Economy",
        xlab="Seats Economy", col="maroon")
boxplot(airline.df$SeatsPremium, main="Boxplot-Seats Premium",
        xlab="Seats Premium", col="dark blue")

Comparison of distance between Consecutive Economy and Premium Economy Seats

par(mfrow=c(1, 2))
hist(airline.df$PitchEconomy,xlab="Pitch Economy", col="maroon", 
          main="Histogram for Pitch Economy")
hist(airline.df$PitchPremium,xlab="Pitch Premium", col="dark blue",
          main="Histogram for Pitch Premium")

Comparison of Width between armrests between Economy and Premium Economy Seats

par(mfrow=c(1, 2))
hist(airline.df$WidthEconomy, main="Histogram- Width Economy",
        xlab="Width Economy", col="maroon")
hist(airline.df$WidthPremium, main="Boxplot-Width Premium",
        xlab="Width Premium", col="dark blue")

Comparison of Prices of Economy and a Premium Economy Seat

par(mfrow=c(1, 2))
boxplot(airline.df$PriceEconomy, main="Boxplot-Seats Economy",
        xlab="Prices", col="maroon")
boxplot(airline.df$PricePremium, main="Boxplot-Seats Premium",
        xlab="Seats Premium", col="dark blue")

Number of International and Domestic Flights

library(lattice)
histogram(airline.df$IsInternational, main="Histogram- Number of International & Domestic Flights", xlab="International/Domestic", col="dark blue")

Comparison of Prices in various airlines

par(mfrow=c(2,1))
plot(PriceEconomy~Airline, data=airline.df, main="Prices in Economy seats", col=c("maroon","red","dark blue","green","purple","grey"))
plot(PricePremium~Airline, data=airline.df, main="Prices in Premium seats", col=c("maroon","red","dark blue","green", "purple","grey")) 

Comparisons using Scatterplot

Comparison between PriceRelative((PricePremium - PriceEconomy) / PriceEconomy) and SeatsTotal(SeatsEconomy + SeatsPremium)

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(airline.df$PriceRelative,airline.df$SeatsTotal, main="Comparison of Price Relative and Seats Total",
            xlab="Price Relative", ylab="Seats Total")

Comparison between WidthDifference(WidthPremium - WidthEconomy) and PriceRelative((PricePremium - PriceEconomy) / PriceEconomy)

scatterplot(airline.df$WidthDifference,airline.df$PriceRelative, main="Comparison of Price Relative and Width Difference", 
            xlab="Width", ylab="Price")

Comparison between PitchDifference(PitchPremium - PitchEconomy) and PriceRelative((PricePremium - PriceEconomy) / PriceEconomy)

scatterplot(airline.df$PitchDifference,airline.df$PriceRelative, main="Comparison of PriceRelative and Pitch Difference",
            xlab = "Pitch", ylab="Price")

Corrgram

library(corrgram)
corrgram(airline.df, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of Six Airlines Variables")

T-test: “What factors explain the difference in price between an economy ticket and a premium-economy airline ticket?”

t.test(airline.df$PricePremium,airline.df$PriceEconomy, data=airline.df)
## 
##  Welch Two Sample t-test
## 
## data:  airline.df$PricePremium and airline.df$PriceEconomy
## t = 6.8304, df = 856.56, p-value = 1.605e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  369.2793 667.0831
## sample estimates:
## mean of x mean of y 
##  1845.258  1327.076

Inference: p-value is less than 0.05, indicating a strong evidence against null hypothesis. Hence, there is a significant difference in the prices of Economy and Premium-Economy ticket.

Correlation testing

between PriceRelative((PricePremium - PriceEconomy) / PriceEconomy) and Flight Duration

cor.test(airline.df$PriceRelative, airline.df$FlightDuration)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$FlightDuration
## t = 2.6046, df = 456, p-value = 0.009498
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.02977856 0.21036806
## sample estimates:
##      cor 
## 0.121075

between PitchDifference(PitchPremium - PitchEconomy) and PriceRelative ((PricePremium - PriceEconomy) / PriceEconomy)

cor.test(airline.df$PriceRelative, airline.df$PitchDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$PitchDifference
## t = 11.331, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3940262 0.5372817
## sample estimates:
##       cor 
## 0.4687302

between WidthDifference(WidthPremium - WidthEconomy) and PriceRelative ((PricePremium - PriceEconomy) / PriceEconomy)

cor.test(airline.df$PriceRelative, airline.df$WidthDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  airline.df$PriceRelative and airline.df$WidthDifference
## t = 11.869, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4125388 0.5528218
## sample estimates:
##       cor 
## 0.4858024

Regression Model

rmodel=lm(PricePremium ~Airline+TravelMonth+FlightDuration+PitchDifference+WidthDifference+PercentPremiumSeats+PriceRelative,airline.df)
summary(rmodel)
## 
## Call:
## lm(formula = PricePremium ~ Airline + TravelMonth + FlightDuration + 
##     PitchDifference + WidthDifference + PercentPremiumSeats + 
##     PriceRelative, data = airline.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2216.1  -408.6   102.1   392.2  4277.1 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           624.427    534.907   1.167   0.2437    
## AirlineBritish       -972.777    218.739  -4.447 1.10e-05 ***
## AirlineDelta        -1133.805    265.671  -4.268 2.42e-05 ***
## AirlineJet          -2497.060    241.586 -10.336  < 2e-16 ***
## AirlineSingapore    -2077.175    166.680 -12.462  < 2e-16 ***
## AirlineVirgin       -1026.230    210.448  -4.876 1.51e-06 ***
## TravelMonthJul         78.482    111.257   0.705   0.4809    
## TravelMonthOct        -39.008     94.984  -0.411   0.6815    
## TravelMonthSep         -4.181     94.494  -0.044   0.9647    
## FlightDuration        187.775     12.780  14.693  < 2e-16 ***
## PitchDifference        25.189    116.907   0.215   0.8295    
## WidthDifference       259.734    157.784   1.646   0.1004    
## PercentPremiumSeats    15.601      9.352   1.668   0.0960 .  
## PriceRelative         234.797    100.289   2.341   0.0197 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 755.4 on 444 degrees of freedom
## Multiple R-squared:  0.6658, Adjusted R-squared:  0.6561 
## F-statistic: 68.05 on 13 and 444 DF,  p-value: < 2.2e-16

The model is statistically significant because the p-value is less than 0.05

Inferences:

  1. Through the tests and analysis we have have conducted we can infer that, the prize of Premium Economy tickets are higher than Economy tickets.

  2. FlightDuration, WidthDifference, PitchDifference, Airline, PercentPremiumSeats are the factors that effect it.

  3. There is a high correlation between Prices of Economy tickets,Premium Economy tickets and Duration of Flight.

  4. Width Difference is the a bigger reason than Pitch Difference why Premium Economy tickets are prized high.