Airlines, now in very competitive market place, aims to reduce cost per passanger seat and increase revenue per flight by increasing number of passangers onboard. This is done by reducing leg-space between the seat. Hence, airlines can accomodate more passangers then before. With this increase in number of seats, now they can charge even lower price to attract more passangers. Also, airlines offers high priced tickets to those passangers who wants more leg-room and other utilities near the seat such as a small TV, charging point, reading lights and confortable seat. In this project, we wil examine whether above mentioned factors are significant enough for airlines to charge a higher price.

  1. Reading data into R
setwd("C:/Users/Abhi/Desktop/Data Analytics/Week 3 Day 5")
airline<- read.csv(paste("SixAirlinesDataV2.csv", sep=""))
head(airline)
##   Airline Aircraft FlightDuration TravelMonth IsInternational SeatsEconomy
## 1 British   Boeing          12.25         Jul   International          122
## 2 British   Boeing          12.25         Aug   International          122
## 3 British   Boeing          12.25         Sep   International          122
## 4 British   Boeing          12.25         Oct   International          122
## 5 British   Boeing           8.16         Aug   International          122
## 6 British   Boeing           8.16         Sep   International          122
##   SeatsPremium PitchEconomy PitchPremium WidthEconomy WidthPremium
## 1           40           31           38           18           19
## 2           40           31           38           18           19
## 3           40           31           38           18           19
## 4           40           31           38           18           19
## 5           40           31           38           18           19
## 6           40           31           38           18           19
##   PriceEconomy PricePremium PriceRelative SeatsTotal PitchDifference
## 1         2707         3725          0.38        162               7
## 2         2707         3725          0.38        162               7
## 3         2707         3725          0.38        162               7
## 4         2707         3725          0.38        162               7
## 5         1793         2999          0.67        162               7
## 6         1793         2999          0.67        162               7
##   WidthDifference PercentPremiumSeats
## 1               1               24.69
## 2               1               24.69
## 3               1               24.69
## 4               1               24.69
## 5               1               24.69
## 6               1               24.69
  1. Summary statistics of data
summary(airline)
##       Airline      Aircraft   FlightDuration   TravelMonth
##  AirFrance: 74   AirBus:151   Min.   : 1.250   Aug:127    
##  British  :175   Boeing:307   1st Qu.: 4.260   Jul: 75    
##  Delta    : 46                Median : 7.790   Oct:127    
##  Jet      : 61                Mean   : 7.578   Sep:129    
##  Singapore: 40                3rd Qu.:10.620              
##  Virgin   : 62                Max.   :14.660              
##       IsInternational  SeatsEconomy    SeatsPremium    PitchEconomy  
##  Domestic     : 40    Min.   : 78.0   Min.   : 8.00   Min.   :30.00  
##  International:418    1st Qu.:133.0   1st Qu.:21.00   1st Qu.:31.00  
##                       Median :185.0   Median :36.00   Median :31.00  
##                       Mean   :202.3   Mean   :33.65   Mean   :31.22  
##                       3rd Qu.:243.0   3rd Qu.:40.00   3rd Qu.:32.00  
##                       Max.   :389.0   Max.   :66.00   Max.   :33.00  
##   PitchPremium    WidthEconomy    WidthPremium    PriceEconomy 
##  Min.   :34.00   Min.   :17.00   Min.   :17.00   Min.   :  65  
##  1st Qu.:38.00   1st Qu.:18.00   1st Qu.:19.00   1st Qu.: 413  
##  Median :38.00   Median :18.00   Median :19.00   Median :1242  
##  Mean   :37.91   Mean   :17.84   Mean   :19.47   Mean   :1327  
##  3rd Qu.:38.00   3rd Qu.:18.00   3rd Qu.:21.00   3rd Qu.:1909  
##  Max.   :40.00   Max.   :19.00   Max.   :21.00   Max.   :3593  
##   PricePremium    PriceRelative      SeatsTotal  PitchDifference 
##  Min.   :  86.0   Min.   :0.0200   Min.   : 98   Min.   : 2.000  
##  1st Qu.: 528.8   1st Qu.:0.1000   1st Qu.:166   1st Qu.: 6.000  
##  Median :1737.0   Median :0.3650   Median :227   Median : 7.000  
##  Mean   :1845.3   Mean   :0.4872   Mean   :236   Mean   : 6.688  
##  3rd Qu.:2989.0   3rd Qu.:0.7400   3rd Qu.:279   3rd Qu.: 7.000  
##  Max.   :7414.0   Max.   :1.8900   Max.   :441   Max.   :10.000  
##  WidthDifference PercentPremiumSeats
##  Min.   :0.000   Min.   : 4.71      
##  1st Qu.:1.000   1st Qu.:12.28      
##  Median :1.000   Median :13.21      
##  Mean   :1.633   Mean   :14.65      
##  3rd Qu.:3.000   3rd Qu.:15.36      
##  Max.   :4.000   Max.   :24.69

Understanding the summary statistics: * There are 6 airlines operating on aircraft of two companies, Boeing (307 aircrafts) and Airbus (151 aircrafts). Majority of this airlines are operating on internation routes. There is also a frequency of travel during a perticular month which is highest for month September closely followed by August and October. This is the vacation season during Fall. Range of economic class seats available in an aircraft is 78 to 389 seats while for primium class is 8 to 66. Maximum width in economic class is 19 inch while for premium class is 21 inch. Average price for economic class is $1327 and $1737 for premium class. Following are histograms for different variables.

str(airline)
## 'data.frame':    458 obs. of  18 variables:
##  $ Airline            : Factor w/ 6 levels "AirFrance","British",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Aircraft           : Factor w/ 2 levels "AirBus","Boeing": 2 2 2 2 2 2 2 2 2 2 ...
##  $ FlightDuration     : num  12.25 12.25 12.25 12.25 8.16 ...
##  $ TravelMonth        : Factor w/ 4 levels "Aug","Jul","Oct",..: 2 1 4 3 1 4 3 1 4 4 ...
##  $ IsInternational    : Factor w/ 2 levels "Domestic","International": 2 2 2 2 2 2 2 2 2 2 ...
##  $ SeatsEconomy       : int  122 122 122 122 122 122 122 122 122 122 ...
##  $ SeatsPremium       : int  40 40 40 40 40 40 40 40 40 40 ...
##  $ PitchEconomy       : int  31 31 31 31 31 31 31 31 31 31 ...
##  $ PitchPremium       : int  38 38 38 38 38 38 38 38 38 38 ...
##  $ WidthEconomy       : int  18 18 18 18 18 18 18 18 18 18 ...
##  $ WidthPremium       : int  19 19 19 19 19 19 19 19 19 19 ...
##  $ PriceEconomy       : int  2707 2707 2707 2707 1793 1793 1793 1476 1476 1705 ...
##  $ PricePremium       : int  3725 3725 3725 3725 2999 2999 2999 2997 2997 2989 ...
##  $ PriceRelative      : num  0.38 0.38 0.38 0.38 0.67 0.67 0.67 1.03 1.03 0.75 ...
##  $ SeatsTotal         : int  162 162 162 162 162 162 162 162 162 162 ...
##  $ PitchDifference    : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ WidthDifference    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PercentPremiumSeats: num  24.7 24.7 24.7 24.7 24.7 ...
attach(airline)
hist(FlightDuration , xlab="Flight duration", col = "blue")

Maximum number of flights operates on 8 hour route.

hist(SeatsEconomy, xlab="Number of Economy Seats", col = "blue")

No airlines offered seating capacity between 250 to 300.

hist(SeatsPremium, xlab="Number of Premium Seats", col = "blue")

Majority of the airlines offers 35 to 40 premium class seats.

hist(WidthEconomy, xlab="Width of Economy Seats", col = "blue")

Only three types of width available in economy class and 18 inch is most prevelent.

hist(WidthPremium, xlab="Width of Premium Economy Seats", col = "blue")

19 inch width is available in majority of aircrafts.

hist(PriceEconomy, xlab="Price of Economy Seats", col = "blue")

As expected, number airlines charging below $1000 are more for economic class.

hist(PricePremium, xlab="Price of Premium Economy Seats", col = "blue")

Finding outliers in tha data.

attach(airline)
## The following objects are masked from airline (pos = 3):
## 
##     Aircraft, Airline, FlightDuration, IsInternational,
##     PercentPremiumSeats, PitchDifference, PitchEconomy,
##     PitchPremium, PriceEconomy, PricePremium, PriceRelative,
##     SeatsEconomy, SeatsPremium, SeatsTotal, TravelMonth,
##     WidthDifference, WidthEconomy, WidthPremium
plot(Airline,SeatsPremium,main="Airline vs Economy Seats",ylab="Mean Economy Seats",col="green")

library(lattice)
histogram(~PriceEconomy | IsInternational, data=airline, col="yellow", main = "Price of economy class tickits in international and domestic flights")

histogram(~PricePremium | IsInternational, data=airline, col="yellow", main = "Price of premium class tickits in international and domestic flights")

plot(Airline,SeatsPremium,col="black",main="Airline vs Economy Seats",ylab="Mean Economy Seats")

plot(Airline,SeatsEconomy,col='black',main="Airline vs Premium Seats",ylab="Mean PremiumSeats")

library(car)
scatterplot(PriceEconomy, FlightDuration, main="Economy class pricing vs. flight duration", xlab="Price (in $)", ylab = "Duration (in hours)")

library(corrgram)
corrgram(airline, order=TRUE,lower.panel=panel.shade,upper.panel=panel.pie,text.panel=panel.txt, main="Corrgram")

Null Hypothesis: There is no difference in the price with respect to width of the seat.

t.test(WidthPremium, PricePremium, var.equal = TRUE, paired=FALSE)
## 
##  Two Sample t-test
## 
## data:  WidthPremium and PricePremium
## t = -30.333, df = 914, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1943.914 -1707.658
## sample estimates:
##  mean of x  mean of y 
##   19.47162 1845.25764

Here, p value is less than 0.05, hence we reject the null hypothesis. Therefor, there is a significant impact of width and price in the premium class.

cor.test((PricePremium-PriceEconomy),WidthDifference)
## 
##  Pearson's product-moment correlation
## 
## data:  (PricePremium - PriceEconomy) and WidthDifference
## t = 2.5291, df = 456, p-value = 0.01177
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.02627012 0.20700978
## sample estimates:
##       cor 
## 0.1176138
rmdl = lm((PricePremium-PriceEconomy) ~ PitchDifference + WidthDifference + FlightDuration)
summary(rmdl)
## 
## Call:
## lm(formula = (PricePremium - PriceEconomy) ~ PitchDifference + 
##     WidthDifference + FlightDuration)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -859.4 -324.7  -62.7  150.1 3331.5 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -286.933    117.833  -2.435   0.0153 *  
## PitchDifference   10.387     20.779   0.500   0.6174    
## WidthDifference   74.641     30.977   2.410   0.0164 *  
## FlightDuration    80.992      6.754  11.992   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 506.1 on 454 degrees of freedom
## Multiple R-squared:  0.2538, Adjusted R-squared:  0.2489 
## F-statistic: 51.48 on 3 and 454 DF,  p-value: < 2.2e-16

Variation in prices were more in International flights. Pitch and Width were more in International flights. Prices of economy increases with Witdth, Pitch, flight duration. Price of Premium class tickits were more than Economy class tickits. Since there is no absolute measure of price, the contributing factors to the difference in prices of the two tickits, which we take as our output of regression, are pitch and width difference and flight duration