Airlines Mini Project

In this mini-project, an airlines dataset is studied and analyzed. A basic summary of all the variables, their visualizations is carried out to better understand the data.

After this, a corrgram of the data is constructed, to identify the variables having a really strong relationship.

The data needs to read and attached and only then can it be analyzed.

setwd("~/Muyeena/Internship/Case Studies/Airlines")
airlines = read.csv("Airlines.csv")
#str(airlines)
attach(airlines)

Basic Analyzing of the data :

We will now try running some basic statistics, to understand the data better.

sum = summary(airlines)
sum[,c(1,2,4,5)]

##       Airline      Aircraft   TravelMonth      IsInternational
##  AirFrance: 74   AirBus:151   Aug:127     Domestic     : 40   
##  British  :175   Boeing:307   Jul: 75     International:418   
##  Delta    : 46                Oct:127                         
##  Jet      : 61                Sep:129                         
##  Singapore: 40                                                
##  Virgin   : 62

library(psych)
des = describe(airlines)
des[c(3,6:18), c(3,4,5,8,9,10)]

##                        mean      sd  median   min     max   range
## FlightDuration         7.58    3.54    7.79  1.25   14.66   13.41
## SeatsEconomy         202.31   76.37  185.00 78.00  389.00  311.00
## SeatsPremium          33.65   13.26   36.00  8.00   66.00   58.00
## PitchEconomy          31.22    0.66   31.00 30.00   33.00    3.00
## PitchPremium          37.91    1.31   38.00 34.00   40.00    6.00
## WidthEconomy          17.84    0.56   18.00 17.00   19.00    2.00
## WidthPremium          19.47    1.10   19.00 17.00   21.00    4.00
## PriceEconomy        1327.08  988.27 1242.00 65.00 3593.00 3528.00
## PricePremium        1845.26 1288.14 1737.00 86.00 7414.00 7328.00
## PriceRelative          0.49    0.45    0.36  0.02    1.89    1.87
## SeatsTotal           235.96   85.29  227.00 98.00  441.00  343.00
## PitchDifference        6.69    1.76    7.00  2.00   10.00    8.00
## WidthDifference        1.63    1.19    1.00  0.00    4.00    4.00
## PercentPremiumSeats   14.65    4.84   13.21  4.71   24.69   19.98

From this data, we can infer the following about the given dataset:

British AIrlines is more preferred than the other airlines.
Boeing Aircraft is more preferred as compared to Airbus.
There is a significant drop in airtravel in the month of July
The given dataset contains more international flights than domestic flights.
Most flights are of a duration of around 7.58 hours.
Despite having a major difference in the overall range of PriceEconomy and PricePremium, the difference between their means is not that drastic.
14% of all the seats in a flight are premium, indicating that more people prefer Economy seats.

Visualizing the Data :

We will now try visualizing most of the important data points, so that we can draw better inference from the data.

Visualizing Categorical Variables :

Categorical variables are those which do not contain a numerical value. The inbuilt r function hist() needs numerical data. So, here ggplot2() package is used to get the desired visualization.

library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

ggplot(data.frame(airlines), aes(x=Airline)) +
  geom_bar(fill = "lightblue")

ggplot(data.frame(airlines), aes(x=Aircraft)) +
  geom_bar(fill = "lightgreen")

ggplot(data.frame(airlines), aes(x=IsInternational)) +
  geom_bar(fill = "purple")

ggplot(data.frame(airlines), aes(x=TravelMonth)) +
  geom_bar(fill = "maroon")

Visualizing Flight Duration :

par(mfrow=c(1,1))
boxplot(FlightDuration, xlab = "Flight Duration (hours)", col = "pink", cex = 1.5)

Visualizing Price Details

par(mfrow = c(2,1))
boxplot(PriceEconomy, horizontal = TRUE, xlab = "Price of Economy Flights (dollars)", col = "skyblue", cex = 1.5, ylim = c(0, 8000))
boxplot(PricePremium, horizontal = TRUE, xlab = "Price of Premium Flights (dollars)", col = "yellow", cex = 1.5, ylim = c(0, 8000))

As can be seen from above, the price distribution of Economy and Premium flights is almost similar, with premium flights being distributed more widely. There is one flight which has a price of around 7600$, which drastically effects the mean, and range of the Premium Flights Price.

Visualizing Pitch Details

par(mfrow=c(3,1))
hist(PitchEconomy, xlim = c(30,40), col = "skyblue")
hist(PitchPremium, xlim = c(30,40), col = "yellow")
hist(PitchDifference, col = "lightgreen")

As seen, there is a distinct difference between the maximum pitch of an economy flight and the minimum pitch of a premium flight. This results in a minimum difference of 2 inches.

Most premium flights have a pitch of 38, except for a few outliers. The maximum pitch difference is of 7 inches.

Visualizing Width Details

par(mfrow=c(3,1))
hist(WidthEconomy, xlim = c(17,21),col = "skyblue")
hist(WidthPremium, xlim = c(17,21), col = "yellow")
hist(WidthDifference, col = "lightgreen")

When it comes to width, there is not much clear difference between the width of the economy flights, and that of the premium flights. This has resulted in few cases, where the width difference is zero.

The maximum width difference is of 1 inch.

Visualizing Seat Details

par(mfrow=c(2,2))
hist(SeatsTotal, xlim = c(5,500), col = "pink")
hist(SeatsEconomy, xlim = c(5,390),col = "skyblue")
hist(PercentPremiumSeats, xlim = c(0,30), col = "lightgreen")
hist(SeatsPremium, xlim = c(5,390), col = "yellow")

From the above visualizations, it is noted that the in most flights, the no. of economy seats is above 100, while all of the premium seats in our dataset is less than 100.

We also notice that the percentage of premium seats is 14%.

Visualizing the relationships of various variables.

In this section, the relationship of the price of both the types of the flights with respect to other variables is explored.

With respect to airlines :

par(mfrow=c(1,2))
clr = c("purple","red","skyblue","green","yellow","pink" )
plot(PriceEconomy ~ Airline, ylim = c(0,4200), col = clr)
plot(PricePremium ~ Airline, ylim = c(0,4200), col = clr)
text =  "An outlier value of around 7500$ is not represented"
mtext(text, side = 1, cex = 0.7, col = grey(0.5), line = 2)

par(mfrow=c(1,1))
plot(PriceRelative ~ Airline, col = clr)

From the above visualizations, we can infer the following :

AirFrance has the highest fare in Economy, but their isn’t much proce difference between the Economy and Premium flights.
British Airlines has the most distributed range of fares. It has one outlier at around 8000$.
There is not much difference in the fares of premium and economy flights of Delta Airlines, and they are one of the cheapest airlines in the dataset.
There is a considerable difference in the prices of economy flights and those of premium flights in Virgin Airlines.
Jet Airlines and Virgin Airlines have the highest index in terms of relative prices of the economy and premium flights.

With respect to the aircraft :

par(mfrow=c(1,2))
clr = c("purple","red","skyblue","green","yellow","pink" )
plot(PriceEconomy ~ Aircraft, ylim = c(0,4200), col = clr)
plot(PricePremium ~ Aircraft, ylim = c(0,4200), col = clr)
text =  "An outlier value of around 7500$ is not represented"
mtext(text, side = 1, cex = 0.7, col = grey(0.5), line = 2)

par(mfrow=c(1,1))
plot(PriceRelative ~ Aircraft, col = clr)

From the above visualizations, we can infer the following :

In terms of pricing, AirBus carriers are more costly than Boeing Carriers.
The difference between the two carriers prices is more significant in the Premium flights as compared to the economy flights.
But, in relative pricing, AirBus carriers have a slightly lower difference than Boeing carriers.

With respect to Travel Month :

par(mfrow=c(1,2))
clr = c("purple","red","skyblue","green","yellow","pink" )
plot(PriceEconomy ~ TravelMonth, ylim = c(0,4200), col = clr)
plot(PricePremium ~ TravelMonth, ylim = c(0,4200), col = clr)
text =  "An outlier value of around 7500$ is not represented"
mtext(text, side = 1, cex = 0.7, col = grey(0.5), line = 2)

par(mfrow=c(1,1))
plot(PriceRelative ~ TravelMonth, col = clr)

From the above visualizations, we can infer the following :

There is not much significant difference in terms of pricing with respect to the travel month.
The fares in July are slightly lesser, but this might be because the total number of flights are also low in the month of July.

With respect to international or not :

par(mfrow=c(1,2))
clr = c("purple","red","skyblue","green","yellow","pink" )
plot(PriceEconomy ~ IsInternational, ylim = c(0,4200), col = clr)
plot(PricePremium ~ IsInternational, ylim = c(0,4200), col = clr)
text =  "An outlier value of around 7500$ is not represented"
mtext(text, side = 1, cex = 0.7, col = grey(0.5), line = 2)

par(mfrow=c(1,1))
plot(PriceRelative ~ IsInternational, col = clr)

From the above visualizations, we can infer the following :

The domestic flights are considerably cheaper than the international flights.
there is negligent difference between the prices of economy flights and that of premium flights.

With Respect to flight duration :

par(mfrow=c(1,2))
clr = c("purple","red","skyblue","green","yellow","pink" )
plot(PriceEconomy ~ FlightDuration, ylim = c(0,4200), col = "black")
abline(lm(PriceEconomy ~ FlightDuration))
abline(lm(PricePremium ~ FlightDuration), col = "red")
plot(PricePremium ~ FlightDuration, ylim = c(0,4200), col = "red")
abline(lm(PriceEconomy ~ FlightDuration))
abline(lm(PricePremium ~ FlightDuration), col = "red")
text =  "An outlier value of around 7500$ is not represented"
mtext(text, side = 1, cex = 0.7, col = grey(0.5), line = 2)

par(mfrow=c(1,1))
plot(PriceRelative ~ FlightDuration, col = "black")
abline(lm(PriceRelative ~ FlightDuration))

Note : An outlier value of around 7500$ is not represented in the above diagram (Price of Premium vs. Flight Duration).

From the above visualizations, we can infer the following :

The shorter the flight duration, the lesser the price (in both: economy and premium flights)
When trying to fit the model, it is found that at lower flight duration, the difference between the premium and economy fares is not that high, but as the duration increases so does the fare.
In relative price of premium flights wrt economy flights, there is not much difference in regards to flight duration.
The data in this case is too dispersed to obtain any specific conclusion from it.

Relationship of price with respect to Pitch :

par(mfrow=c(1,2))
clr = c("purple","red","skyblue","green","yellow","pink" )
plot(PriceEconomy ~ PitchEconomy, ylim = c(0,4200), xlim = c(30,40), col = "black")
abline(lm(PriceEconomy ~ PitchEconomy))
abline(lm(PricePremium ~ PitchPremium), col = "red")
plot(PricePremium ~ PitchPremium, ylim = c(0,4200), xlim = c(30,40), col = "red")
abline(lm(PriceEconomy ~ PitchEconomy))
abline(lm(PricePremium ~ PitchPremium), col = "red")
text =  "An outlier value of around 7500$ is not represented"
mtext(text, side = 1, cex = 0.7, col = grey(0.5), line = 2)

par(mfrow=c(1,1))
plot(PriceRelative ~ PitchDifference, col = "black")
abline(lm(PriceRelative ~ PitchDifference))

Note : An outlier value of around 7500$ is not represented in the above diagram (Price of Premium vs.Pitch of Premium). It has a pitch of 38 inches.

From the above visualizations, we can infer the following :

The price change in economic flights due to increase in pitch is way higher/steeper than the price change in premium flights due to increase in pitch.
The price of the premium ticket is directly proportional to the pitch difference. This means, greater the pitch difference, higher the price difference too.

Relationship of price with respect to Width :

par(mfrow=c(1,2))
clr = c("purple","red","skyblue","green","yellow","pink" )
plot(PriceEconomy ~ WidthEconomy, ylim = c(0,4200), xlim = c(17,21), col = "black")
abline(lm(PriceEconomy ~ WidthEconomy))
abline(lm(PricePremium ~ WidthPremium), col = "red")
plot(PricePremium ~ WidthPremium, ylim = c(0,4200), xlim = c(17,21), col = "red")
abline(lm(PriceEconomy ~ WidthEconomy))
abline(lm(PricePremium ~ WidthPremium), col = "red")
text =  "An outlier value of around 7500$ is not represented"
mtext(text, side = 1, cex = 0.7, col = grey(0.5), line = 2)

par(mfrow=c(1,1))
plot(PriceRelative ~ WidthDifference, col = "darkgreen", pch = 18)
abline(lm(PriceRelative ~ WidthDifference), col = "darkgreen")

Note : An outlier value of around 7500$ is not represented in the above diagram (Price of Premium vs.Width of Premium). It has a width of 19 inches.

From the above visualizations, we can infer the following :

There is a gradual increase in price with respect to increase in width.
Even when the width is similar, the premium flights are more costly than the economy flights.
The increase of width difference is positively related to the increase in price.

Relationship of price with respect to Total Number of Seats

par(mfrow=c(1,2))
clr = c("purple","red","skyblue","green","yellow","pink" )
plot(PriceEconomy ~ SeatsTotal, ylim = c(0,4200), xlim = c(0,390), col = "black")
abline(lm(PriceEconomy ~ SeatsTotal))
abline(lm(PricePremium ~ SeatsTotal), col = "red")
plot(PricePremium ~ SeatsTotal, ylim = c(0,4200), xlim = c(0,390), col = "red")
abline(lm(PriceEconomy ~ SeatsTotal))
abline(lm(PricePremium ~ SeatsTotal), col = "red")
text =  "An outlier value of around 7500$ is not represented"
mtext(text, side = 1, cex = 0.7, col = grey(0.5), line = 2)

par(mfrow=c(1,1))
plot(PriceRelative ~ SeatsTotal, col = "darkgreen", pch = 18)
abline(lm(PriceRelative ~ SeatsTotal), col = "darkgreen")

Note : An outlier value of around 7500$ is not represented in the above diagram (Price of Premium vs.Seat Premium). It has a total of 220 seats.

From the above visualizations, we can infer the following :

In general, with an increase in total number of seats, the price also increases slightly.
The price of premium flights, increases gradually with the total number of seats.
The total number of seats does not have a effect on the relative pricing of premium flights. It looks like there is a very marginal decline in the relative pricing.

Relationship of price with respect to Number of Seats in Economy and Premium :

par(mfrow=c(1,2))
clr = c("purple","red","skyblue","green","yellow","pink" )
plot(PriceEconomy ~ SeatsEconomy, ylim = c(0,4200), xlim = c(0,390), col = "black")
abline(lm(PriceEconomy ~ SeatsEconomy))
abline(lm(PricePremium ~ SeatsPremium), col = "red")
plot(PricePremium ~ SeatsPremium, ylim = c(0,4200), col = "red")
abline(lm(PriceEconomy ~ SeatsEconomy))
abline(lm(PricePremium ~ SeatsPremium), col = "red")
text2 = "The x-axis margins are different from adjacent figure."
text =  "An outlier value of around 7500$ is not represented"
mtext(text, side = 1, cex = 0.7, col = grey(0.5), line = 2)
mtext(text2, side = 1, cex = 0.7, col = grey(0.5), line = 4)

par(mfrow=c(1,1))
plot(PriceRelative ~ PercentPremiumSeats, col = "darkgreen", pch = 18)
abline(lm(PriceRelative ~ PercentPremiumSeats), col = "darkgreen")

From the above visualizations, we can infer the following :

As the number of premium seats increase, the price also increases drastically.
With the increase in percentage of premium seats in a flight, there is a DECREASE in the relative pricing. Thus, flights having lesser percentage of premium seats tend to have a larger price difference between their economy and

Corrgram Visualization :

par(mfrow=c(1,1))
library(corrgram)
library(corrplot)

## corrplot 0.84 loaded

col = c(3,(6:18))
airlines1 = airlines[,col]
corrplot(corr = cor(airlines1), method = "ellipse", type = "upper")

From the above corrplot, we can infer the following :

Price Premium has a very high correlation with Price Economy.
Price Economy in turn has a significant correlation with Flight Duration and Pitch Economy.
Price Premium also has a very high correlation with Flight Duration.
Price Relative has a significant relationship with Pitch Premium, Width Premium, Pitch Difference and Width difference.
Price Relative has a negative impact with Pitch Economy and Price Economy.

Visualization Summary :

After the exhaustive analysis of all the above visualizations, it is advisable to explore the relationship between the following variables :

Price Premium wrt Flight Duration
Price Premium wrt Price Economy
Price Economy wrt Flight Duration
Price Relative wrt Pitch Difference
Pitch Difference wrt Width Premium
Price Relative wrt Width Difference
Width Difference wrt Pitch Premium

Correlations between various variables

Price Premium wrt Flight Duration

The Null Hypothesis for the same will be :

There is no signficant change in Price Premium wrt change in Flight Duration.

cor.test(PricePremium, FlightDuration)

## 
##  Pearson's product-moment correlation
## 
## data:  PricePremium and FlightDuration
## t = 18.204, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5923218 0.6988270
## sample estimates:
##       cor 
## 0.6487398

A low p-value (<0.05) indicates that the null hypothesis can be rejected.

Price Premium wrt Price Economy

The Null Hypothesis for the same will be :

There is no signficant change in Price Premium wrt change in Economy Prices.

cor.test(PricePremium, PriceEconomy)

## 
##  Pearson's product-moment correlation
## 
## data:  PricePremium and PriceEconomy
## t = 44.452, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8826622 0.9172579
## sample estimates:
##       cor 
## 0.9013887

A low p-value (<0.05) indicates that the null hypothesis can be rejected.

Price Economy wrt Flight Duration

The Null Hypothesis for the same will be :

There is no signficant change in Price Economy wrt change in Flight Duration.

cor.test(PriceEconomy, FlightDuration)

## 
##  Pearson's product-moment correlation
## 
## data:  PriceEconomy and FlightDuration
## t = 14.685, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5010266 0.6257772
## sample estimates:
##       cor 
## 0.5666404

A low p-value (<0.05) indicates that the null hypothesis can be rejected.

Price Relative wrt Pitch Difference

The Null Hypothesis for the same will be :

There is no signficant change in Price Relative wrt change in Pitch Difference.

cor.test(PriceRelative, PitchDifference)

## 
##  Pearson's product-moment correlation
## 
## data:  PriceRelative and PitchDifference
## t = 11.331, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3940262 0.5372817
## sample estimates:
##       cor 
## 0.4687302

A low p-value (<0.05) indicates that the null hypothesis can be rejected. But, they have a weak correlation.

Price Relative wrt Width Difference

The Null Hypothesis for the same will be :

There is no signficant change in Price Relative wrt change in Width Difference.

cor.test(PriceRelative, WidthDifference)

## 
##  Pearson's product-moment correlation
## 
## data:  PriceRelative and WidthDifference
## t = 11.869, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4125388 0.5528218
## sample estimates:
##       cor 
## 0.4858024

A low p-value (<0.05) indicates that the null hypothesis can be rejected.But they have a weak correlation.

Pitch Difference wrt Width Difference

The Null Hypothesis for the same will be :

There is no signficant change in Pitch Difference wrt change in Width Difference.

cor.test(PitchDifference, WidthDifference)

## 
##  Pearson's product-moment correlation
## 
## data:  PitchDifference and WidthDifference
## t = 25.04, df = 456, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7194209 0.7969557
## sample estimates:
##       cor 
## 0.7608911

A low p-value (<0.05) indicates that the null hypothesis can be rejected.

Price Relative wrt Airlines

The Null Hypothesis for the same will be :

There is no signficant change in Price Relative wrt Airline.

airline_pr = xtabs(~PriceRelative+Airline)
chisq.test(airline_pr)

## Warning in chisq.test(airline_pr): Chi-squared approximation may be
## incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  airline_pr
## X-squared = 1402.9, df = 485, p-value < 2.2e-16

A low p-value (<0.05) indicates that the null hypothesis can be rejected.

Price Relative wrt Aircraft

The Null Hypothesis for the same will be :

There is no signficant change in Price Relative wrt change in Aircraft.

aircraft_pr = xtabs(~PriceRelative+Aircraft)
chisq.test(aircraft_pr)

## Warning in chisq.test(aircraft_pr): Chi-squared approximation may be
## incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  aircraft_pr
## X-squared = 245.44, df = 97, p-value = 7.647e-15

A low p-value (<0.05) indicates that the null hypothesis can be rejected.

Price Relative wrt International/Domestic

The Null Hypothesis for the same will be :

There is no signficant change in Price Relative wrt International/Domestic.

isint_pr = xtabs(~PriceRelative+IsInternational)
chisq.test(airline_pr)

## Warning in chisq.test(airline_pr): Chi-squared approximation may be
## incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  airline_pr
## X-squared = 1402.9, df = 485, p-value < 2.2e-16

A low p-value (<0.05) indicates that the null hypothesis can be rejected.

Price Relative wrt Month of Travel

The Null Hypothesis for the same will be :

There is no signficant change in Price Relative wrt change in the month of travel.

month_pr = xtabs(~PriceRelative+TravelMonth)
chisq.test(month_pr)

## Warning in chisq.test(month_pr): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  month_pr
## X-squared = 169.32, df = 291, p-value = 1

As the p-value is too high, we fail to reject the null hypothesis.

Correlation Summary :

From the above data, we can conclude the following :

The premium price is heavily dependant on the economy flight price. The economy flight price in turn is dependent on the flight duration

AND

The relative price of a flight (this variable takes into account the change in economy price), is dependent on a lot of factors, which include :

Airline
AirCraft
IsInternational
FlightDuration
PitchDifference
WidthDifference

When fitting a model for the relative price, we need to account for all the above variables.

Regression Models

A linear model for Premium Price :

form_pp = PricePremium ~ PriceEconomy * FlightDuration
reg_pp = lm(formula = form_pp, data = airlines)
summary(reg_pp)

## 
## Call:
## lm(formula = form_pp, data = airlines)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1000.1  -302.6   -66.4   170.9  3169.3 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -15.772180  77.335761  -0.204    0.838    
## PriceEconomy                  0.930762   0.088313  10.539  < 2e-16 ***
## FlightDuration               65.057640  11.382623   5.716 1.98e-08 ***
## PriceEconomy:FlightDuration   0.011037   0.009767   1.130    0.259    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 515.4 on 454 degrees of freedom
## Multiple R-squared:  0.841,  Adjusted R-squared:  0.8399 
## F-statistic: 800.4 on 3 and 454 DF,  p-value: < 2.2e-16

From the above, we can conclude the following :

Both Price Economy and Flight Duration have an impact on the Price Relative (p-value < 0.05). Also, the given model accounts for 83% of the data (adjusted R square value)

A linear model for Relative Price (taking into account all categorical factors) :

form_pr = PriceRelative ~ PitchDifference + WidthDifference + Airline + IsInternational + Aircraft
reg_pr = lm(formula = form_pr, data = airlines)
summary(reg_pr)

## 
## Call:
## lm(formula = form_pr, data = airlines)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.84780 -0.21460 -0.08242  0.11540  1.39717 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -0.268518   0.288101  -0.932 0.351824    
## PitchDifference               0.040276   0.064391   0.625 0.531971    
## WidthDifference               0.001098   0.082734   0.013 0.989415    
## AirlineBritish                0.175432   0.111163   1.578 0.115235    
## AirlineDelta                  0.188034   0.185262   1.015 0.310673    
## AirlineJet                    0.559896   0.143294   3.907 0.000108 ***
## AirlineSingapore              0.318518   0.081863   3.891 0.000115 ***
## AirlineVirgin                 0.517611   0.107450   4.817    2e-06 ***
## IsInternationalInternational  0.188594   0.245605   0.768 0.442966    
## AircraftBoeing                0.080674   0.043792   1.842 0.066105 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3713 on 448 degrees of freedom
## Multiple R-squared:  0.3342, Adjusted R-squared:  0.3208 
## F-statistic: 24.99 on 9 and 448 DF,  p-value: < 2.2e-16

From the above, we can conclude the following :

+When accounted for all other categorical variables, pitch difference and Width difference doesnt have a significant impact on Relative Price +All categorical data’s are based on the first factor in the category. +With this, we conclude that Jet Airlines, Singapore Airlines and Virgin Airlines have a significant effect on relative pricing.

Note : This model only accounts for 32% of the overall values (Adjusted R- square). So, I am not sure how good the model is.

A linear model for Relative Price (with just pitch difference and width difference) :

form_pr = PriceRelative ~ PitchDifference + WidthDifference
reg_pr = lm(formula = form_pr, data = airlines)
summary(reg_pr)

## 
## Call:
## lm(formula = form_pr, data = airlines)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.84163 -0.28484 -0.07241  0.17698  1.18778 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -0.10514    0.08304  -1.266 0.206077    
## PitchDifference  0.06019    0.01590   3.785 0.000174 ***
## WidthDifference  0.11621    0.02356   4.933 1.14e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3886 on 455 degrees of freedom
## Multiple R-squared:  0.2593, Adjusted R-squared:  0.2561 
## F-statistic: 79.65 on 2 and 455 DF,  p-value: < 2.2e-16

With the abovemodel summary, we can conclude the following :

When only pitch difference and width difference are considered, then they have a significant impact on the Relative Price.

Airlines Mini Project

Muyeena Khanzada

February 16, 2018