This is an assignment given on Week 3, Day 3 of the Data Analytics Internship under Prof. Sameer Mathur, IIML.
Read the data into R and summarize it.
Airdata <- read.csv(paste("SixAirlinesDataV2.csv"))
View(Airdata)
summary(Airdata)
Draw Box Plots / Bar Plots to visualize the distribution of each variable independently
library(psych)
describe(Airdata)
describe(Airdata$FlightDuration)
boxplot(Airdata$FlightDuration, horizontal = TRUE, xlab="Duration(hours)")
table(Airdata$TravelMonth)
barplot(table(Airdata$TravelMonth), xlab="Month", ylab = "No. of Flights", col="yellow" )
table(Airdata$IsInternational)
barplot(table(Airdata$IsInternational), ylab = "No. of Flights", col="blue" )
describe(Airdata$SeatsEconomy)
boxplot(Airdata$SeatsEconomy, horizontal = TRUE, xlab="No. of Seats")
describe(Airdata$PitchEconomy)
barplot(table(Airdata$PitchEconomy),xlab="Pitch(Inches)", ylab = "No. of Flights", col="grey" )
describe(Airdata$WidthEconomy)
barplot(table(Airdata$WidthEconomy),xlab="Width(Inches)", ylab = "No. of Flights", col="red" )
describe(Airdata$PriceEconomy)
boxplot(Airdata$PriceEconomy, horizontal = TRUE, xlab="Ticket Price(USD)")
describe(Airdata$PriceRelative)
boxplot(Airdata$PriceRelative, horizontal = TRUE, xlab="
(PricePremium - PriceEconomy) / PriceEconomy")
describe(Airdata$SeatsTotal)
boxplot(Airdata$SeatsTotal, horizontal = TRUE, xlab="No. of Seats")
describe(Airdata$PitchDifference)
barplot(table(Airdata$PitchDifference),xlab="PitchDifference(Inches)", ylab = "No. of Flights", col="grey" )
describe(Airdata$WidthDifference)
barplot(table(Airdata$WidthDifference),xlab="WidthDifference(Inches)", ylab = "No. of Flights", col="green" )
library(corrgram)
corrgram(Airdata, order=FALSE,
lower.panel=panel.shade,
upper.panel=panel.pie,
text.panel=panel.txt,
main="Corrgram of analyze relations between variable of dataframe")
A.Test on the correlation between difference of price and PitchDifference .
cor.test((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$PitchDifference)
library(car)
scatterplot((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$PitchDifference)
B.Test on the corelation between difference of price and WidthDifference.
cor.test((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$WidthDifference)
scatterplot((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$WidthDifference)
C.Test on the corelation between difference of price and FlightDuration .
cor.test((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$FlightDuration)
library(car)
scatterplot((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$FlightDuration)
The above correlations tests suggest that the difference in pricing of the 2 class of tickets depends strongly on the flightduration since p vlue is significantly less (i.e.2.2e-16) and also on the pitch and width difference.(p-value<0.05).
Null Hypothesis : there is no difference between the price of an economy class ticket and a premium economy class ticket.
t.test(Airdata$PriceEconomy,Airdata$PricePremium,var.equal = TRUE,paired = FALSE)
The null hypothesis is rejected because the t-Test gives a very low p-value which means that there is a difference between economy class and premium economy class tickets.
Airdata2 <- (Airdata$PricePremium-Airdata$PriceEconomy) ~ Airdata$PitchDifference+Airdata$WidthDifference+Airdata$FlightDuration
Airdata3 <- lm(Airdata2)
summary(Airdata3)
A.Beta coefficients of Model.
Airdata3$coefficients
B . Confidence Intervals on the beta coefficients.
confint(Airdata3)
C.Plot of the model.
library(car)
plot(Airdata2)
abline(Airdata2)
1.The data set is normally distributed therfore we can easily perform the regression analysis . 2.The difference in price between an economy ticket and a premium-economy ticket (PriceRelative) depends significantly on FlightDuration and WidthDifference and less significantly on PitchDifference .