This is an assignment given on Week 3, Day 3 of the Data Analytics Internship under Prof. Sameer Mathur, IIML.

Task 1

Read the data into R and summarize it.

Airdata <- read.csv(paste("SixAirlinesDataV2.csv"))
View(Airdata)
summary(Airdata)

Task 2

Draw Box Plots / Bar Plots to visualize the distribution of each variable independently

library(psych)
describe(Airdata)

FlightDuration description

describe(Airdata$FlightDuration)
boxplot(Airdata$FlightDuration, horizontal = TRUE, xlab="Duration(hours)")

TravelMonth description

table(Airdata$TravelMonth)
barplot(table(Airdata$TravelMonth), xlab="Month", ylab = "No. of Flights", col="yellow" )

IsInternational description

table(Airdata$IsInternational)
barplot(table(Airdata$IsInternational), ylab = "No. of Flights", col="blue" )

SeatsEconomy description

describe(Airdata$SeatsEconomy)
boxplot(Airdata$SeatsEconomy, horizontal = TRUE, xlab="No. of Seats")

SeatsPremium description

describe(Airdata$SeatsPremium)
boxplot(Airdata$SeatsPremium, horizontal = TRUE, xlab="No. of Seats")

PitchEconomy description

describe(Airdata$PitchEconomy)
barplot(table(Airdata$PitchEconomy),xlab="Pitch(Inches)", ylab = "No. of Flights", col="grey" )

PitchPremium description

describe(Airdata$PitchPremium)
barplot(table(Airdata$PitchPremium),xlab="Pitch(Inches)", ylab = "No. of Flights", col="black" )

WidthEconomy description

describe(Airdata$WidthEconomy)
barplot(table(Airdata$WidthEconomy),xlab="Width(Inches)", ylab = "No. of Flights", col="red" )

WidthPremium description

describe(Airdata$WidthPremium)
barplot(table(Airdata$WidthPremium),xlab="Width(Inches)", ylab = "No. of Flights", col="violet" )

PriceEconomy description

describe(Airdata$PriceEconomy)
boxplot(Airdata$PriceEconomy, horizontal = TRUE, xlab="Ticket Price(USD)")

PricePremium description

describe(Airdata$PricePremium)
boxplot(Airdata$PricePremium, horizontal = TRUE, xlab="Ticket Price(USD)")

PriceRelative description

describe(Airdata$PriceRelative)
boxplot(Airdata$PriceRelative, horizontal = TRUE, xlab="
(PricePremium - PriceEconomy) / PriceEconomy")

SeatsTotal description

describe(Airdata$SeatsTotal)
boxplot(Airdata$SeatsTotal, horizontal = TRUE, xlab="No. of Seats")

PercentPremiumSeats description

describe(Airdata$PercentPremiumSeats)
boxplot(Airdata$PercentPremiumSeats, horizontal = TRUE, xlab="Percentage of Premium Seats in Aircraft")

PitchDifference description

describe(Airdata$PitchDifference)
barplot(table(Airdata$PitchDifference),xlab="PitchDifference(Inches)", ylab = "No. of Flights", col="grey" )

WidthDifference description

describe(Airdata$WidthDifference)
barplot(table(Airdata$WidthDifference),xlab="WidthDifference(Inches)", ylab = "No. of Flights", col="green" )

Task 3 - Corrgram

library(corrgram)
corrgram(Airdata, order=FALSE, 
         lower.panel=panel.shade,
         upper.panel=panel.pie, 
         text.panel=panel.txt,
         main="Corrgram of analyze relations between variable of dataframe")

Task 4 -Pearson’s Correlation Test and Scatter Plots

A.Test on the correlation between difference of price and PitchDifference .

cor.test((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$PitchDifference)
library(car)
scatterplot((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$PitchDifference)

B.Test on the corelation between difference of price and WidthDifference.

cor.test((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$WidthDifference)
scatterplot((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$WidthDifference)

C.Test on the corelation between difference of price and FlightDuration .

cor.test((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$FlightDuration)
library(car)
scatterplot((Airdata$PricePremium-Airdata$PriceEconomy),Airdata$FlightDuration)

The above correlations tests suggest that the difference in pricing of the 2 class of tickets depends strongly on the flightduration since p vlue is significantly less (i.e.2.2e-16) and also on the pitch and width difference.(p-value<0.05).

Task 5 T-Test

Null Hypothesis : there is no difference between the price of an economy class ticket and a premium economy class ticket.

t.test(Airdata$PriceEconomy,Airdata$PricePremium,var.equal = TRUE,paired = FALSE)

The null hypothesis is rejected because the t-Test gives a very low p-value which means that there is a difference between economy class and premium economy class tickets.

Task 6- Regression Analysis

Airdata2 <- (Airdata$PricePremium-Airdata$PriceEconomy) ~ Airdata$PitchDifference+Airdata$WidthDifference+Airdata$FlightDuration
Airdata3 <- lm(Airdata2)
summary(Airdata3)

Observation

A.Beta coefficients of Model.

Airdata3$coefficients

B . Confidence Intervals on the beta coefficients.

confint(Airdata3)

C.Plot of the model.

library(car)
plot(Airdata2)
abline(Airdata2)

Summary

1.The data set is normally distributed therfore we can easily perform the regression analysis . 2.The difference in price between an economy ticket and a premium-economy ticket (PriceRelative) depends significantly on FlightDuration and WidthDifference and less significantly on PitchDifference .