This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
airline.df<-read.csv(paste("SixAirlinesDataV2.csv", sep=""))
head(airline.df)
## Airline Aircraft FlightDuration TravelMonth IsInternational SeatsEconomy
## 1 British Boeing 12.25 Jul International 122
## 2 British Boeing 12.25 Aug International 122
## 3 British Boeing 12.25 Sep International 122
## 4 British Boeing 12.25 Oct International 122
## 5 British Boeing 8.16 Aug International 122
## 6 British Boeing 8.16 Sep International 122
## SeatsPremium PitchEconomy PitchPremium WidthEconomy WidthPremium
## 1 40 31 38 18 19
## 2 40 31 38 18 19
## 3 40 31 38 18 19
## 4 40 31 38 18 19
## 5 40 31 38 18 19
## 6 40 31 38 18 19
## PriceEconomy PricePremium PriceRelative SeatsTotal PitchDifference
## 1 2707 3725 0.38 162 7
## 2 2707 3725 0.38 162 7
## 3 2707 3725 0.38 162 7
## 4 2707 3725 0.38 162 7
## 5 1793 2999 0.67 162 7
## 6 1793 2999 0.67 162 7
## WidthDifference PercentPremiumSeats
## 1 1 24.69
## 2 1 24.69
## 3 1 24.69
## 4 1 24.69
## 5 1 24.69
## 6 1 24.69
You can also embed plots, for example:
## Warning: package 'caTools' was built under R version 3.4.3
## Airline Aircraft FlightDuration TravelMonth
## AirFrance: 59 AirBus:113 Min. : 1.250 Aug:95
## British :128 Boeing:230 1st Qu.: 4.500 Jul:59
## Delta : 34 Median : 7.830 Oct:92
## Jet : 45 Mean : 7.643 Sep:97
## Singapore: 31 3rd Qu.:10.500
## Virgin : 46 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 28 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:315 1st Qu.:136.0 1st Qu.:22.50 1st Qu.:31.00
## Median :185.0 Median :36.00 Median :31.00
## Mean :202.2 Mean :33.64 Mean :31.23
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 65
## 1st Qu.:38.00 1st Qu.:17.50 1st Qu.:19.00 1st Qu.: 426
## Median :38.00 Median :18.00 Median :19.00 Median :1247
## Mean :37.92 Mean :17.84 Mean :19.48 Mean :1347
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1919
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 86.0 Min. :0.020 Min. : 98.0 Min. : 2.000
## 1st Qu.: 581.5 1st Qu.:0.100 1st Qu.:166.0 1st Qu.: 6.000
## Median :1784.0 Median :0.380 Median :227.0 Median : 7.000
## Mean :1878.5 Mean :0.486 Mean :235.8 Mean : 6.688
## 3rd Qu.:2997.0 3rd Qu.:0.730 3rd Qu.:279.0 3rd Qu.: 7.000
## Max. :7414.0 Max. :1.890 Max. :441.0 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.28
## Median :1.000 Median :13.21
## Mean :1.641 Mean :14.64
## 3rd Qu.:3.000 3rd Qu.:15.36
## Max. :4.000 Max. :24.69
## Airline Aircraft FlightDuration TravelMonth
## AirFrance:15 AirBus:38 Min. : 1.250 Aug:32
## British :47 Boeing:77 1st Qu.: 3.830 Jul:16
## Delta :12 Median : 7.660 Oct:35
## Jet :16 Mean : 7.382 Sep:32
## Singapore: 9 3rd Qu.:10.790
## Virgin :16 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 12 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:103 1st Qu.:126.0 1st Qu.:21.00 1st Qu.:31.00
## Median :198.0 Median :36.00 Median :31.00
## Mean :202.6 Mean :33.66 Mean :31.17
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 77
## 1st Qu.:38.00 1st Qu.:18.00 1st Qu.:19.00 1st Qu.: 339
## Median :38.00 Median :18.00 Median :19.00 Median :1140
## Mean :37.86 Mean :17.83 Mean :19.44 Mean :1267
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1903
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 99 Min. :0.0300 Min. : 98.0 Min. : 2.000
## 1st Qu.: 444 1st Qu.:0.0950 1st Qu.:162.0 1st Qu.: 6.000
## Median :1603 Median :0.3600 Median :227.0 Median : 7.000
## Mean :1746 Mean :0.4907 Mean :236.3 Mean : 6.687
## 3rd Qu.:2948 3rd Qu.:0.7700 3rd Qu.:279.0 3rd Qu.: 7.000
## Max. :3725 Max. :1.8900 Max. :441.0 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.39
## Median :1.000 Median :13.21
## Mean :1.609 Mean :14.67
## 3rd Qu.:3.000 3rd Qu.:15.36
## Max. :4.000 Max. :24.69
library(e1071)
## Warning: package 'e1071' was built under R version 3.4.3
model<-naiveBayes(Aircraft~Airline+IsInternational+TravelMonth, data=train)
pred<-predict(model, newdata=train)
table(pred,train$Aircraft)
##
## pred AirBus Boeing
## AirBus 39 36
## Boeing 74 194
mean(pred==train$Aircraft)
## [1] 0.6793003
pred<-predict(model,newdata=test)
table(pred,test$Aircraft)
##
## pred AirBus Boeing
## AirBus 9 10
## Boeing 29 67
mean(pred==test$Aircraft)
## [1] 0.6608696
The a-priori probabilities are prior probability in Bayes’ theorem. That is, how frequently each level of class occurs in the training dataset. The rationale underlying the prior probability is that if a level is rare, it is unlikely that such level will occur in the test dataset. In other words, the prediction of an outcome is not only influenced by the predictors, but also by the prevalence of the outcome. Conditional probabilities are calculated for each variable.So, the probability that it will be AirFrance provided we know it’s AirBus is 0.2655.
model
##
## Naive Bayes Classifier for Discrete Predictors
##
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
##
## A-priori probabilities:
## Y
## AirBus Boeing
## 0.3294461 0.6705539
##
## Conditional probabilities:
## Airline
## Y AirFrance British Delta Jet Singapore Virgin
## AirBus 0.24778761 0.28318584 0.07964602 0.05309735 0.10619469 0.23008850
## Boeing 0.13478261 0.41739130 0.10869565 0.16956522 0.08260870 0.08695652
##
## IsInternational
## Y Domestic International
## AirBus 0.02654867 0.97345133
## Boeing 0.10869565 0.89130435
##
## TravelMonth
## Y Aug Jul Oct Sep
## AirBus 0.2831858 0.1769912 0.2654867 0.2743363
## Boeing 0.2739130 0.1695652 0.2695652 0.2869565
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.