This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
airline.df<-read.csv(paste("SixAirlinesDataV2.csv", sep=""))
summary(airline.df)
## Airline Aircraft FlightDuration TravelMonth
## AirFrance: 74 AirBus:151 Min. : 1.250 Aug:127
## British :175 Boeing:307 1st Qu.: 4.260 Jul: 75
## Delta : 46 Median : 7.790 Oct:127
## Jet : 61 Mean : 7.578 Sep:129
## Singapore: 40 3rd Qu.:10.620
## Virgin : 62 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 40 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:418 1st Qu.:133.0 1st Qu.:21.00 1st Qu.:31.00
## Median :185.0 Median :36.00 Median :31.00
## Mean :202.3 Mean :33.65 Mean :31.22
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 65
## 1st Qu.:38.00 1st Qu.:18.00 1st Qu.:19.00 1st Qu.: 413
## Median :38.00 Median :18.00 Median :19.00 Median :1242
## Mean :37.91 Mean :17.84 Mean :19.47 Mean :1327
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1909
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 86.0 Min. :0.0200 Min. : 98 Min. : 2.000
## 1st Qu.: 528.8 1st Qu.:0.1000 1st Qu.:166 1st Qu.: 6.000
## Median :1737.0 Median :0.3650 Median :227 Median : 7.000
## Mean :1845.3 Mean :0.4872 Mean :236 Mean : 6.688
## 3rd Qu.:2989.0 3rd Qu.:0.7400 3rd Qu.:279 3rd Qu.: 7.000
## Max. :7414.0 Max. :1.8900 Max. :441 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.28
## Median :1.000 Median :13.21
## Mean :1.633 Mean :14.65
## 3rd Qu.:3.000 3rd Qu.:15.36
## Max. :4.000 Max. :24.69
You can also embed plots, for example:
## Warning: package 'caTools' was built under R version 3.4.3
## Airline Aircraft FlightDuration TravelMonth IsInternational
## 1 British Boeing 12.25 Jul International
## 5 British Boeing 8.16 Aug International
## 7 British Boeing 8.16 Oct International
## 8 British Boeing 6.50 Aug International
## 9 British Boeing 6.50 Sep International
## 11 British Boeing 11.50 Oct International
## 14 British Boeing 11.58 Aug International
## 17 British Boeing 9.16 Aug International
## 18 British Boeing 9.16 Sep International
## 19 British Boeing 9.16 Oct International
## 20 British Boeing 6.75 Aug International
## 21 British Boeing 6.75 Sep International
## 22 British Boeing 6.75 Oct International
## 23 British Boeing 6.66 Aug International
## 25 British Boeing 6.66 Oct International
## 26 British Boeing 8.75 Aug International
## 28 British Boeing 8.75 Oct International
## 29 British Boeing 4.91 Aug International
## 30 British Boeing 4.91 Sep International
## 31 British Boeing 4.91 Oct International
## 33 British Boeing 3.83 Aug International
## 34 British Boeing 3.83 Sep International
## 36 British Boeing 13.50 Aug International
## 38 British Boeing 13.50 Oct International
## 41 British Boeing 3.83 Sep International
## 43 British Boeing 5.41 Aug International
## 45 British Boeing 5.41 Oct International
## 47 British Boeing 8.25 Sep International
## 48 British Boeing 8.25 Oct International
## 49 British Boeing 12.75 Aug International
## 50 British Boeing 12.75 Sep International
## 52 British Boeing 6.50 Oct International
## 53 British Boeing 11.08 Aug International
## 54 British Boeing 11.08 Sep International
## 55 British Boeing 11.08 Oct International
## 59 British Boeing 12.50 Aug International
## 60 British Boeing 12.05 Sep International
## 61 British Boeing 12.50 Oct International
## 62 Virgin AirBus 8.00 Jul International
## 64 Virgin AirBus 8.00 Sep International
## 65 Virgin AirBus 8.00 Oct International
## 66 Virgin AirBus 8.83 Jul International
## 68 Virgin AirBus 8.83 Sep International
## 70 Virgin AirBus 7.08 Aug International
## 73 Virgin AirBus 7.75 Aug International
## 75 Delta Boeing 2.33 Sep Domestic
## 76 Delta Boeing 2.06 Jul Domestic
## 78 Delta Boeing 2.33 Oct Domestic
## 80 Delta Boeing 2.30 Aug Domestic
## 81 Delta Boeing 2.30 Jul Domestic
## 82 British Boeing 6.83 Jul International
## 84 British Boeing 6.83 Sep International
## 87 British Boeing 7.58 Sep International
## 88 British Boeing 7.58 Oct International
## 93 Jet Boeing 3.08 Oct International
## 94 Jet Boeing 3.08 Aug International
## 95 Jet Boeing 3.08 Sep International
## 96 Jet Boeing 3.08 Oct International
## 97 Jet Boeing 3.08 Jul International
## 100 British AirBus 11.16 Aug International
## 102 British AirBus 11.16 Oct International
## 105 British AirBus 10.50 Sep International
## 108 British AirBus 13.08 Sep International
## 109 British AirBus 13.08 Oct International
## 111 British AirBus 11.16 Oct International
## 112 British AirBus 11.16 Aug International
## 113 British AirBus 4.08 Sep International
## 117 British AirBus 3.25 Aug International
## 120 British AirBus 2.66 Oct International
## 127 British AirBus 4.50 Aug International
## 128 British AirBus 2.41 Oct International
## 130 British AirBus 1.83 Aug International
## 133 British AirBus 4.08 Aug International
## 134 British AirBus 2.83 Sep International
## 135 British AirBus 4.50 Sep International
## 137 British AirBus 4.08 Oct International
## 143 British AirBus 1.25 Sep International
## 144 British Boeing 1.25 Sep International
## 146 British AirBus 2.83 Oct International
## 148 British Boeing 1.33 Sep International
## 152 Delta Boeing 4.33 Aug Domestic
## 156 Virgin Boeing 11.25 Jul International
## 159 Virgin Boeing 11.25 Oct International
## 161 Virgin Boeing 12.08 Sep International
## 163 Virgin Boeing 12.08 Jul International
## 165 Virgin Boeing 9.91 Aug International
## 166 Virgin Boeing 9.91 Sep International
## 168 Virgin Boeing 10.83 Jul International
## 169 Virgin Boeing 10.83 Aug International
## 174 Virgin Boeing 10.75 Jul International
## 175 Virgin Boeing 10.75 Aug International
## 177 Virgin Boeing 10.75 Oct International
## 178 Virgin Boeing 12.58 Oct International
## 182 Virgin Boeing 7.66 Sep International
## 187 Delta AirBus 9.50 Aug International
## 191 Virgin AirBus 6.91 Jul International
## 192 Virgin AirBus 6.91 Aug International
## 193 Virgin AirBus 6.91 Sep International
## 195 Virgin AirBus 6.58 Jul International
## 196 Virgin AirBus 6.58 Aug International
## 200 Virgin AirBus 10.41 Aug International
## 201 Virgin AirBus 10.41 Sep International
## 202 Virgin AirBus 11.33 Aug International
## 203 Virgin AirBus 11.33 Sep International
## 204 Virgin AirBus 7.41 Jul International
## 206 Virgin AirBus 7.41 Sep International
## 212 AirFrance AirBus 9.50 Oct International
## 213 AirFrance AirBus 9.50 Sep International
## 216 AirFrance AirBus 8.33 Aug International
## 217 AirFrance AirBus 8.33 Sep International
## 218 AirFrance AirBus 8.33 Oct International
## 219 AirFrance AirBus 8.33 Jul International
## 222 AirFrance AirBus 8.33 Oct International
## 224 AirFrance AirBus 7.33 Sep International
## 225 AirFrance AirBus 7.33 Oct International
## 226 AirFrance AirBus 6.83 Sep International
## 229 AirFrance AirBus 8.91 Sep International
## 230 AirFrance AirBus 8.91 Oct International
## 233 AirFrance AirBus 9.18 Aug International
## 235 AirFrance AirBus 9.25 Oct International
## 236 AirFrance AirBus 9.16 Jul International
## 237 AirFrance AirBus 9.16 Aug International
## 238 AirFrance AirBus 9.25 Sep International
## 240 British Boeing 10.41 Aug International
## 241 British Boeing 10.41 Oct International
## 243 British Boeing 11.00 Aug International
## 249 British Boeing 9.91 Oct International
## 250 British Boeing 8.58 Aug International
## 251 British Boeing 8.58 Sep International
## 255 British Boeing 11.41 Sep International
## 256 British Boeing 11.41 Oct International
## 258 British Boeing 9.33 Sep International
## 261 British Boeing 8.66 Aug International
## 262 British Boeing 8.66 Sep International
## 263 British Boeing 8.66 Oct International
## 267 British Boeing 8.91 Aug International
## 268 British Boeing 7.08 Aug International
## 270 British Boeing 7.08 Oct International
## 274 British Boeing 11.41 Sep International
## 275 British Boeing 8.91 Oct International
## 276 British Boeing 11.08 Jul International
## 284 Delta Boeing 4.65 Sep Domestic
## 286 Delta Boeing 4.70 Jul Domestic
## 288 Delta Boeing 4.70 Oct Domestic
## 291 Delta Boeing 4.43 Oct Domestic
## 294 Delta Boeing 4.40 Sep Domestic
## 296 Delta AirBus 1.95 Sep Domestic
## 299 Delta Boeing 2.55 Aug Domestic
## 301 Delta Boeing 2.86 Aug Domestic
## 303 Delta AirBus 1.83 Oct Domestic
## 304 Delta Boeing 1.57 Aug Domestic
## 306 Delta AirBus 1.80 Oct Domestic
## 307 Delta Boeing 1.57 Jul Domestic
## 309 Jet AirBus 9.50 Aug International
## 310 Jet AirBus 9.50 Jul International
## 313 Jet AirBus 9.50 Sep International
## 314 Jet AirBus 8.91 Aug International
## 316 Singapore Boeing 13.91 Sep International
## 317 Singapore Boeing 13.91 Oct International
## 319 Singapore Boeing 12.41 Aug International
## 320 Singapore Boeing 12.41 Sep International
## 323 Singapore Boeing 10.83 Aug International
## 324 Singapore Boeing 10.83 Sep International
## 328 Singapore Boeing 9.66 Aug International
## 329 Singapore Boeing 9.66 Sep International
## 336 Singapore Boeing 12.75 Sep International
## 339 AirFrance Boeing 8.33 Aug International
## 340 AirFrance Boeing 8.33 Sep International
## 341 AirFrance Boeing 8.33 Oct International
## 342 AirFrance Boeing 7.50 Oct International
## 343 AirFrance Boeing 6.83 Aug International
## 347 AirFrance Boeing 7.66 Aug International
## 348 AirFrance Boeing 6.83 Sep International
## 349 AirFrance Boeing 6.83 Oct International
## 350 AirFrance Boeing 9.50 Jul International
## 352 AirFrance Boeing 9.50 Sep International
## 353 AirFrance Boeing 9.50 Oct International
## 354 AirFrance Boeing 7.75 Jul International
## 357 AirFrance Boeing 7.83 Oct International
## 358 AirFrance Boeing 9.41 Aug International
## 359 AirFrance Boeing 9.41 Sep International
## 363 AirFrance Boeing 11.91 Jul International
## 364 AirFrance Boeing 11.91 Aug International
## 371 British Boeing 13.33 Sep International
## 372 British Boeing 13.33 Oct International
## 375 British Boeing 8.91 Oct International
## 376 British Boeing 9.58 Aug International
## 378 British Boeing 9.58 Sep International
## 382 Jet Boeing 3.25 Jul International
## 383 Jet Boeing 3.25 Oct International
## 385 Jet Boeing 4.16 Oct International
## 387 Jet Boeing 2.50 Jul International
## 388 Jet Boeing 2.50 Sep International
## 390 Jet Boeing 2.66 Jul International
## 395 Jet Boeing 4.08 Sep International
## 397 Jet Boeing 4.16 Sep International
## 398 Jet Boeing 2.50 Aug International
## 399 Jet Boeing 2.66 Aug International
## 400 Jet Boeing 4.33 Jul International
## 401 Jet Boeing 4.33 Aug International
## 403 Jet Boeing 4.33 Oct International
## 405 Jet Boeing 3.25 Jul International
## 406 AirFrance Boeing 6.91 Sep International
## 407 AirFrance Boeing 6.91 Oct International
## 408 AirFrance Boeing 6.91 Aug International
## 411 Singapore AirBus 13.33 Aug International
## 412 Singapore AirBus 13.33 Sep International
## 415 Singapore AirBus 6.16 Aug International
## 417 Singapore AirBus 6.16 Oct International
## 418 Singapore AirBus 12.66 Jul International
## 419 Singapore AirBus 12.66 Aug International
## 421 Singapore AirBus 12.66 Oct International
## 422 Singapore AirBus 6.50 Jul International
## 423 Singapore AirBus 6.50 Aug International
## 424 Singapore AirBus 6.50 Sep International
## 427 AirFrance AirBus 13.00 Oct International
## 428 AirFrance AirBus 7.50 Aug International
## 430 AirFrance Boeing 10.66 Aug International
## 435 AirFrance AirBus 8.50 Sep International
## 437 AirFrance Boeing 11.50 Sep International
## 446 Jet Boeing 5.66 Oct International
## 447 Jet Boeing 5.66 Jul International
## 450 Jet Boeing 2.58 Sep International
## 452 Jet Boeing 3.16 Sep International
## 453 Jet Boeing 2.58 Jul International
## 456 Jet Boeing 2.58 Sep International
## 457 Jet Boeing 3.25 Jul International
## 458 Jet Boeing 2.58 Jul International
## SeatsEconomy SeatsPremium PitchEconomy PitchPremium WidthEconomy
## 1 122 40 31 38 18
## 5 122 40 31 38 18
## 7 122 40 31 38 18
## 8 122 40 31 38 18
## 9 122 40 31 38 18
## 11 122 40 31 38 18
## 14 122 40 31 38 18
## 17 122 40 31 38 18
## 18 122 40 31 38 18
## 19 122 40 31 38 18
## 20 122 40 31 38 18
## 21 122 40 31 38 18
## 22 122 40 31 38 18
## 23 122 40 31 38 18
## 25 122 40 31 38 18
## 26 122 40 31 38 18
## 28 122 40 31 38 18
## 29 122 40 31 38 18
## 30 122 40 31 38 18
## 31 122 40 31 38 18
## 33 122 40 31 38 18
## 34 122 40 31 38 18
## 36 122 40 31 38 18
## 38 122 40 31 38 18
## 41 122 40 31 38 18
## 43 122 40 31 38 18
## 45 122 40 31 38 18
## 47 122 40 31 38 18
## 48 122 40 31 38 18
## 49 122 40 31 38 18
## 50 122 40 31 38 18
## 52 127 39 31 38 18
## 53 127 39 31 38 18
## 54 127 39 31 38 18
## 55 127 39 31 38 18
## 59 127 39 31 38 18
## 60 127 39 31 38 18
## 61 127 39 31 38 18
## 62 185 48 31 38 18
## 64 185 48 31 38 18
## 65 185 48 31 38 18
## 66 185 48 31 38 18
## 68 185 48 31 38 18
## 70 185 48 31 38 18
## 73 185 48 31 38 18
## 75 78 20 31 34 18
## 76 78 20 31 34 18
## 78 78 20 31 34 18
## 80 78 20 31 34 18
## 81 78 20 31 34 18
## 82 243 56 31 38 18
## 84 243 56 31 38 18
## 87 243 56 31 38 18
## 88 243 56 31 38 18
## 93 138 28 30 40 17
## 94 138 28 30 40 17
## 95 138 28 30 40 17
## 96 138 28 30 40 17
## 97 138 28 30 40 17
## 100 303 55 31 38 18
## 102 303 55 31 38 18
## 105 303 55 31 38 18
## 108 303 55 31 38 18
## 109 303 55 31 38 18
## 111 303 55 31 38 18
## 112 303 55 31 38 18
## 113 303 55 31 38 18
## 117 303 55 31 38 18
## 120 303 55 31 38 18
## 127 303 55 31 38 18
## 128 303 55 31 38 18
## 130 303 55 31 38 18
## 133 303 55 31 38 18
## 134 303 55 31 38 18
## 135 303 55 31 38 18
## 137 303 55 31 38 18
## 143 303 55 31 38 18
## 144 303 55 31 38 18
## 146 303 55 31 38 18
## 148 303 55 31 38 18
## 152 171 29 32 35 18
## 156 198 35 31 38 18
## 159 198 35 31 38 18
## 161 198 35 31 38 18
## 163 198 35 31 38 18
## 165 375 66 31 38 18
## 166 375 66 31 38 18
## 168 198 35 31 38 18
## 169 198 35 31 38 18
## 174 375 66 31 38 18
## 175 375 66 31 38 18
## 177 375 66 31 38 18
## 178 198 35 31 38 18
## 182 198 35 31 38 18
## 187 233 38 31 38 18
## 191 233 38 31 38 18
## 192 233 38 31 38 18
## 193 233 38 31 38 18
## 195 233 38 31 38 18
## 196 233 38 31 38 18
## 200 233 38 31 38 18
## 201 233 38 31 38 18
## 202 233 38 31 38 18
## 203 233 38 31 38 18
## 204 233 38 31 38 18
## 206 233 38 31 38 18
## 212 147 21 32 38 18
## 213 147 21 32 38 18
## 216 147 21 32 38 18
## 217 147 21 32 38 18
## 218 147 21 32 38 18
## 219 147 21 32 38 18
## 222 147 21 32 38 18
## 224 147 21 32 38 18
## 225 147 21 32 38 18
## 226 147 21 32 38 18
## 229 147 21 32 38 18
## 230 147 21 32 38 18
## 233 147 21 32 38 18
## 235 147 21 32 38 18
## 236 147 21 32 38 18
## 237 147 21 32 38 18
## 238 147 21 32 38 18
## 240 243 36 31 38 18
## 241 243 36 31 38 18
## 243 243 36 31 38 18
## 249 243 36 31 38 18
## 250 243 36 31 38 18
## 251 243 36 31 38 18
## 255 243 36 31 38 18
## 256 243 36 31 38 18
## 258 243 36 31 38 18
## 261 243 36 31 38 18
## 262 243 36 31 38 18
## 263 243 36 31 38 18
## 267 243 36 31 38 18
## 268 243 36 31 38 18
## 270 243 36 31 38 18
## 274 243 36 31 38 18
## 275 243 36 31 38 18
## 276 243 36 31 38 18
## 284 126 18 32 34 17
## 286 139 21 31 34 17
## 288 139 21 31 34 17
## 291 126 18 32 34 17
## 294 126 18 32 34 17
## 296 120 18 32 34 17
## 299 136 20 33 35 17
## 301 136 20 33 35 17
## 303 120 18 32 34 17
## 304 126 18 32 34 17
## 306 120 18 32 34 17
## 307 126 18 32 34 17
## 309 147 21 32 38 18
## 310 147 21 32 38 18
## 313 147 21 32 38 18
## 314 147 21 32 38 18
## 316 184 28 32 38 19
## 317 184 28 32 38 19
## 319 184 28 32 38 19
## 320 184 28 32 38 19
## 323 184 28 32 38 19
## 324 184 28 32 38 19
## 328 184 28 32 38 19
## 329 184 28 32 38 19
## 336 184 28 32 38 19
## 339 200 28 32 38 17
## 340 200 28 32 38 17
## 341 200 28 32 38 17
## 342 200 28 32 38 17
## 343 200 28 32 38 17
## 347 200 28 32 38 17
## 348 200 28 32 38 17
## 349 200 28 32 38 17
## 350 200 28 32 38 17
## 352 200 28 32 38 17
## 353 200 28 32 38 17
## 354 174 24 32 38 17
## 357 174 24 32 38 17
## 358 200 28 32 38 17
## 359 200 28 32 38 17
## 363 200 28 32 38 17
## 364 200 28 32 38 17
## 371 203 24 31 38 18
## 372 203 24 31 38 18
## 375 203 24 31 38 18
## 376 203 24 31 38 18
## 378 203 24 31 38 18
## 382 124 16 30 40 17
## 383 124 16 30 40 17
## 385 124 16 30 40 17
## 387 124 16 30 40 17
## 388 124 16 30 40 17
## 390 124 16 30 40 17
## 395 124 16 30 40 17
## 397 124 16 30 40 17
## 398 124 16 30 40 17
## 399 124 16 30 40 17
## 400 124 16 30 40 17
## 401 124 16 30 40 17
## 403 124 16 30 40 17
## 405 124 16 30 40 17
## 406 216 24 32 38 17
## 407 216 24 32 38 17
## 408 216 24 32 38 17
## 411 333 36 32 38 19
## 412 333 36 32 38 19
## 415 333 36 32 38 19
## 417 333 36 32 38 19
## 418 333 36 32 38 19
## 419 333 36 32 38 19
## 421 333 36 32 38 19
## 422 333 36 32 38 19
## 423 333 36 32 38 19
## 424 333 36 32 38 19
## 427 389 38 32 38 18
## 428 389 38 32 38 18
## 430 389 38 32 38 18
## 435 389 38 32 38 18
## 437 389 38 32 38 18
## 446 162 8 30 40 17
## 447 162 8 30 40 17
## 450 162 8 30 40 17
## 452 162 8 30 40 17
## 453 162 8 30 40 17
## 456 162 8 30 40 17
## 457 162 8 30 40 17
## 458 162 8 30 40 17
## WidthPremium PriceEconomy PricePremium PriceRelative SeatsTotal
## 1 19 2707 3725 0.38 162
## 5 19 1793 2999 0.67 162
## 7 19 1793 2999 0.67 162
## 8 19 1476 2997 1.03 162
## 9 19 1476 2997 1.03 162
## 11 19 1705 2989 0.75 162
## 14 19 1750 2656 0.52 162
## 17 19 1813 2504 0.38 162
## 18 19 1813 2504 0.38 162
## 19 19 1813 2504 0.38 162
## 20 19 1634 2195 0.34 162
## 21 19 1634 2195 0.34 162
## 22 19 1634 2195 0.34 162
## 23 19 1651 2191 0.33 162
## 25 19 1651 2191 0.33 162
## 26 19 1542 2084 0.35 162
## 28 19 1566 2084 0.33 162
## 29 19 1356 1820 0.34 162
## 30 19 1356 1820 0.34 162
## 31 19 1356 1820 0.34 162
## 33 19 1242 1764 0.42 162
## 34 19 1242 1764 0.42 162
## 36 19 940 1548 0.65 162
## 38 19 940 1548 0.65 162
## 41 19 1224 1512 0.24 162
## 43 19 1127 1317 0.17 162
## 45 19 1127 1317 0.17 162
## 47 19 1123 1213 0.08 162
## 48 19 1123 1213 0.08 162
## 49 19 509 773 0.52 162
## 50 19 509 773 0.52 162
## 52 19 1476 2997 1.03 166
## 53 19 2156 2933 0.36 166
## 54 19 2156 2933 0.36 166
## 55 19 2156 2933 0.36 166
## 59 19 1038 1259 0.21 166
## 60 19 1038 1259 0.21 166
## 61 19 509 818 0.61 166
## 62 21 1813 3128 0.73 233
## 64 21 1813 3128 0.73 233
## 65 21 1813 3128 0.73 233
## 66 21 2052 2856 0.39 233
## 68 21 2052 2856 0.39 233
## 70 21 1919 2409 0.26 233
## 73 21 540 594 0.10 233
## 75 18 189 204 0.08 98
## 76 18 228 243 0.07 98
## 78 18 216 231 0.07 98
## 80 18 349 364 0.04 98
## 81 18 581 596 0.03 98
## 82 19 1444 2982 1.07 299
## 84 19 1444 2982 1.07 299
## 87 19 1824 2549 0.40 299
## 88 19 1824 2549 0.40 299
## 93 21 354 524 0.48 166
## 94 21 464 616 0.33 166
## 95 21 464 616 0.33 166
## 96 21 464 616 0.33 166
## 97 21 489 616 0.26 166
## 100 19 2384 3563 0.49 358
## 102 19 2384 3563 0.49 358
## 105 19 1848 3536 0.91 358
## 108 19 1758 2592 0.47 358
## 109 19 1758 2592 0.47 358
## 111 19 719 1634 1.27 358
## 112 19 1198 1634 0.36 358
## 113 19 457 486 0.06 358
## 117 19 356 396 0.11 358
## 120 19 297 323 0.09 358
## 127 19 228 263 0.15 358
## 128 19 231 247 0.07 358
## 130 19 201 237 0.18 358
## 133 19 182 211 0.16 358
## 134 19 171 201 0.18 358
## 135 19 168 198 0.18 358
## 137 19 147 175 0.20 358
## 143 19 109 141 0.30 358
## 144 19 109 141 0.30 358
## 146 19 97 125 0.29 358
## 148 19 77 99 0.29 358
## 152 18 298 337 0.13 200
## 156 21 574 1619 1.82 233
## 159 21 574 1619 1.82 233
## 161 21 1086 2964 1.73 233
## 163 21 1247 2964 1.38 233
## 165 21 1781 3509 0.97 441
## 166 21 1781 3509 0.97 441
## 168 21 1580 3019 0.91 233
## 169 21 1580 3019 0.91 233
## 174 21 2445 3694 0.51 441
## 175 21 2445 3694 0.51 441
## 177 21 2445 3694 0.51 441
## 178 21 975 1465 0.50 233
## 182 21 1811 2531 0.40 233
## 187 21 1999 2765 0.38 271
## 191 21 1434 2982 1.08 271
## 192 21 1434 2982 1.08 271
## 193 21 1434 2982 1.08 271
## 195 21 1476 2997 1.03 271
## 196 21 1476 2997 1.03 271
## 200 21 1903 3509 0.84 271
## 201 21 1903 3509 0.84 271
## 202 21 2369 3540 0.49 271
## 203 21 2369 3540 0.49 271
## 204 21 1767 2499 0.41 271
## 206 21 1767 2499 0.41 271
## 212 19 630 1611 1.56 168
## 213 19 743 1611 1.17 168
## 216 19 2659 2859 0.08 168
## 217 19 2659 2859 0.08 168
## 218 19 2659 2859 0.08 168
## 219 19 2659 2859 0.08 168
## 222 19 2659 2859 0.08 168
## 224 19 2607 2807 0.08 168
## 225 19 2607 2807 0.08 168
## 226 19 2860 3063 0.07 168
## 229 19 2609 2787 0.07 168
## 230 19 2609 2787 0.07 168
## 233 19 3165 3275 0.03 168
## 235 19 3165 3275 0.03 168
## 236 19 3165 3275 0.03 168
## 237 19 3165 3275 0.03 168
## 238 19 3165 3275 0.03 168
## 240 19 1651 3509 1.13 279
## 241 19 1651 3509 1.13 279
## 243 19 2230 3227 0.45 279
## 249 19 2356 3200 0.36 279
## 250 19 1562 3099 0.98 279
## 251 19 1562 3099 0.98 279
## 255 19 2281 3025 0.33 279
## 256 19 2281 3025 0.33 279
## 258 19 1813 2472 0.36 279
## 261 19 1609 2292 0.42 279
## 262 19 1609 2292 0.42 279
## 263 19 1609 2292 0.42 279
## 267 19 1140 2049 0.80 279
## 268 19 1736 1866 0.07 279
## 270 19 1736 1866 0.07 279
## 274 19 1485 1784 0.20 279
## 275 19 891 1603 0.80 279
## 276 19 1323 1550 0.17 279
## 284 17 363 407 0.12 144
## 286 17 413 457 0.11 160
## 288 17 413 457 0.11 160
## 291 17 340 379 0.11 144
## 294 17 328 362 0.10 144
## 296 17 166 181 0.09 138
## 299 17 243 262 0.08 156
## 301 17 354 378 0.07 156
## 303 17 293 308 0.05 138
## 304 17 293 308 0.05 144
## 306 17 416 431 0.04 138
## 307 17 349 364 0.04 144
## 309 19 429 841 0.96 168
## 310 19 462 841 0.82 168
## 313 19 661 928 0.40 168
## 314 19 676 931 0.38 168
## 316 20 794 1452 0.83 212
## 317 20 794 1452 0.83 212
## 319 20 1215 1947 0.60 212
## 320 20 1215 1947 0.60 212
## 323 20 609 900 0.48 212
## 324 20 609 900 0.48 212
## 328 20 1247 1407 0.13 212
## 329 20 1247 1407 0.13 212
## 336 20 1431 1564 0.09 212
## 339 19 2918 3972 0.36 228
## 340 19 2918 3972 0.36 228
## 341 19 2918 3972 0.36 228
## 342 19 2581 2781 0.08 228
## 343 19 2860 3063 0.07 228
## 347 19 3057 3167 0.04 228
## 348 19 3057 3167 0.04 228
## 349 19 3057 3167 0.04 228
## 350 19 3414 3524 0.03 228
## 352 19 3414 3524 0.03 228
## 353 19 3414 3524 0.03 228
## 354 19 3215 3325 0.03 198
## 357 19 3215 3325 0.03 198
## 358 19 3480 3589 0.03 228
## 359 19 3480 3589 0.03 228
## 363 19 3159 3243 0.03 228
## 364 19 3159 3243 0.03 228
## 371 19 2166 2470 0.14 227
## 372 19 2166 2470 0.14 227
## 375 19 575 853 0.48 227
## 376 19 797 826 0.04 227
## 378 19 582 797 0.37 227
## 382 21 139 398 1.87 140
## 383 21 149 398 1.67 140
## 385 21 211 534 1.53 140
## 387 21 118 267 1.26 140
## 388 21 118 267 1.26 140
## 390 21 108 228 1.11 140
## 395 21 156 318 1.04 140
## 397 21 324 620 0.91 140
## 398 21 147 267 0.81 140
## 399 21 127 228 0.79 140
## 400 21 154 267 0.74 140
## 401 21 154 267 0.74 140
## 403 21 154 267 0.74 140
## 405 21 594 696 0.17 140
## 406 19 648 1710 1.64 240
## 407 19 648 1710 1.64 240
## 408 19 700 1710 1.44 240
## 411 20 505 1004 0.99 369
## 412 20 505 1004 0.99 369
## 415 20 505 1004 0.99 369
## 417 20 505 1004 0.99 369
## 418 20 690 1110 0.61 369
## 419 20 690 1110 0.61 369
## 421 20 690 1110 0.61 369
## 422 20 690 1110 0.61 369
## 423 20 690 1110 0.61 369
## 424 20 690 1110 0.61 369
## 427 19 1522 3289 1.16 427
## 428 19 2581 2781 0.08 427
## 430 19 2996 3196 0.07 427
## 435 19 2979 3088 0.04 427
## 437 19 3593 3702 0.03 427
## 446 21 187 430 1.30 170
## 447 21 245 545 1.22 170
## 450 21 172 304 0.77 170
## 452 21 293 483 0.65 170
## 453 21 281 451 0.60 170
## 456 21 380 550 0.45 170
## 457 21 505 696 0.38 170
## 458 21 510 569 0.12 170
## PitchDifference WidthDifference PercentPremiumSeats
## 1 7 1 24.69
## 5 7 1 24.69
## 7 7 1 24.69
## 8 7 1 24.69
## 9 7 1 24.69
## 11 7 1 24.69
## 14 7 1 24.69
## 17 7 1 24.69
## 18 7 1 24.69
## 19 7 1 24.69
## 20 7 1 24.69
## 21 7 1 24.69
## 22 7 1 24.69
## 23 7 1 24.69
## 25 7 1 24.69
## 26 7 1 24.69
## 28 7 1 24.69
## 29 7 1 24.69
## 30 7 1 24.69
## 31 7 1 24.69
## 33 7 1 24.69
## 34 7 1 24.69
## 36 7 1 24.69
## 38 7 1 24.69
## 41 7 1 24.69
## 43 7 1 24.69
## 45 7 1 24.69
## 47 7 1 24.69
## 48 7 1 24.69
## 49 7 1 24.69
## 50 7 1 24.69
## 52 7 1 23.49
## 53 7 1 23.49
## 54 7 1 23.49
## 55 7 1 23.49
## 59 7 1 23.49
## 60 7 1 23.49
## 61 7 1 23.49
## 62 7 3 20.60
## 64 7 3 20.60
## 65 7 3 20.60
## 66 7 3 20.60
## 68 7 3 20.60
## 70 7 3 20.60
## 73 7 3 20.60
## 75 3 0 20.41
## 76 3 0 20.41
## 78 3 0 20.41
## 80 3 0 20.41
## 81 3 0 20.41
## 82 7 1 18.73
## 84 7 1 18.73
## 87 7 1 18.73
## 88 7 1 18.73
## 93 10 4 16.87
## 94 10 4 16.87
## 95 10 4 16.87
## 96 10 4 16.87
## 97 10 4 16.87
## 100 7 1 15.36
## 102 7 1 15.36
## 105 7 1 15.36
## 108 7 1 15.36
## 109 7 1 15.36
## 111 7 1 15.36
## 112 7 1 15.36
## 113 7 1 15.36
## 117 7 1 15.36
## 120 7 1 15.36
## 127 7 1 15.36
## 128 7 1 15.36
## 130 7 1 15.36
## 133 7 1 15.36
## 134 7 1 15.36
## 135 7 1 15.36
## 137 7 1 15.36
## 143 7 1 15.36
## 144 7 1 15.36
## 146 7 1 15.36
## 148 7 1 15.36
## 152 3 0 14.50
## 156 7 3 15.02
## 159 7 3 15.02
## 161 7 3 15.02
## 163 7 3 15.02
## 165 7 3 14.97
## 166 7 3 14.97
## 168 7 3 15.02
## 169 7 3 15.02
## 174 7 3 14.97
## 175 7 3 14.97
## 177 7 3 14.97
## 178 7 3 15.02
## 182 7 3 15.02
## 187 7 3 14.02
## 191 7 3 14.02
## 192 7 3 14.02
## 193 7 3 14.02
## 195 7 3 14.02
## 196 7 3 14.02
## 200 7 3 14.02
## 201 7 3 14.02
## 202 7 3 14.02
## 203 7 3 14.02
## 204 7 3 14.02
## 206 7 3 14.02
## 212 6 1 12.50
## 213 6 1 12.50
## 216 6 1 12.50
## 217 6 1 12.50
## 218 6 1 12.50
## 219 6 1 12.50
## 222 6 1 12.50
## 224 6 1 12.50
## 225 6 1 12.50
## 226 6 1 12.50
## 229 6 1 12.50
## 230 6 1 12.50
## 233 6 1 12.50
## 235 6 1 12.50
## 236 6 1 12.50
## 237 6 1 12.50
## 238 6 1 12.50
## 240 7 1 12.90
## 241 7 1 12.90
## 243 7 1 12.90
## 249 7 1 12.90
## 250 7 1 12.90
## 251 7 1 12.90
## 255 7 1 12.90
## 256 7 1 12.90
## 258 7 1 12.90
## 261 7 1 12.90
## 262 7 1 12.90
## 263 7 1 12.90
## 267 7 1 12.90
## 268 7 1 12.90
## 270 7 1 12.90
## 274 7 1 12.90
## 275 7 1 12.90
## 276 7 1 12.90
## 284 2 0 12.50
## 286 3 0 13.13
## 288 3 0 13.13
## 291 2 0 12.50
## 294 2 0 12.50
## 296 2 0 13.04
## 299 2 0 12.82
## 301 2 0 12.82
## 303 2 0 13.04
## 304 2 0 12.50
## 306 2 0 13.04
## 307 2 0 12.50
## 309 6 1 12.50
## 310 6 1 12.50
## 313 6 1 12.50
## 314 6 1 12.50
## 316 6 1 13.21
## 317 6 1 13.21
## 319 6 1 13.21
## 320 6 1 13.21
## 323 6 1 13.21
## 324 6 1 13.21
## 328 6 1 13.21
## 329 6 1 13.21
## 336 6 1 13.21
## 339 6 2 12.28
## 340 6 2 12.28
## 341 6 2 12.28
## 342 6 2 12.28
## 343 6 2 12.28
## 347 6 2 12.28
## 348 6 2 12.28
## 349 6 2 12.28
## 350 6 2 12.28
## 352 6 2 12.28
## 353 6 2 12.28
## 354 6 2 12.12
## 357 6 2 12.12
## 358 6 2 12.28
## 359 6 2 12.28
## 363 6 2 12.28
## 364 6 2 12.28
## 371 7 1 10.57
## 372 7 1 10.57
## 375 7 1 10.57
## 376 7 1 10.57
## 378 7 1 10.57
## 382 10 4 11.43
## 383 10 4 11.43
## 385 10 4 11.43
## 387 10 4 11.43
## 388 10 4 11.43
## 390 10 4 11.43
## 395 10 4 11.43
## 397 10 4 11.43
## 398 10 4 11.43
## 399 10 4 11.43
## 400 10 4 11.43
## 401 10 4 11.43
## 403 10 4 11.43
## 405 10 4 11.43
## 406 6 2 10.00
## 407 6 2 10.00
## 408 6 2 10.00
## 411 6 1 9.76
## 412 6 1 9.76
## 415 6 1 9.76
## 417 6 1 9.76
## 418 6 1 9.76
## 419 6 1 9.76
## 421 6 1 9.76
## 422 6 1 9.76
## 423 6 1 9.76
## 424 6 1 9.76
## 427 6 1 8.90
## 428 6 1 8.90
## 430 6 1 8.90
## 435 6 1 8.90
## 437 6 1 8.90
## 446 10 4 4.71
## 447 10 4 4.71
## 450 10 4 4.71
## 452 10 4 4.71
## 453 10 4 4.71
## 456 10 4 4.71
## 457 10 4 4.71
## 458 10 4 4.71
A bootstrap sample is a smaller sample that is “bootstrapped” from a larger sample. Bootstrapping is a type of resampling where large numbers of smaller samples of the same size are repeatedly drawn, with replacement, from a single original sample.
summary(train)
## Airline Aircraft FlightDuration TravelMonth
## AirFrance:32 AirBus: 76 Min. : 1.250 Aug:61
## British :89 Boeing:154 1st Qu.: 3.830 Jul:41
## Delta :27 Median : 7.705 Oct:67
## Jet :30 Mean : 7.474 Sep:61
## Singapore:21 3rd Qu.:10.727
## Virgin :31 Max. :14.660
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 22 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:208 1st Qu.:141.0 1st Qu.:24.00 1st Qu.:31.00
## Median :198.0 Median :36.00 Median :31.00
## Mean :208.2 Mean :33.84 Mean :31.21
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 65.0
## 1st Qu.:38.00 1st Qu.:18.00 1st Qu.:19.00 1st Qu.: 357.8
## Median :38.00 Median :18.00 Median :19.00 Median :1133.5
## Mean :37.88 Mean :17.85 Mean :19.47 Mean :1292.0
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1909.0
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593.0
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 86 Min. :0.0200 Min. : 98.0 Min. : 2.000
## 1st Qu.: 483 1st Qu.:0.1000 1st Qu.:166.5 1st Qu.: 6.000
## Median :1619 Median :0.3600 Median :228.0 Median : 7.000
## Mean :1800 Mean :0.4850 Mean :242.0 Mean : 6.665
## 3rd Qu.:2987 3rd Qu.:0.7375 3rd Qu.:279.0 3rd Qu.: 7.000
## Max. :7414 Max. :1.8900 Max. :441.0 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.34
## Median :1.000 Median :13.21
## Mean :1.622 Mean :14.23
## 3rd Qu.:3.000 3rd Qu.:15.36
## Max. :4.000 Max. :24.69
summary(test)
## Airline Aircraft FlightDuration TravelMonth
## AirFrance:42 AirBus: 75 Min. : 1.250 Aug:66
## British :86 Boeing:153 1st Qu.: 4.612 Jul:34
## Delta :19 Median : 8.160 Oct:60
## Jet :31 Mean : 7.682 Sep:68
## Singapore:19 3rd Qu.:10.410
## Virgin :31 Max. :13.910
## IsInternational SeatsEconomy SeatsPremium PitchEconomy
## Domestic : 18 Min. : 78.0 Min. : 8.00 Min. :30.00
## International:210 1st Qu.:126.0 1st Qu.:21.00 1st Qu.:31.00
## Median :184.0 Median :36.00 Median :31.00
## Mean :196.4 Mean :33.46 Mean :31.22
## 3rd Qu.:243.0 3rd Qu.:40.00 3rd Qu.:32.00
## Max. :389.0 Max. :66.00 Max. :33.00
## PitchPremium WidthEconomy WidthPremium PriceEconomy
## Min. :34.00 Min. :17.00 Min. :17.00 Min. : 77
## 1st Qu.:38.00 1st Qu.:17.00 1st Qu.:19.00 1st Qu.: 501
## Median :38.00 Median :18.00 Median :19.00 Median :1340
## Mean :37.93 Mean :17.82 Mean :19.47 Mean :1362
## 3rd Qu.:38.00 3rd Qu.:18.00 3rd Qu.:21.00 3rd Qu.:1907
## Max. :40.00 Max. :19.00 Max. :21.00 Max. :3593
## PricePremium PriceRelative SeatsTotal PitchDifference
## Min. : 99 Min. :0.0300 Min. : 98.0 Min. : 2.000
## 1st Qu.: 619 1st Qu.:0.1100 1st Qu.:162.0 1st Qu.: 6.000
## Median :1843 Median :0.3800 Median :212.0 Median : 7.000
## Mean :1891 Mean :0.4894 Mean :229.9 Mean : 6.711
## 3rd Qu.:2997 3rd Qu.:0.7400 3rd Qu.:279.0 3rd Qu.: 7.000
## Max. :3972 Max. :1.8700 Max. :441.0 Max. :10.000
## WidthDifference PercentPremiumSeats
## Min. :0.000 Min. : 4.71
## 1st Qu.:1.000 1st Qu.:12.28
## Median :1.000 Median :13.21
## Mean :1.645 Mean :15.06
## 3rd Qu.:3.000 3rd Qu.:16.87
## Max. :4.000 Max. :24.69
Bootstrapping is loosely based on the law of large numbers, which states that if you sample over and over again, your data should approximate the true population data. This works, perhaps surprisingly, even when you’re using a single sample to generate the data.
Why Resample? Ideally, you would want to draw large, non-repeated, samples from a population in order to create a sampling distribution for a statistic. However, you may be limited to one sample because of finances or time. This single sample method can serve as a mini population, from which repeated small samples are drawn with replacement over and over again. As well as saving time and money, bootstrapped samples can be quite good approximations for population parameters.
In random forests, there is no need for cross-validation or a separate test set to get an unbiased estimate of the test set error. It is estimated internally, during the run, as follows:
Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree.
Put each case left out in the construction of the kth tree down the kth tree to get a classification. In this way, a test set classification is obtained for each case in about one-third of the trees. At the end of the run, take j to be the class that got most of the votes every time case n was oob. The proportion of times that j is not equal to the true class of n averaged over all cases is the oob error estimate. This has proven to be unbiased in many tests.
library(randomForest)
## Warning: package 'randomForest' was built under R version 3.4.3
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
output.forest<-randomForest(Aircraft~.,data=train)
output.forest #default value of mtry of variables tried at each split=square root(no. of predictors)
##
## Call:
## randomForest(formula = Aircraft ~ ., data = train)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 4
##
## OOB estimate of error rate: 5.22%
## Confusion matrix:
## AirBus Boeing class.error
## AirBus 70 6 0.07894737
## Boeing 6 148 0.03896104
output.forest<-randomForest(Aircraft~.,data=train, ntree=300, importance=TRUE)
output.forest
##
## Call:
## randomForest(formula = Aircraft ~ ., data = train, ntree = 300, importance = TRUE)
## Type of random forest: classification
## Number of trees: 300
## No. of variables tried at each split: 4
##
## OOB estimate of error rate: 4.78%
## Confusion matrix:
## AirBus Boeing class.error
## AirBus 70 6 0.07894737
## Boeing 5 149 0.03246753
The mean decrease in Gini coefficient is a measure of how each variable contributes to the homogeneity of the nodes and leaves in the resulting random forest. Each time a particular variable is used to split a node, the Gini coefficient for the child nodes are calculated and compared to that of the original node. The Gini coefficient is a measure of homogeneity from 0 (homogeneous) to 1 (heterogeneous). It does this by searching for the variable to split where the split gives the lowest Gini index (lower is ‘purer’ / ‘better’). Variables that result in nodes with higher purity have a higher decrease in Gini coefficient.
importance(output.forest)
## AirBus Boeing MeanDecreaseAccuracy
## Airline 13.927475 14.3818401 16.112996587
## FlightDuration 6.056726 13.5347813 13.328116791
## TravelMonth -1.146137 -0.4006632 -0.965550484
## IsInternational 1.001671 -1.0016708 -0.008367982
## SeatsEconomy 15.988219 18.9361428 19.680827820
## SeatsPremium 17.701343 15.6183311 19.054805563
## PitchEconomy 7.040930 4.9636700 7.165053548
## PitchPremium 3.001591 4.4874097 4.338978156
## WidthEconomy 8.549525 9.0310832 9.972399224
## WidthPremium 8.703548 8.3058276 10.583667578
## PriceEconomy 2.265824 10.8122733 10.198272470
## PricePremium 4.181034 9.9236232 10.067368268
## PriceRelative 7.267622 7.0528401 9.727151422
## SeatsTotal 16.774129 16.3879194 18.824094905
## PitchDifference 6.471175 5.7863320 7.235889888
## WidthDifference 9.462643 9.1214451 10.831709100
## PercentPremiumSeats 10.667727 11.8837869 13.795645943
## MeanDecreaseGini
## Airline 7.12078102
## FlightDuration 8.35504315
## TravelMonth 1.58515100
## IsInternational 0.02310154
## SeatsEconomy 16.46969848
## SeatsPremium 13.62092490
## PitchEconomy 1.38747979
## PitchPremium 0.87668405
## WidthEconomy 3.30174049
## WidthPremium 2.75286566
## PriceEconomy 5.98636181
## PricePremium 5.99163613
## PriceRelative 4.44565503
## SeatsTotal 15.41158990
## PitchDifference 1.53583382
## WidthDifference 3.24157706
## PercentPremiumSeats 7.30519936
varImp gives mean decrease in acccuracy. The more the accuracy of the random forest decreases due to the exclusion (or permutation) of a single variable, the more important that variable is deemed, and therefore variables with a large mean decrease in accuracy are more important for classification of the data.
library(caret)
## Warning: package 'caret' was built under R version 3.4.3
## Loading required package: lattice
## Warning: package 'lattice' was built under R version 3.4.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.4.3
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
##
## margin
varImp(output.forest)
## AirBus Boeing
## Airline 14.1546575 14.1546575
## FlightDuration 9.7957538 9.7957538
## TravelMonth -0.7733999 -0.7733999
## IsInternational 0.0000000 0.0000000
## SeatsEconomy 17.4621811 17.4621811
## SeatsPremium 16.6598371 16.6598371
## PitchEconomy 6.0023001 6.0023001
## PitchPremium 3.7445003 3.7445003
## WidthEconomy 8.7903043 8.7903043
## WidthPremium 8.5046877 8.5046877
## PriceEconomy 6.5390484 6.5390484
## PricePremium 7.0523287 7.0523287
## PriceRelative 7.1602308 7.1602308
## SeatsTotal 16.5810244 16.5810244
## PitchDifference 6.1287535 6.1287535
## WidthDifference 9.2920443 9.2920443
## PercentPremiumSeats 11.2757571 11.2757571
A type 1 variable importance plot shows the mean decrease in accuracy, while a type 2 plot shows the mean decrease in Gini.
varImpPlot(output.forest)
#varImpPlot(output.forest, type=1) gives only one out of the two
Pred<-predict(output.forest, newdata=test)
table(Pred,test$Aircraft)
##
## Pred AirBus Boeing
## AirBus 74 2
## Boeing 1 151
Cross-validation is a technique to evaluate predictive models by partitioning the original sample into a training set to train the model, and a test set to evaluate it.
In k-fold cross-validation, the original sample is randomly partitioned into k equal size subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k-1 subsamples are used as training data. The cross-validation process is then repeated k times (the folds), with each of the k subsamples used exactly once as the validation data. The k results from the folds can then be averaged (or otherwise combined) to produce a single estimation. The advantage of this method is that all observations are used for both training and validation, and each observation is used for validation exactly once.
For classification problems, one typically uses stratified k-fold cross-validation, in which the folds are selected so that each fold contains roughly the same proportions of class labels.
In repeated cross-validation, the cross-validation procedure is repeated n times, yielding n random partitions of the original sample. The n results are again averaged (or otherwise combined) to produce a single estimation. In 10-fold, 229 divided by 10 to give approx 23 samples in each, 9 used as training making it 206.
Accuracy is a measure of the total instances achieved correctly compared to total cases. Kappa normalizes it according to the expected accuracy even with randomization. https://stats.stackexchange.com/questions/82162/cohens-kappa-in-plain-english
control <- trainControl(method="repeatedcv", number=10, repeats=4, search="grid")
tunegrid <- expand.grid(.mtry=c(1:25))
rf_gridsearch <- train(Aircraft~., data=train, method="rf", trControl=control)
print(rf_gridsearch)
## Random Forest
##
## 230 samples
## 17 predictor
## 2 classes: 'AirBus', 'Boeing'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 4 times)
## Summary of sample sizes: 208, 206, 208, 206, 208, 206, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.9543890 0.8967021
## 12 0.9457757 0.8735236
## 23 0.9392910 0.8582220
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
plot(rf_gridsearch)
Pred<-predict(rf_gridsearch)
table(Pred,train$Aircraft)
##
## Pred AirBus Boeing
## AirBus 76 3
## Boeing 0 151
Different accuracy because earlier average accuracy of all samples at a time, this time on the complete sample as a whole.
Pred<-predict(rf_gridsearch, newdata=test)
confusionMatrix(test$Aircraft, Pred)
## Confusion Matrix and Statistics
##
## Reference
## Prediction AirBus Boeing
## AirBus 72 3
## Boeing 2 151
##
## Accuracy : 0.9781
## 95% CI : (0.9496, 0.9928)
## No Information Rate : 0.6754
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9502
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9730
## Specificity : 0.9805
## Pos Pred Value : 0.9600
## Neg Pred Value : 0.9869
## Prevalence : 0.3246
## Detection Rate : 0.3158
## Detection Prevalence : 0.3289
## Balanced Accuracy : 0.9767
##
## 'Positive' Class : AirBus
##
control <- trainControl(method="repeatedcv", number=10, repeats=4, search="grid")
tunegrid <- expand.grid(.mtry=c(4:25))
rf_gridsearch <- train(Aircraft~., data=train, method="rf", trControl=control)
print(rf_gridsearch)
## Random Forest
##
## 230 samples
## 17 predictor
## 2 classes: 'AirBus', 'Boeing'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 4 times)
## Summary of sample sizes: 207, 207, 208, 207, 208, 207, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.9520751 0.8933648
## 12 0.9454093 0.8734085
## 23 0.9420002 0.8654641
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
plot(rf_gridsearch)
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.