NASCAR Data

library(glmnet)
library(varSelRF)
library(FSelector)
library(mlbench)
library(knitr)
library(plot3D)
library(rgl)
library(R.basic)
set.seed(123)

We chose to have 3 categories.
A Gain in rank -> Black – Improves rank
No Change in rank -> Red
A Loss in rank -> Green – Falls behind
Load the data

dat <- read.csv("C:/Users/Prashan/Dropbox (MIT)/MIT/Predictive Analytics/code/data/NASCAR_5f/allf_3c_phoenix2_2014_prototype_sel.csv")

Start with 43 features, create the x-attributes, y-label matrices

features <- 3:45 
complete <- which(rowSums(is.na(dat[, features]))==0)

datc <- dat[complete, ]
x <- as.matrix(datc[, features])
y <- datc[,2]
cl <- factor(y)

#column one contains the names of the datapoints (i.e cereal xxx)
nascar_data_with_label=datc[,(-1)]
selected_features<-c()

Variable selection from CFS filter (https://cran.r-project.org/web/packages/FSelector/FSelector.pdf)

Variable selection from Consistency-based filter (https://cran.r-project.org/web/packages/FSelector/FSelector.pdf)

knit_hooks$set(webgl = hook_webgl)
subset <- consistency(rank_change_label~., datc[,(-1)])
plot3d(x[,subset[1]],x[,subset[2]],x[,subset[3]],col=y,xlab=subset[1],ylab=subset[2],zlab=subset[4])

## Warning in persp.default(x = xdummy, y = ydummy, z = zdummy, xlim = xlim, :
## surface extends beyond the box

subset

## [1] "X26..before.pit..incoming.rank.before.pit.bef.leg" 
## [2] "X34..before.pit..25th.percentile.rank.upto.bef.pit"
## [3] "X35..before.pit..75th.percentile.rank.upto.bef.pit"
## [4] "X82..before.pit..starting.position.of.the.car"

selected_features<-c(selected_features,subset)