library(glmnet)
library(varSelRF)
library(FSelector)
library(mlbench)
library(knitr)
library(plot3D)
library(rgl)
library(R.basic)
set.seed(123)
We chose to have 3 categories.
A Gain in rank -> Black – Improves rank
No Change in rank -> Red
A Loss in rank -> Green – Falls behind
Load the data
dat <- read.csv("C:/Users/Prashan/Dropbox (MIT)/MIT/Predictive Analytics/code/data/NASCAR_5f/allf_3c_phoenix2_2014_prototype_sel.csv")
Start with 43 features, create the x-attributes, y-label matrices
features <- 3:45
complete <- which(rowSums(is.na(dat[, features]))==0)
datc <- dat[complete, ]
x <- as.matrix(datc[, features])
y <- datc[,2]
cl <- factor(y)
#column one contains the names of the datapoints (i.e cereal xxx)
nascar_data_with_label=datc[,(-1)]
selected_features<-c()
Variable selection from CFS filter (https://cran.r-project.org/web/packages/FSelector/FSelector.pdf)
Variable selection from Consistency-based filter (https://cran.r-project.org/web/packages/FSelector/FSelector.pdf)
knit_hooks$set(webgl = hook_webgl)
subset <- consistency(rank_change_label~., datc[,(-1)])
plot3d(x[,subset[1]],x[,subset[2]],x[,subset[3]],col=y,xlab=subset[1],ylab=subset[2],zlab=subset[4])
## Warning in persp.default(x = xdummy, y = ydummy, z = zdummy, xlim = xlim, :
## surface extends beyond the box
subset
## [1] "X26..before.pit..incoming.rank.before.pit.bef.leg"
## [2] "X34..before.pit..25th.percentile.rank.upto.bef.pit"
## [3] "X35..before.pit..75th.percentile.rank.upto.bef.pit"
## [4] "X82..before.pit..starting.position.of.the.car"
selected_features<-c(selected_features,subset)