library("randomForest")
## Warning: package 'randomForest' was built under R version 3.5.1
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
library(Metrics)
## Warning: package 'Metrics' was built under R version 3.5.1
mydata <- read.csv(paste("stock_data.csv", sep=""))
View(mydata)
Checking for dimensions of the data
dim(mydata)
## [1] 3000 101
Specifying Outcome variable as factor variable
mydata$Y<-as.factor(mydata$Y)
Dividing the data set into train and test
train<-mydata[1:2000,]
test<-mydata[2001:3000,]
Applying Random Forest
m_rf<-randomForest(Y~.,data=train)
prd<-predict(object=m_rf,test[,-101])
table(prd)
## prd
## -1 1
## 875 125
Checking Accuracy
auc(prd,test$Y)
## [1] 0.4622857
Accuracy we get hers is 47% approximately. We create 5 clusters based on values of independent variables using k-means clustering and reapply random forest.
combing test and train
all<-rbind(train,test)
Creating 5 clusters using k means clustering
Cluster <- kmeans(all[,-101], 5)
Adding clusters as independent variable to the dataset.
all$cluster<-as.factor(Cluster$cluster)
Dividing the dataset into train and test
train<- all[1:2000,]
test<- all[2001:3000,]
Applying Random Forest
m_rf1<-randomForest(Y~.,data=train)
prd2<-predict(object=m_rf1,test[,-101])
table(prd2)
## prd2
## -1 1
## 180 820
Checking Acccuracy
auc(prd2,test$Y)
## [1] 0.4788618