R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

x=c(1,3,5,6,7,9,11,15,17,22,100,50)
y=c(2.5,7,9,14,30,20,34,60,70,75,20,50)
d<-data.frame(x,y)
plot(d,pch=18,col="green") #pch is for symbol

Linear Regression Model

model<-lm(y~x)
pred<-predict(model,d)
plot(d)
abline(model)

SVM Plot

## Warning: package 'e1071' was built under R version 3.4.3

SVM error is more accurate than linear regression, less error

error<-model$residuals #error of linear regression
z<-sqrt(mean((error)^2))
z
## [1] 23.97027
error2<-d$y-pred2 #error of SVM
y<-(mean((error2)^2))^0.5
y
## [1] 10.93888

Best model choose by tuning: We can further improve our SVM model and tune it so that the error is even lower. We will now go deeper into the SVM function and the tune function. We can specify the values for the cost parameter and epsilon which is 0.1 by default. A simple way is to try for each value of epsilon between 0 and 1 (I will take steps of 0.01) and similarly try for cost function from 4 to 2^9 (I will take exponential steps of 2 here). I am taking 101 values of epsilon and 8 values of cost function. I will thus be testing 808 models and see which ones performs best. The code may take a short while to run all the models and find the best version. The corresponding code will be

svm_tune <- tune(svm, y ~ x, data = d,ranges = list(epsilon = seq(0,1,0.01), cost=2^(2:9)))
print(svm_tune)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  epsilon cost
##     0.16   16
## 
## - best performance: 60.12774

This tuning method is known as grid search. R runs all various models with all the possible values of epsilon and cost function in the specified range and gives us the model which has the lowest error. We can also plot our tuning model to see the performance of all the models together. This plot shows the performance of various models using color coding. Darker regions imply better accuracy. The use of this plot is to determine the possible range where we can narrow down our search to and try further tuning if required. For instance, this plot shows that I can run tuning for epsilon in the new ranges and while I’m at it, I can move in even lower steps (say 0.002) but going further may lead to overfitting so I can stop at this point.

plot(svm_tune)

The best model reduces error even further

pred3<-predict(svm_tune$best.model,d)
error3<-pred-d$y
x<-(mean((error3)^2))^0.5
x
## [1] 23.97027

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.