ML_Homework6: Topics on Support Vector

R packages used in this assignment:

1. e1071: library for Support Vector Machine

2. randomForest: Breiman and Cutler’s random forests for classification and regression

3. glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models

4. mlbench: Machine Learning Benchmark Problems

Question 1: In a past homework, you performed ridge regression on the wine quality data set. Now use a support vector machine to classify these data.

1a) First classify the data treating the last column as an ordered factor (the wine tasters score). Next treat the last column as a numeric. Which SVM implementation is better? Why do you think it is better?

1b) Using the best version choose two attributes and a slice through the data to plot. Choose a different set of attributes and another set of slices to plot.

1c) Compare and contrast the best version of the SVM with the ridge regression model.

rm(list=ls())
setwd("c:/Users/Andrew/SkyDrive/AGZ_Home/workspace_R/UCSC/MachinLearning/All_data")
wine<-read.csv("winequality-red.csv",header=TRUE,sep=";")
summary(wine)

##  fixed.acidity   volatile.acidity  citric.acid    residual.sugar  
##  Min.   : 4.60   Min.   :0.1200   Min.   :0.000   Min.   : 0.900  
##  1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090   1st Qu.: 1.900  
##  Median : 7.90   Median :0.5200   Median :0.260   Median : 2.200  
##  Mean   : 8.32   Mean   :0.5278   Mean   :0.271   Mean   : 2.539  
##  3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420   3rd Qu.: 2.600  
##  Max.   :15.90   Max.   :1.5800   Max.   :1.000   Max.   :15.500  
##    chlorides       free.sulfur.dioxide total.sulfur.dioxide
##  Min.   :0.01200   Min.   : 1.00       Min.   :  6.00      
##  1st Qu.:0.07000   1st Qu.: 7.00       1st Qu.: 22.00      
##  Median :0.07900   Median :14.00       Median : 38.00      
##  Mean   :0.08747   Mean   :15.87       Mean   : 46.47      
##  3rd Qu.:0.09000   3rd Qu.:21.00       3rd Qu.: 62.00      
##  Max.   :0.61100   Max.   :72.00       Max.   :289.00      
##     density             pH          sulphates         alcohol     
##  Min.   :0.9901   Min.   :2.740   Min.   :0.3300   Min.   : 8.40  
##  1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50  
##  Median :0.9968   Median :3.310   Median :0.6200   Median :10.20  
##  Mean   :0.9967   Mean   :3.311   Mean   :0.6581   Mean   :10.42  
##  3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10  
##  Max.   :1.0037   Max.   :4.010   Max.   :2.0000   Max.   :14.90  
##     quality     
##  Min.   :3.000  
##  1st Qu.:5.000  
##  Median :6.000  
##  Mean   :5.636  
##  3rd Qu.:6.000  
##  Max.   :8.000

#library for Support Vector Machine
library(e1071)

## Warning: package 'e1071' was built under R version 3.1.3

str(wine$quality)

##  int [1:1599] 5 5 5 6 5 5 5 7 7 5 ...

x <- subset(wine, select = -quality)
y <-as.numeric(wine$quality)

# 1a) First classify the data treating the last column as an ordered factor
# (the wine tasters score). 

wine_factor<-cbind(x, quality=as.factor(y))
str(wine_factor)

## 'data.frame':    1599 obs. of  12 variables:
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : Factor w/ 6 levels "3","4","5","6",..: 3 3 3 4 3 3 3 5 5 3 ...

wineTrain<-wine_factor[1:1400,]
wineTest<-wine_factor[1401:1599,]

x_factor <- subset(wineTest, select = -quality)
y_factor <- wineTest$quality

wine_svm <- svm(quality ~ ., data = wineTrain)
summary(wine_svm)

## 
## Call:
## svm(formula = quality ~ ., data = wineTrain)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.09090909 
## 
## Number of Support Vectors:  1166
## 
##  ( 430 496 172 46 15 7 )
## 
## 
## Number of Classes:  6 
## 
## Levels: 
##  3 4 5 6 7 8

# gamma: 0.0909 cost: 1
wine_factor_predict <- predict(wine_svm, x_factor);
1-sum(wine_factor_predict == y_factor)/length(y_factor)

## [1] 0.4371859

## tune `svm' for classification with RBF-kernel (default in svm),
## using one split for training/validation set
# gamma = 0.06 0.07 0.08 0.09 0.10 0.11; cost = 1.0 1.5 2.0 2.5 3.0 3.5
wine_svm_tuned <- tune(svm, quality~., data = wineTrain,
            ranges = list(gamma = seq(.05,.11,.01), cost = seq(1,4,0.5)),
            tunecontrol = tune.control(sampling = "cross")
)
summary(wine_svm_tuned)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  gamma cost
##    0.1  1.5
## 
## - best performance: 0.3621429 
## 
## - Detailed performance results:
##    gamma cost     error dispersion
## 1   0.05  1.0 0.3764286 0.03595948
## 2   0.06  1.0 0.3742857 0.03353679
## 3   0.07  1.0 0.3728571 0.03363806
## 4   0.08  1.0 0.3735714 0.03384807
## 5   0.09  1.0 0.3735714 0.04041312
## 6   0.10  1.0 0.3678571 0.04127366
## 7   0.11  1.0 0.3678571 0.03842860
## 8   0.05  1.5 0.3771429 0.03746503
## 9   0.06  1.5 0.3757143 0.03502510
## 10  0.07  1.5 0.3714286 0.03779645
## 11  0.08  1.5 0.3642857 0.03897787
## 12  0.09  1.5 0.3628571 0.03981249
## 13  0.10  1.5 0.3621429 0.03810268
## 14  0.11  1.5 0.3628571 0.03850965
## 15  0.05  2.0 0.3750000 0.03661561
## 16  0.06  2.0 0.3707143 0.03649154
## 17  0.07  2.0 0.3692857 0.03719929
## 18  0.08  2.0 0.3678571 0.03722975
## 19  0.09  2.0 0.3635714 0.03904326
## 20  0.10  2.0 0.3628571 0.03512207
## 21  0.11  2.0 0.3650000 0.03554723
## 22  0.05  2.5 0.3757143 0.03598312
## 23  0.06  2.5 0.3721429 0.03756325
## 24  0.07  2.5 0.3700000 0.03654587
## 25  0.08  2.5 0.3642857 0.03955535
## 26  0.09  2.5 0.3671429 0.03486287
## 27  0.10  2.5 0.3664286 0.03532326
## 28  0.11  2.5 0.3657143 0.03607752
## 29  0.05  3.0 0.3714286 0.03688556
## 30  0.06  3.0 0.3728571 0.03761603
## 31  0.07  3.0 0.3692857 0.03970555
## 32  0.08  3.0 0.3664286 0.03467537
## 33  0.09  3.0 0.3671429 0.03598312
## 34  0.10  3.0 0.3685714 0.03566663
## 35  0.11  3.0 0.3657143 0.03821410
## 36  0.05  3.5 0.3742857 0.03782643
## 37  0.06  3.5 0.3728571 0.03670066
## 38  0.07  3.5 0.3700000 0.03607752
## 39  0.08  3.5 0.3671429 0.03453612
## 40  0.09  3.5 0.3700000 0.03654587
## 41  0.10  3.5 0.3685714 0.03767627
## 42  0.11  3.5 0.3721429 0.03875178
## 43  0.05  4.0 0.3728571 0.03731341
## 44  0.06  4.0 0.3692857 0.04027260
## 45  0.07  4.0 0.3707143 0.03602248
## 46  0.08  4.0 0.3700000 0.03463447
## 47  0.09  4.0 0.3692857 0.03689324
## 48  0.10  4.0 0.3707143 0.03933258
## 49  0.11  4.0 0.3714286 0.04123930

plot(wine_svm_tuned)

# took long time ...

wine_svm_tuned$best.parameters

##    gamma cost
## 13   0.1  1.5

# using gamma = 0.07 cost = 1.5
wine_svm <- svm(quality ~ ., data = wineTrain, gamma = 0.07, cost = 1.5)
wine_factor_predict <- predict(wine_svm, x_factor);
1-sum(wine_factor_predict == y_factor)/length(y_factor)

## [1] 0.4371859

# Next treat the last column as a numeric. 
wine_numeric<-cbind(x, quality=y)
str(wine_numeric)

## 'data.frame':    1599 obs. of  12 variables:
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : num  5 5 5 6 5 5 5 7 7 5 ...

wineTrain<-wine_numeric[1:1400,]
wineTest<-wine_numeric[1401:1599,]

x_factor <- subset(wineTest, select = -quality)
y_factor <- wineTest$quality

wine_svm <- svm(quality ~ ., data = wineTrain)
summary(wine_svm)

## 
## Call:
## svm(formula = quality ~ ., data = wineTrain)
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.09090909 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  1162

# gamma: 0.0909 cost: 1

wine_factor_predict <- predict(wine_svm, x_factor);
sqrt( sum((wineTest$quality-wine_factor_predict)^2))/length(wine_factor_predict)

## [1] 0.04847373

#[1] 0.04847373

## tune `svm' for classification with RBF-kernel (default in svm),
## using one split for training/validation set
# gamma = 0.06 0.07 0.08 0.09 0.10 0.11; cost = 1.0 1.5 2.0 2.5 3.0 3.5
wine_svm_tuned <- tune(svm, quality~., data = wineTrain,
                       ranges = list(gamma = seq(.05,.11,.01), cost = seq(1,4,0.5)),
                       tunecontrol = tune.control(sampling = "cross")
) # This took a long time also.

summary(wine_svm_tuned)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  gamma cost
##   0.11  1.5
## 
## - best performance: 0.3779477 
## 
## - Detailed performance results:
##    gamma cost     error dispersion
## 1   0.05  1.0 0.3933767 0.03111975
## 2   0.06  1.0 0.3894143 0.03040395
## 3   0.07  1.0 0.3859173 0.02794516
## 4   0.08  1.0 0.3836139 0.02557613
## 5   0.09  1.0 0.3820125 0.02484204
## 6   0.10  1.0 0.3806449 0.02437718
## 7   0.11  1.0 0.3798551 0.02417274
## 8   0.05  1.5 0.3887931 0.02951885
## 9   0.06  1.5 0.3860247 0.02670351
## 10  0.07  1.5 0.3827143 0.02435098
## 11  0.08  1.5 0.3805294 0.02319417
## 12  0.09  1.5 0.3789467 0.02276341
## 13  0.10  1.5 0.3781318 0.02331553
## 14  0.11  1.5 0.3779477 0.02422690
## 15  0.05  2.0 0.3876178 0.02826725
## 16  0.06  2.0 0.3834809 0.02450302
## 17  0.07  2.0 0.3809698 0.02302295
## 18  0.08  2.0 0.3788462 0.02231663
## 19  0.09  2.0 0.3783697 0.02356869
## 20  0.10  2.0 0.3786191 0.02504706
## 21  0.11  2.0 0.3797492 0.02677891
## 22  0.05  2.5 0.3860613 0.02628725
## 23  0.06  2.5 0.3824666 0.02356673
## 24  0.07  2.5 0.3795438 0.02291436
## 25  0.08  2.5 0.3787625 0.02406585
## 26  0.09  2.5 0.3791854 0.02545808
## 27  0.10  2.5 0.3804509 0.02705919
## 28  0.11  2.5 0.3820752 0.02840122
## 29  0.05  3.0 0.3845686 0.02497025
## 30  0.06  3.0 0.3809081 0.02367164
## 31  0.07  3.0 0.3790676 0.02372830
## 32  0.08  3.0 0.3790282 0.02550304
## 33  0.09  3.0 0.3804384 0.02712539
## 34  0.10  3.0 0.3824476 0.02829878
## 35  0.11  3.0 0.3835673 0.02974205
## 36  0.05  3.5 0.3835795 0.02423519
## 37  0.06  3.5 0.3802913 0.02390804
## 38  0.07  3.5 0.3786601 0.02483396
## 39  0.08  3.5 0.3800767 0.02652691
## 40  0.09  3.5 0.3820321 0.02776698
## 41  0.10  3.5 0.3831677 0.02949035
## 42  0.11  3.5 0.3854873 0.03128188
## 43  0.05  4.0 0.3822812 0.02415505
## 44  0.06  4.0 0.3797801 0.02470262
## 45  0.07  4.0 0.3792003 0.02592417
## 46  0.08  4.0 0.3811046 0.02724750
## 47  0.09  4.0 0.3829279 0.02873294
## 48  0.10  4.0 0.3847974 0.03081793
## 49  0.11  4.0 0.3886604 0.03264213

plot(wine_svm_tuned)

wine_svm_tuned$best.parameters

##    gamma cost
## 14  0.11  1.5

# using gamma = 0.1 cost = 2
wine_svm <- svm(quality ~ ., data = wineTrain, gamma = 0.1, cost = 2)
wine_factor_predict <- predict(wine_svm, x_factor);
sqrt(sum((wineTest$quality-wine_factor_predict)^2))/length(wine_factor_predict)

## [1] 0.04952251

#[1] 0.04952251

# Which SVM implementation is better? Why do you think it is better?
print("quality as factor had error = 0.437 but numeric quality had small error = 0.0495")

## [1] "quality as factor had error = 0.437 but numeric quality had small error = 0.0495"

print("Regression is better than classification")

## [1] "Regression is better than classification"

Quesiton 2: Classify the sonar data set.

2a) Use a support vector machine to classify the sonar data set. First tune an SVM employing radial basis function (default). Next tune an SVM employing a linear kernel. Compare the results.

setwd("c:/Users/Andrew/SkyDrive/AGZ_Home/workspace_R/UCSC/MachinLearning/All_data")
sonarTest<-read.csv("sonar_test.csv",header=FALSE)
sonarTest$V61[sonarTest$V61 == -1]<-0

sonarTrain<-read.csv("sonar_train.csv",header=FALSE)
sonarTrain$V61[sonarTrain$V61 == -1]<-0


x <- subset(sonarTest, select = -V61)
y <- sonarTest$V61

sonar_svm <- svm(V61 ~ ., data = sonarTrain)
summary(sonar_svm)

## 
## Call:
## svm(formula = V61 ~ ., data = sonarTrain)
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.01666667 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  111

# cost: 1 gamma: 0.01666667

sonar_predict <- predict(sonar_svm, x);
sqrt(sum((y-sonar_predict)^2))/length(sonarTest)

## [1] 0.05062342

#[1] 0.05062342

## tune `svm' for classification with RBF-kernel (default in svm),
## using one split for training/validation set
# gamma = 0.0 0.01 0.02 0.03 0.04 0.05; cost = 1.0 1.5 2.0 2.5 3.0 3.5
sonar_svm_tuned <- tune(svm, V61~., data = sonarTrain,
                       ranges = list(gamma = seq(0,.05,.01), cost = seq(1,4,0.5)),
                       tunecontrol = tune.control(sampling = "cross")
)

summary(sonar_svm_tuned)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  gamma cost
##   0.02    4
## 
## - best performance: 0.1170109 
## 
## - Detailed performance results:
##    gamma cost     error dispersion
## 1   0.00  1.0 0.5007608 0.06359573
## 2   0.01  1.0 0.1544112 0.05267355
## 3   0.02  1.0 0.1407517 0.05224512
## 4   0.03  1.0 0.1395238 0.04443672
## 5   0.04  1.0 0.1448254 0.03794580
## 6   0.05  1.0 0.1523087 0.03336457
## 7   0.00  1.5 0.5007608 0.06359573
## 8   0.01  1.5 0.1484291 0.05476681
## 9   0.02  1.5 0.1327762 0.04671820
## 10  0.03  1.5 0.1307349 0.03825207
## 11  0.04  1.5 0.1382642 0.03405481
## 12  0.05  1.5 0.1484242 0.03128048
## 13  0.00  2.0 0.5007608 0.06359573
## 14  0.01  2.0 0.1445690 0.05426878
## 15  0.02  2.0 0.1256375 0.04116307
## 16  0.03  2.0 0.1267979 0.03552814
## 17  0.04  2.0 0.1372200 0.03343265
## 18  0.05  2.0 0.1484109 0.03130288
## 19  0.00  2.5 0.5007608 0.06359573
## 20  0.01  2.5 0.1420114 0.05169829
## 21  0.02  2.5 0.1216924 0.03737395
## 22  0.03  2.5 0.1256515 0.03480375
## 23  0.04  2.5 0.1372200 0.03343265
## 24  0.05  2.5 0.1484109 0.03130288
## 25  0.00  3.0 0.5007608 0.06359573
## 26  0.01  3.0 0.1399669 0.04958882
## 27  0.02  3.0 0.1191985 0.03629600
## 28  0.03  3.0 0.1256522 0.03480335
## 29  0.04  3.0 0.1372200 0.03343265
## 30  0.05  3.0 0.1484109 0.03130288
## 31  0.00  3.5 0.5007608 0.06359573
## 32  0.01  3.5 0.1371179 0.04736768
## 33  0.02  3.5 0.1175965 0.03528741
## 34  0.03  3.5 0.1256522 0.03480335
## 35  0.04  3.5 0.1372200 0.03343265
## 36  0.05  3.5 0.1484109 0.03130288
## 37  0.00  4.0 0.5007608 0.06359573
## 38  0.01  4.0 0.1340051 0.04544560
## 39  0.02  4.0 0.1170109 0.03499190
## 40  0.03  4.0 0.1256522 0.03480335
## 41  0.04  4.0 0.1372200 0.03343265
## 42  0.05  4.0 0.1484109 0.03130288

plot(sonar_svm_tuned)

sonar_svm_tuned$best.parameters

##    gamma cost
## 39  0.02    4

# gamma = 0.02, cost = 4
sonar_svm <- svm(V61 ~ ., data = sonarTrain, gamma = 0.02, cost = 4)
sonar_predict <- predict(sonar_svm, x);
sqrt(sum((y-sonar_predict)^2))/length(y)

## [1] 0.03728963

#[1] 0.03728963

#2a) Next tune an SVM employing a linear kernel. Compare the results.
sonar_svm <- svm(V61 ~ ., data = sonarTrain, kernel="linear")
summary(sonar_svm)

## 
## Call:
## svm(formula = V61 ~ ., data = sonarTrain, kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  linear 
##        cost:  1 
##       gamma:  0.01666667 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  120

#gamma = 0.01666667, cost = 1
## tune `svm' for classification with kernel = "linear",
## using one split for training/validation set
# gamma = 0.0 0.01 0.02 0.03 0.04 0.05; cost = 1.0 1.5 2.0 2.5 3.0 3.5
sonar_svm_tuned <- tune(svm, V61~., data = sonarTrain, kernel="linear", 
                        ranges = list(gamma = seq(0,.05,.01), cost = seq(1,4,0.5)),
                        tunecontrol = tune.control(sampling = "cross")
)
summary(sonar_svm_tuned)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  gamma cost
##      0    1
## 
## - best performance: 0.3313263 
## 
## - Detailed performance results:
##    gamma cost     error dispersion
## 1   0.00  1.0 0.3313263  0.1460075
## 2   0.01  1.0 0.3313263  0.1460075
## 3   0.02  1.0 0.3313263  0.1460075
## 4   0.03  1.0 0.3313263  0.1460075
## 5   0.04  1.0 0.3313263  0.1460075
## 6   0.05  1.0 0.3313263  0.1460075
## 7   0.00  1.5 0.3454587  0.1468720
## 8   0.01  1.5 0.3454587  0.1468720
## 9   0.02  1.5 0.3454587  0.1468720
## 10  0.03  1.5 0.3454587  0.1468720
## 11  0.04  1.5 0.3454587  0.1468720
## 12  0.05  1.5 0.3454587  0.1468720
## 13  0.00  2.0 0.3516189  0.1494875
## 14  0.01  2.0 0.3516189  0.1494875
## 15  0.02  2.0 0.3516189  0.1494875
## 16  0.03  2.0 0.3516189  0.1494875
## 17  0.04  2.0 0.3516189  0.1494875
## 18  0.05  2.0 0.3516189  0.1494875
## 19  0.00  2.5 0.3605445  0.1509295
## 20  0.01  2.5 0.3605445  0.1509295
## 21  0.02  2.5 0.3605445  0.1509295
## 22  0.03  2.5 0.3605445  0.1509295
## 23  0.04  2.5 0.3605445  0.1509295
## 24  0.05  2.5 0.3605445  0.1509295
## 25  0.00  3.0 0.3644351  0.1505470
## 26  0.01  3.0 0.3644351  0.1505470
## 27  0.02  3.0 0.3644351  0.1505470
## 28  0.03  3.0 0.3644351  0.1505470
## 29  0.04  3.0 0.3644351  0.1505470
## 30  0.05  3.0 0.3644351  0.1505470
## 31  0.00  3.5 0.3680924  0.1524489
## 32  0.01  3.5 0.3680924  0.1524489
## 33  0.02  3.5 0.3680924  0.1524489
## 34  0.03  3.5 0.3680924  0.1524489
## 35  0.04  3.5 0.3680924  0.1524489
## 36  0.05  3.5 0.3680924  0.1524489
## 37  0.00  4.0 0.3726178  0.1532045
## 38  0.01  4.0 0.3726178  0.1532045
## 39  0.02  4.0 0.3726178  0.1532045
## 40  0.03  4.0 0.3726178  0.1532045
## 41  0.04  4.0 0.3726178  0.1532045
## 42  0.05  4.0 0.3726178  0.1532045

plot(sonar_svm_tuned)

sonar_svm_tuned$best.parameters

##   gamma cost
## 1     0    1

# gamma = 0, cost = 1
sonar_svm <- svm(V61 ~ ., data = sonarTrain, gamma = 0, cost = 1, kernel="linear")
sonar_predict <- predict(sonar_svm, x);
error <- sqrt(sum((y-sonar_predict)^2))/length(y)
#[1] 0.05814681

2b) In past homework, trees were used to classify the sonar data. Compare the best result using trees with the best result using SVM.

print("In Homework 2 Problem 4 Sonar Test Error using trees was 0.2564103")

## [1] "In Homework 2 Problem 4 Sonar Test Error using trees was 0.2564103"

paste("Smaller Sonar Test Error using SVM was",error)

## [1] "Smaller Sonar Test Error using SVM was 0.0581468133184799"

Question 3. The in class example (svm1.r) used the glass data set. Use the Random Forest technique on the glass data. Compare the Random Forest results with the results obtained in class with SVM.

library(randomForest)

## Warning: package 'randomForest' was built under R version 3.1.3

## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.

library(mlbench)

## Warning: package 'mlbench' was built under R version 3.1.3

#install.packages("mlbench")
data(Glass, package = "mlbench")
str(Glass)

## 'data.frame':    214 obs. of  10 variables:
##  $ RI  : num  1.52 1.52 1.52 1.52 1.52 ...
##  $ Na  : num  13.6 13.9 13.5 13.2 13.3 ...
##  $ Mg  : num  4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
##  $ Al  : num  1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
##  $ Si  : num  71.8 72.7 73 72.6 73.1 ...
##  $ K   : num  0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
##  $ Ca  : num  8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
##  $ Ba  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Fe  : num  0 0 0 0 0 0.26 0 0 0 0.11 ...
##  $ Type: Factor w/ 6 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...

Glass$Type

##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [71] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [106] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [141] 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5
## [176] 5 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7
## [211] 7 7 7 7
## Levels: 1 2 3 5 6 7

index <- 1:nrow(Glass)
set.seed(pi)
testindex <- sample(index, trunc(length(index)/3))
testset <- Glass[testindex, ]
trainset <- Glass[-testindex, ]

x <- subset(trainset, select=-Type)
y <- trainset$Type
rf_Glass_Model <- randomForest(x,y)

xTest <- subset(testset, select=-Type)
yTest <- testset$Type
predictGlass <- predict(rf_Glass_Model, xTest)
error <- 1-sum(predictGlass == yTest)/length(yTest)
#[1] 0
paste("Random Forest, with seed = pi, error =",error)

## [1] "Random Forest, with seed = pi, error = 0.183098591549296"

print("From class example, with seed = pi, error = 0.3239437")

## [1] "From class example, with seed = pi, error = 0.3239437"

Question 4. Choose a new data set which we haven’t used in class yet (suggestion: choose one from http://archive.ics.uci.edu/ml/.) Use SVM to classify the data set. Try different kernels. Does changing the kernel make a difference? Which kernel resulted in the smallest error? Use another technique to classify the data set. Which resulted in the better model? (Make sure you describe the data set)

setwd("c:/Users/Andrew/SkyDrive/AGZ_Home/workspace_R/UCSC/MachinLearning/All_data")
abalone <- read.csv(file="abalone.data",header=FALSE)
head(abalone)

##   V1    V2    V3    V4     V5     V6     V7    V8 V9
## 1  M 0.455 0.365 0.095 0.5140 0.2245 0.1010 0.150 15
## 2  M 0.350 0.265 0.090 0.2255 0.0995 0.0485 0.070  7
## 3  F 0.530 0.420 0.135 0.6770 0.2565 0.1415 0.210  9
## 4  M 0.440 0.365 0.125 0.5160 0.2155 0.1140 0.155 10
## 5  I 0.330 0.255 0.080 0.2050 0.0895 0.0395 0.055  7
## 6  I 0.425 0.300 0.095 0.3515 0.1410 0.0775 0.120  8

colnames(abalone)<- c("Sex","Length","Diameter","Height","Whole weight", "Shucked weight", "Viscera weight", "Shell weight", "Rings")
str(abalone)

## 'data.frame':    4177 obs. of  9 variables:
##  $ Sex           : Factor w/ 3 levels "F","I","M": 3 3 1 3 2 2 1 1 3 1 ...
##  $ Length        : num  0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
##  $ Diameter      : num  0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
##  $ Height        : num  0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
##  $ Whole weight  : num  0.514 0.226 0.677 0.516 0.205 ...
##  $ Shucked weight: num  0.2245 0.0995 0.2565 0.2155 0.0895 ...
##  $ Viscera weight: num  0.101 0.0485 0.1415 0.114 0.0395 ...
##  $ Shell weight  : num  0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
##  $ Rings         : int  15 7 9 10 7 8 20 16 9 19 ...

# Use SVM to classify the data set. Try different kernels. 
abaloneTrain<-abalone[1:2500,]
abaloneTest<-abalone[2501:4177,]

x <- subset(abaloneTest, select = -Rings)
y <- abaloneTest$Rings

abalone_svm <- svm(Rings ~ ., data = abaloneTrain)
summary(abalone_svm)

## 
## Call:
## svm(formula = Rings ~ ., data = abaloneTrain)
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.1 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  2045

# cost: 1 gamma: 0.1
abalone_predict <- predict(abalone_svm, x);
sqrt(sum((y-abalone_predict)^2))/length(y)

## [1] 0.04347119

#[1] 0.04347144
## tune `svm' for classification with kernel = "radial basis" (default),
## using one split for training/validation set
# gamma = 0.0 0.1 0.2 0.3 0.4; cost = 1.0 1.5 2.0 2.5 3.0 3.5
abalone_svm_tuned <- tune(svm, Rings ~., data = abaloneTrain,  
                        ranges = list(gamma = seq(0,.4,.1), cost = seq(1,3,0.5)),
                        tunecontrol = tune.control(sampling = "cross")
)
summary(abalone_svm_tuned)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  gamma cost
##    0.1    3
## 
## - best performance: 5.281191 
## 
## - Detailed performance results:
##    gamma cost     error dispersion
## 1    0.0  1.0 12.414036  1.3636946
## 2    0.1  1.0  5.329160  0.5056530
## 3    0.2  1.0  5.323441  0.5037028
## 4    0.3  1.0  5.347942  0.5134076
## 5    0.4  1.0  5.408453  0.5359393
## 6    0.0  1.5 12.414036  1.3636946
## 7    0.1  1.5  5.308079  0.4842634
## 8    0.2  1.5  5.305380  0.4857427
## 9    0.3  1.5  5.345411  0.5042479
## 10   0.4  1.5  5.400286  0.5293523
## 11   0.0  2.0 12.414036  1.3636946
## 12   0.1  2.0  5.293217  0.4800094
## 13   0.2  2.0  5.308880  0.4833429
## 14   0.3  2.0  5.361479  0.5074635
## 15   0.4  2.0  5.403508  0.5277766
## 16   0.0  2.5 12.414036  1.3636946
## 17   0.1  2.5  5.284263  0.4767232
## 18   0.2  2.5  5.312076  0.4807368
## 19   0.3  2.5  5.368984  0.5045782
## 20   0.4  2.5  5.420778  0.5335321
## 21   0.0  3.0 12.414036  1.3636946
## 22   0.1  3.0  5.281191  0.4707592
## 23   0.2  3.0  5.319004  0.4801740
## 24   0.3  3.0  5.374947  0.5039710
## 25   0.4  3.0  5.440579  0.5386105

abalone_svm_tuned$best.parameters

##    gamma cost
## 22   0.1    3

# gamma = 0.1, cost = 2
abalone_svm <- svm(Rings ~ ., data = abaloneTrain, gamma = 0.1, cost = 2)
abalone_predict <- predict(abalone_svm, x);
rb_error <- sqrt(sum((y-abalone_predict)^2))/length(y)
rb_error #[1] 0.04342107

## [1] 0.04341996

# Change kernel to "linear"
abalone_svm <- svm(Rings ~ ., data = abaloneTrain, kernel="linear")
summary(abalone_svm)

## 
## Call:
## svm(formula = Rings ~ ., data = abaloneTrain, kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  linear 
##        cost:  1 
##       gamma:  0.1 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  2082

# cost: 1 gamma: 0.1
abalone_predict <- predict(abalone_svm, x);
sqrt(sum((y-abalone_predict)^2))/length(y)

## [1] 0.0471688

#[1]0.0471688

# Change kernel to "polynomial"
abalone_svm <- svm(Rings ~ ., data = abaloneTrain, kernel="polynomial")
summary(abalone_svm)

## 
## Call:
## svm(formula = Rings ~ ., data = abaloneTrain, kernel = "polynomial")
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  polynomial 
##        cost:  1 
##      degree:  3 
##       gamma:  0.1 
##      coef.0:  0 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  2083

# cost: 1 gamma: 0.1
abalone_predict <- predict(abalone_svm, x);
sqrt(sum((y-abalone_predict)^2))/length(y)

## [1] 0.04901388

#[1] 0.04901388

# Does changing the kernel make a difference? 
print("Changing kernel has not made much difference")

## [1] "Changing kernel has not made much difference"

# Which kernel resulted in the smallest error? 
print("Radial basis kernel has the smallest error")

## [1] "Radial basis kernel has the smallest error"

# Use another technique to classify the data set. 
library(randomForest)
xTrain <- subset(abaloneTrain, select = -Rings)
yTrain <- abaloneTrain$Rings

rf_Abalone_Model <- randomForest(xTrain,yTrain)

predictAbalone <- predict(rf_Abalone_Model, x)
sqrt(sum((y-predictAbalone)^2))/length(y)

## [1] 0.04631641

#[1] 0.04631641

# Which resulted in the better model? 
print("SVM with Radial basis kernel is better model")

## [1] "SVM with Radial basis kernel is better model"

Quesiton 5. Use SVM with kernel = “linear” to create regression predictions on the data set created using these lines of code:

x <- seq(0.1, 5, by = 0.05) # the observed feature

y <- log(x) + rnorm(x, sd = 0.2) # the target for the observed feature

Next try various kernels and added features with SVM. Can you improve the model by adding an extra feature which might be a function of the first feature? Compare both lm.ridge and svm. Which method produced a better model? (don’t forget to tune your models)

x <- seq(0.1, 5, by = 0.05) # the observed feature
y <- log(x) + rnorm(x, sd = 0.2) # the target for the observed feature
dataset <- as.data.frame(cbind(x,y))
str(dataset)

## 'data.frame':    99 obs. of  2 variables:
##  $ x: num  0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 ...
##  $ y: num  -2.3 -2.29 -1.81 -1.44 -1.11 ...

dataset_svm <- svm(y ~ ., data = dataset, kernel = "linear")
summary(dataset_svm)

## 
## Call:
## svm(formula = y ~ ., data = dataset, kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  linear 
##        cost:  1 
##       gamma:  1 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  80

#gamma = 1 cost = 1
dataset_predict <- predict(dataset_svm, x);
sqrt(sum((y-dataset_predict)^2))/length(y)

## [1] 0.04381142

#[1] 0.04381142

## tune `svm' for classification with kernel = "linear",
## using one split for training/validation set
# gamma = 0.0 0.5 1.0 1.5 2.0; cost = 1.0 1.5 2.0 2.5 3.0 3.5
dataset_svm_tuned <- tune(svm, y~., data = dataset, kernel="linear", 
                        ranges = list(gamma = seq(0,2,.5), cost = seq(1,3.5,0.5)),
                        tunecontrol = tune.control(sampling = "cross")
)
summary(dataset_svm_tuned)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  gamma cost
##      0    3
## 
## - best performance: 0.1908039 
## 
## - Detailed performance results:
##    gamma cost     error dispersion
## 1    0.0  1.0 0.1912521  0.1463126
## 2    0.5  1.0 0.1912521  0.1463126
## 3    1.0  1.0 0.1912521  0.1463126
## 4    1.5  1.0 0.1912521  0.1463126
## 5    2.0  1.0 0.1912521  0.1463126
## 6    0.0  1.5 0.1908978  0.1460247
## 7    0.5  1.5 0.1908978  0.1460247
## 8    1.0  1.5 0.1908978  0.1460247
## 9    1.5  1.5 0.1908978  0.1460247
## 10   2.0  1.5 0.1908978  0.1460247
## 11   0.0  2.0 0.1913567  0.1456044
## 12   0.5  2.0 0.1913567  0.1456044
## 13   1.0  2.0 0.1913567  0.1456044
## 14   1.5  2.0 0.1913567  0.1456044
## 15   2.0  2.0 0.1913567  0.1456044
## 16   0.0  2.5 0.1912236  0.1456750
## 17   0.5  2.5 0.1912236  0.1456750
## 18   1.0  2.5 0.1912236  0.1456750
## 19   1.5  2.5 0.1912236  0.1456750
## 20   2.0  2.5 0.1912236  0.1456750
## 21   0.0  3.0 0.1908039  0.1450031
## 22   0.5  3.0 0.1908039  0.1450031
## 23   1.0  3.0 0.1908039  0.1450031
## 24   1.5  3.0 0.1908039  0.1450031
## 25   2.0  3.0 0.1908039  0.1450031
## 26   0.0  3.5 0.1908219  0.1450436
## 27   0.5  3.5 0.1908219  0.1450436
## 28   1.0  3.5 0.1908219  0.1450436
## 29   1.5  3.5 0.1908219  0.1450436
## 30   2.0  3.5 0.1908219  0.1450436

plot(dataset_svm_tuned)
dataset_svm_tuned$best.parameters

##    gamma cost
## 21     0    3

#gamma = 0, cost = 2.5
dataset_svm <- svm(y ~ ., data = dataset, kernel = "linear", gamma=0, cost=2.5)
dataset_predict <- predict(dataset_svm, x);
sqrt(sum((y-dataset_predict)^2))/length(y)

## [1] 0.04361607

#[1]  0.04361607

#5) Next try various kernels and added features with SVM. 
# Kernel = Radial Basis (default)
dataset_svm <- svm(y ~ ., data = dataset)
summary(dataset_svm)

## 
## Call:
## svm(formula = y ~ ., data = dataset)
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  1 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  73

#gamma = 1 cost = 1
dataset_predict <- predict(dataset_svm, x);
sqrt(sum((y-dataset_predict)^2))/length(y)

## [1] 0.0282729

#[1] 0.0282729
## tune `svm' for classification with kernel = radial basis,
## using one split for training/validation set
# gamma = 0.0 0.5 1.0 1.5 2.0; cost = 1 1.5 2.0 2.5 3.0
dataset_svm_tuned <- tune(svm, y~., data = dataset, 
                          ranges = list(gamma = seq(0,2,.5), cost = seq(1,3.5,0.5)),
                          tunecontrol = tune.control(sampling = "cross")
)
dataset_svm_tuned$best.parameters

##    gamma cost
## 25     2    3

dataset_svm <- svm(y ~ ., data = dataset, gamma=1.5, cost=3.5)
dataset_predict <- predict(dataset_svm, x);
sqrt(sum((y-dataset_predict)^2))/length(y)

## [1] 0.02374744

#[1] 0.02374744

dataset_svm <- svm(y ~ ., data = dataset, kernel="polynomial")
summary(dataset_svm)

## 
## Call:
## svm(formula = y ~ ., data = dataset, kernel = "polynomial")
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  polynomial 
##        cost:  1 
##      degree:  3 
##       gamma:  1 
##      coef.0:  0 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  84

#gamma = 1 cost = 1
dataset_predict <- predict(dataset_svm, x);
sqrt(sum((y-dataset_predict)^2))/length(y)

## [1] 0.04557566

#[1]  0.04557566
## tune `svm' for classification with kernel = "polynomial",
## using one split for training/validation set
# gamma = 0.0 0.5 1.0 1.5 2.0; cost = 1 1.5 2.0 2.5 3.0
dataset_svm_tuned <- tune(svm, y~., data = dataset, kernel="polynomial",
                          ranges = list(gamma = seq(0,2,.5), cost = seq(1,3.5,0.5)),
                          tunecontrol = tune.control(sampling = "cross")
)
dataset_svm_tuned$best.parameters

##   gamma cost
## 2   0.5    1

dataset_svm <- svm(y ~ ., data = dataset, kernel="polynomial", gamma=0.5, cost=1.5)
dataset_predict <- predict(dataset_svm, x);
sqrt(sum((y-dataset_predict)^2))/length(y)

## [1] 0.04518411

#[1] 0.04518411

dataset_svm <- svm(y ~ ., data = dataset, kernel = "sigmoid")
summary(dataset_svm)

## 
## Call:
## svm(formula = y ~ ., data = dataset, kernel = "sigmoid")
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  sigmoid 
##        cost:  1 
##       gamma:  1 
##      coef.0:  0 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  99

#gamma = 1 cost = 1
dataset_predict <- predict(dataset_svm, x);
sqrt(sum((y-dataset_predict)^2))/length(y)

## [1] 0.3460026

#[1] 0.3460026
## tune `svm' for classification with kernel = "sigmoid",
## using one split for training/validation set
# gamma = 0.0 0.5 1.0 1.5 2.0; cost = 1 1.5 2.0 2.5 3.0
dataset_svm_tuned <- tune(svm, y~., data = dataset, kernel="sigmoid",
                          ranges = list(gamma = seq(0,2,.5), cost = seq(1,3.5,0.5)),
                          tunecontrol = tune.control(sampling = "cross")
)
dataset_svm_tuned$best.parameters

##   gamma cost
## 1     0    1

dataset_svm <- svm(y ~ ., data = dataset, kernel="sigmoid", gamma=0, cost=1)
dataset_predict <- predict(dataset_svm, x);
sqrt(sum((y-dataset_predict)^2))/length(y)

## [1] 0.0946527

#[1] 0.0946527

#5) Can you improve the model by adding an extra feature which might be a 
# function of the first feature? Compare both lm.ridge and svm. 
xlog<-log(x)
dataset <- as.data.frame(cbind(x,xlog,y))
str(dataset)

## 'data.frame':    99 obs. of  3 variables:
##  $ x   : num  0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 ...
##  $ xlog: num  -2.3 -1.9 -1.61 -1.39 -1.2 ...
##  $ y   : num  -2.3 -2.29 -1.81 -1.44 -1.11 ...

x <- dataset[,1:2]
dataset_svm <- svm(y ~ ., data = dataset, kernel = "linear")
summary(dataset_svm)

## 
## Call:
## svm(formula = y ~ ., data = dataset, kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  linear 
##        cost:  1 
##       gamma:  0.5 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  70

#gamma = 0.5 cost = 1
dataset_predict <- predict(dataset_svm, x);
sqrt(sum((y-dataset_predict)^2))/length(y)

## [1] 0.02213159

#[1] 0.02213159
## tune `svm' for classification with kernel = "linear",
## using one split for training/validation set
# gamma = 0.0 0.5 1.0 1.5 2.0; cost = 1.0 1.5 2.0 2.5 3.0 3.5
dataset_svm_tuned <- tune(svm, y~., data = dataset, kernel="linear", 
                          ranges = list(gamma = seq(0,2,.5), cost = seq(1,3.5,0.5)),
                          tunecontrol = tune.control(sampling = "cross")
)
summary(dataset_svm_tuned)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  gamma cost
##      0  1.5
## 
## - best performance: 0.05167865 
## 
## - Detailed performance results:
##    gamma cost      error dispersion
## 1    0.0  1.0 0.05197314 0.01480749
## 2    0.5  1.0 0.05197314 0.01480749
## 3    1.0  1.0 0.05197314 0.01480749
## 4    1.5  1.0 0.05197314 0.01480749
## 5    2.0  1.0 0.05197314 0.01480749
## 6    0.0  1.5 0.05167865 0.01508452
## 7    0.5  1.5 0.05167865 0.01508452
## 8    1.0  1.5 0.05167865 0.01508452
## 9    1.5  1.5 0.05167865 0.01508452
## 10   2.0  1.5 0.05167865 0.01508452
## 11   0.0  2.0 0.05170011 0.01509153
## 12   0.5  2.0 0.05170011 0.01509153
## 13   1.0  2.0 0.05170011 0.01509153
## 14   1.5  2.0 0.05170011 0.01509153
## 15   2.0  2.0 0.05170011 0.01509153
## 16   0.0  2.5 0.05194368 0.01518225
## 17   0.5  2.5 0.05194368 0.01518225
## 18   1.0  2.5 0.05194368 0.01518225
## 19   1.5  2.5 0.05194368 0.01518225
## 20   2.0  2.5 0.05194368 0.01518225
## 21   0.0  3.0 0.05195528 0.01515975
## 22   0.5  3.0 0.05195528 0.01515975
## 23   1.0  3.0 0.05195528 0.01515975
## 24   1.5  3.0 0.05195528 0.01515975
## 25   2.0  3.0 0.05195528 0.01515975
## 26   0.0  3.5 0.05198968 0.01517651
## 27   0.5  3.5 0.05198968 0.01517651
## 28   1.0  3.5 0.05198968 0.01517651
## 29   1.5  3.5 0.05198968 0.01517651
## 30   2.0  3.5 0.05198968 0.01517651

dataset_svm_tuned$best.parameters

##   gamma cost
## 6     0  1.5

#gamma = 0, cost = 2.5
dataset_svm <- svm(y ~ ., data = dataset, kernel = "linear", gamma=0, cost=2.5)
dataset_predict <- predict(dataset_svm, x);
svm_error<-sqrt(sum((y-dataset_predict)^2))/length(y)
#[1] 0.02130392

#install.packages("glmnet")
library(glmnet)

## Warning: package 'glmnet' was built under R version 3.1.3

## Loading required package: Matrix

## Warning: package 'Matrix' was built under R version 3.1.2

## Loaded glmnet 1.9-8

dataset <- cbind(x,xlog,y)

grid=10^seq(10,-2,length=100)
cv.out=cv.glmnet(as.matrix(dataset),y,alpha=0,lambda=grid)
cv.out$lambda.min

## [1] 0.01

#[1] 0.01
ridgeMod =glmnet(as.matrix(dataset),y,alpha=0,lambda=0.01)
ridgePredict <- predict(ridgeMod, newx = as.matrix(dataset))
error <-sqrt( sum((y-ridgePredict)^2))/length(y)
paste("Ridge Regression error",error)

## [1] "Ridge Regression error 0.00319279474519351"

paste("SVM error",svm_error)

## [1] "SVM error 0.0221504098233445"

print("lm.ridge produced a better model than SVM")

## [1] "lm.ridge produced a better model than SVM"

ML_Homework6: Topics on Support Vector

Andrew Zhang

Tuesday, April 07, 2015

R packages used in this assignment:

1. e1071: library for Support Vector Machine

2. randomForest: Breiman and Cutler’s random forests for classification and regression

3. glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models

4. mlbench: Machine Learning Benchmark Problems

Question 1: In a past homework, you performed ridge regression on the wine quality data set. Now use a support vector machine to classify these data.

1a) First classify the data treating the last column as an ordered factor (the wine tasters score). Next treat the last column as a numeric. Which SVM implementation is better? Why do you think it is better?

1b) Using the best version choose two attributes and a slice through the data to plot. Choose a different set of attributes and another set of slices to plot.

1c) Compare and contrast the best version of the SVM with the ridge regression model.

Quesiton 2: Classify the sonar data set.

2a) Use a support vector machine to classify the sonar data set. First tune an SVM employing radial basis function (default). Next tune an SVM employing a linear kernel. Compare the results.

2b) In past homework, trees were used to classify the sonar data. Compare the best result using trees with the best result using SVM.

Question 3. The in class example (svm1.r) used the glass data set. Use the Random Forest technique on the glass data. Compare the Random Forest results with the results obtained in class with SVM.

Quesiton 5. Use SVM with kernel = “linear” to create regression predictions on the data set created using these lines of code:

x <- seq(0.1, 5, by = 0.05) # the observed feature

y <- log(x) + rnorm(x, sd = 0.2) # the target for the observed feature

Next try various kernels and added features with SVM. Can you improve the model by adding an extra feature which might be a function of the first feature? Compare both lm.ridge and svm. Which method produced a better model? (don’t forget to tune your models)