Support vector machines (SVM), large-scale data classification, and machine learning software design. The package e1071 offers an interface to the award-winning1 C++- implementation by Chih-Chung Chang and Chih-Jen Lin (林智仁), libsvm(2011) (current version: 2.6)
LIBSVM : a simple and easy-to-use support vector machines tool for classification (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR), and distribution estimation. It includes a GUI for both classification and regression. Version 1.0 released in April 2000.
應用智慧SVM分類器提升文章發佈效率於企業之知識分享平台
文件在進行分類處理時,除了需要花費時間閱讀以瞭解其內容主題,有時候可能也需要俱備一定的專業知識才能理解文件內容,因此文件分類是一件相當花費時間且需要特定的專家才能完成的一項工作,在資訊化已相當普及的今天,文件資料儲存的平台與讀者的閱讀習慣從紙本書籍轉換到數位資料上,因此如何利用電腦運算處理自動化的優勢來解決分類問題的重要性也日益增加,以節省文件分類的時間與降低人工分類的困難度。 本研究應用SVM分類器於一企業的知識分享平台的文件發佈流程中,並以其由人工進行分類好的文件分類做為測試資料進行分類效能評測,測試文件性質為來自產業情報網站上的科技產業新聞文章,由實驗結果發現SVM分類器在此類文件的分類準確率達到86%,在處理多類別分類的問題時也達到86%的準確度,因此SVM分類器很適合應用在此類科技產業新聞文件的分類處理。
library("e1071")
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
attach(iris)
# Divide Iris data to x (containt the all features) and y only the classes
x <- subset(iris, select=-Species)
y <- Species
svm_model <- svm(Species ~ ., data=iris)
summary(svm_model)
##
## Call:
## svm(formula = Species ~ ., data = iris)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
## gamma: 0.25
##
## Number of Support Vectors: 51
##
## ( 8 22 21 )
##
##
## Number of Classes: 3
##
## Levels:
## setosa versicolor virginica
svm_model1 <- svm(x,y)
summary(svm_model1)
##
## Call:
## svm.default(x = x, y = y)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
## gamma: 0.25
##
## Number of Support Vectors: 51
##
## ( 8 22 21 )
##
##
## Number of Classes: 3
##
## Levels:
## setosa versicolor virginica
pred <- predict(svm_model1,x)
system.time(pred <- predict(svm_model1,x))
## user system elapsed
## 0 0 0
table(pred,y)
## y
## pred setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 48 2
## virginica 0 2 48
svm_tune <- tune(svm, train.x=x, train.y=y,
kernel="radial", ranges=list(cost=10^(-1:2), gamma=c(.5,1,2)))
print(svm_tune)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 10 0.5
##
## - best performance: 0.04
## - best performance: 0.05333
# After you find the best cost and gamma, you can create svm model again and try to # run again
svm_model_after_tune <- svm(Species ~ ., data=iris, kernel="radial", cost=1, gamma=0.5)
summary(svm_model_after_tune)
##
## Call:
## svm(formula = Species ~ ., data = iris, kernel = "radial", cost = 1,
## gamma = 0.5)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
## gamma: 0.5
##
## Number of Support Vectors: 59
##
## ( 11 23 25 )
##
##
## Number of Classes: 3
##
## Levels:
## setosa versicolor virginica
pred <- predict(svm_model_after_tune,x)
system.time(predict(svm_model_after_tune,x))
## user system elapsed
## 0 0 0
table(pred,y)
## y
## pred setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 48 2
## virginica 0 2 48