Perform an analysis of the dataset used in Homework #2 using the SVM algorithm.Compare the results with the results from previous homework. Based on articles
https://www.hindawi.com/journals/complexity/2021/5550344/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8137961/
Search for academic content (at least 3 articles) that compare the use of decision trees vs SVMs in your current area of expertise. Which algorithm is recommended to get more accurate results? Is it better for classification or regression scenarios? Do you agree with the recommendations? Why? Format: R file & essay Due date: Sunday, April 24, 2022, end of day
Based on the the topics presented, bring a dataset of your choice and create a Decision tree where you can solve a classification or regression problem and predict the outcome of a particular feature or detail of the data used. Switch variables to generate 2 decision trees and compare the results. Create a random forest for regression and analyze the results.
Description: Gas mileage, horsepower, and other information for 392 vehicles. Usage Auto
Format: A data frame with 392 observations on the following 9 variables. mpg miles per gallon cylinders Number of cylinders between 4 and 8 displacement Engine displacement (cu. inches) horsepower Engine horsepower weight Vehicle weight (lbs.) acceleration Time to accelerate from 0 to 60 mph (sec.) year Model year (modulo 100) origin Origin of car (1. American, 2. European, 3. Japanese) name Vehicle name
The orginal data contained 408 observations but 16 observations with missing values were removed.
Source: This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. The dataset was used in the 1983 American Statistical Association Exposition.
# Get the list of data sets contained in package
x <- data(package = "ISLR")
x$results[, "Item"]
## [1] "Auto" "Caravan" "Carseats" "College" "Credit" "Default"
## [7] "Hitters" "Khan" "NCI60" "OJ" "Portfolio" "Smarket"
## [13] "Wage" "Weekly"
colnames(x)
## NULL
As you can see some our variables are ranging from Auto to Weekly etc
data(Carseats)
# Get the variable names
names(Carseats)
## [1] "Sales" "CompPrice" "Income" "Advertising" "Population"
## [6] "Price" "ShelveLoc" "Age" "Education" "Urban"
## [11] "US"
dim(Carseats)
## [1] 400 11
head(Carseats)
## Sales CompPrice Income Advertising Population Price ShelveLoc Age Education
## 1 9.50 138 73 11 276 120 Bad 42 17
## 2 11.22 111 48 16 260 83 Good 65 10
## 3 10.06 113 35 10 269 80 Medium 59 12
## 4 7.40 117 100 4 466 97 Medium 55 14
## 5 4.15 141 64 3 340 128 Bad 38 13
## 6 10.81 124 113 13 501 72 Bad 78 16
## Urban US
## 1 Yes Yes
## 2 Yes Yes
## 3 Yes Yes
## 4 Yes Yes
## 5 Yes No
## 6 No Yes
As suggested by the instructions in the assignment I will be doing classification thetree to analyze the carseats data set. Also notice that a simulated data set containing Sales of child car seats at 400 different stores. You’ll see 400 observations and 11 variables in the data set. I am interested in predicting Sales based on the other variables in the data set. Of course, since Sales is a continuous variable, we need to make as a binary variable. You’ll see the new variable, High, will take value of Yes if the Sales variable exceeds 8, and No otherwise.
High = ifelse(Carseats$Sales <=8, "No", "Yes")
Carseats=data.frame(Carseats,High)
Carseats.H <- Carseats[,-1]
Carseats.H$High = as.factor(Carseats$High)
class(Carseats.H$High)
## [1] "factor"
set.seed(888)
thetrain = sample(1:nrow(Carseats.H), 200)
Carseats.thetrain=Carseats.H[thetrain,]
Carseats.thetest=Carseats.H[-thetrain,]
High.thetest=High[-thetrain]
My first step is to make classification thetree using the thetraining set to predict High using all variables except Sales (remember that High was derived from Sales).
The cp value is a stopping parameter. It helps speed up the search for splits because it can identify splits that don’t meet this criteria and prune them before going too far.
If you take the approach of building really deep thetrees, the default value of 0.01 might be too restrictive.
fit.thetree = rpart(High ~ ., data=Carseats.thetrain, method = "class", cp=0.008)
fit.thetree
## n= 200
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 200 87 No (0.56500000 0.43500000)
## 2) Price>=96.5 161 57 No (0.64596273 0.35403727)
## 4) ShelveLoc=Bad 42 3 No (0.92857143 0.07142857) *
## 5) ShelveLoc=Good,Medium 119 54 No (0.54621849 0.45378151)
## 10) Advertising< 8.5 65 19 No (0.70769231 0.29230769)
## 20) CompPrice< 144.5 51 9 No (0.82352941 0.17647059) *
## 21) CompPrice>=144.5 14 4 Yes (0.28571429 0.71428571) *
## 11) Advertising>=8.5 54 19 Yes (0.35185185 0.64814815)
## 22) ShelveLoc=Medium 36 18 No (0.50000000 0.50000000)
## 44) Education>=13.5 21 6 No (0.71428571 0.28571429) *
## 45) Education< 13.5 15 3 Yes (0.20000000 0.80000000) *
## 23) ShelveLoc=Good 18 1 Yes (0.05555556 0.94444444) *
## 3) Price< 96.5 39 9 Yes (0.23076923 0.76923077)
## 6) CompPrice< 99 8 3 No (0.62500000 0.37500000) *
## 7) CompPrice>=99 31 4 Yes (0.12903226 0.87096774) *
# Visualizing
rpart.plot(fit.thetree)
pred.thetree = predict(fit.thetree, Carseats.thetest, type = "class")
table(pred.thetree,High.thetest)
## High.thetest
## pred.thetree No Yes
## No 90 32
## Yes 33 45
#plotcp(fit.thetree)
printcp(fit.thetree)
##
## Classification tree:
## rpart(formula = High ~ ., data = Carseats.thetrain, method = "class",
## cp = 0.008)
##
## Variables actually used in tree construction:
## [1] Advertising CompPrice Education Price ShelveLoc
##
## Root node error: 87/200 = 0.435
##
## n= 200
##
## CP nsplit rel error xerror xstd
## 1 0.241379 0 1.00000 1.00000 0.080587
## 2 0.091954 1 0.75862 0.98851 0.080476
## 3 0.068966 3 0.57471 0.87356 0.078901
## 4 0.051724 4 0.50575 0.73563 0.075827
## 5 0.022989 6 0.40230 0.73563 0.075827
## 6 0.008000 7 0.37931 0.65517 0.073379
# lowest cp value
fit.thetree$cptable[which.min(fit.thetree$cptable[,"xerror"]),"CP"]
## [1] 0.008
we’ll prune the regression thetree to find the optimal value to use for cp (the complexity parameter) that leads to the lowest thetest error.
Note that the optimal value for cp is the one that leads to the lowest xerror in the previous output, which represents the error on the observations from the cross-validation data.
bestcp <-fit.thetree$cptable[which.min(fit.thetree$cptable[,"xerror"]),"CP"]
pruned.thetree <- prune(fit.thetree, cp = bestcp)
rpart.plot(pruned.thetree)
pred.prune = predict(pruned.thetree, Carseats.thetest, type="class")
table(pred.prune, High.thetest)
## High.thetest
## pred.prune No Yes
## No 90 32
## Yes 33 45
# High variable
Carseats.S <- Carseats[,-12]
set.seed(999)
thetrain = sample(1:nrow(Carseats.S), 200)
Carseats.thetrain=Carseats.S[thetrain,]
Carseats.thetest=Carseats.S[-thetrain,]
Analysis of Variance (ANOVA) consists of calculations that provide information about levels of variability within a regression model and form a basis for thetests of significance. The basic regression line concept, DATA = FIT + RESIDUAL, is rewritten as follows: (yi - ) = ( i - ) + (yi - i)
fit.thetree = rpart(Sales ~ ., data=Carseats.thetrain, method="anova", cp=0.008)
#summary(fit.thetree)
fit.thetree
## n= 200
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 200 1605.306000 7.289650
## 2) ShelveLoc=Bad,Medium 161 982.688100 6.655714
## 4) Price>=94.5 135 646.254000 6.136593
## 8) ShelveLoc=Bad 46 163.515600 4.679565
## 16) Population< 106 11 48.512490 3.319091 *
## 17) Population>=106 35 88.244510 5.107143
## 34) Age>=33.5 28 65.860070 4.787857 *
## 35) Age< 33.5 7 8.112371 6.384286 *
## 9) ShelveLoc=Medium 89 334.610500 6.889663
## 18) Price>=127 29 83.957080 5.622069
## 36) Advertising< 3.5 13 18.137710 4.636154 *
## 37) Advertising>=3.5 16 42.915940 6.423125 *
## 19) Price< 127 60 181.534500 7.502333
## 38) Age>=60.5 25 70.170620 6.565600
## 76) CompPrice< 118.5 9 10.483560 5.307778 *
## 77) CompPrice>=118.5 16 37.438540 7.273125 *
## 39) Age< 60.5 35 73.758030 8.171429
## 78) Advertising< 6 18 29.197650 7.171667 *
## 79) Advertising>=6 17 7.519200 9.230000 *
## 5) Price< 94.5 26 111.153100 9.351154
## 10) Advertising< 9 15 70.662090 8.597333 *
## 11) Advertising>=9 11 20.344090 10.379090 *
## 3) ShelveLoc=Good 39 290.814100 9.906667
## 6) Price>=135 9 15.959890 6.391111 *
## 7) Price< 135 30 130.252300 10.961330
## 14) Age>=62 7 12.926490 9.168571 *
## 15) Age< 62 23 87.980690 11.506960
## 30) Urban=No 9 22.282160 9.887778 *
## 31) Urban=Yes 14 26.934240 12.547860 *
rpart.plot(fit.thetree)
fit.thetree$variable.importance
## ShelveLoc Price Age Advertising CompPrice Income
## 483.59509 474.09259 139.48793 87.61880 78.87432 57.47487
## US Population Urban Education
## 55.24076 54.08655 38.76430 30.12734
pred.thetree = predict(fit.thetree, Carseats.thetest)
The mean squared error (MSE) tells you how close a regression line is to a set of points. It does this by taking the distances from the points to the regression line (these distances are the “errors”) and squaring them. The squaring is necessary to remove any negative signs. It also gives more weight to larger differences. It’s called the mean squared error as you’re finding the average of a set of errors. The lower the MSE, the better the forecast.
# mean square error
mse <- mean((pred.thetree - Carseats.thetest$Sales)^2)
mse
## [1] 4.530078
# CP value
printcp(fit.thetree)
##
## Regression tree:
## rpart(formula = Sales ~ ., data = Carseats.thetrain, method = "anova",
## cp = 0.008)
##
## Variables actually used in tree construction:
## [1] Advertising Age CompPrice Population Price ShelveLoc
## [7] Urban
##
## Root node error: 1605.3/200 = 8.0265
##
## n= 200
##
## CP nsplit rel error xerror xstd
## 1 0.2066921 0 1.00000 1.00668 0.094663
## 2 0.1403352 1 0.79331 0.87310 0.081991
## 3 0.0922740 2 0.65297 0.69819 0.067697
## 4 0.0900774 3 0.56070 0.69214 0.067207
## 5 0.0430565 4 0.47062 0.57741 0.056675
## 6 0.0234260 5 0.42756 0.57185 0.061614
## 7 0.0230742 6 0.40414 0.59280 0.063329
## 8 0.0212139 7 0.38106 0.59789 0.063341
## 9 0.0166688 9 0.33864 0.60690 0.064391
## 10 0.0142673 10 0.32197 0.64061 0.063234
## 11 0.0138594 11 0.30770 0.62653 0.061521
## 12 0.0125502 12 0.29384 0.63228 0.061349
## 13 0.0088906 13 0.28129 0.64558 0.060469
## 14 0.0080000 14 0.27240 0.64923 0.060476
bestcp <- fit.thetree$cptable[which.min(fit.thetree$cptable[,"xerror"]),"CP"]
bestcp
## [1] 0.02342595
The accuracy of the model on the thetest data is better when the thetree is pruned, which means that the pruned decision thetree model generalizes well and is more suited for a production environment. However, there are also other factors that can influence decision thetree model creation, such as building a thetree on an unbalanced class. These factors were not accounted for in this demonstration but it’s very important for them to be examined during a live model formulation.
pruned.thetree <- prune(fit.thetree, cp = bestcp)
# Visualize the thetree
rpart.plot(pruned.thetree)
# Checking the order of variable importance
pruned.thetree$variable.importance
## ShelveLoc Price Age Income CompPrice Education
## 479.932017 442.221990 35.353913 33.396181 29.246370 7.150235
## Advertising Population
## 3.220173 2.383412
With the decision thetree, it can enable validation since it is the best predictive model. In fact, it finds use of making quantitative analysis of the business platform. In addition, it can validate results of the statistical thetests. Nevertheless, it can support naturally the classification of problems with several classes by modification process.
# Use the thetest data to evaluate performance of pruned regression thetree
pred.prune = predict(pruned.thetree, Carseats.thetest)
# Calcualte the MSE for the pruned thetree
mse <- mean((pred.prune - Carseats.thetest$Sales)^2)
mse
## [1] 4.897713
Random Forest is based on the bagging algorithm and uses Ensemble Learning technique. It creates as many thetrees on the subset of the data and combines the output of all the thetrees. In this way it reduces overfitting problem in decision thetrees and also reduces the variance and therefore improves the accuracy.
Random Forest can be used to solve both classification as well as regression problems.
Random Forest works well with both categorical and continuous variables.
Random Forest can automatically handle missing values.
No feature scaling required: No feature scaling (standardization and normalization) required in case of Random Forest as it uses rule based approach instead of distance calculation.
# random forest using all predictors
# using
modFit.rf <- randomForest::randomForest(Carseats.thetrain$Sales ~ ., data = Carseats.thetrain[,c(1:11)])
modFit.rf
##
## Call:
## randomForest(formula = Carseats.thetrain$Sales ~ ., data = Carseats.thetrain[, c(1:11)])
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 3
##
## Mean of squared residuals: 3.232277
## % Var explained: 59.73
forest_pred <- predict(modFit.rf, Carseats.thetest)
table(forest_pred)
## forest_pred
## 4.49766066666666 4.553272 4.74268461904762 4.84429633333333
## 1 1 1 1
## 4.882053 4.90977966666667 4.916167 4.92679866666667
## 1 1 1 1
## 5.08684316666667 5.09389766666667 5.13174255555556 5.143352
## 1 1 1 1
## 5.20166183333333 5.33597961904761 5.41031466666666 5.42922566666667
## 1 1 1 1
## 5.44026733333333 5.62702733333333 5.64218895238094 5.66944888095238
## 1 1 1 1
## 5.67166533333333 5.69624733333333 5.825111 5.834741
## 1 1 1 1
## 5.85814066666667 5.87860738095238 5.89826866666667 5.96244161904762
## 1 1 1 1
## 6.02942271428571 6.043148 6.04935166666667 6.06333661904762
## 1 1 1 1
## 6.08969066666667 6.12923283333333 6.20497100000001 6.20790599999999
## 1 1 1 1
## 6.23353299999999 6.24949233333333 6.32152233333333 6.34873466666666
## 1 1 1 1
## 6.36093066666667 6.36178433333333 6.37020176190476 6.440513
## 1 1 1 1
## 6.45411866666667 6.460274 6.468564 6.47315
## 1 1 1 1
## 6.50512361904762 6.518236 6.542815 6.56347799999999
## 1 1 1 1
## 6.56632616666667 6.58527966666667 6.612339 6.630332
## 1 1 1 1
## 6.657704 6.659737 6.66003588888889 6.66013833333333
## 1 1 1 1
## 6.66025654761905 6.66469266666667 6.69525128571428 6.70967971428571
## 1 1 1 1
## 6.71034083333332 6.71400961904761 6.729461 6.73066033333333
## 1 1 1 1
## 6.747615 6.770789 6.78079833333332 6.823457
## 1 1 1 1
## 6.83419466666667 6.8782 6.889779 6.892132
## 1 1 1 1
## 6.89628266666666 6.91858033333334 6.95286433333334 7.00136583333334
## 1 1 1 1
## 7.02389080952381 7.02771199999999 7.03207566666666 7.07083433333335
## 1 1 1 1
## 7.08448661904762 7.09387000000001 7.09501466666666 7.11656666666667
## 1 1 1 1
## 7.134659 7.14507866666666 7.237091 7.24696433333333
## 1 1 1 1
## 7.24784095238095 7.28751466666666 7.290333 7.30234133333334
## 1 1 1 1
## 7.33092099999999 7.33240428571429 7.42347380952381 7.42905366666667
## 1 1 1 1
## 7.43322299999999 7.43324433333333 7.43324533333333 7.44922799999999
## 1 1 1 1
## 7.45635100000001 7.46085933333334 7.46259466666665 7.48436533333333
## 1 1 1 1
## 7.507944 7.519279 7.53220399999999 7.53987033333333
## 1 1 1 1
## 7.58219422222222 7.59806666666667 7.6013 7.66689866666667
## 1 1 1 1
## 7.70365899999999 7.73442738095238 7.75570966666667 7.76352133333334
## 1 1 1 1
## 7.78285166666667 7.79906728571428 7.80812733333333 7.831732
## 1 1 1 1
## 7.90635866666666 7.92275 7.93173066666667 7.95244433333334
## 1 1 1 1
## 7.95700066666666 7.963845 7.97789366666666 7.98678133333334
## 1 1 1 1
## 7.99230916666666 7.99579480952382 8.02173850000001 8.03621566666667
## 1 1 1 1
## 8.09594166666667 8.09940071428571 8.2424045 8.27331433333334
## 1 1 1 1
## 8.27478033333333 8.34325595238097 8.37773866666666 8.43998066666666
## 1 1 1 1
## 8.54173966666666 8.54311 8.56097883333334 8.612151
## 1 1 1 1
## 8.61851666666667 8.71259409523809 8.751208 8.85622416666667
## 1 1 1 1
## 8.87673733333333 8.90295466666666 8.94765433333332 8.97254733333333
## 1 1 1 1
## 8.977202 8.98490516666666 8.98880828571429 9.01419433333333
## 1 1 1 1
## 9.05158583333332 9.13910233333333 9.25350033333333 9.28426899999999
## 1 1 1 1
## 9.28980111111112 9.29261899999999 9.30257266666666 9.31912499999999
## 1 1 1 1
## 9.34772299999999 9.351979 9.39249666666667 9.44541966666667
## 1 1 1 1
## 9.44659633333334 9.45246266666666 9.50048399999999 9.63173233333331
## 1 1 1 1
## 9.6427775 9.67040744444443 9.80139166666666 9.83688099999999
## 1 1 1 1
## 9.88388566666668 9.91572733333333 9.96535833333333 10.0005271111111
## 1 1 1 1
## 10.044763 10.0535023333333 10.0589236666667 10.0940126666667
## 1 1 1 1
## 10.0968243333333 10.1466936666667 10.212493 10.2341617777778
## 1 1 1 1
## 10.2349104444444 10.2355576666667 10.267478 10.3406077777778
## 1 1 1 1
## 10.3625343333333 10.4262394444444 11.2468724444444 11.4011914444444
## 1 1 1 1
#install.packages(‘skimr’)
skim(Carseats.thetrain)
Name | Carseats.thetrain |
Number of rows | 200 |
Number of columns | 11 |
_______________________ | |
Column type frequency: | |
factor | 3 |
numeric | 8 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
ShelveLoc | 0 | 1 | FALSE | 3 | Med: 107, Bad: 54, Goo: 39 |
Urban | 0 | 1 | FALSE | 2 | Yes: 142, No: 58 |
US | 0 | 1 | FALSE | 2 | Yes: 125, No: 75 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Sales | 0 | 1 | 7.29 | 2.84 | 0.37 | 5.20 | 7.38 | 9.31 | 15.63 | ▁▇▇▃▁ |
CompPrice | 0 | 1 | 125.44 | 15.43 | 77.00 | 115.00 | 125.00 | 135.00 | 162.00 | ▁▃▇▆▂ |
Income | 0 | 1 | 66.49 | 27.17 | 22.00 | 42.00 | 67.00 | 87.00 | 120.00 | ▇▆▇▆▅ |
Advertising | 0 | 1 | 6.12 | 6.30 | 0.00 | 0.00 | 5.00 | 11.00 | 26.00 | ▇▂▃▁▁ |
Population | 0 | 1 | 261.65 | 151.17 | 12.00 | 128.25 | 271.00 | 398.50 | 508.00 | ▇▆▅▇▆ |
Price | 0 | 1 | 116.46 | 24.04 | 24.00 | 100.00 | 118.00 | 131.00 | 191.00 | ▁▂▇▅▁ |
Age | 0 | 1 | 53.95 | 16.58 | 25.00 | 39.75 | 55.00 | 66.00 | 80.00 | ▇▆▇▇▇ |
Education | 0 | 1 | 13.81 | 2.54 | 10.00 | 12.00 | 14.00 | 16.00 | 18.00 | ▇▇▃▇▆ |
set.seed(69)
my_svm<-svm(Income~., data=Carseats.thetrain, kernel="linear", cost=10,scale=TRUE)
summary(my_svm)
##
## Call:
## svm(formula = Income ~ ., data = Carseats.thetrain, kernel = "linear",
## cost = 10, scale = TRUE)
##
##
## Parameters:
## SVM-Type: eps-regression
## SVM-Kernel: linear
## cost: 10
## gamma: 0.08333333
## epsilon: 0.1
##
##
## Number of Support Vectors: 185
print(my_svm)
##
## Call:
## svm(formula = Income ~ ., data = Carseats.thetrain, kernel = "linear",
## cost = 10, scale = TRUE)
##
##
## Parameters:
## SVM-Type: eps-regression
## SVM-Kernel: linear
## cost: 10
## gamma: 0.08333333
## epsilon: 0.1
##
##
## Number of Support Vectors: 185
set.seed(888)
pred <- predict(my_svm, newdata=Carseats.thetest)
print(my_svm)
##
## Call:
## svm(formula = Income ~ ., data = Carseats.thetrain, kernel = "linear",
## cost = 10, scale = TRUE)
##
##
## Parameters:
## SVM-Type: eps-regression
## SVM-Kernel: linear
## cost: 10
## gamma: 0.08333333
## epsilon: 0.1
##
##
## Number of Support Vectors: 185
plot(my_svm, Carseats.thetest)
Carseats.thetest$pred <- predict(my_svm, newdata=Carseats.thetest)
the_rmse <- Carseats.thetrain %>%
mutate(residual = Income - pred) %>%
summarize(rmse = sqrt(mean(residual^2)))
print(the_rmse)
## rmse
## 1 32.36961
summary: SVM uses kernel trick to solve non-linear problems whereas decision trees derive hyper-rectangles in input space to solve the problem. Decision trees are better for categorical data and it deals colinearity better than SVM.
summary: The biggest difference between the two algorithms is that SVM uses the kernel trick to turn a linearly nonseparable problem into a linearly separable one (unless of course we use the linear kernel), while decision trees (and forests based on them, and boosted trees, both to a lesser extent due to the nature of the ensemble algorithms) split the input space into hyper-rectangles according to the target. Usually one will work better than another in a given situation, but it’s hard to tell in most cases in high dimensional spaces unless there is something about the data that suggests one over the other. This is the preferred method, but hardly obvious in most cases.Most of the time, people use a validation set to not only optimize hyperperameters but also to choose between algorithms. It’s not perfect, but often it works. Oh - if you have categories in your inputs, you can’t use SVMs. They only work with numeric data.
summary: In the end, if you have the computational resources to do so, try both. See which class of models performs best on your holdout/validation/test dataset(s). If your data is highly structured, gradient boosting methods will likely perform very well. And oftentimes, you can train a high-performing booster in less time than it takes to fit an SVM.If your data includes categorical features, just be aware that the performance of tree-based methods often suffers if these features are one-hot encoded (see here for a great discussion of why this occurs). So either use another encoding strategy such as target encoding, or use a library with native handling of categorical features, such as H20 (see here).
If we have more categorical data then first I will prefer to go with decisions tree. There are many advantage with decisions tree.Its highly interpretive.Its automatically handled the multicolinarty problem.If we have more sparse data then I will prefer to go with SVM.Some people ask why is svm not so good as decision tree on the same data? Possibilities include the use of an inappropriate kernel (e.g. a linear kernel for a non-linear problem), poor choice of kernel and regularisation hyper-parameters. Good model selection (choice of kernel and hyper-parameter tuning is the key to getting good performance from SVMs, they can only be expected to give good results when used correctly). SVMs often do take a long time to train, this is especially true when the choice of kernel and particularly regularisation parameter means that almost all the data end up as support vectors (the sparsity of SVMs is a handy by-product, nothing more).Lastly, there is no a-priori superiority for any classifier system over the others, so the best classifier for a particular task is itself task-dependent. However there is more compelling theory for the SVM that suggests it is likely to be better choice than many other approaches for many problems.
SVM is one of the supervised algorithms mostly used for classification problems. This article will give an idea about its advantages in general. SVM is very helpful method if we don’t have much idea about the data. It can be used for the data such as image, text, audio etc.It can be used for the data that is not regularly distributed and have unknown distribution. The SVM provides a very useful technique within it known as kernel and by the application of associated kernel function we can solve any complex problem. Kernel provides choosing a function which is not necessarily linear and can have different forms in terms of different data it operates on and thus is a non-parametric function. In Classification problems, there is a strong assumption that is Data have samples that are linearly separable but with the introduction of kernel, Input data can be converted into High dimensional data avoiding the need of this assumption. K(x1, x2)=〈f(x1), f(x2)〉Where K is the kernel function, x1, x2 are n-dimensional inputs and f is a function that is used to map n-dimensional space into m-dimensional space and 〈x1, x2〉is used to specify/indicate the dot product SVM generally do not suffer condition of overfitting and performs well when there is a clear indication of separation between classes. SVM can be used when total no of samples is less than the no of dimensions and performs well in terms of memory. SVM performs and generalized well on the out of sample data. Due to this as it performs well on out of generalization sample data SVM proves itself to be fast as the sure fact says that in SVM for the classification of one sample , the kernel function is evaluated and performed for each and every support vectors. The other important advantage of SVM Algorithm is that it is able to handle High dimensional data too and this proves to be a great help taking into account its usage and application in Machine learning field.