Read and study the following SVM algorithms and articles:
1."AI4OPT - DE&M II - SVM Algorithms from J. Brownlee 000001.pdf" (12.4.3 Support Vector Machine - Classification and Regression Examples - Listing 12.17 to 12.20)
2. "SVM Algorithm to Run - A Project" (There is an algorithm here: "Classifying data using support vector machines (SVMs) in R) - Read down to "Visualizing the Test Set results" and "Output."
Then be prepared to give a presentation, in class, on your efforts and findings.
Tasks to do:
Question:
Discuss the nuts and bolts of each algorithm/model
Ans:
Algorithm 1: Classification problem
Data: PimaIndiansDiabetes
package: kernlab
ResponseL: diabetes
Covariates: "pregnant" "glucose" "pressure" "triceps" "insulin" "mass" "pedigree" "age"
Syntax:
ksvm(diabetes ~., data = PimaIndiansDiabetes, kernel = "rbfdot")
It fits SVM with ksvm() from the kernlab package with kernel, rbfdot = Radial Basis kernel "Gaussian".
Algorithm 2: Regression problem
Data: BostonHousing
package: kernlab
ResponseL: medv (median value of owner-occupied homes in USD 1000's)
Covariates: All other variables in dataset.
fit <- ksvm( medv ~., data = BostonHousing, kernel = "rbfdot")
It fits SVM with ksvm() from the kernlab package with kernel, rbfdot = Radial Basis kernel "Gaussian".
Algorithm 3: Classification problem
Data: PimaIndiansDiabetes
package: caret
Cross-validation: 5-fold
ResponseL: diabetes
Covariates: "pregnant" "glucose" "pressure" "triceps" "insulin" "mass" "pedigree" "age"
Syntax:
trainControl <- trainControl(method="cv", number = 5)
train(diabetes ~., data = PimaIndiansDiabetes, method = "svmRadial", metric = "Accuracy", trControl = trainControl)
It fits SVM with train() from the caret package with kernel Radial Basis Function (Gaussian kernel).
Algorithm 4: Regression problem
Data: BostonHousing
Cross-validation: 5-fold
package: caret
ResponseL: medv (median value of owner-occupied homes in USD 1000's)
Covariates: All other variables in the dataset
trainControl <- trainControl(method="cv", number = 5)
train(medv ~., data = BostonHousing, method = "svmRadial", metric = "RMSE", trControl = trainControl)
It fits SVM with train() from the caret package with kernel Radial Basis Function (Gaussian kernel).
----------------------- X -----------------------
Question:
Discuss what type of data the model works best on
Ans:
Thd SVM can either for categorical or continuous response data type.
Question:
Discuss how the model will perform on the data: list some of the steps involved, though not necessarily all of the steps
Ans:
The best model performance metrics will be as follows:
Classification: Accuracy rate
Regression: Mean Square Error or Root Meen Square Error
Question:
Discuss some of the strengths of the model
Ans:
It does not require to find the distance from all observations.
It uses high dimensional data for group separation.
Question:
Discuss how well the model performed (error rate, accuracy, ...)
Ans:
We will choose model with higher accuracy for classification and lower Mean Square Error (MSE) for regression problem.
Question:
Discuss some of the arguments found in the function statements
Ans:
SVM can be applied for high dimension data (No. of covariates > no. of observations)
SVM can transform low dimension data to high dimension data
SVM handles non-linearity
Question:
Discuss prediction
Ans:
The following is one output form the SVM model fitted with caret package
using train() function with 5 fold cross-validation.
--------------------
Support Vector Machines with Radial Basis Function Kernel
768 samples
8 predictor
2 classes: 'neg', 'pos'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 614, 614, 615, 614, 615
Resampling results across tuning parameters:
C Accuracy Kappa
0.25 0.7604278 0.4360310
0.50 0.7656056 0.4552142
1.00 0.7590952 0.4409422
Tuning parameter 'sigma' was held constant at a value of 0.124824
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.124824 and C = 0.5.
----------------------------
The C value in a Support Vector Machine (SVM) is the regularization parameter that controls the trade-off between achieving a low error on the training data and maximizing the margin of the decision boundary. It essentially determines the penalty for misclassifying training examples
Large C Value (High Penalty): A large value of C places a high penalty on misclassifications, forcing the model to classify as many training examples correctly as possible. This often leads to a smaller margin and a more complex decision boundary that closely fits the training data, increasing the risk of overfitting.
Small C Value (Low Penalty): A small value of C results in a lower penalty for misclassification, allowing the model to prioritize a wider margin, even if it means some training points are misclassified. This creates a smoother, more generalizable decision boundary, which helps in preventing overfitting and can be beneficial with noisy data or outliers.
Properly tuning the C parameter is a crucial step in building a robust and accurate SVM model. It helps strike the perfect balance between bias and variance
The highest Accuracy with 0.765605 model was selected.
Kappa is a measure of agreement beyond the level of agreement expected by chance alone. Common interpretations for the kappa statistic are as follows: < 0.2 slight agreement, 0.2 - 0.4 fair agreement, 0.4 - 0.6 moderate agreement, 0.6 - 0.8 substantial agreement, > 0.8 almost perfect agreement
The sigma value is a parameter of the Radial Basis Function (RBF) kernel that controls the "width" or "bandwidth" of the kernel's influence. It determines the model's complexity and flexibility: a small sigma value creates a highly non-linear decision boundary that closely follows the data, while a large sigma value creates a more linear and smoother boundary.
Relationship between gamma and sigma gamma = 1/2 sigma ^2
Question:
Discuss the objective of the model: what is the model trying to do with the data
Ans:
To find the optimal hyperplane that maximizes the margin between classes while minimizing classification errors.
Question:
Discuss why pruning was (or was not) necessary
Ans:
In SVM try find the the hyperplane that has larger separation margin. It is not like a tree that generate branches.
Pruning is not strictly necessary for SVMs because they already derive a model from a selected subset of data points (support vectors), but it is beneficial for improving efficiency and reducing overfitting by removing redundant support vectors. Pruning can lead to a faster model with lower memory requirements, which is especially useful for deployment in resource-constrained environments. It also helps simplify the model, making it more generalizable to new data.
Question:
Discuss the possibility of the model overfitting
Ans:
There is always overfitting problem in models, however the tuning parameter in SVM help to lessen overfitting.
Tuning parameters: C, sigma
Question:
Discuss when confusion matrices are used
Ans:
Find accuracy rate in calssification problem.
Question:
Discuss when a measure of central tendency may be used
Ans:
When ever we want use central are a representative value.
Question:
What were some of the weaknesses of the model?
Ans:
The SVM is not interpretable and depends with tuning parameters.
SVMs are difficult to use on large datasets due to long training times and high memory usage. They also struggle with noisy data, the need for careful kernel selection, and can be challenging to interpret and tune. Furthermore, SVMs don't provide direct probability estimates and are not well-suited for multi-class problems
Strengths
*Unstructured Data: SVMs can be applied to unstructured data like text and images, but these often require pre-processing steps to extract relevant features and convert them into a structured format suitable for the SVM algorithm.
*Image Classification: SVMs are utilized for recognizing and classifying images based on pixel intensity and patterns.
Text Categorization: SVM algorithms are applied in natural language processing to categorize text documents into predefined classes.
*Bioinformatics: In genomics, SVMs help in classifying proteins and genes based on their sequences and structural features.
*Fraud Detection: SVMs are used in finance to detect fraudulent transactions by analyzing patterns in transaction data.
*Handwriting Recognition: SVM models are employed in optical character recognition (OCR) systems
Question:
Is preprocessing the data necessary?
Ans:
If data is not in in the format as required by the function then yes needed preprocessing.
Yes, data preprocessing is necessary for SVM models because it significantly improves their performance and accuracy. Preprocessing helps handle issues like outliers, missing values, and different feature scales, which can negatively impact SVM's ability to find the optimal hyperplane. Key preprocessing steps for SVM include feature scaling and handling missing or categorical data.
Why preprocessing is necessary for SVM
*Handles different scales: SVM is sensitive to the scale of features. Preprocessing methods like normalization or standardization bring all features to a similar range, preventing features with larger values from disproportionately influencing the model.
*Improves performance: Preprocessing cleans the data by handling missing values, which can lead to a more robust and accurate model.
*Addresses outliers: Preprocessing can help deal with outliers, which can distort the hyperplane and negatively affect the model's performance.
*Enables non-linear classification: For non-linear data, preprocessing can be used to map the data to a higher-dimensional space, making it easier to find a linear separation using the kernel trick
Key preprocessing steps for SVM
*Feature scaling: Standardizing or normalizing features to a common range (e.g., [0] or [-1]) is crucial for SVM to avoid bias towards features with larger values.
*Handling missing values: Methods like imputation (using the mean, median, or mode) or deletion can be used to deal with missing data points.
*Handling categorical data: Categorical variables must be encoded into a numerical format (e.g., using one-hot encoding) before they can be used in an SVM model.
*Dimensionality reduction: Techniques like Principal Component Analysis (PCA) can be used to reduce the number of features while retaining important information, which can make the model more robust and computationally efficient.