Support Vector Machines (SVMs) are powerful supervised learning algorithms primarily used for classification tasks. The goal of SVM is to find an optimal separating hyperplane that best divides different classes of data points.
Understanding vector algebra is essential for SVMs.
A hyperplane in \(n\)-dimensional space is defined as: \[ \beta_0 + \beta^T x = 0 \] Where: - \(\beta\) is the normal vector (perpendicular to the hyperplane). - \(x\) represents the feature vectors. - \(\beta_0\) is the bias term.
For two classes labeled \(y_i \in \{-1, 1\}\), the SVM decision boundary follows: \[ y_i (\beta^T x_i + \beta_0) \geq M \] where \(M\) is the margin.
The objective is to maximize the margin, which translates to minimizing \(||\beta||\) subject to: \[ y_i (\beta^T x_i + \beta_0) \geq 1 \] This is a convex optimization problem solved using Lagrange multipliers.
SVM introduces slack variables \(\xi_i\) to allow some misclassification: \[ y_i (\beta^T x_i + \beta_0) \geq 1 - \xi_i \] where \(\xi_i \geq 0\) represents the margin violation amount.
New optimization objective: \[ \min_{\beta, \beta_0, \xi} \frac{1}{2} ||\beta||^2 + C \sum \xi_i \] where \(C\) is a regularization parameter controlling the trade-off between margin maximization and classification errors.
If a dataset is not linearly separable, we map it to a higher-dimensional space using kernels: \[ K(x_i, x_j) = \phi(x_i)^T \phi(x_j) \]
The kernel trick enables SVM to operate in higher-dimensional spaces without explicit transformations.
\[ \min_{\beta, \beta_0} \frac{1}{2} ||\beta||^2 \] subject to: \[ y_i (\beta^T x_i + \beta_0) \geq 1 \]
\[ \min_{\beta, \beta_0, \xi} \frac{1}{2} ||\beta||^2 + C \sum \xi_i \] subject to: \[ y_i (\beta^T x_i + \beta_0) \geq 1 - \xi_i, \quad \xi_i \geq 0 \]
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_redundant=0, random_state=42)
y = np.where(y == 0, -1, 1) # Convert labels to {-1, 1}
# Split into train-test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train SVM classifier
svm = SVC(kernel='linear', C=1.0)
svm.fit(X_train, y_train)
# Predictions
y_pred = svm.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2f}')
# Plot decision boundary
def plot_svm_decision_boundary(X, y, model):
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='k')
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 50),
np.linspace(ylim[0], ylim[1], 50))
Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
ax.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1], linestyles=['--', '-', '--'])
plt.title("SVM Decision Boundary")
plt.show()
plot_svm_decision_boundary(X_train, y_train, svm)