Update on 2024-06-12

Support vector machine

Maximum Margin Classifiers

这是一个用于解决分类和回归问题的流行方法. 它基于凸优化, 因此局部最优解也是全局最优解.

对于概率模型, 在二类问题的情况下, 最方便的表示是二进制表示, 其中只有一个目标, \(t = 1\) 表示类别 \(C_1\), \(t = 0\) 表示类别 \(C_2\). \(t\) 可以被视为概率.

SVM 是一个决策机器, 因此通常将 \(t\) 设置为 \(\in \{-1,\ 1\}\).

优势和劣势

支持向量机(Support Vector Machines,SVM)的优势包括:

  1. 在高维空间中表现有效.
  2. 在维度数量大于样本数量的情况下仍然有效.
  3. 在决策函数中使用训练点的子集(称为支持向量),因此在内存利用方面也很高效.
  4. 全面:可以为决策函数指定不同的核函数.提供了常见的核函数,同时也可以指定自定义的核函数.

支持向量机的缺点包括:

  1. 如果特征数量远大于样本数量,则在选择核函数和正则化项时要避免过拟合.
  2. SVM不直接提供概率估计.

Decision boundary (two classes)

A linear function of input vector:

\[ y(\mathbf x) = \mathbf x^T \mathbf w + w_0. \]

Here \(\mathbf w\) is a weight vector, \(w_0\) is a bias. if \(y > 0\), \(\mathbf x\) is assigned to class \(C_1\) and to class \(C_2\) otherwise. \(y = 0\) is therefore the corresponding decision boundary.

Distance to origin

Consider two points \(\mathbf x_A\) and \(\mathbf x_B\) both of which lie on the decision surface.

\[ \mathbf x_A^T \mathbf w + w_0 = \mathbf x_B^T \mathbf w + w_0 = 0. \] \(\mathbf x_A^T - \mathbf x_B^T\) is a vector lie within the decision surface because all points lie within the decision surface. So

\[ (\mathbf x_A^T - \mathbf x_B^T) \mathbf w = 0. \] The vector \(\mathbf w\) is orthogonal to the decision surface.

The distance from the origin to the decision surface \(\lambda\):

\[ \begin{aligned} & \mathbf 0 - \mathbf x_d = \lambda \frac{\mathbf w}{||\mathbf w||}, \\ & \mathbf x_d^T \mathbf w + w_0 = 0. \end{aligned} \]

Solve them:

\[ \lambda = \frac{w_0}{||\mathbf w||}. \]

Distance

For any point \(\mathbf x\) to the decision surface:

\[ \begin{aligned} & \mathbf x - \mathbf x_d = \lambda \frac{\mathbf w}{||\mathbf w||}, \\ & \mathbf x_d^T \mathbf w + w_0 = 0. \end{aligned} \]

Solve them: \[ \lambda = \frac{\mathbf x^T \mathbf w + w_0}{||\mathbf w||} = \frac{y(\mathbf x)}{||\mathbf w||}. \]

So the absolute distance is \(\frac{t(x)y(\mathbf x)}{||\mathbf w||}\).

For titally \(N\) samples \(\{\mathbf x_1, \cdots, \mathbf x_n, \cdots \mathbf x_N\}\) the margin is

\[ \mbox{min}_n \frac{t_n (\mathbf x_n^T \mathbf w + w_0)}{||\mathbf w||}. \]

The maximum margin

The maximum margin solution is found by solving:

\[ \mbox{argmax}_{\mathbf w,\ w_0} \bigg\{ \mbox{min}_n \frac{t_n (\mathbf x_n^T \mathbf w + w_0)}{||\mathbf w||} \bigg\}. \] Direct solution of this optimization problem would be very complex.

if we set

\[ t_n (\mathbf x_n^T \mathbf w + w_0) \geq 1, \tag{1} \]

for all \(n\), then we have to solve the optimization problem

\[ \mbox{argmin}_{\mathbf w,\ w_0} \frac{1}{2}||\mathbf w||^2, \]

subject to the constraints given by \((1)\).