Maximum Margin Classifiers
这是一个用于解决分类和回归问题的流行方法. 它基于凸优化, 因此局部最优解也是全局最优解.
对于概率模型, 在二类问题的情况下, 最方便的表示是二进制表示, 其中只有一个目标, \(t = 1\) 表示类别 \(C_1\), \(t = 0\) 表示类别 \(C_2\). \(t\) 可以被视为概率.
SVM 是一个决策机器, 因此通常将 \(t\) 设置为 \(\in \{-1,\ 1\}\).
Update on 2024-06-12
这是一个用于解决分类和回归问题的流行方法. 它基于凸优化, 因此局部最优解也是全局最优解.
对于概率模型, 在二类问题的情况下, 最方便的表示是二进制表示, 其中只有一个目标, \(t = 1\) 表示类别 \(C_1\), \(t = 0\) 表示类别 \(C_2\). \(t\) 可以被视为概率.
SVM 是一个决策机器, 因此通常将 \(t\) 设置为 \(\in \{-1,\ 1\}\).
支持向量机(Support Vector Machines,SVM)的优势包括:
支持向量机的缺点包括:
A linear function of input vector:
\[ y(\mathbf x) = \mathbf x^T \mathbf w + w_0. \]
Here \(\mathbf w\) is a weight vector, \(w_0\) is a bias. if \(y > 0\), \(\mathbf x\) is assigned to class \(C_1\) and to class \(C_2\) otherwise. \(y = 0\) is therefore the corresponding decision boundary.
Consider two points \(\mathbf x_A\) and \(\mathbf x_B\) both of which lie on the decision surface.
\[ \mathbf x_A^T \mathbf w + w_0 = \mathbf x_B^T \mathbf w + w_0 = 0. \] \(\mathbf x_A^T - \mathbf x_B^T\) is a vector lie within the decision surface because all points lie within the decision surface. So
\[ (\mathbf x_A^T - \mathbf x_B^T) \mathbf w = 0. \] The vector \(\mathbf w\) is orthogonal to the decision surface.
The distance from the origin to the decision surface \(\lambda\):
\[ \begin{aligned} & \mathbf 0 - \mathbf x_d = \lambda \frac{\mathbf w}{||\mathbf w||}, \\ & \mathbf x_d^T \mathbf w + w_0 = 0. \end{aligned} \]
Solve them:
\[ \lambda = \frac{w_0}{||\mathbf w||}. \]
For any point \(\mathbf x\) to the decision surface:
\[ \begin{aligned} & \mathbf x - \mathbf x_d = \lambda \frac{\mathbf w}{||\mathbf w||}, \\ & \mathbf x_d^T \mathbf w + w_0 = 0. \end{aligned} \]
Solve them: \[ \lambda = \frac{\mathbf x^T \mathbf w + w_0}{||\mathbf w||} = \frac{y(\mathbf x)}{||\mathbf w||}. \]
So the absolute distance is \(\frac{t(x)y(\mathbf x)}{||\mathbf w||}\).
For titally \(N\) samples \(\{\mathbf x_1, \cdots, \mathbf x_n, \cdots \mathbf x_N\}\) the margin is
\[ \mbox{min}_n \frac{t_n (\mathbf x_n^T \mathbf w + w_0)}{||\mathbf w||}. \]
The maximum margin solution is found by solving:
\[ \mbox{argmax}_{\mathbf w,\ w_0} \bigg\{ \mbox{min}_n \frac{t_n (\mathbf x_n^T \mathbf w + w_0)}{||\mathbf w||} \bigg\}. \] Direct solution of this optimization problem would be very complex.
if we set
\[ t_n (\mathbf x_n^T \mathbf w + w_0) \geq 1, \tag{1} \]
for all \(n\), then we have to solve the optimization problem
\[ \mbox{argmin}_{\mathbf w,\ w_0} \frac{1}{2}||\mathbf w||^2, \]
subject to the constraints given by \((1)\).