\[ \begin{aligned} \hat {\boldsymbol \beta}^\rm{ridge} &= \underset{\boldsymbol \beta} {\rm argmin} \bigg\{ \sum_{i = 1}^n (y_i - \sum_{j = 1}^{p}\mathbf x_{ij} \beta_j)^2 \bigg\} \\ &\mbox{subject to} \sum_{j = 1}^p \beta_j^2 \leq s. \end{aligned} \] 显然\(0 \leq s \leq \sum \hat{\beta}_j^2\). 一种更常见的形式: \[ \hat {\boldsymbol \beta}^{\rm ridge} = \underset{\boldsymbol \beta} {\rm argmin} \bigg\{\frac{1}{2}\sum_{i = 1}^n (y_i - \sum_{j = 1}^{p}\mathbf x_{ij} \beta_j)^2 \color{red}{+ \lambda\sum_j^p \beta_j^2} \bigg\}. \] 其中对于每一个\(s > 0\),都有一个\(\lambda > 0\)与之一一对应. 当\(s\)越小, 对应\(\lambda\)越大, 系数\(\boldsymbol \beta\)收敛越厉害.