支持向量机的原理和案例解析

发布于:2025-08-16 ⋅ 阅读:(18) ⋅ 点赞:(0)

一、支持向量机的核心目标:间隔最大化

支持向量机的核心思想是在两类数据中找到一个最优分离超平面,使得两类数据到超平面的“间隔”最大。我们从线性可分情况开始推导(非线性情况可通过核函数扩展)。

步骤1:定义分离超平面

在n维空间中,线性分离超平面的方程为:
w ⋅ x + b = 0 (1) \boldsymbol{w} \cdot \boldsymbol{x} + b = 0 \tag{1} wx+b=0(1)

  • w = ( w 1 , w 2 , . . . , w n ) T \boldsymbol{w} = (w_1, w_2, ..., w_n)^T w=(w1,w2,...,wn)T 是超平面的法向量(决定超平面方向);
  • b b b偏置项(决定超平面位置);
  • x = ( x 1 , x 2 , . . . , x n ) T \boldsymbol{x} = (x_1, x_2, ..., x_n)^T x=(x1,x2,...,xn)T 是样本点的特征向量。

对于二分类问题,假设样本标签为 y ∈ { + 1 , − 1 } y \in \{+1, -1\} y{+1,1}(分别代表正、负类),则超平面需满足:

  • 正类样本: w ⋅ x + b > 0 \boldsymbol{w} \cdot \boldsymbol{x} + b > 0 wx+b>0(即 y = + 1 y=+1 y=+1);
  • 负类样本: w ⋅ x + b < 0 \boldsymbol{w} \cdot \boldsymbol{x} + b < 0 wx+b<0(即 y = − 1 y=-1 y=1)。
步骤2:定义样本到超平面的距离(间隔)

为衡量超平面的“分离能力”,需定义样本到超平面的距离。

  • 函数间隔:对于样本 ( x i , y i ) (\boldsymbol{x}_i, y_i) (xi,yi),函数间隔为 y i ( w ⋅ x i + b ) y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) yi(wxi+b)

    • 意义:若值为正,说明分类正确( y i y_i yi w ⋅ x i + b \boldsymbol{w} \cdot \boldsymbol{x}_i + b wxi+b 同号);绝对值越大,分类可信度越高。
  • 几何间隔:函数间隔受 w \boldsymbol{w} w b b b 缩放影响(例如 w → 2 w , b → 2 b \boldsymbol{w} \to 2\boldsymbol{w}, b \to 2b w2w,b2b 时超平面不变,但函数间隔翻倍),因此需归一化:
    γ i = y i ( w ⋅ x i + b ) ∥ w ∥ (2) \gamma_i = \frac{y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b)}{\|\boldsymbol{w}\|} \tag{2} γi=wyi(wxi+b)(2)
    其中 ∥ w ∥ = w ⋅ w \|\boldsymbol{w}\| = \sqrt{\boldsymbol{w} \cdot \boldsymbol{w}} w=ww w \boldsymbol{w} w 的L2范数,几何间隔即样本到超平面的实际欧氏距离

步骤3:间隔最大化的目标

最优超平面需满足:所有样本的几何间隔中最小的那个(即“最小间隔”)尽可能大

设数据集的最小几何间隔为 γ = min ⁡ i γ i \gamma = \min_i \gamma_i γ=miniγi,目标是最大化 γ \gamma γ
max ⁡ w , b γ s.t. y i ( w ⋅ x i + b ) ∥ w ∥ ≥ γ ( ∀ i ) (3) \max_{\boldsymbol{w}, b} \gamma \quad \text{s.t.} \quad \frac{y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b)}{\|\boldsymbol{w}\|} \geq \gamma \quad (\forall i) \tag{3} w,bmaxγs.t.wyi(wxi+b)γ(i)(3)

步骤4:简化目标函数

由于超平面 ( w , b ) (\boldsymbol{w}, b) (w,b) ( k w , k b ) (k\boldsymbol{w}, kb) (kw,kb) k > 0 k>0 k>0)表示同一平面,可通过缩放将最小函数间隔归一化为1(即 y i ( w ⋅ x i + b ) ≥ 1 y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) \geq 1 yi(wxi+b)1),此时最小几何间隔 γ = 1 ∥ w ∥ \gamma = \frac{1}{\|\boldsymbol{w}\|} γ=w1

因此,最大化 γ \gamma γ 等价于最小化 ∥ w ∥ \|\boldsymbol{w}\| w(或 1 2 ∥ w ∥ 2 \frac{1}{2}\|\boldsymbol{w}\|^2 21w2,便于求导),目标函数简化为:
min ⁡ w , b 1 2 ∥ w ∥ 2 s.t. y i ( w ⋅ x i + b ) ≥ 1 ( ∀ i ) (4) \min_{\boldsymbol{w}, b} \frac{1}{2}\|\boldsymbol{w}\|^2 \quad \text{s.t.} \quad y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) \geq 1 \quad (\forall i) \tag{4} w,bmin21w2s.t.yi(wxi+b)1(i)(4)

二、通过拉格朗日乘子法求解优化问题

式(4)是带不等式约束的凸优化问题,可通过拉格朗日乘子法转化为对偶问题求解。

步骤5:构建拉格朗日函数

引入拉格朗日乘子 α i ≥ 0 \alpha_i \geq 0 αi0(对应每个约束条件),拉格朗日函数为:
L ( w , b , α ) = 1 2 ∥ w ∥ 2 − ∑ i = 1 N α i [ y i ( w ⋅ x i + b ) − 1 ]   ( 5 ) \mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\alpha}) = \frac{1}{2}\|\boldsymbol{w}\|^2 - \sum_{i=1}^N \alpha_i \left[ y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 \right] \ (5) L(w,b,α)=21w2i=1Nαi[yi(wxi+b)1] (5)

  • 第一项是原目标函数;
  • 第二项是约束条件的惩罚项( α i ≥ 0 \alpha_i \geq 0 αi0 确保约束被满足)。
步骤6:求解对偶问题(KKT条件)

凸优化的对偶问题需满足KKT(Karush-Kuhn-Tucker)条件,即:

  1. 原始可行性: y i ( w ⋅ x i + b ) ≥ 1 y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) \geq 1 yi(wxi+b)1
  2. 对偶可行性: α i ≥ 0 \alpha_i \geq 0 αi0
  3. 互补松弛: α i [ y i ( w ⋅ x i + b ) − 1 ] = 0 \alpha_i \left[ y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 \right] = 0 αi[yi(wxi+b)1]=0(若 α i > 0 \alpha_i > 0 αi>0,则约束取等号,即 y i ( w ⋅ x i + b ) = 1 y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) = 1 yi(wxi+b)=1);
  4. 梯度为零: ∇ w L = 0 \nabla_{\boldsymbol{w}} \mathcal{L} = 0 wL=0 ∇ b L = 0 \nabla_b \mathcal{L} = 0 bL=0
步骤7:求导化简(核心推导)

w \boldsymbol{w} w b b b 求偏导并令其为0:

  • w \boldsymbol{w} w 求导:
    ∇ w L = w − ∑ i = 1 N α i y i x i = 0    ⟹    w = ∑ i = 1 N α i y i x i   ( 6 ) \nabla_{\boldsymbol{w}} \mathcal{L} = \boldsymbol{w} - \sum_{i=1}^N \alpha_i y_i \boldsymbol{x}_i = 0 \implies \boldsymbol{w} = \sum_{i=1}^N \alpha_i y_i \boldsymbol{x}_i \ (6) wL=wi=1Nαiyixi=0w=i=1Nαiyixi (6)
    w \boldsymbol{w} w 可由样本的线性组合表示,系数为 α i y i \alpha_i y_i αiyi

  • b b b 求导:
    ∇ b L = − ∑ i = 1 N α i y i = 0    ⟹    ∑ i = 1 N α i y i = 0   ( 7 ) \nabla_b \mathcal{L} = -\sum_{i=1}^N \alpha_i y_i = 0 \implies \sum_{i=1}^N \alpha_i y_i = 0 \ (7) bL=i=1Nαiyi=0i=1Nαiyi=0 (7)

步骤8:对偶问题的目标函数

将式(6)代入拉格朗日函数(5),化简对偶问题:

  1. 展开 1 2 ∥ w ∥ 2 \frac{1}{2}\|\boldsymbol{w}\|^2 21w2
    1 2 ∥ w ∥ 2 = 1 2 ( ∑ i = 1 N α i y i x i ) ⋅ ( ∑ j = 1 N α j y j x j ) = 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) \frac{1}{2}\|\boldsymbol{w}\|^2 = \frac{1}{2} \left( \sum_{i=1}^N \alpha_i y_i \boldsymbol{x}_i \right) \cdot \left( \sum_{j=1}^N \alpha_j y_j \boldsymbol{x}_j \right) = \frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) 21w2=21(i=1Nαiyixi)(j=1Nαjyjxj)=21i=1Nj=1Nαiαjyiyj(xixj)

  2. 展开惩罚项:
    ∑ i = 1 N α i [ y i ( w ⋅ x i + b ) − 1 ] = ∑ i = 1 N α i y i ( w ⋅ x i ) + ∑ i = 1 N α i y i b − ∑ i = 1 N α i \sum_{i=1}^N \alpha_i \left[ y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) - 1 \right] = \sum_{i=1}^N \alpha_i y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i) + \sum_{i=1}^N \alpha_i y_i b - \sum_{i=1}^N \alpha_i i=1Nαi[yi(wxi+b)1]=i=1Nαiyi(wxi)+i=1Nαiyibi=1Nαi
    由式(6)和(7),第二项 ∑ α i y i b = b ⋅ 0 = 0 \sum \alpha_i y_i b = b \cdot 0 = 0 αiyib=b0=0,第一项:
    ∑ i = 1 N α i y i ( w ⋅ x i ) = ∑ i = 1 N α i y i ( ∑ j = 1 N α j y j x j ⋅ x i ) = ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) \sum_{i=1}^N \alpha_i y_i (\boldsymbol{w} \cdot \boldsymbol{x}_i) = \sum_{i=1}^N \alpha_i y_i \left( \sum_{j=1}^N \alpha_j y_j \boldsymbol{x}_j \cdot \boldsymbol{x}_i \right) = \sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) i=1Nαiyi(wxi)=i=1Nαiyi(j=1Nαjyjxjxi)=i=1Nj=1Nαiαjyiyj(xixj)

  3. 合并化简拉格朗日函数:
    L = 1 2 ∑ i , j α i α j y i y j ( x i ⋅ x j ) − [ ∑ i , j α i α j y i y j ( x i ⋅ x j ) − ∑ i α i ] \mathcal{L} = \frac{1}{2} \sum_{i,j} \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) - \left[ \sum_{i,j} \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) - \sum_i \alpha_i \right] L=21i,jαiαjyiyj(xixj)[i,jαiαjyiyj(xixj)iαi]
    = − 1 2 ∑ i , j α i α j y i y j ( x i ⋅ x j ) + ∑ i α i = -\frac{1}{2} \sum_{i,j} \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) + \sum_i \alpha_i =21i,jαiαjyiyj(xixj)+iαi

因此,对偶问题转化为最大化以下函数( subject to 约束 α i ≥ 0 \alpha_i \geq 0 αi0 ∑ α i y i = 0 \sum \alpha_i y_i = 0 αiyi=0):
max ⁡ α ( ∑ i = 1 N α i − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) ) (8) \max_{\boldsymbol{\alpha}} \left( \sum_{i=1}^N \alpha_i - \frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j (\boldsymbol{x}_i \cdot \boldsymbol{x}_j) \right) \tag{8} αmax(i=1Nαi21i=1Nj=1Nαiαjyiyj(xixj))(8)

步骤9:求解超平面参数

通过对偶问题求出 α \boldsymbol{\alpha} α 后,可计算:

  • w \boldsymbol{w} w:由式(6), w = ∑ α i y i x i \boldsymbol{w} = \sum \alpha_i y_i \boldsymbol{x}_i w=αiyixi
  • b b b:由互补松弛条件,对 α i > 0 \alpha_i > 0 αi>0 的样本(即支持向量), y i ( w ⋅ x i + b ) = 1 y_i(\boldsymbol{w} \cdot \boldsymbol{x}_i + b) = 1 yi(wxi+b)=1,解得:
    b = y i − w ⋅ x i = y i − ∑ j = 1 N α j y j ( x j ⋅ x i ) (9) b = y_i - \boldsymbol{w} \cdot \boldsymbol{x}_i = y_i - \sum_{j=1}^N \alpha_j y_j (\boldsymbol{x}_j \cdot \boldsymbol{x}_i) \tag{9} b=yiwxi=yij=1Nαjyj(xjxi)(9)
步骤10:分类决策函数

对新样本 x \boldsymbol{x} x,分类结果由超平面的符号决定:
f ( x ) = sign ( w ⋅ x + b ) = sign ( ∑ i = 1 N α i y i ( x i ⋅ x ) + b )   ( 10 ) f(\boldsymbol{x}) = \text{sign}(\boldsymbol{w} \cdot \boldsymbol{x} + b) = \text{sign}\left( \sum_{i=1}^N \alpha_i y_i (\boldsymbol{x}_i \cdot \boldsymbol{x}) + b \right) \ (10) f(x)=sign(wx+b)=sign(i=1Nαiyi(xix)+b) (10)

三、数学案例:线性可分数据的SVM求解

设二维数据集如下(线性可分):

  • 正类( y = + 1 y=+1 y=+1): x 1 = ( 3 , 3 ) T \boldsymbol{x}_1 = (3, 3)^T x1=(3,3)T x 2 = ( 4 , 3 ) T \boldsymbol{x}_2 = (4, 3)^T x2=(4,3)T
  • 负类( y = − 1 y=-1 y=1): x 3 = ( 1 , 1 ) T \boldsymbol{x}_3 = (1, 1)^T x3=(1,1)T x 4 = ( 2 , 1 ) T \boldsymbol{x}_4 = (2, 1)^T x4=(2,1)T
步骤1:直观分析

最优超平面应位于两类数据中间,假设支持向量为 x 1 \boldsymbol{x}_1 x1(正类)和 x 3 \boldsymbol{x}_3 x3(负类),即 α 1 > 0 , α 3 > 0 \alpha_1 > 0, \alpha_3 > 0 α1>0,α3>0 α 2 = α 4 = 0 \alpha_2 = \alpha_4 = 0 α2=α4=0(非支持向量)。

步骤2:代入对偶问题约束

∑ α i y i = 0 \sum \alpha_i y_i = 0 αiyi=0
α 1 ⋅ 1 + α 3 ⋅ ( − 1 ) = 0    ⟹    α 1 = α 3 (11) \alpha_1 \cdot 1 + \alpha_3 \cdot (-1) = 0 \implies \alpha_1 = \alpha_3 \tag{11} α11+α3(1)=0α1=α3(11)

步骤3:计算对偶目标函数

目标函数式(8)简化为(仅保留 α 1 , α 3 \alpha_1, \alpha_3 α1,α3):
W ( α ) = α 1 + α 3 − 1 2 [ α 1 2 ( x 1 ⋅ x 1 ) + α 3 2 ( x 3 ⋅ x 3 ) + 2 α 1 α 3 y 1 y 3 ( x 1 ⋅ x 3 ) ] W(\alpha) = \alpha_1 + \alpha_3 - \frac{1}{2} \left[ \alpha_1^2 (x_1 \cdot x_1) + \alpha_3^2 (x_3 \cdot x_3) + 2\alpha_1 \alpha_3 y_1 y_3 (x_1 \cdot x_3) \right] W(α)=α1+α321[α12(x1x1)+α32(x3x3)+2α1α3y1y3(x1x3)]

代入数据:

  • x 1 ⋅ x 1 = 3 2 + 3 2 = 18 x_1 \cdot x_1 = 3^2 + 3^2 = 18 x1x1=32+32=18 x 3 ⋅ x 3 = 1 2 + 1 2 = 2 x_3 \cdot x_3 = 1^2 + 1^2 = 2 x3x3=12+12=2
  • x 1 ⋅ x 3 = 3 ⋅ 1 + 3 ⋅ 1 = 6 x_1 \cdot x_3 = 3 \cdot 1 + 3 \cdot 1 = 6 x1x3=31+31=6 y 1 y 3 = 1 ⋅ ( − 1 ) = − 1 y_1 y_3 = 1 \cdot (-1) = -1 y1y3=1(1)=1
  • α 1 = α 3 = α \alpha_1 = \alpha_3 = \alpha α1=α3=α,得:
    W ( α ) = 2 α − 1 2 [ α 2 ⋅ 18 + α 2 ⋅ 2 + 2 α 2 ⋅ ( − 1 ) ⋅ 6 ] W(\alpha) = 2\alpha - \frac{1}{2} \left[ \alpha^2 \cdot 18 + \alpha^2 \cdot 2 + 2\alpha^2 \cdot (-1) \cdot 6 \right] W(α)=2α21[α218+α22+2α2(1)6]
    = 2 α − 1 2 [ 20 α 2 − 12 α 2 ] = 2 α − 4 α 2 = 2\alpha - \frac{1}{2} \left[ 20\alpha^2 - 12\alpha^2 \right] = 2\alpha - 4\alpha^2 =2α21[20α212α2]=2α4α2
步骤4:最大化对偶函数

W ( α ) W(\alpha) W(α) 求导并令其为0:
d W d α = 2 − 8 α = 0    ⟹    α = 1 4 \frac{dW}{d\alpha} = 2 - 8\alpha = 0 \implies \alpha = \frac{1}{4} dαdW=28α=0α=41
因此, α 1 = α 3 = 1 4 \alpha_1 = \alpha_3 = \frac{1}{4} α1=α3=41 α 2 = α 4 = 0 \alpha_2 = \alpha_4 = 0 α2=α4=0

步骤5:计算 w \boldsymbol{w} w b b b
  • w = α 1 y 1 x 1 + α 3 y 3 x 3 = 1 4 ⋅ 1 ⋅ ( 3 , 3 ) + 1 4 ⋅ ( − 1 ) ⋅ ( 1 , 1 ) = ( 3 − 1 4 , 3 − 1 4 ) = ( 0.5 , 0.5 ) \boldsymbol{w} = \alpha_1 y_1 \boldsymbol{x}_1 + \alpha_3 y_3 \boldsymbol{x}_3 = \frac{1}{4} \cdot 1 \cdot (3,3) + \frac{1}{4} \cdot (-1) \cdot (1,1) = \left( \frac{3-1}{4}, \frac{3-1}{4} \right) = (0.5, 0.5) w=α1y1x1+α3y3x3=411(3,3)+41(1)(1,1)=(431,431)=(0.5,0.5)
  • 由支持向量 x 1 \boldsymbol{x}_1 x1 b b b y 1 ( w ⋅ x 1 + b ) = 1 y_1(\boldsymbol{w} \cdot \boldsymbol{x}_1 + b) = 1 y1(wx1+b)=1
    1 ⋅ ( 0.5 ⋅ 3 + 0.5 ⋅ 3 + b ) = 1    ⟹    3 + b = 1    ⟹    b = − 2 1 \cdot (0.5 \cdot 3 + 0.5 \cdot 3 + b) = 1 \implies 3 + b = 1 \implies b = -2 1(0.53+0.53+b)=13+b=1b=2
步骤6:验证超平面

超平面方程: 0.5 x 1 + 0.5 x 2 − 2 = 0 0.5x_1 + 0.5x_2 - 2 = 0 0.5x1+0.5x22=0(即 x 1 + x 2 = 4 x_1 + x_2 = 4 x1+x2=4)。

  • 正类样本 x 1 \boldsymbol{x}_1 x1 到超平面的距离: ∣ 3 + 3 − 4 ∣ 0.5 2 + 0.5 2 = 2 \frac{|3+3-4|}{\sqrt{0.5^2 + 0.5^2}} = \sqrt{2} 0.52+0.52 ∣3+34∣=2
  • 负类样本 x 3 \boldsymbol{x}_3 x3 到超平面的距离: ∣ 1 + 1 − 4 ∣ 0.5 2 + 0.5 2 = 2 \frac{|1+1-4|}{\sqrt{0.5^2 + 0.5^2}} = \sqrt{2} 0.52+0.52 ∣1+14∣=2 ,满足间隔最大化。

总结

SVM通过间隔最大化确定最优超平面,利用拉格朗日乘子法将原始问题转化为对偶问题,最终仅通过支持向量( α i > 0 \alpha_i > 0 αi>0 的样本)即可求解超平面参数。这一特性使其在高维空间中仍具高效性,且可通过核函数扩展到非线性分类场景。以下将对支持向量机(SVM)的核心公式原理进行逐步骤详细推导,并结合数学案例说明,确保每一步的逻辑和计算过程清晰可追溯。


网站公告

今日签到

点亮在社区的每一天
去签到