《机器学习数学基础》补充资料:正交基

发布于:2025-02-10 ⋅ 阅读:(21) ⋅ 点赞:(0)

已经用两篇文章,介绍了线性代数的两个基本定理:

本文介绍第三个定理:正交基。

定理3:正交基

m × n m\times n m×n 矩阵 A \pmb{A} A r = rank A ≤ min ⁡ { m , n } r=\text{rank}\pmb{A}\le\min\{m,n\} r=rankAmin{m,n} { v 1 , ⋯   , v r } \{\pmb{v}_1,\cdots,\pmb{v}_r\} {v1,,vr} A \pmb{A} A 的行空间 C ( A T ) C(\pmb{A}^T) C(AT) 的一组基, { u 1 , ⋯   , u r } \{\pmb{u}_1,\cdots,\pmb{u}_r\} {u1,,ur} 是列空间 C ( A ) C(\pmb{A}) C(A) 的一组基。

σ 1 2 , ⋯   , σ n 2 \sigma_1^2,\cdots,\sigma_n^2 σ12,,σn2 σ i ≥ 0 \sigma_i\ge0 σi0 )是 n × n n\times n n×n 的矩阵 A T A \pmb{A}^{\text{T}}\pmb{A} ATA 的特征值, e 1 , ⋯   , e n \pmb{e}_1,\cdots,\pmb{e}_n e1,,en 为相应的单位特征向量,则有:

A T A e i = σ i 2 e i , i = 1 , ⋯   , n (3.1) \pmb{A}^{\text{T}}\pmb{Ae}_i=\sigma_i^2\pmb{e}_i,\quad i=1,\cdots,n \tag{3.1} ATAei=σi2ei,i=1,,n(3.1)

其中, { e i T e i = 1 , i = j e i T e i = 0 , i ≠ j \begin{cases}\pmb{e}^{\text{T}}_i\pmb{e}_i=1,\quad i=j\\\pmb{e}^{\text{T}}_i\pmb{e}_i=0,\quad i\ne j\end{cases} {eiTei=1,i=jeiTei=0,i=j

将(3.1)式写成矩阵相乘的形式:

A T A [ e 1 ⋯ e n ] = [ e 1 ⋯ e n ] [ σ 1 2 ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ σ n 2 ] \pmb{A}^{\text{T}}\pmb{A}\begin{bmatrix}\pmb{e}_1&\cdots&\pmb{e}_n\end{bmatrix}=\begin{bmatrix}\pmb{e}_1&\cdots&\pmb{e}_n\end{bmatrix}\begin{bmatrix}\sigma^2_1&\cdots&0\\\vdots&\ddots&\vdots\\0&\cdots&\sigma^2_n\end{bmatrix} ATA[e1en]=[e1en] σ1200σn2
E = [ e 1 ⋯ e n ] \pmb{E}=\begin{bmatrix}\pmb{e}_1&\cdots&\pmb{e}_n\end{bmatrix} E=[e1en] ,则 E T E = I \pmb{E}^{\text{T}}\pmb{E}=\pmb{I} ETE=I ,故 E T = E − 1 \pmb{E}^{\text{T}}=\pmb{E}^{-1} ET=E1 ,则 E \pmb{E} E 是正交矩阵。对上式右乘 E T \pmb{E}^{\text{T}} ET 后,得:

A T A = E [ σ 1 2 ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ σ n 2 ] E T \pmb{A}^{\text{T}}\pmb{A}=\pmb{E}\begin{bmatrix}\sigma^2_1&\cdots&0\\\vdots&\ddots&\vdots\\0&\cdots&\sigma^2_n\end{bmatrix}\pmb{E}^{\text{T}} ATA=E σ1200σn2 ET

所以, rank ( A T A ) = rank ( d i a g ( σ 1 2 , ⋯   , σ n 2 ) ) \text{rank}(\pmb{A}^{\text{T}}\pmb{A})=\text{rank}(diag(\sigma^2_1,\cdots,\sigma^2_n)) rank(ATA)=rank(diag(σ12,,σn2))

根据前面的假设 r = rank A r=\text{rank}\pmb{A} r=rankA ,可以假设 σ 1 ≥ σ 2 ≥ ⋯ σ r > 0 \sigma_1\ge\sigma_2\ge\cdots\sigma_r\gt0 σ1σ2σr>0 ,且 σ + r + 1 = ⋯ = σ n = 0 \sigma+{r+1}=\cdots=\sigma_n=0 σ+r+1==σn=0 ,则 rank ( A T A ) = r \text{rank}(\pmb{A}^{\text{T}}\pmb{A})=r rank(ATA)=r

根据定理1, dim ⁡ N ( A ) = n − r a n k A = n − r \dim N(\pmb{A})=n-rank\pmb{A}=n-r dimN(A)=nrankA=nr

另外: ∥ A e i ∥ 2 = e i T A T A e i = σ i 2 e i T e i = σ i 2 \begin{Vmatrix}\pmb{Ae}_i\end{Vmatrix}^2=\pmb{e}_i^{\text{T}}\pmb{A}^{\text{T}}\pmb{Ae}_i=\sigma_i^2\pmb{e}^{\text{T}}_i\pmb{e}_i=\sigma_i^2 Aei 2=eiTATAei=σi2eiTei=σi2

所以, ∥ A e i ∥ = σ i , ( 1 ≤ i ≤ n ) \begin{Vmatrix}\pmb{Ae}_i\end{Vmatrix}=\sigma_i,(1\le i\le n) Aei =σi,(1in)

i = r + 1 , ⋯   , n i=r+1,\cdots,n i=r+1,,n ∥ A e i ∥ = 0 \begin{Vmatrix}\pmb{Ae}_i\end{Vmatrix}=0 Aei =0 ,即 A e i = 0 \pmb{Ae}_i=\pmb{0} Aei=0

因为 { e 1 , ⋯   , e n } \{\pmb{e}_1,\cdots,\pmb{e}_n\} {e1,,en} 线性独立,且 dim ⁡ N ( A ) = n − r \dim N(\pmb{A})=n-r dimN(A)=nr ,所以 { e r + 1 , ⋯   , e n } \{\pmb{e}_{r+1},\cdots,\pmb{e}_n\} {er+1,,en} A \pmb{A} A 的零空间 N ( A ) N(\pmb{A}) N(A) 的基。

根据定理2,从子空间的正交补可知, { e 1 , ⋯   , e r } \{\pmb{e}_1,\cdots,\pmb{e}_r\} {e1,,er} A \pmb{A} A 的行空间 C ( A T ) C(\pmb{A}^{\text{T}}) C(AT) 的基。

将(3.1)式左乘 A \pmb{A} A ,得:

A A T A e i = σ i 2 A e i , i = 1 , ⋯   , n \pmb{AA}^{\text{T}}\pmb{Ae}_i=\sigma_i^2\pmb{Ae}_i,i=1,\cdots,n AATAei=σi2Aei,i=1,,n
m × m m\times m m×m 矩阵 A A T \pmb{AA}^{\text{T}} AAT 有非零特征值 σ 1 2 , ⋯   , σ r 2 \sigma_1^2,\cdots,\sigma_r^2 σ12,,σr2 ,对应特征向量为 A e i , i = 1 , ⋯   , r \pmb{Ae}_i,i=1,\cdots,r Aei,i=1,,r

令: u i = A e i σ i , i = 1 , ⋯   , r \pmb{u}_i=\frac{\pmb{Ae}_i}{\sigma_i},i=1,\cdots,r ui=σiAei,i=1,,r

对于 1 ≤ i , j ≤ r 1\le{i,j}\le{r} 1i,jr

u i T u j = ( A e i σ i ) T ( A e j σ j ) = e i T A T A e j σ i σ j = { 1 ( i = j ) 0 ( i ≠ j ) \pmb{u_i}^{\text{T}}\pmb{u}_j=\left(\frac{\pmb{Ae}_i}{\sigma_i}\right)^{\text{T}}\left(\frac{\pmb{Ae}_j}{\sigma_j}\right)=\frac{\pmb{e}_i^{\text{T}}\pmb{A}^{\text{T}}\pmb{Ae}_j}{\sigma_i\sigma_j}=\begin{cases}1\quad(i=j)\\0\quad(i\ne j )\end{cases} uiTuj=(σiAei)T(σjAej)=σiσjeiTATAej={1(i=j)0(i=j)
{ u 1 , ⋯   , u r } \{\pmb{u}_1,\cdots,\pmb{u}_r\} {u1,,ur} 构成了 A \pmb{A} A 的列空间 C ( A ) C(\pmb{A}) C(A) 的一组单位正交基。

因为 A A T \pmb{AA}^{\text{T}} AAT A T A \pmb{A}^{\text{T}}\pmb{A} ATA 有相同的非零特征值, A A T \pmb{AA}^{\text{T}} AAT 另有 m − r m-r mr 个零特征值。

根据格拉姆-施密特正交化方法(参阅《机器学习数学基础》第3章3.5.1节),得左零空间 N ( A T ) = N ( A A T ) N(\pmb{A}^{\text{T}})=N(\pmb{AA}^{\text{T}}) N(AT)=N(AAT) 的单位正交基 { u r + 1 , ⋯   , u m } \{\pmb{u}_{r+1},\cdots,\pmb{u}_m\} {ur+1,,um}

因为 N ( A T ) = C ( A ) ⊥ N(\pmb{A}^{\text{T}})=C(\pmb{A})^{\bot} N(AT)=C(A) (定理2), { u 1 , ⋯   , u m } \{\pmb{u}_1,\cdots,\pmb{u}_m\} {u1,,um} R m \mathbb{R}^m Rm 的单位正交基。

综上可得:

对于 m × n m\times n m×n 矩阵 A \pmb{A} A ,秩为 r r r ,则:

  • r = rank A = rank A T = rank ( A T A ) = rank ( A A T ) r=\text{rank}\pmb{A}=\text{rank}\pmb{A}^{\text{T}}=\text{rank}(\pmb{A}^{\text{T}}\pmb{A})=\text{rank}(\pmb{AA}^{\text{T}}) r=rankA=rankAT=rank(ATA)=rank(AAT)
  • C ( A T ) = C ( A T A ) , C ( A ) = C ( A A T ) C(\pmb{A}^{\text{T}})=C(\pmb{A}^{\text{T}}\pmb{A}),\quad C(\pmb{A})=C(\pmb{AA}^{\text{T}}) C(AT)=C(ATA),C(A)=C(AAT)
  • N ( A ) = N ( A T A ) , N ( A T ) = N ( A A T ) N(\pmb{A})=N(\pmb{A}^{\text{T}}\pmb{A}),\quad N(\pmb{A}^{\text{T}})=N(\pmb{AA}^{\text{T}}) N(A)=N(ATA),N(AT)=N(AAT)
  • A T A \pmb{A}^{\text{T}}\pmb{A} ATA 的特征值为 σ 1 2 , ⋯   , σ n 2 \sigma_1^2,\cdots,\sigma_n^2 σ12,,σn2 ,对应单位正交的特征向量 e 1 , ⋯   , e n \pmb{e}_1,\cdots,\pmb{e}_n e1,,en
  • A A T \pmb{AA}^{\text{T}} AAT 的特征值为 σ 1 2 , ⋯   , σ m 2 \sigma_1^2,\cdots,\sigma_m^2 σ12,,σm2 ,对应单位正交的特征向量 u 1 , ⋯   , u m \pmb{u}_1,\cdots,\pmb{u}_m u1,,um
  • A e i = σ i u i , σ i > 0 , i = 1 , ⋯   , r \pmb{Ae}_i=\sigma_i\pmb{u}_i,\sigma_i\gt0,i=1,\cdots,r Aei=σiui,σi>0,i=1,,r ,且 A e i = 0 , i = r + 1 , ⋯   , n \pmb{Ae}_i=\pmb{0},i=r+1,\cdots,n Aei=0,i=r+1,,n
  • A T u j = σ e j , σ j > 0 , j = 1 , ⋯   , r \pmb{A}^{\text{T}}\pmb{u}_j=\sigma\pmb{e}_j,\sigma_j\gt0,j=1,\cdots,r ATuj=σej,σj>0,j=1,,r ,且 A T u j = 0 , j = r + 1 , ⋯   , m \pmb{A}^{\text{T}}\pmb{u}_j=\pmb{0},j=r+1,\cdots,m ATuj=0,j=r+1,,m

基本子空间的单位正交基

  • 行空间 C ( A T ) C(\pmb{A}^{\text{T}}) C(AT) 的基: { e 1 , ⋯   , e r } \{\pmb{e}_1,\cdots,\pmb{e}_r\} {e1,,er} dim ⁡ C ( A T ) = r \dim C(\pmb{A}^{\text{T}})=r dimC(AT)=r
  • 零空间 N ( A ) N(\pmb{A}) N(A) 的基: { e r + 1 , ⋯   , e n } \{\pmb{e}_{r+1},\cdots,\pmb{e}_n\} {er+1,,en} dim ⁡ N ( A ) = n − r \dim N(\pmb{A})=n-r dimN(A)=nr
  • 列空间 C ( A ) C(\pmb{A}) C(A) 的基: { u 1 , ⋯   , u r } \{\pmb{u}_1,\cdots,\pmb{u}_r\} {u1,,ur} dim ⁡ C ( A ) = r \dim C(\pmb{A}) = r dimC(A)=r
  • 左零空间 N ( A T ) N(\pmb{A}^{\text{T}}) N(AT) 的基: { u r + 1 , ⋯   , u m } \{\pmb{u}_{r+1},\cdots,\pmb{u}_m\} {ur+1,,um} dim ⁡ N ( A T ) = m − r \dim N(\pmb{A}^{\text{T}})=m-r dimN(AT)=mr

定理4:奇异值分解

此定理已经在《机器学习数学基础》第3章3.5.3节详细介绍,此处不赘述。

术语比较

T : V → W \pmb{T}:\mathbb{V}\to\mathbb{W} T:VW 线性变换 m × n m\times n m×n 矩阵 A : R n → R m \pmb{A}:\mathbb{R}^n\to\mathbb{R}^m A:RnRm
值域:$ran(\pmb{T})={\pmb{T}(\pmb{x}) \pmb{x}\in\mathbb{V}}\subseteq\mathbb{W}$
核: ker ⁡ ( T ) = { x ∈ V + T ( x ) = 0 } ⊆ V \ker(\pmb{T})=\{\pmb{x}\in\mathbb{V}+\pmb{T}(\pmb{x})=\pmb{0}\}\subseteq\mathbb{V} ker(T)={xV+T(x)=0}V 零空间:$N(\pmb{A})={\pmb{x}\in\mathbb{R}^n
秩: rank T = dim ⁡ R ( T ) \text{rank}\pmb{T}=\dim R(\pmb{T}) rankT=dimR(T) 秩: rank A = dim ⁡ C ( A ) \text{rank}\pmb{A}=\dim C(\pmb{A}) rankA=dimC(A)
零化度: n u l l i t y T = dim ⁡ ker ⁡ ( T ) nullity\pmb{T}=\dim\ker(\pmb{T}) nullityT=dimker(T) 零化度: n u l l i t y A = dim ⁡ N ( A ) nullity\pmb{A}=\dim N(\pmb{A}) nullityA=dimN(A)
满射: R ( T ) = W R(\pmb{T})=\mathbb{W} R(T)=W ,即 rank T = dim ⁡ W \text{rank}\pmb{T}=\dim\mathbb{W} rankT=dimW 满行秩: C ( A ) = R m C(\pmb{A})=\mathbb{R}^m C(A)=Rm ,即 rank A = m \text{rank}\pmb{A}=m rankA=m
单射: ker ⁡ ( T ) = { 0 } \ker(\pmb{T})=\{\pmb{0}\} ker(T)={0} ,即 rank T = dim ⁡ V \text{rank}\pmb{T}=\dim\mathbb{V} rankT=dimV 满列秩: N ( A ) = { 0 } N(\pmb{A})=\{\pmb{0}\} N(A)={0} ,即 rank A = n \text{rank}\pmb{A}=n rankA=n
同构: rank T = dim ⁡ W = dim ⁡ V \text{rank}\pmb{T}=\dim\mathbb{W}=\dim\mathbb{V} rankT=dimW=dimV 满秩: rank A = m = n \text{rank}\pmb{A}=m=n rankA=m=n