机器学习笔记之降维(二)样本均值与样本方差的矩阵表示

发布于:2022-11-28 ⋅ 阅读:(281) ⋅ 点赞:(0)

引言

上一节介绍了维数灾难,随着维数增加对计算量样本稀疏性造成的影响。本节将介绍降维的预备知识——对样本均值和样本方差进行矩阵表示

场景介绍

已知数据集合 X \mathcal X X N N N个样本构成:
X = { x ( 1 ) , x ( 2 ) , ⋯   , x ( N ) } \mathcal X = \{x^{(1)},x^{(2)},\cdots,x^{(N)}\} X={x(1),x(2),,x(N)}
任意样本 x ( i ) ( i = 1 , 2 , ⋯   , N ) x^{(i)}(i=1,2,\cdots,N) x(i)(i=1,2,,N)均包含 p p p维特征:
x ( i ) = ( x 1 ( i ) x 2 ( i ) ⋮ x p ( i ) ) x ( i ) ∈ R p , i = 1 , 2 , ⋯   , N x^{(i)} = \begin{pmatrix}x_1^{(i)} \\ x_2^{(i)} \\ \vdots \\ x_p^{(i)}\end{pmatrix} \quad x^{(i)} \in \mathbb R^p,i=1,2,\cdots,N x(i)= x1(i)x2(i)xp(i) x(i)Rp,i=1,2,,N
至此,数据集合 X \mathcal X X可表示为 N × p N \times p N×p的矩阵形式:
X = ( x ( 1 ) , x ( 2 ) , ⋯   , x ( N ) ) T = ( x 1 ( 1 ) , x 2 ( 1 ) , ⋯   , x p ( 1 ) x 1 ( 2 ) , x 2 ( 2 ) , ⋯   , x p ( 2 ) ⋮ x 1 ( N ) , x 2 ( N ) , ⋯   , x p ( N ) ) N × p \mathcal X = \begin{pmatrix}x^{(1)},x^{(2)},\cdots,x^{(N)}\end{pmatrix}^{T} = \begin{pmatrix} x_1^{(1)},x_2^{(1)},\cdots,x_p^{(1)} \\ x_1^{(2)},x_2^{(2)},\cdots,x_p^{(2)} \\ \vdots \\ x_1^{(N)},x_2^{(N)},\cdots,x_p^{(N)}\end{pmatrix}_{N \times p} X=(x(1),x(2),,x(N))T= x1(1),x2(1),,xp(1)x1(2),x2(2),,xp(2)x1(N),x2(N),,xp(N) N×p

样本均值与样本方差

假设数据集合 X \mathcal X X一维数据集合

  • 样本均值(Sample Mean) X ˉ \bar {\mathcal X} Xˉ表示具体如下:
    X ˉ = 1 N ∑ i = 1 N x ( i ) \bar {\mathcal X} = \frac{1}{N} \sum_{i=1}^{N} x^{(i)} Xˉ=N1i=1Nx(i)
  • 样本方差(Sample Convariance) S \mathcal S S具体表示如下:
    S = 1 N ∑ i = 1 N ( x ( i ) − X ˉ ) 2 \mathcal S = \frac{1}{N} \sum_{i=1}^N (x^{(i)} - \bar{\mathcal X})^2 S=N1i=1N(x(i)Xˉ)2

对应地,如果数据集合 X \mathcal X X中的样本 x ( i ) ( i = 1 , 2 , ⋯   , N ) x^{(i)}(i=1,2,\cdots,N) x(i)(i=1,2,,N) p p p维向量。这样一个集合,它的 样本均值样本方差如何表示呢

  • 样本均值
    X ˉ = 1 N ∑ i = 1 N x ( i ) \bar {\mathcal X} = \frac{1}{N} \sum_{i=1}^{N} x^{(i)} Xˉ=N1i=1Nx(i)

  • 样本方差
    S = 1 N ∑ i = 1 N ( x ( i ) − X ˉ ) ( x ( i ) − X ˉ ) T \mathcal S = \frac{1}{N} \sum_{i=1}^N (x^{(i)} - \bar {\mathcal X})(x^{(i)} - \bar {\mathcal X})^{T} S=N1i=1N(x(i)Xˉ)(x(i)Xˉ)T

由于 x ( i ) x^{(i)} x(i)是一个 p p p维向量,因而样本均值 X ˉ \bar {\mathcal X} Xˉ同样是一个 p p p维向量形式
X ˉ = ( 1 N ∑ i = 1 N x 1 ( i ) 1 N ∑ i = 1 N x 2 ( i ) ⋮ 1 N ∑ i = 1 N x p ( i ) ) p × 1 \bar {\mathcal X} = \begin{pmatrix} \frac{1}{N} \sum_{i=1}^{N} x_1^{(i)} \\ \frac{1}{N} \sum_{i=1}^{N} x_2^{(i)} \\ \vdots \\ \frac{1}{N} \sum_{i=1}^{N} x_p^{(i)}\end{pmatrix}_{p \times 1} Xˉ= N1i=1Nx1(i)N1i=1Nx2(i)N1i=1Nxp(i) p×1

对应的样本方差是一个 p × p p\times p p×p的方阵形式:
S p × p = 1 N ∑ i = 1 N ( x ( i ) − X ˉ ) p × 1 ⋅ ( x ( i ) − X ˉ ) 1 × p T \mathcal S_{p \times p} = \frac{1}{N} \sum_{i=1}^N (x^{(i)} - \bar {\mathcal X})_{p \times 1} \cdot (x^{(i)} - \bar {\mathcal X})_{1 \times p}^{T} Sp×p=N1i=1N(x(i)Xˉ)p×1(x(i)Xˉ)1×pT

样本均值与样本方差的矩阵表示

样本均值的矩阵表达

样本均值公式展开,写成如下向量乘法形式
X ˉ = 1 N ∑ i = 1 N x ( i ) = 1 N ( x ( 1 ) , x ( 2 ) , ⋯   , x ( N ) ) p × N ⋅ ( 1 1 ⋮ 1 ) N × 1 \bar {\mathcal X} = \frac{1}{N} \sum_{i=1}^N x^{(i)} = \frac{1}{N}\begin{pmatrix}x^{(1)},x^{(2)},\cdots,x^{(N)}\end{pmatrix}_{p \times N} \cdot \begin{pmatrix}1 \\ 1\\ \vdots \\1\end{pmatrix}_{N \times 1} Xˉ=N1i=1Nx(i)=N1(x(1),x(2),,x(N))p×N 111 N×1
定义如下符号:
元素均为1的 N N N维列向量。
( 1 1 ⋮ 1 ) N × 1 = I N \begin{pmatrix}1 \\ 1\\ \vdots \\1\end{pmatrix}_{N \times 1} = \mathcal I_N 111 N×1=IN
样本均值 X ˉ \bar {\mathcal X} Xˉ可表示为如下形式:
X ˉ = 1 N ( x ( 1 ) , x ( 2 ) , ⋯   , x ( N ) ) ⋅ I N = 1 N X T ⋅ I N \begin{aligned} \bar{\mathcal X} & = \frac{1}{N} \begin{pmatrix}x^{(1)},x^{(2)},\cdots,x^{(N)}\end{pmatrix} \cdot \mathcal I_N \\ & = \frac{1}{N} \mathcal X^{T} \cdot \mathcal I_N \end{aligned} Xˉ=N1(x(1),x(2),,x(N))IN=N1XTIN

样本方差的矩阵表达

样本均值,将样本方差公式进行展开
S = 1 N ∑ i = 1 N ( x ( i ) − X ˉ ) ( x ( i ) − X ˉ ) T = 1 N [ ( x ( 1 ) − X ˉ ) ( x ( 1 ) − X ˉ ) T + ⋯ + ( x ( N ) − X ˉ ) ( x ( N ) − X ˉ ) T ] \begin{aligned} \mathcal S & = \frac{1}{N} \sum_{i=1}^N (x^{(i)} - \bar {\mathcal X})(x^{(i)} - \bar {\mathcal X})^{T} \\ & = \frac{1}{N} \left[ (x^{(1)} - \bar{\mathcal X})(x^{(1)} - \bar{\mathcal X})^{T} + \cdots + (x^{(N)} - \bar{\mathcal X})(x^{(N)} - \bar{\mathcal X})^{T}\right] \end{aligned} S=N1i=1N(x(i)Xˉ)(x(i)Xˉ)T=N1[(x(1)Xˉ)(x(1)Xˉ)T++(x(N)Xˉ)(x(N)Xˉ)T]
将上述中括号中的元素表示为向量乘积的形式:
S = 1 N ( x ( 1 ) − X ˉ , ⋯   , x ( N ) − X ˉ ) ⋅ [ ( x ( 1 ) − X ˉ ) T ( x ( 2 ) − X ˉ ) T ⋮ ( x ( N ) − X ˉ ) T ] \mathcal S = \frac{1}{N} \left(x^{(1)} - \bar {\mathcal X} , \cdots, x^{(N)} - \bar{\mathcal X}\right) \cdot \begin{bmatrix} \left(x^{(1)} - \bar {\mathcal X}\right)^{T} \\ \left(x^{(2)} - \bar {\mathcal X}\right)^{T} \\ \vdots \\ \left(x^{(N)} - \bar {\mathcal X}\right)^{T}\end{bmatrix} S=N1(x(1)Xˉ,,x(N)Xˉ) (x(1)Xˉ)T(x(2)Xˉ)T(x(N)Xˉ)T
其中, ( x ( 1 ) − X ˉ , ⋯   , x ( N ) − X ˉ ) \left(x^{(1)} - \bar {\mathcal X} , \cdots, x^{(N)} - \bar{\mathcal X}\right) (x(1)Xˉ,,x(N)Xˉ)可拆成两个行向量的差值
( x ( 1 ) − X ˉ , ⋯   , x ( N ) − X ˉ ) = ( x ( 1 ) , ⋯   , x ( N ) ) − ( X ˉ , ⋯   , X ˉ ) = X T − X ˉ ⋅ ( 1 , 1 , ⋯   , 1 ) 1 × N = X T − X ˉ ⋅ I N T \begin{aligned} \left(x^{(1)} - \bar {\mathcal X} , \cdots, x^{(N)} - \bar{\mathcal X}\right) & = \left(x^{(1)}, \cdots ,x^{(N)}\right) - \left(\bar {\mathcal X},\cdots, \bar {\mathcal X}\right) \\ & = \mathcal X^{T} - \bar {\mathcal X} \cdot (1,1, \cdots,1)_{1 \times N} \\ & = \mathcal X^{T} - \bar {\mathcal X} \cdot \mathcal I_N^{T} \end{aligned} (x(1)Xˉ,,x(N)Xˉ)=(x(1),,x(N))(Xˉ,,Xˉ)=XTXˉ(1,1,,1)1×N=XTXˉINT
样本均值 X ˉ \bar {\mathcal X} Xˉ的矩阵表示结果代入上式,并将 X T \mathcal X^{T} XT提出来:
注意该位置提取公因式的时候, X T \mathcal X^{T} XT对应的公因式结果是 E N \mathcal E_N EN而不是单纯的1。其中 E N \mathcal E_N EN是N维单位矩阵。即: X T E N = X T \mathcal X^{T}\mathcal E_N = \mathcal X^{T} XTEN=XT

X T − 1 N X T ⋅ I N ⋅ I N T = X T ( E N − 1 N I N I N T ) \mathcal X^{T} - \frac{1}{N} \mathcal X^{T}\cdot \mathcal I_N \cdot \mathcal I_N^{T} = \mathcal X^{T} \left(\mathcal E_N - \frac{1}{N} \mathcal I_N\mathcal I_N^{T}\right) XTN1XTININT=XT(ENN1ININT)

由于 [ ( x ( 1 ) − X ˉ ) T ( x ( 2 ) − X ˉ ) T ⋮ ( x ( N ) − X ˉ ) T ] \begin{bmatrix} \left(x^{(1)} - \bar {\mathcal X}\right)^{T} \\ \left(x^{(2)} - \bar {\mathcal X}\right)^{T} \\ \vdots \\ \left(x^{(N)} - \bar {\mathcal X}\right)^{T}\end{bmatrix} (x(1)Xˉ)T(x(2)Xˉ)T(x(N)Xˉ)T ( x ( 1 ) − X ˉ , ⋯   , x ( N ) − X ˉ ) \left(x^{(1)} - \bar {\mathcal X} , \cdots, x^{(N)} - \bar{\mathcal X}\right) (x(1)Xˉ,,x(N)Xˉ)转置形式,则有:
[ ( x ( 1 ) − X ˉ ) T ( x ( 2 ) − X ˉ ) T ⋮ ( x ( N ) − X ˉ ) T ] = [ X T ( E N − 1 N I N I N T ) ] T = ( E N − 1 N I N I N T ) T ⋅ X \begin{bmatrix} \left(x^{(1)} - \bar {\mathcal X}\right)^{T} \\ \left(x^{(2)} - \bar {\mathcal X}\right)^{T} \\ \vdots \\ \left(x^{(N)} - \bar {\mathcal X}\right)^{T}\end{bmatrix} = \left[\mathcal X^{T} \left(\mathcal E_N - \frac{1}{N} \mathcal I_N\mathcal I_N^{T}\right)\right]^{T} = \left(\mathcal E_N - \frac{1}{N} \mathcal I_N\mathcal I_N^{T}\right)^{T} \cdot \mathcal X (x(1)Xˉ)T(x(2)Xˉ)T(x(N)Xˉ)T =[XT(ENN1ININT)]T=(ENN1ININT)TX

至此,样本方差 S \mathcal S S的矩阵结果表示如下:
S = 1 N X T ( E N − 1 N I N I N T ) ⋅ ( E N − 1 N I N I N T ) T ⋅ X \mathcal S = \frac{1}{N} \mathcal X^{T} (\mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T}) \cdot (\mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T})^{T} \cdot \mathcal X S=N1XT(ENN1ININT)(ENN1ININT)TX

我们将 E N − 1 N I N I N T \mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T} ENN1ININT称作中心矩阵(Centering Matrix),用 H N \mathcal H_N HN表示。至此,样本方差 S \mathcal S S的矩阵表示可化简为:
S = 1 N X T H ⋅ H T X \mathcal S = \frac{1}{N} \mathcal X^{T} \mathcal H \cdot \mathcal H^{T} \mathcal X S=N1XTHHTX

中心矩阵的性质

中心矩阵的物理意义相当于对样本空间中的样本点进行平移,平移至样本空间的原点位置

示例:
已知 K \mathcal K K是由两个2维样本组成的样本集合,具体表示如下:
K = ( 1 2 3 4 ) \mathcal K = \begin{pmatrix} 1 \quad 2 \\ 3 \quad 4 \end{pmatrix} K=(1234)
此时,数据集合中仅包含两个样本: A : ( 1 , 3 ) \mathcal A:(1,3) A:(1,3) B : ( 2 , 4 ) \mathcal B:(2,4) B:(2,4)。其对应的中心矩阵 H 2 \mathcal H_2 H2表示如下:
H 2 = E 2 − 1 2 I 2 ⋅ I 2 T = ( 1 0 0 1 ) − 1 2 ( 1 1 1 1 ) = ( 1 2 − 1 2 − 1 2 1 2 ) \begin{aligned} \mathcal H_2 & = \mathcal E_2 - \frac{1}{2} \mathcal I_2 \cdot \mathcal I_2^{T} \\ & = \begin{pmatrix}1 \quad 0 \\ 0 \quad 1\end{pmatrix} - \frac{1}{2}\begin{pmatrix}1 \quad 1 \\ 1 \quad 1\end{pmatrix} \\ & = \begin{pmatrix}\frac{1}{2} \quad -\frac{1}{2} \\ -\frac{1}{2} \quad \frac{1}{2}\end{pmatrix} \end{aligned} H2=E221I2I2T=(1001)21(1111)=(21212121)
此时,样本 K \mathcal K K左乘中心矩阵 H 2 \mathcal H_2 H2,具体结果如下:
K ⋅ H 2 = ( − 1 2 1 2 − 1 2 1 2 ) \mathcal K \cdot \mathcal H_2 = \begin{pmatrix}-\frac{1}{2} \quad \frac{1}{2} \\ -\frac{1}{2} \quad \frac{1}{2}\end{pmatrix} KH2=(21212121)
此时得到两个新的样本点 A ′ : ( − 1 2 , − 1 2 ) \mathcal A':(-\frac{1}{2},-\frac{1}{2}) A:(21,21) B ′ : ( 1 2 , 1 2 ) \mathcal B':(\frac{1}{2},\frac{1}{2}) B:(21,21)原始样本点新样本点在样本空间中的表示如下:
中心矩阵
其中橙色点是原始样本点;蓝色点是左乘中心矩阵后的新样本点。可以观察得到相比于原始样本点,新样本点向原点位置偏移,但样本点之间的相对位置没有变化。

观察中心矩阵 H \mathcal H H
H N × N = E N − 1 N I N I N T \mathcal H_{N \times N} = \mathcal E_N - \frac{1}{N} \mathcal I_N\mathcal I_N^{T} HN×N=ENN1ININT

  • 中心矩阵的转置 H T \mathcal H^{T} HT
    由于 E \mathcal E E是单位矩阵,因此有: E T = E \mathcal E^{T} = \mathcal E ET=E
    H T = ( E N − 1 N I N I N T ) T = ( E N ) T − 1 N ⋅ ( I N T ) T ⋅ I N T = E N − 1 N I N I N T = H \begin{aligned} \mathcal H^{T} & = (\mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T})^{T} \\ & = (\mathcal E_N)^{T} - \frac{1}{N} \cdot (\mathcal I_N^{T})^{T} \cdot \mathcal I_N^{T} \\ & = \mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T} \\ & = \mathcal H \end{aligned} HT=(ENN1ININT)T=(EN)TN1(INT)TINT=ENN1ININT=H
    由此可知,中心矩阵 H \mathcal H H是一个对称矩阵

  • 中心矩阵的平方 H 2 \mathcal H^2 H2
    H 2 = H ⋅ H = ( E N − 1 N I N I N T ) ⋅ ( E N − 1 N I N I N T ) = E N − 2 N I N I N T + 1 N 2 I N I N T ⋅ I N I N T \begin{aligned} \mathcal H^2 & = \mathcal H \cdot \mathcal H \\ & = (\mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T})\cdot (\mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T}) \\ & = \mathcal E_N - \frac{2}{N}\mathcal I_N\mathcal I_N^{T} + \frac{1}{N^2} \mathcal I_N\mathcal I_N^{T}\cdot \mathcal I_N\mathcal I_N^{T} \end{aligned} H2=HH=(ENN1ININT)(ENN1ININT)=ENN2ININT+N21ININTININT
    因为 I N I N T \mathcal I_N\mathcal I_N^{T} ININT它的结果是一个各元素均为1的 N × N N\times N N×N的方阵,因而 I N I N T ⋅ I N I N T \mathcal I_N\mathcal I_N^{T}\cdot \mathcal I_N\mathcal I_N^{T} ININTININT结果是一个各元素均为 N N N N × N N \times N N×N的方阵。即:
    I N I N T ⋅ I N I N T = N ⋅ I N I N T \mathcal I_N\mathcal I_N^{T}\cdot \mathcal I_N\mathcal I_N^{T} = N \cdot \mathcal I_N\mathcal I_N^{T} ININTININT=NININT
    将该结果带入上式:
    H 2 = E N − 2 N I N I N T + 1 N I N I N T = E N − 1 N I N I N T = H \begin{aligned} \mathcal H^2 & = \mathcal E_N - \frac{2}{N}\mathcal I_N\mathcal I_N^{T} + \frac{1}{N}\mathcal I_N\mathcal I_N^{T} \\ & = \mathcal E_N - \frac{1}{N} \mathcal I_N \mathcal I_N^{T} \\ & = \mathcal H \end{aligned} H2=ENN2ININT+N1ININT=ENN1ININT=H

综上,因而有:
H 2 = H ⋅ H = H T ⋅ H = H ⋅ H T = H \mathcal H^2 = \mathcal H \cdot \mathcal H = \mathcal H^{T} \cdot \mathcal H = \mathcal H \cdot \mathcal H^{T} = \mathcal H H2=HH=HTH=HHT=H

回顾样本方差的矩阵表示,可以继续化简为如下形式:
S = 1 N X T H ⋅ H T X = 1 N X T H ⋅ X \begin{aligned} \mathcal S & = \frac{1}{N} \mathcal X^{T} \mathcal H \cdot \mathcal H^{T}\mathcal X \\ & = \frac{1}{N} \mathcal X^{T} \mathcal H \cdot \mathcal X \end{aligned} S=N1XTHHTX=N1XTHX

至此,我们介绍了样本均值样本方差的矩阵表达,以及中心矩阵的相关性质。下一节将介绍PCA降维。

相关参考:
机器学习-降维2-样本均值&样本方差的矩阵表示

本文含有隐藏内容,请 开通VIP 后查看

网站公告

今日签到

点亮在社区的每一天
去签到