1. 计算图及参数设置
1.1 计算图
2022.10.17日更正:
参考其他网站,图中 W 12 W_{12} W12和 W 21 W_{21} W21应当位置互换,下角标的后位 n n n统一标识从 X n X_{n} Xn节点的输出。这样才能再后面矩阵 W 0 W_0 W0运算的时候保持统一。
感谢评论区的盆友指出问题!
- 偏置项简化省略
- h输入为全连接,输出经过sigmoid层,得到中间的logits记为 Z Z Z,得到 Y ^ \hat{Y} Y^值再sigmoid层激活一下。
- Loss function 选取MSE
- σ ( ) \sigma() σ()函数的导数 σ ′ ( x ) = σ ( x ) ( 1 − σ ( x ) ) \sigma^{\prime}(x)=\sigma(x)(1-\sigma(x)) σ′(x)=σ(x)(1−σ(x))
1.2 参数设置
X = [ 0.35 , 0.9 ] T X=[0.35,0.9]^T X=[0.35,0.9]T
y t r u e = 0.5 y_{true}=0.5 ytrue=0.5
W 0 = [ w 11 w 12 w 21 w 22 ] = [ 0.1 0.8 0.4 0.6 ] W 0=\left[\begin{array}{ll}w_{11} & w_{12} \\ w_{21} & w_{22}\end{array}\right]=\left[\begin{array}{ll}0.1 & 0.8 \\ 0.4 & 0.6\end{array}\right] W0=[w11w21w12w22]=[0.10.40.80.6]
W 1 = [ w h 1 , w h 2 ] = [ 0.3 , 0.9 ] W_1 = [w_{h1},w_{h2}]=[0.3,0.9] W1=[wh1,wh2]=[0.3,0.9]
步长 α = 0.01 \alpha=0.01 α=0.01
2. 正向传播过程
- h = W 0 ⋅ X = [ 0.755 , 0.68 ] T h = W_0\cdot X=[0.755,0.68]^T h=W0⋅X=[0.755,0.68]T
- Z = σ ( h ) = [ 0.680 , 0.664 ] T Z = \sigma(h)=[0.680,0.664]^T Z=σ(h)=[0.680,0.664]T
- Z y = W h ⋅ Z = [ 0.8014 ] Z_y=W_h\cdot Z=[0.8014] Zy=Wh⋅Z=[0.8014]
- Y ^ = σ ( Z y ) = 0.6903 \hat{Y}=\sigma(Z_y)=0.6903 Y^=σ(Zy)=0.6903
- L = 1 2 ( Y t r u e − Y ^ ) 2 = 1 2 ( 0.5 − 0.6903 ) 2 = 0.0181 L = \frac{1}{2}(Y_{true}-\hat{Y})^2 =\frac{1}{2}(0.5-0.6903)^2 =0.0181 L=21(Ytrue−Y^)2=21(0.5−0.6903)2=0.0181
神经网络训练的核心思路:
根据Loss → \rightarrow →调节参数 W 0 , W 1 W_0,W_1 W0,W1等 → \rightarrow →结合GD等方法沿着梯度下降方法
3. BP过程
第一步预想求得 ∂ L ∂ W h 1 \frac{\partial L}{\partial W_{h1}} ∂Wh1∂L,根据路径回溯可以看到:
{ L = 1 2 ( y − y ^ ) 2 y ^ = σ ( Z y ) Z y = W h 1 h 1 + W h 2 h 2 \left\{\begin{array}{c}L=\frac{1}{2}\left(y-\hat{y}\right)^2 \\ \hat{y}=\sigma(Z_y) \\ Z_y=W_{h1}h_1+W_{h2} h_2\end{array}\right. ⎩ ⎨ ⎧L=21(y−y^)2y^=σ(Zy)Zy=Wh1h1+Wh2h2
所以根据链式法则可以得到:
∂ L ∂ w h 1 = ∂ L ∂ y ^ ∗ ∂ y ^ ∂ Z y ∗ ∂ Z y ∂ W h 1 = 1 2 ( Y t r u e − Y ^ ) 2 ∗ σ ( Z y ) ∗ ( 1 − σ ( Z y ) ) ∗ h 1 = ( 0.6903 − 0.5 ) ∗ ( 0.69 ) ∗ ( 1 − 0.69 ) ∗ 0.68 = 0.02768 \begin{array}{l} \frac{\partial L}{\partial w_{h1}}=\frac{\partial L}{\partial \hat{y} } * \frac{\partial \hat{y} }{\partial Z_y } * \frac{\partial Z_y }{\partial W_{h1} } \\ =\frac{1}{2}(Y_{true}-\hat{Y})^2 * \sigma(Z_y) *(1-\sigma(Z_y)) * h_1 \\ =(0.6903-0.5) *(0.69) *(1-0.69) * 0.68 \\ =0.02768 \end{array} ∂wh1∂L=∂y^∂L∗∂Zy∂y^∗∂Wh1∂Zy=21(Ytrue−Y^)2∗σ(Zy)∗(1−σ(Zy))∗h1=(0.6903−0.5)∗(0.69)∗(1−0.69)∗0.68=0.02768
第二步骤继续回溯,要想求得 ∂ L ∂ W 11 \frac{\partial L}{\partial W_{11}} ∂W11∂L,根据路径回溯可以看到:
{ ⋯ Z 1 = σ ( h 1 ) h 1 = W 11 X 1 + W 21 X 2 \left\{\begin{array}{c}\cdots \\ Z_1=\sigma(h_1) \\ h_1=W_{11}X_1+W_{21} X_2\end{array}\right. ⎩ ⎨ ⎧⋯Z1=σ(h1)h1=W11X1+W21X2
所以根据链式法则可以得到:
∂ L ∂ w 11 = ∂ L ∂ y ^ ∗ ∂ y ^ ∂ Z y ∗ ∂ Z y ∂ Z 1 ∗ ∂ Z 1 ∂ h 1 ∗ ∂ h 1 ∂ W 11 = 1 2 ( Y t r u e − Y ^ ) 2 ∗ σ ( Z y ) ∗ ( 1 − σ ( Z y ) ) ∗ W h 1 ∗ σ ( Z 1 ) ∗ ( 1 − σ ( Z 1 ) ) ∗ X 1 = ( 0.6903 − 0.5 ) ∗ ( 0.69 ) ∗ ( 1 − 0.69 ) ∗ 0.3 ∗ 0.68 ∗ ( 1 − 0.68 ) ∗ 0.35 = 0.00093 \begin{array}{l} \frac{\partial L}{\partial w_{11}}=\frac{\partial L}{\partial \hat{y} } * \frac{\partial \hat{y} }{\partial Z_y } * \frac{\partial Z_y }{\partial Z_1 }*\frac{\partial Z_1 }{\partial h_1 } *\frac{\partial h_1 }{\partial W_{11} } \\ =\frac{1}{2}(Y_{true}-\hat{Y})^2 * \sigma(Z_y) *(1-\sigma(Z_y)) * W_{h1}* \sigma(Z_1) *(1-\sigma(Z_1))*X_1 \\ =(0.6903-0.5) *(0.69) *(1-0.69) * 0.3 *0.68*(1-0.68)*0.35\\ =0.00093 \end{array} ∂w11∂L=∂y^∂L∗∂Zy∂y^∗∂Z1∂Zy∗∂h1∂Z1∗∂W11∂h1=21(Ytrue−Y^)2∗σ(Zy)∗(1−σ(Zy))∗Wh1∗σ(Z1)∗(1−σ(Z1))∗X1=(0.6903−0.5)∗(0.69)∗(1−0.69)∗0.3∗0.68∗(1−0.68)∗0.35=0.00093
同理可得
∇ L W 0 = [ ∂ L ∂ W 11 ∂ L ∂ W 12 ∂ L ∂ W 21 ∂ L ∂ W 22 ] = [ 0.00093 0.002861 0.002392 0.00736 ] \nabla L_{W0}=\left[\begin{array}{ll}\frac{\partial L}{\partial W_{11}} & \frac{\partial L}{\partial W_{12}} \\ \frac{\partial L}{\partial W_{21}} & \frac{\partial L}{\partial W_{22}}\end{array}\right]=\left[\begin{array}{ll}0.00093 & 0.002861 \\ 0.002392 & 0.00736\end{array}\right] ∇LW0=[∂W11∂L∂W21∂L∂W12∂L∂W22∂L]=[0.000930.0023920.0028610.00736]
∇ L W 1 = [ ∂ L ∂ W h 1 , ∂ L ∂ W h 2 ] = [ 0.02768 , 0.02703 ] T \nabla L_{W1}=[\frac{\partial L}{\partial W_{h1}} , \frac{\partial L}{\partial W_{h2}}]=[0.02768,0.02703]^T ∇LW1=[∂Wh1∂L,∂Wh2∂L]=[0.02768,0.02703]T
所以,结合梯度下降法 e . g . W 11 ′ = W 11 − α ⋅ ∂ L ∂ W 11 = 0.99061 e.g. W_{11}^{\prime}=W_{11}-\alpha\cdot \frac{\partial L}{\partial W_{11}} = 0.99061 e.g.W11′=W11−α⋅∂W11∂L=0.99061得到更新后的权重矩阵为
W 0 ′ = W 0 − α ∇ L W 0 = [ 0.99061 0.79997 0.399976 0.599926 ] W_0^{\prime}=W_0-\alpha\nabla L_{W0}=\left[\begin{array}{ll}0.99061 & 0.79997 \\ 0.399976 & 0.599926\end{array}\right] W0′=W0−α∇LW0=[0.990610.3999760.799970.599926]
W 1 ′ = W 1 − α ∇ L W 1 = [ 0.299972 , 0.899973 ] T W_1^{\prime}=W_1-\alpha\nabla L_{W1}=[0.299972,0.899973]^T W1′=W1−α∇LW1=[0.299972,0.899973]T
4. 用Numpy手撸Code
import numpy as np
import matplotlib.pylab as plt
def sigmoid_derive(x,derive=False):
## 用于计算sigmoid函数的值或其求导函数的值
if derive == True:
return x*(1-x)
else:
return 1/(1+np.exp(-x))
X = np.array([[0.35],[0.9]]) #输入层
y = np.array([[0.5]]) #groundtruth
epochs = 200
np.random.seed(1)
W0 = np.array([[0.1,0.8],[0.4,0.6]])
W1 = np.array([[0.3,0.9]])
print("original:\n","W0:\n",W0,"\n W1:\n",W1)
loss = []
for epoch in range(epochs):
print("In the process of %sth epoch"%(epoch+1))
l0=X
l1 = sigmoid_derive(np.dot(W0,l0)) # 计算隐藏层的输出
l2 = sigmoid_derive(np.dot(W1,l1)) # 计算输出节点的logits
l2_error = y-l2 #计算MSE Loss
Loss = 0.5*np.abs(l2_error)**0.5
loss.append(Loss[0][0])
print("The Current Loss:", 0.5*np.abs(l2_error)**0.5)
l2_delta = l2_error*sigmoid_derive(l2,derive=True) # 用于BackProp
l1_error = l2_delta*W1
l1_delta = l1_error*sigmoid_derive(l1,derive=True)
W1 += l2_delta*l1.T
W0 += l0.T.dot(l1_delta)
print("After BackProp:\n","W0:\n",W0,"\n W1:\n",W1)
print('=========================================')
plt.plot(loss)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Decreaing')
plt.show()
输出结果:
In the process of 1th epoch
The Current Loss: [[0.21810748]]
After BackProp:
W0:
[[0.09661944 0.78985831]
[0.39661944 0.58985831]]
W1:
[[0.27232597 0.87299836]]
=========================================
In the process of 2th epoch
The Current Loss: [[0.21319219]]
After BackProp:
W0:
[[0.09363393 0.78028763]
[0.39363393 0.58028763]]
W1:
[[0.2455836 0.84691021]]
=========================================
In the process of 3th epoch
The Current Loss: [[0.20830283]]
After BackProp:
W0:
[[0.09102066 0.7712756 ]
[0.39102066 0.5712756 ]]
W1:
[[0.21978966 0.82175133]]
训练Loss降低示意图: