目录
必做题
习题5-2 证明宽卷积具有交换性,即公式(5.13)
手动推导一下:
首先: X : 1 × 3 × 3 X:1\times3\times3 X:1×3×3
W : 1 × 2 × 2 W:1\times2\times2 W:1×2×2
X = [ x 00 x 01 x 02 x 10 x 11 x 12 x 20 x 21 x 22 ] X=\left[ \begin{matrix} x_{00} & x_{01} & x_{02} \\ x_{10} & x_{11} & x_{12} \\ x_{20} & x_{21} & x_{22} \end{matrix} \right] X=⎣
⎡x00x10x20x01x11x21x02x12x22⎦
⎤ W = [ w 00 w 01 w 10 w 11 ] W=\left[ \begin{matrix} w_{00} & w_{01} \\ w_{10} & w_{11} \end{matrix} \right] W=[w00w10w01w11]左边的公式: r o t 180 ( W ) ⊗ ~ X = [ w 11 w 10 w 01 w 00 ] ⊗ [ 0 0 0 0 0 0 x 00 x 01 x 02 0 0 x 10 x 11 x 12 0 0 x 20 x 21 x 22 0 0 0 0 0 0 ] = [ w 00 x 00 w 01 x 00 + w 00 x 01 w 01 x 01 + w 00 x 02 w 01 x 02 w 10 x 00 + w 00 x 10 w 11 x 00 + w 10 x 01 + w 01 x 10 + w 00 x 11 w 11 x 01 + w 10 x 02 + w 01 x 11 + w 00 x 12 w 11 x 02 + w 01 x 12 w 10 x 10 + w 00 x 20 w 11 x 10 + w 10 x 11 + w 01 x 20 + w 00 x 21 w 11 x 11 + w 10 x 12 + w 01 x 21 + w 00 x 22 w 11 x 12 + w 01 x 22 w 10 x 20 w 11 x 20 + w 10 x 21 w 11 x 21 + w 10 x 22 w 11 x 22 ] \begin{aligned} rot180(W) \tilde{\otimes} X&=\left[ \begin{matrix} w_{11} & w_{10} \\ w_{01} & w_{00} \end{matrix} \right] \otimes \left[ \begin{matrix} 0 & 0 & 0& 0 & 0\\ 0 & x_{00} & x_{01} & x_{02} & 0\\ 0 & x_{10} & x_{11} & x_{12} & 0 \\ 0 & x_{20} & x_{21} & x_{22} & 0 \\ 0 & 0 & 0& 0 & 0\\ \end{matrix} \right]\\ & = \left[ \begin{matrix} w_{00}x_{00} & w_{01}x_{00}+w_{00}x_{01} & w_{01}x_{01}+w_{00}x_{02} & w_{01}x_{02}\\ w_{10}x_{00}+w_{00}x_{10} & w_{11}x_{00}+w_{10}x_{01}+w_{01}x_{10}+w_{00}x_{11} & w_{11}x_{01}+w_{10}x_{02}+w_{01}x_{11}+w_{00}x_{12} & w_{11}x_{02}+w_{01}x_{12} \\ w_{10}x_{10}+w_{00}x_{20} & w_{11}x_{10}+w_{10}x_{11}+w_{01}x_{20}+w_{00}x_{21} & w_{11}x_{11}+w_{10}x_{12}+w_{01}x_{21}+w_{00}x_{22} & w_{11}x_{12}+w_{01}x_{22} \\ w_{10}x_{20} & w_{11}x_{20}+w_{10}x_{21} & w_{11}x_{21}+w_{10}x_{22} & w_{11}x_{22}\\ \end{matrix} \right] \end{aligned} rot180(W)⊗~X=[w11w01w10w00]⊗⎣
⎡000000x00x10x2000x01x11x2100x02x12x22000000⎦
⎤=⎣
⎡w00x00w10x00+w00x10w10x10+w00x20w10x20w01x00+w00x01w11x00+w10x01+w01x10+w00x11w11x10+w10x11+w01x20+w00x21w11x20+w10x21w01x01+w00x02w11x01+w10x02+w01x11+w00x12w11x11+w10x12+w01x21+w00x22w11x21+w10x22w01x02w11x02+w01x12w11x12+w01x22w11x22⎦
⎤右边的式子: r o t 180 ( X ) ⊗ ~ W = [ x 22 x 21 x 20 x 12 x 11 x 10 x 02 x 01 x 00 ] ⊗ [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w 00 w 01 0 0 0 0 w 10 w 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] = [ w 00 x 00 w 01 x 00 + w 00 x 01 w 01 x 01 + w 00 x 02 w 01 x 02 w 10 x 00 + w 00 x 10 w 11 x 00 + w 10 x 01 + w 01 x 10 + w 00 x 11 w 11 x 01 + w 10 x 02 + w 01 x 11 + w 00 x 12 w 11 x 02 + w 01 x 12 w 10 x 10 + w 00 x 20 w 11 x 10 + w 10 x 11 + w 01 x 20 + w 00 x 21 w 11 x 11 + w 10 x 12 + w 01 x 21 + w 00 x 22 w 11 x 12 + w 01 x 22 w 10 x 20 w 11 x 20 + w 10 x 21 w 11 x 21 + w 10 x 22 w 11 x 22 ] \begin{aligned} rot180(X) \tilde{\otimes} W &=\left[ \begin{matrix} x_{22} & x_{21} & x_{20}\\ x_{12} & x_{11} & x_{10}\\ x_{02} & x_{01} & x_{00} \end{matrix} \right] \otimes \left[ \begin{matrix} 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & w_{00} & w_{01} & 0 & 0\\ 0 & 0 & w_{10} & w_{11} & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ \end{matrix} \right]\\ & = \left[ \begin{matrix} w_{00}x_{00} & w_{01}x_{00}+w_{00}x_{01} & w_{01}x_{01}+w_{00}x_{02} & w_{01}x_{02}\\ w_{10}x_{00}+w_{00}x_{10} & w_{11}x_{00}+w_{10}x_{01}+w_{01}x_{10}+w_{00}x_{11} & w_{11}x_{01}+w_{10}x_{02}+w_{01}x_{11}+w_{00}x_{12} & w_{11}x_{02}+w_{01}x_{12} \\ w_{10}x_{10}+w_{00}x_{20} & w_{11}x_{10}+w_{10}x_{11}+w_{01}x_{20}+w_{00}x_{21} & w_{11}x_{11}+w_{10}x_{12}+w_{01}x_{21}+w_{00}x_{22} & w_{11}x_{12}+w_{01}x_{22} \\ w_{10}x_{20} & w_{11}x_{20}+w_{10}x_{21} & w_{11}x_{21}+w_{10}x_{22} & w_{11}x_{22}\\ \end{matrix} \right] \end{aligned} rot180(X)⊗~W=⎣
⎡x22x12x02x21x11x01x20x10x00⎦
⎤⊗⎣
⎡00000000000000w00w100000w01w1100000000000000⎦
⎤=⎣
⎡w00x00w10x00+w00x10w10x10+w00x20w10x20w01x00+w00x01w11x00+w10x01+w01x10+w00x11w11x10+w10x11+w01x20+w00x21w11x20+w10x21w01x01+w00x02w11x01+w10x02+w01x11+w00x12w11x11+w10x12+w01x21+w00x22w11x21+w10x22w01x02w11x02+w01x12w11x12+w01x22w11x22⎦
⎤
可以看出是一样的。
习题5-3 分析卷积神经网络中用1×1的卷积核的作用。
答:1×1的卷积核在实际应用中类似于一根截面积为1个像素的正方形管道,用来贯穿整个输入特征。每个1×1的卷积核都试图提取基于相同像素位置的特征的融合表达。可以实现特征升维和降维的目的。并且增加了网络的非线性,提高拟合能力。
1*1卷积过滤器 和正常的过滤器一样,唯一不同的是它的大小是1*1,没有考虑在前一层局部信息之间的关系。
由于33卷积或者55卷积在几百个filter的卷积层上做卷积操作时相当耗时,所以11卷积在33卷积或者5*5卷积计算之前先降低维度。
那么,1*1卷积的主要作用有以下几点:
1、降维( dimension reductionality )。比如,一张500 * 500且厚度depth为100 的图片在20个filter上做11的卷积,那么结果的大小为500500*20。
2、加入非线性。卷积层之后经过激励层,1*1的卷积在前一层的学习表示上添加了非线性激励( non-linear activation ),提升网络的表达能力;
习题5-4
对于一个输入为100×100×256的特征映射组,使用3×3的卷积核,输出为100×100×256的特征映射组的卷积层,求其时间和空间复杂度.如果引入一个1×1卷积核,先得到100×100×64的特征映射,再进行3×3的卷积,得到100×100×256的特征映射组,求其时间和空间复杂度。
答:
时间复杂度一:256×100×100×256×3×3 = 5,898,240,000
空间复杂度一:256×100×100 = 2,560,000
时间复杂度二:64×100×100×256 + 256×100×100×64×3×3 = 1,638,400,000
空间复杂度二:64×100×100 + 256×100×100 = 3,200,000
可以看出,使用1*1卷积核可以大大减少时间复杂度,代价是略微增加空间复杂度。
习题5-7 忽略激活函数,分析卷积网络中卷积层的前向计算和反向传播(公式(5.39))是一种转置关系.



选做题
推导CNN反向传播算法
1、已知池化层的误差,反向推导上一隐藏层的误差
在前向传播时,池化层我们会用MAX或者Average对输入进行池化,池化的区域大小已知。现在我们反过来,要从缩小后区域的误差,还原前一层较大区域的误差。这个过程叫做upsample。假设我们的池化区域大小是2x2。第l层误差的第k个子矩阵 δ k l δ_k^l δkl为:





2、已知卷积层的误差,反向推导上一隐藏层的误差.
公式如下:



3、已知卷积层的误差,推导该层的W,b的梯度
经过以上各步骤,我们已经算出每一层的误差了,那么:
- 对于全连接层,可以按照普通网络的反向传播算法求该层W,b的梯度。
- 对于池化层,它并没有W,b,也不用求W,b的梯度。
- 只有卷积层的W,b需要求出,先看w:



设计简易CNN模型,分别用Numpy、Python实现卷积层和池化层的反向传播算子,并代入数值测试。
卷积层的反向传播
import numpy as np
import torch.nn as nn
class Conv2D(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, stride, padding):
super(Conv2D, self).__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.ksize = kernel_size
self.stride = stride
self.padding = padding
self.weights = np.random.standard_normal((out_channels, in_channels, kernel_size, kernel_size))
self.bias = np.zeros(out_channels)
self.grad_w = np.zeros(self.weights.shape)
self.grad_b = np.zeros(self.bias.shape)
def forward(self, x):
self.x = x
weights = self.weights.reshape(self.out_channels, -1) # o,ckk
x = np.pad(x, ((0, 0), (0, 0), (self.padding, self.padding), (self.padding, self.padding)), 'constant',
constant_values=0)
b, c, h, w = x.shape
self.out = np.zeros(
(b, self.out_channels, (h - self.ksize) // self.stride + 1, (w - self.ksize) // self.stride + 1))
self.col_img = self.im2col(x, self.ksize, self.stride) # bhw * ckk
out = np.dot(weights, self.col_img.T).reshape(self.out_channels, b, -1).transpose(1, 0, 2)
self.out = np.reshape(out, self.out.shape)
return self.out
def backward(self, grad_out):
b, c, h, w = self.out.shape #
grad_out_ = grad_out.transpose(1, 0, 2, 3) # b,oc,h,w * (bhw , ckk)
grad_out_flat = np.reshape(grad_out_, [self.out_channels, -1])
self.grad_w = np.dot(grad_out_flat, self.col_img).reshape(self.grad_w.shape)
self.grad_b = np.sum(grad_out_flat, axis=1)
tmp = self.ksize - self.padding - 1
grad_out_pad = np.pad(grad_out, ((0, 0), (0, 0), (tmp, tmp), (tmp, tmp)), 'constant', constant_values=0)
flip_weights = np.flip(self.weights, (2, 3))
# flip_weights = np.flipud(np.fliplr(self.weights)) # rot(180)
flip_weights = flip_weights.swapaxes(0, 1) # in oc
col_flip_weights = flip_weights.reshape([self.in_channels, -1])
weights = self.weights.transpose(1, 0, 2, 3).reshape(self.in_channels, -1)
col_grad = self.im2col(grad_out_pad, self.ksize, 1) # bhw,ckk
# (in,ckk) * (bhw,ckk).T
next_eta = np.dot(weights, col_grad.T).reshape(self.in_channels, b, -1).transpose(1, 0, 2)
next_eta = np.reshape(next_eta, self.x.shape)
return next_eta
def zero_grad(self):
self.grad_w = np.zeros_like(self.grad_w)
self.grad_b = np.zeros_like(self.grad_b)
def update(self, lr=1e-3):
self.weights -= lr * self.grad_w
self.bias -= lr * self.grad_b
def im2col(self, x, k_size, stride):
b, c, h, w = x.shape
image_col = []
for n in range(b):
for i in range(0, h - k_size + 1, stride):
for j in range(0, w - k_size + 1, stride):
col = x[n, :, i:i + k_size, j:j + k_size].reshape(-1)
image_col.append(col)
return np.array(image_col)
class Layers():
def __init__(self, name):
self.name = name
# 前向
def forward(self, x):
pass
# 梯度置零
def zero_grad(self):
pass
# 后向
def backward(self, grad_out):
pass
# 参数更新
def update(self, lr=1e-3):
pass
class Module():
def __init__(self):
self.layers = [] # 所有的Layer
def forward(self, x):
for layer in self.layers:
x = layer.forward(x)
return x
def backward(self, grad):
for layer in reversed(self.layers):
layer.zero_grad()
grad = layer.backward(grad)
def step(self, lr=1e-3):
for layer in reversed(self.layers):
layer.update(lr)
# test_conv
if __name__ == '__main__':
x = np.array([[[[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]],
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]])
conv = Conv2D(2, 3, 2, 1, 0)
y = conv.forward(x)
print(y.shape)
loss = y - (y + 1)
grad = conv.backward(loss)
print(grad.shape)
池化层的反向传播
import numpy as np
import torch.nn as nn
class MaxPooling(nn.Module):
def __init__(self, ksize=2, stride=2):
super(MaxPooling,self).__init__()
self.ksize = ksize
self.stride = stride
def forward(self, x):
n,c,h,w = x.shape
out = np.zeros([n, c, h//self.stride,w//self.stride])
self.index = np.zeros_like(x)
for b in range(n):
for d in range(c):
for i in range(h//self.stride):
for j in range(w//self.stride):
_x = i*self.stride
_y = j*self.stride
out[b, d ,i , j] = np.max(
x[b, d ,_x:_x+self.ksize, _y:_y+self.ksize])
index = np.argmax(x[b, d ,_x:_x+self.ksize, _y:_y+self.ksize])
self.index[b,d,_x+index//self.ksize, _y+index%self.ksize] = 1
return out
def backward(self, grad_out):
return np.repeat(np.repeat(grad_out, self.stride, axis=2), self.stride, axis=3) * self.index