内容总结

卷积层可通过重复使用卷积核有效地表征局部空间（即参数共享，避免过拟合），卷积核（过滤器 $f i lt er$ ）通过卷积的计算结果（相似度）表示该卷积核和扫描过的图像块的灰色格子部分相吻合的个数——该值越大则说明越符合卷积核的偏好程度，即卷积的结果矩阵为特征映射（ $f e a t u re$ $ma p$ ）。注意 $H a d ama r d$ 乘积和内积不同，前者是将相同形状的矩阵A和B的相同位置的元素相乘，产生的矩阵。
当卷积层的输入样本是三通道的彩色图像时，一开始的卷积核会是三维的 $3 \times M \times M$ ，M表示卷积核大小。第二层及其以后的卷积层的输入是上一层的特征图，而特征图的个数是由上一层的卷积核数决定的。
- example：当上一层的卷积核数为8时，就会得到8个特征图作为下一个层的输入，所以下一层需要8个三维的 $8 \times M \times M$ 卷积核。
定义一个卷积层：输入通道数、输出通道数、卷积核的大小（长和宽）。卷积层要求输入输出是四维张量(B,C,W,H)，全连接层的输入与输出都是二维张量(B,Input_feature)。
卷积(convolution)后，C(Channels)变，W(width)和H(Height)可变可不变，取决于是否padding。subsampling(或pooling)后，C不变，W和H变。padding一般是0填充（zero-padding），指用0填充输入样本的边界，填充大小为 $P = (F - 1) /2$ ，其中F为卷积核尺寸。
如果要有m个输出channel，就要使用m个卷积核：
1）每个卷积核的通道数要求和输入通道相同；
2）卷积核的组数是和输出的通道数相同；
3）卷积核的大小由自己来定，和图像的大小无关，一般设置为正方形，边长为奇数（其实设置为长方形也是可以的）。

零、简单回顾

以往算法和机器学习的区别，
在这里插入图片描述
花书里的分类：

特征提取，维度的诅咒。feature的数量越多，对整个样本的数量需求就越多，但收集数据（有label）成本高，所以为了降低feature的维度，需要用到表示学习present。

深度学习代码过程：Dataset、Model、Training、Infering。其中损失函数可以为：平均平方误差MSE：
在这里插入图片描述

一、全连接网络

在这里插入图片描述

二、卷积神经网络CNN

在上面的全连接层中是将input的图像拉成一个向量，但是这样可能会导致：某两个相邻的点在处理后的向量中确实间距很远，这样就会丧失原有的空间结构。而CNN是直接按照图像的空间结构进行保存。

让网络正常工作：明确输入的张量维度和输出的张量维度，设计维度上的变化，最终映射到我们想要的空间上。
在这里插入图片描述
卷积层每次需要拿出一块像素块进行操作：

定义一个卷积层：输入通道数、输出通道数、卷积核的大小（长和宽）。

2.1 单通道卷积运算

在这里插入图片描述

2.2 多通道卷积运算

在这里插入图片描述
一开始input的彩色图像的通道是3，到中间的网络可能有几百个通道数，下面以三通道卷积为例，注意卷积核数 = 通道数。

如果输入是n个通道：

如果要有m个输出channel，就要使用m个卷积核：
（1）每个卷积核的通道数要求和输入通道相同；
（2）卷积核的组数是和输出的通道数相同；
（3）卷积核的大小由自己来定，和图像的大小无关，一般设置为正方形，边长为奇数（其实设置为长方形也是可以的）。
在这里插入图片描述

2.3 卷积层

中间红色字体即每个卷积核的size。
在这里插入图片描述

2.4 代码栗子

# -*- coding: utf-8 -*-
"""
Created on Tue Oct 12 09:07:16 2021

@author: 86493
"""
import torch
in_channels, out_channels = 5, 10
width, height = 100, 100
kernel_size = 3 # 3×3的卷积核
batch_size = 1

# 定义了input张量的维度，但具体的值randn(标准均匀分布)
input = torch.randn(batch_size,
                    in_channels,
                    width,
                    height)

conv_layer = torch.nn.Conv2d(in_channels,
                             out_channels, 
                             kernel_size = kernel_size)

output = conv_layer(input)

print(input.shape)
# 打印出torch.Size([1, 5, 100, 100])
# 即5个通道，100×100图像
print(output.shape)
# 打印出torch.Size([1, 10, 98, 98])
# 输出为10个通道，98×98图像，100-2（3-1=2）
print(conv_layer.weight.shape)
# torch.Size([10, 5, 3, 3])
# 卷积层权重的形状，输出的通道为10，输入的通道为5，
# 卷积核大小为3×3

卷积层和池化层对【输入图像的维度大小】没有要求（对输入的通道数有要求，如上面输入的channel为6则出错），最后的分类器最在乎。
在这里插入图片描述
因为最后要用交叉熵损失，所有最后一层是不用激活的。

三、重要参数介绍

3.1 padding详解

做padding时，如果卷积核是3×3就填充1圈，如果是5×5就填充2圈（圈数也可按下图绿色框规律，即做整除）。
在这里插入图片描述
我们在原始的input外面一层填充0后，做卷积后的结果：

上图的代码为：

# -*- coding: utf-8 -*-
"""
Created on Tue Oct 12 09:52:08 2021

@author: 86493
"""
import torch

input = [3, 4, 6, 5, 7,
         2, 4, 6, 8, 2,
         1, 6, 7, 8, 4,
         9, 7, 4, 6, 2,
         3, 7, 5, 4, 1]
input = torch.Tensor(input).view(1, 1, 5, 5)
# batch=1,channel=1,size=5×5

# 输入通道为1，输出通道为1
conv_layer = torch.nn.Conv2d(1, 1, 
                             kernel_size = 3,
                             padding = 1,
                             bias = False)
# 不给通道量加bias，所以设置为false

# 输出通道数=1，输入通道数=1
kernel = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8,
                       9]).view(1, 1, 3, 3)

conv_layer.weight.data = kernel.data
# 赋值给卷积层的权重

output = conv_layer(input)
print(output)

结果为：

tensor([[[[ 91., 168., 224., 215., 127.],
          [114., 211., 295., 262., 149.],
          [192., 259., 282., 214., 122.],
          [194., 251., 253., 169.,  86.],
          [ 96., 112., 110.,  68.,  31.]]]], grad_fn=<ThnnConv2DBackward>)

3.2 stride详解

修改步长stride，可以有效地降低图像宽度和高度，如我们直接修改上面3.1代码（增加stride=1）

# -*- coding: utf-8 -*-
"""
Created on Tue Oct 12 09:52:08 2021

@author: 86493
"""
import torch

input = [3, 4, 6, 5, 7,
         2, 4, 6, 8, 2,
         1, 6, 7, 8, 4,
         9, 7, 4, 6, 2,
         3, 7, 5, 4, 1]
input = torch.Tensor(input).view(1, 1, 5, 5)
# batch=1,channel=1,size=5×5

# 输入通道为1，输出通道为1
conv_layer = torch.nn.Conv2d(1, 1, 
                             kernel_size = 3,
                             stride = 2,
                             bias = False)
# 不给通道量加bias，所以设置为false

# 输出通道数=1，输入通道数=1
kernel = torch.Tensor([1, 2, 3, 4, 5, 6, 7, 8,
                       9]).view(1, 1, 3, 3)

conv_layer.weight.data = kernel.data
# 赋值给卷积层的权重

output = conv_layer(input)
print(output)
# tensor([[[[211., 262.],
#          [251., 169.]]]], grad_fn=<ThnnConv2DBackward>)

3.3 nn.Conv2d详解

这块主要将CNN的通道channel。
（1）pytorch的二维卷积方法nn.Conv2d用于二维图像。

class torch.nn.Conv2d(in_channels,
	out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1,
	bias=True)

参数介绍：
在这里插入图片描述
stride：步长
zero-padding:图像四周填0
dilation:控制 kernel 点之间的空间距离
groups:分组卷积

（2）tensorflow 中给出的，对于输入样本中 channels 的含义。一般的RGB图片，channels 数量是 3 （红、绿、蓝）；而monochrome（黑白）图片，channels 数量是 1 。

channels : Number of color channels in the example images. For color images, the number of channels is 3 (red, green, blue). For monochrome images, there is just 1 channel (black). ——tensorflow

（3）mxnet 中提到的，一般 channels 的含义是，每个卷积层中卷积核的数量。

channels (int) : The dimensionality of the output space, i.e. the number of output channels (filters) in the convolution. ——mxnet

如下图，假设现有一个为 6×6×3 的图片样本，使用 3×3×3 的卷积核（filter）进行卷积操作。此时输入图片的 channels 为 3 ，而卷积核中的 in_channels 与需要进行卷积操作的数据的 channels 一致（这里就是图片样本，为3）。
在这里插入图片描述
接下来，进行卷积操作，卷积核中的27个数字与分别与样本对应相乘后，再进行求和，得到第一个结果。依次进行，最终得到 4×4 的结果。

上面步骤完成后，由于只有一个卷积核，所以最终得到的结果为 4×4×1 ， out_channels 为 1 。

在实际应用中，都会使用多个卷积核。这里如果再加一个卷积核，就会得到 4×4×2 的结果。

在这里插入图片描述
上面提到的 channels 分为三种：

最初输入的图片样本的 channels ，取决于图片类型，比如RGB；
卷积操作完成后输出的 out_channels ，取决于卷积核的数量。此时的 out_channels 也会作为下一次卷积时的卷积核的 in_channels；
卷积核中的 in_channels ，刚刚2中已经说了，就是上一次卷积的 out_channels ，如果是第一次做卷积，就是1中样本图片的 channels 。
在CNN中，想搞清楚每一层的传递关系，主要就是 height,width 的变化情况，和 channels 的变化情况。

四、池化层

4.1 最大池化

通道数不变，图像大小缩小，pytorch中用的是MaxPool2d。
在这里插入图片描述

# -*- coding: utf-8 -*-
"""
Created on Tue Oct 12 10:14:35 2021

@author: 86493
"""
import torch

input = [3, 4, 6, 5,
         2, 4, 6, 8,
         1, 6, 7, 8,
         9, 7, 4, 6,
         ]
# batch=1,channel=1,size=4×4
input = torch.Tensor(input).view(1, 1, 4, 4)

maxpooling_layer = torch.nn.MaxPool2d(kernel_size = 2)

output = maxpooling_layer(input)
print(output)
# tensor([[[[4., 8.],
#           [9., 8.]]]])

五、完整CNN小栗子

定义一个卷积层：输入通道数、输出通道数、卷积核的大小（长和宽）。
在这里插入图片描述
卷积层和池化层对【输入图像的维度大小】没有要求（对输入的通道数有要求，如上面输入的channel为6则出错），最后的分类器最在乎——此处可以偷懒，即定义模型时先不定义全连接层，构造一个随机的input，把最后的维度输出下后再加回FC层完成模型的训练。

在这里插入图片描述
上图的最大池化MaxPool2d做一个就行（它和sigmoid、relu等一样，因为没有权重，但是有权重的就要每个层单独做一个实例）。
PS：因为最后要用交叉熵损失，所以最后一层是不用激活的。

# -*- coding: utf-8 -*-
"""
Created on Tue Oct 19 15:02:11 2021

@author: 86493
"""
import torch 
import torch.nn as nn
from torchvision import transforms
from torchvision import datasets 
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch.optim as optim 
import matplotlib.pyplot as plt 

# 准备数据
batch_size = 64
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.1307,), (0.3081))])
train_dataset = datasets.MNIST(root = '../dataset/mnist/', 
                               train = True,
                               download = True,
                               transform = transform)
train_loader = DataLoader(train_dataset,
                          shuffle = True,
                          batch_size = batch_size)
test_dataset = datasets.MNIST(root = '../dataset/mnist/',
                              train = False,
                              download = True,
                              transform = transform)
test_loader = DataLoader(test_dataset,
                         shuffle = False,
                         batch_size = batch_size)

# CNN网络
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size = 5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size = 5)
        self.pooling = nn.MaxPool2d(2)
        self.fc = nn.Linear(320, 10)
        
    def forward(self, x):
        # Flatten data from (n, 1, 28, 28)to(n, 784)
        batch_size = x.size(0)
        x = F.relu(self.pooling(self.conv1(x)))
        x = F.relu(self.pooling(self.conv2(x)))
        # flatten
        x = x.view(batch_size, -1)
        # print("x.shape", x.shape)
        x = self.fc(x)
        return x
    
model = Net()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# 有多个显卡时则可以填其他cuda号
model.to(device)
# 把模型的参数等放到显卡中

# 设计损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(),
                      lr = 0.01,
                      momentum = 0.5)

def train(epoch):
    running_loss = 0.0 
    for batch_idx, data in enumerate(train_loader, 0):
        # 1.准备数据
        inputs, target = data 
        # 迁移到GPU，注意迁移的device要和模型的device在同一块显卡
        inputs, target = inputs.to(device), target.to(device)
        # 2.前向传递
        outputs = model(inputs)
        loss = criterion(outputs, target)
        # 3.反向传播
        optimizer.zero_grad()
        loss.backward()
        # 4.更新参数
        optimizer.step()
        
        running_loss += loss.item()
        if batch_idx % 300 == 299:
            print('[%d, %5d] loss:%.3f'%
                  (epoch + 1,
                   batch_idx + 1,
                   running_loss / 300))
            running_loss = 0.0
            

def test():
    correct = 0
    total = 0
    with torch.no_grad():
        for data in test_loader:
            images, labels = data
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            # 求出每一行(样本)的最大值的下标,dim = 1即行的维度
            # 返回最大值和最大值所在的下标
            _, predicted = torch.max(outputs.data, dim = 1)
            # label矩阵为N × 1
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
        print('accuracy on test set :%d %% ' % (100 * correct / total))
        return correct / total


if __name__ == '__main__':
    epoch_list = []
    acc_list = []
    
    for epoch in range(10):
        train(epoch)
        acc = test()
        epoch_list.append(epoch)
        acc_list.append(acc)
        
    plt.plot(epoch_list, acc_list)
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.show()

结果为如下，并且可视化的accuracy：
在这里插入图片描述

[1,   300] loss:0.675
[1,   600] loss:0.180
[1,   900] loss:0.131
accuracy on test set :96 % 
[2,   300] loss:0.103
[2,   600] loss:0.093
[2,   900] loss:0.082
accuracy on test set :97 % 
[3,   300] loss:0.075
[3,   600] loss:0.070
[3,   900] loss:0.069
accuracy on test set :98 % 
[4,   300] loss:0.058
[4,   600] loss:0.059
[4,   900] loss:0.061
accuracy on test set :98 % 
[5,   300] loss:0.050
[5,   600] loss:0.055
[5,   900] loss:0.051
accuracy on test set :98 % 
[6,   300] loss:0.048
[6,   600] loss:0.050
[6,   900] loss:0.043
accuracy on test set :98 % 
[7,   300] loss:0.038
[7,   600] loss:0.042
[7,   900] loss:0.047
accuracy on test set :98 % 
[8,   300] loss:0.038
[8,   600] loss:0.039
[8,   900] loss:0.041
accuracy on test set :98 % 
[9,   300] loss:0.039
[9,   600] loss:0.035
[9,   900] loss:0.037
accuracy on test set :98 % 
[10,   300] loss:0.035
[10,   600] loss:0.035
[10,   900] loss:0.034
accuracy on test set :98 %

六、作业

在这里插入图片描述

Reference

（1）PyTorch 深度学习实践第10讲，刘二系列
（2）b站视频：https://www.bilibili.com/video/BV1Y7411d7Ys?p=10
（3）官方文档：https://pytorch.org/docs/stable/_modules/torch/nn/modules/conv.html#Conv2d
（4）吴恩达网易云课程：https://study.163.com/my#/smarts
（5）刘洪普老师博客：https://liuii.github.io/
（6）某同学的笔记：http://biranda.top/archives/page/2/

本文含有隐藏内容，请开通VIP 后查看

【PyTorch基础教程11】详解CNN卷积神经网络

内容总结

文章目录

零、简单回顾

一、全连接网络

二、卷积神经网络CNN

2.1 单通道卷积运算

2.2 多通道卷积运算

2.3 卷积层

2.4 代码栗子

三、重要参数介绍

3.1 padding详解

3.2 stride详解

3.3 nn.Conv2d详解

四、池化层

4.1 最大池化

五、完整CNN小栗子

六、作业

Reference

网站公告

今日签到

热门文章

最新发布