1. SE 注意力模块
ResNet(Residual Network)是一种经典的深度卷积神经网络架构,通过引入残差连接(skip connection)解决了深层网络中的梯度消失问题,使得网络可以训练得更深。SE(Squeeze-and-Excitation)模块是一种注意力机制,通过学习通道之间的依赖关系来增强特征表示能力。将SE模块融合到ResNet中,可以进一步提升模型的性能。
ResNet 融合 SE 模块的基本思路
SE模块的核心思想是通过全局池化(Squeeze)和全连接层(Excitation)来动态调整每个通道的权重,从而增强重要特征并抑制不重要的特征。将SE模块插入到ResNet的残差块中,可以增强每个残差块的特征提取能力。
具体实现步骤
SE模块的结构:
Squeeze:通过全局平均池化(Global Average Pooling)将每个通道的空间维度压缩为一个标量。
Excitation:通过两个全连接层(FC层)学习通道之间的依赖关系,生成每个通道的权重。
Scale:将学习到的权重与原始特征图相乘,得到加权后的特征图。
将SE模块插入ResNet的残差块:
在ResNet的每个残差块中,通常在卷积层之后、残差连接之前插入SE模块。
SE模块会对卷积输出的特征图进行加权,然后再与残差连接相加。
代码实现(以PyTorch为例):
下面是一个简单的ResNet残差块融合SE模块的实现:
import torch
import torch.nn as nn
import torch.nn.functional as F
class SEBlock(nn.Module):
def __init__(self, channel, reduction=16):
super(SEBlock, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
return x * y.expand_as(x)
class ResNetBlockWithSE(nn.Module):
def __init__(self, in_channels, out_channels, stride=1, reduction=16):
super(ResNetBlockWithSE, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
self.se = SEBlock(out_channels, reduction)
self.shortcut = nn.Sequential()
if stride != 1 or in_channels != out_channels:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels)
)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out = self.se(out) # 应用SE模块
out += self.shortcut(x)
out = F.relu(out)
return out
网络结构:
在ResNet的每个残差块中插入SE模块后,整个网络的结构保持不变,但每个残差块的特征提取能力得到了增强。
你可以根据需要选择在哪些残差块中插入SE模块,通常是在较深的层中插入,因为这些层的特征更加抽象和高级。
优点
增强特征表示:SE模块通过动态调整通道权重,增强了重要特征的表示能力。
轻量级:SE模块的计算开销较小,不会显著增加模型的参数量和计算复杂度。
通用性:SE模块可以轻松集成到其他网络架构中,如ResNet、Inception等。
2. resnet+SE
在ResNet-34中添加SE(Squeeze-and-Excitation)模块,需要对ResNet的每个残差块进行修改,将SE模块插入到残差块中。以下是完整的ResNet-34融合SE模块的实现代码(基于PyTorch框架)。
代码如下:
import torch
import torch.nn as nn
import torch.nn.functional as F
# 定义SE模块
class SEBlock(nn.Module):
def __init__(self, channel, reduction=16):
super(SEBlock, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1) # 全局平均池化
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction, bias=False), # 降维
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel, bias=False), # 升维
nn.Sigmoid() # 激活函数,生成通道权重
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c) # Squeeze操作
y = self.fc(y).view(b, c, 1, 1) # Excitation操作
return x * y.expand_as(x) # Scale操作
# 定义残差块(带SE模块)
class ResNetBlockWithSE(nn.Module):
def __init__(self, in_channels, out_channels, stride=1, downsample=None, reduction=16):
super(ResNetBlockWithSE, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
self.se = SEBlock(out_channels, reduction) # 添加SE模块
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = F.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.se(out) # 应用SE模块
if self.downsample is not None:
identity = self.downsample(x)
out += identity
out = F.relu(out)
return out
# 定义ResNet-34
class ResNet34WithSE(nn.Module):
def __init__(self, num_classes=1000, reduction=16):
super(ResNet34WithSE, self).__init__()
self.in_channels = 64
# 初始卷积层
self.conv1 = nn.Conv2d(3, self.in_channels, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.in_channels)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# ResNet的四个阶段
self.layer1 = self._make_layer(64, 3, stride=1, reduction=reduction)
self.layer2 = self._make_layer(128, 4, stride=2, reduction=reduction)
self.layer3 = self._make_layer(256, 6, stride=2, reduction=reduction)
self.layer4 = self._make_layer(512, 3, stride=2, reduction=reduction)
# 全局平均池化和全连接层
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512, num_classes)
def _make_layer(self, out_channels, num_blocks, stride, reduction):
downsample = None
if stride != 1 or self.in_channels != out_channels:
downsample = nn.Sequential(
nn.Conv2d(self.in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels)
layers = []
layers.append(ResNetBlockWithSE(self.in_channels, out_channels, stride, downsample, reduction))
self.in_channels = out_channels
for _ in range(1, num_blocks):
layers.append(ResNetBlockWithSE(self.in_channels, out_channels, reduction=reduction))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
# 测试模型
if __name__ == "__main__":
model = ResNet34WithSE(num_classes=1000)
input_tensor = torch.randn(1, 3, 224, 224) # 输入张量 (batch_size, channels, height, width)
output = model(input_tensor)
print(output.shape) # 输出形状应为 (1, 1000)
代码说明
SE模块:
SEBlock
实现了 Squeeze-and-Excitation 操作,通过全局平均池化和两个全连接层生成通道权重。在残差块中,SE模块对卷积输出的特征图进行加权。
残差块:
ResNetBlockWithSE
是 ResNet 的基础残差块,包含两个卷积层和一个 SE 模块。如果输入和输出的通道数或空间尺寸不一致,则通过
downsample
进行调整。
ResNet-34 结构:
ResNet34WithSE
定义了完整的 ResNet-34 结构,包含四个阶段(layer1
到layer4
),每个阶段由多个残差块组成。在
_make_layer
方法中,构建每个阶段的残差块,并将 SE 模块插入到每个残差块中。
测试:
输入一个随机张量(例如
(1, 3, 224, 224)
),模型输出形状为(1, 1000)
,表示 1000 个类别的分类结果。
模型的输出如下:
ResNet34WithSE(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): ResNetBlockWithSE(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=64, out_features=4, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=4, out_features=64, bias=False)
(3): Sigmoid()
)
)
)
(1): ResNetBlockWithSE(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=64, out_features=4, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=4, out_features=64, bias=False)
(3): Sigmoid()
)
)
)
(2): ResNetBlockWithSE(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=64, out_features=4, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=4, out_features=64, bias=False)
(3): Sigmoid()
)
)
)
)
(layer2): Sequential(
(0): ResNetBlockWithSE(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=128, out_features=8, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=8, out_features=128, bias=False)
(3): Sigmoid()
)
)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ResNetBlockWithSE(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=128, out_features=8, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=8, out_features=128, bias=False)
(3): Sigmoid()
)
)
)
(2): ResNetBlockWithSE(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=128, out_features=8, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=8, out_features=128, bias=False)
(3): Sigmoid()
)
)
)
(3): ResNetBlockWithSE(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=128, out_features=8, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=8, out_features=128, bias=False)
(3): Sigmoid()
)
)
)
)
(layer3): Sequential(
(0): ResNetBlockWithSE(
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=256, out_features=16, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=16, out_features=256, bias=False)
(3): Sigmoid()
)
)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ResNetBlockWithSE(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=256, out_features=16, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=16, out_features=256, bias=False)
(3): Sigmoid()
)
)
)
(2): ResNetBlockWithSE(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=256, out_features=16, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=16, out_features=256, bias=False)
(3): Sigmoid()
)
)
)
(3): ResNetBlockWithSE(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=256, out_features=16, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=16, out_features=256, bias=False)
(3): Sigmoid()
)
)
)
(4): ResNetBlockWithSE(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=256, out_features=16, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=16, out_features=256, bias=False)
(3): Sigmoid()
)
)
)
(5): ResNetBlockWithSE(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=256, out_features=16, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=16, out_features=256, bias=False)
(3): Sigmoid()
)
)
)
)
(layer4): Sequential(
(0): ResNetBlockWithSE(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=512, out_features=32, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=32, out_features=512, bias=False)
(3): Sigmoid()
)
)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): ResNetBlockWithSE(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=512, out_features=32, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=32, out_features=512, bias=False)
(3): Sigmoid()
)
)
)
(2): ResNetBlockWithSE(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(se): SEBlock(
(avg_pool): AdaptiveAvgPool2d(output_size=1)
(fc): Sequential(
(0): Linear(in_features=512, out_features=32, bias=False)
(1): ReLU(inplace=True)
(2): Linear(in_features=32, out_features=512, bias=False)
(3): Sigmoid()
)
)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=512, out_features=10, bias=True)
)
torch.Size([1, 10])