引言:深度神经网络的瓶颈
随着深度学习的发展,研究者发现简单地增加网络层数反而会导致模型性能下降——这种现象称为退化问题(Degradation Problem)。传统深层网络面临的主要挑战:
梯度消失/爆炸:反向传播时梯度指数级衰减或增大
训练困难:深层网络难以收敛到理想状态
性能饱和:深度增加时准确率反而下降
一、残差学习的核心思想
2015年,何恺明团队提出的残差网络(ResNet)通过引入跳跃连接(Skip Connection) 解决了这一难题,核心公式:
y=F(x,{Wi})+xy=F(x,{Wi})+x
其中:
$\mathbf{x}$:输入特征
$\mathcal{F}$:残差函数
$\mathbf{y}$:输出特征
这种设计允许网络学习输入与输出之间的残差(差值),而非直接映射。当原始映射接近恒等映射时,学习残差$\mathcal{F} = \mathbf{y} - \mathbf{x}$比学习完整映射更容易。
二、残差块结构解析
import torch
import torch.nn as nn
class ResidualBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride=1):
super().__init__()
# 第一卷积层
self.conv1 = nn.Conv2d(in_channels, out_channels,
kernel_size=3, stride=stride,
padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
# 第二卷积层
self.conv2 = nn.Conv2d(out_channels, out_channels,
kernel_size=3, stride=1,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
# 跳跃连接处理维度变化
self.shortcut = nn.Sequential()
if stride != 1 or in_channels != out_channels:
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels)
)
def forward(self, x):
identity = self.shortcut(x) # 保留原始输入
out = self.conv1(x)
out = self.bn1(out)
out = nn.ReLU()(out)
out = self.conv2(out)
out = self.bn2(out)
out += identity # 关键步骤:添加跳跃连接
out = nn.ReLU()(out)
return out
三、ResNet网络架构
ResNet由多个残差块堆叠而成,不同深度的网络配置:
网络层 | ResNet-18 | ResNet-34 | ResNet-50 |
---|---|---|---|
conv1 | 7×7, 64, stride 2 | ||
pool1 | 3×3 max pool, stride 2 | ||
conv2_x | 3×3, 64 3×3, 64 ×2 |
×3 | 1×1, 64 3×3, 64 1×1, 256 ×3 |
conv3_x | 3×3, 128 3×3, 128 ×2 |
×4 | 1×1, 128 3×3, 128 1×1, 512 ×4 |
conv4_x | 3×3, 256 3×3, 256 ×2 |
×6 | 1×1, 256 3×3, 256 1×1, 1024 ×6 |
conv5_x | 3×3, 512 3×3, 512 ×2 |
×3 | 1×1, 512 3×3, 512 1×1, 2048 ×3 |
全连接层 | 1000-d fc |
四、PyTorch实现ResNet-18
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000):
super().__init__()
self.in_channels = 64
# 初始卷积层
self.conv1 = nn.Conv2d(3, 64, kernel_size=7,
stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# 残差层
self.layer1 = self._make_layer(block, 64, layers[0], stride=1)
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
# 分类头
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512, num_classes)
def _make_layer(self, block, out_channels, blocks, stride=1):
layers = []
# 第一个块可能需要下采样
layers.append(block(self.in_channels, out_channels, stride))
self.in_channels = out_channels
# 剩余块
for _ in range(1, blocks):
layers.append(block(out_channels, out_channels))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
# 实例化ResNet-18
def resnet18(num_classes=1000):
return ResNet(ResidualBlock, [2, 2, 2, 2], num_classes)
五、关键优势分析
梯度传播优化:跳跃连接提供梯度高速公路
恒等映射保障:当残差接近0时自动退化为恒等函数
参数效率:相比传统网络,参数量更少但性能更好
六、训练技巧与注意事项
权重初始化:使用He初始化
for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
学习率调度:余弦退火策略
数据增强:随机裁剪、水平翻转
优化器选择:SGD with momentum (0.9)