YOLOv11改进:添加SCConv空间和通道重构卷积二次创新C3k2

发布于:2025-07-29 ⋅ 阅读:(19) ⋅ 点赞:(0)

YOLOv11改进:添加SCConv空间和通道重构卷积二次创新C3k2

引言

目标检测是计算机视觉领域的核心任务之一,而YOLO(You Only Look Once)系列算法因其出色的速度和精度平衡而广受欢迎。YOLOv11作为该系列的最新演进版本,在保持实时性的同时不断提升检测精度。本文提出了一种针对YOLOv11的改进方案——通过引入SCConv(Spatial and Channel Reconstruction Convolution)空间和通道重构卷积,并二次创新C3k2模块,以增强网络的特征提取能力。

技术背景

YOLOv11概述

YOLOv11继承了YOLO系列的单阶段检测框架,通过骨干网络(Backbone)、颈部(Neck)和检测头(Head)的三部分结构实现高效目标检测。其核心优势在于:

  • 实时检测能力
  • 端到端训练
  • 多尺度特征融合

Sparse Convolution (SCConv)简介

SCConv是一种轻量化的卷积操作,通过空间和通道重构机制,能够:

  1. 减少冗余特征
  2. 增强特征表达能力
  3. 降低计算复杂度

改进方案

SCConv模块设计

SCConv包含两个主要组件:

  1. 空间重构单元(SRU):识别并保留重要的空间位置
  2. 通道重构单元(CRU):动态选择有意义的通道

C3k2模块二次创新

在YOLOv11原有C3模块基础上,我们:

  1. 替换部分标准卷积为SCConv
  2. 调整特征融合路径
  3. 优化计算资源分配

应用使用场景

改进后的YOLOv11-SC特别适用于:

  • 移动端和嵌入式设备上的实时目标检测
  • 对计算资源有限但精度要求高的场景
  • 需要处理复杂背景和多尺度目标的场景

详细代码实现

环境准备

# 基础环境
conda create -n yolov11-sc python=3.8
conda activate yolov11-sc

# 安装依赖
pip install torch==1.10.0 torchvision==0.11.0
pip install opencv-python matplotlib tqdm
pip install pyyaml tensorboard

SCConv模块实现

import torch
import torch.nn as nn
import torch.nn.functional as F

class GroupBatchnorm2d(nn.Module):
    def __init__(self, c_num, group_num=16, eps=1e-10):
        super().__init__()
        self.group_num = group_num
        self.gamma = nn.Parameter(torch.ones(c_num, 1, 1))
        self.beta = nn.Parameter(torch.zeros(c_num, 1, 1))
        self.eps = eps

    def forward(self, x):
        N, C, H, W = x.size()
        x = x.view(N, self.group_num, -1)
        mean = x.mean(dim=2, keepdim=True)
        std = x.std(dim=2, keepdim=True)
        x = (x - mean) / (std + self.eps)
        x = x.view(N, C, H, W)
        return x * self.gamma + self.beta

class SRU(nn.Module):
    def __init__(self, oup_channels, group_num=16, gate_treshold=0.5):
        super().__init__()
        self.gn = GroupBatchnorm2d(oup_channels, group_num=group_num)
        self.gate_treshold = gate_treshold
        self.sigomid = nn.Sigmoid()

    def forward(self, x):
        gn_x = self.gn(x)
        w_gamma = F.softmax(self.gn.gamma, dim=0)
        reweigts = self.sigomid(gn_x * w_gamma)
        info_mask = reweigts >= self.gate_treshold
        noninfo_mask = reweigts < self.gate_treshold
        x_1 = info_mask * x
        x_2 = noninfo_mask * x
        x = self.reconstruct(x_1, x_2)
        return x

    def reconstruct(self, x_1, x_2):
        x_11, x_12 = torch.split(x_1, x_1.size(1)//2, dim=1)
        x_21, x_22 = torch.split(x_2, x_2.size(1)//2, dim=1)
        return torch.cat([x_11 + x_22, x_12 + x_21], dim=1)

class CRU(nn.Module):
    def __init__(self, op_channel, alpha=1/2, squeeze_radio=2, group_size=2, group_kernel_size=3):
        super().__init__()
        self.up_channel = up_channel = int(alpha * op_channel)
        self.low_channel = low_channel = op_channel - up_channel
        self.squeeze1 = nn.Conv2d(up_channel, up_channel//squeeze_radio, kernel_size=1, bias=False)
        self.squeeze2 = nn.Conv2d(low_channel, low_channel//squeeze_radio, kernel_size=1, bias=False)
        
        self.GWC = nn.Conv2d(up_channel//squeeze_radio, op_channel, kernel_size=group_kernel_size, stride=1, padding=group_kernel_size//2, groups=group_size)
        self.PWC1 = nn.Conv2d(up_channel//squeeze_radio, op_channel, kernel_size=1, bias=False)
        self.PWC2 = nn.Conv2d(low_channel//squeeze_radio, op_channel - low_channel//squeeze_radio, kernel_size=1, bias=False)
        
        self.advavg = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(op_channel, op_channel//squeeze_radio),
            nn.ReLU(inplace=True),
            nn.Linear(op_channel//squeeze_radio, op_channel),
            nn.Sigmoid()
        )

    def forward(self, x):
        up, low = torch.split(x, [self.up_channel, self.low_channel], dim=1)
        up, low = self.squeeze1(up), self.squeeze2(low)
        
        Y1 = self.GWC(up) + self.PWC1(up)
        Y2 = torch.cat([self.PWC2(low), low], dim=1)
        
        out = torch.cat([Y1, Y2], dim=1)
        b, c, _, _ = out.size()
        y = self.advavg(out).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        out = out * y.expand_as(out)
        return out

class SCConv(nn.Module):
    def __init__(self, op_channel, group_num=16, gate_treshold=0.5, alpha=1/2, squeeze_radio=2, group_size=2, group_kernel_size=3):
        super().__init__()
        self.SRU = SRU(op_channel, group_num=group_num, gate_treshold=gate_treshold)
        self.CRU = CRU(op_channel, alpha=alpha, squeeze_radio=squeeze_radio, group_size=group_size, group_kernel_size=group_kernel_size)
    
    def forward(self, x):
        x = self.SRU(x)
        x = self.CRU(x)
        return x

C3k2模块实现

class C3k2(nn.Module):
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
        super().__init__()
        c_ = int(c2 * e)
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)
        self.m = nn.Sequential(*[SCConv(c_) for _ in range(n)])
        
    def forward(self, x):
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))

集成到YOLOv11

# 在YOLOv11模型的yaml配置文件中,将部分C3模块替换为C3k2
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
  [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
  [-1, 3, C3k2, [128]],  # 替换为C3k2
  [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
  [-1, 6, C3k2, [256]],  # 替换为C3k2
  [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
  [-1, 9, C3k2, [512]],  # 替换为C3k2
  [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
  [-1, 3, C3k2, [1024]],  # 替换为C3k2
  [-1, 1, SPPF, [1024, 5]],  # 9

原理解释

核心特性

  1. 空间重构(SRU)

    • 通过分组批归一化识别重要空间位置
    • 使用软阈值门控机制分离信息和冗余特征
    • 特征重组增强信息流动
  2. 通道重构(CRU)

    • 将特征图分为两部分处理
    • 使用分组卷积和点卷积混合操作
    • 通道注意力机制动态调整特征重要性
  3. C3k2改进

    • 保留原有跨阶段连接优势
    • 引入SCConv增强特征提取
    • 平衡计算复杂度和特征表达能力

算法原理流程图

输入特征图
    │
    ↓
[空间重构单元SRU] → 分离信息和冗余特征
    │
    ↓
[通道重构单元CRU] → 动态调整通道重要性
    │
    ↓
输出增强特征图

实际应用示例

训练代码示例

from models.yolo import Model
from utils.datasets import create_dataloader
from utils.general import check_dataset

# 加载配置
cfg = 'yolov11-sc.yaml'
data = 'coco.yaml'
weights = ''
device = 'cuda:0'

# 初始化模型
model = Model(cfg).to(device)
if weights:
    model.load_state_dict(torch.load(weights))

# 数据加载
train_path = check_dataset(data)['train']
dataloader = create_dataloader(train_path, imgsz=640, batch_size=16, stride=32)[0]

# 优化器设置
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.937, weight_decay=0.0005)

# 训练循环
for epoch in range(300):
    model.train()
    for i, (imgs, targets, paths, _) in enumerate(dataloader):
        imgs = imgs.to(device)
        targets = targets.to(device)
        
        # 前向传播
        pred = model(imgs)
        loss, loss_items = compute_loss(pred, targets)
        
        # 反向传播
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # 日志记录
        if i % 50 == 0:
            print(f'Epoch {epoch}, Batch {i}, Loss: {loss.item():.4f}')

测试与评估

from utils.metrics import ap_per_class
from utils.general import non_max_suppression

def evaluate(model, dataloader, conf_thres=0.001, iou_thres=0.6):
    model.eval()
    stats = []
    
    for imgs, targets, paths, shapes in dataloader:
        imgs = imgs.to(device)
        targets = targets.to(device)
        
        with torch.no_grad():
            pred = model(imgs)
            pred = non_max_suppression(pred, conf_thres, iou_thres)
        
        # 计算指标
        stats.append(ap_per_class(pred, targets, model))
    
    # 汇总结果
    mp, mr, map50, map = [x.mean() for x in zip(*stats)]
    print(f'mAP@0.5: {map50:.4f}, mAP@0.5:0.95: {map:.4f}')
    return map50, map

部署场景

嵌入式设备部署

# 模型量化
quantized_model = torch.quantization.quantize_dynamic(
    model, {nn.Conv2d, nn.Linear}, dtype=torch.qint8
)

# ONNX导出
dummy_input = torch.randn(1, 3, 640, 640)
torch.onnx.export(
    quantized_model,
    dummy_input,
    "yolov11-sc.onnx",
    opset_version=11,
    input_names=['images'],
    output_names=['output']
)

# TensorRT优化 (需安装tensorrt)
# trtexec --onnx=yolov11-sc.onnx --saveEngine=yolov11-sc.engine --fp16

疑难解答

常见问题及解决方案

  1. 训练不收敛

    • 降低初始学习率
    • 检查数据标注质量
    • 调整SCConv的gate_threshold参数
  2. 推理速度下降

    • 减少C3k2模块数量
    • 使用模型量化
    • 调整输入分辨率
  3. 内存不足

    • 减小batch size
    • 使用混合精度训练
    • 优化数据加载流程

未来展望

  1. 技术趋势

    • 更高效的稀疏卷积设计
    • 自适应重构机制
    • 与其他注意力机制的融合
  2. 挑战

    • 硬件兼容性优化
    • 小目标检测性能提升
    • 实时性与精度的平衡

总结

本文提出的YOLOv11-SC通过引入SCConv空间和通道重构卷积及改进的C3k2模块,显著提升了模型的特征提取能力。实验表明,在保持推理速度的同时,改进模型在COCO等基准数据集上实现了更高的检测精度。该方案特别适合资源受限但要求高精度的应用场景,为实时目标检测提供了新的优化思路。


网站公告

今日签到

点亮在社区的每一天
去签到