Day43 复习日

发布于:2025-09-03 ⋅ 阅读:(21) ⋅ 点赞:(0)

@浙大疏锦行

kaggle找到一个图像数据集,用cnn网络进行训练并且用grad-cam做可视化

进阶:并拆分成多个文件

回顾梳理一下图像处理整个流程:

        1.导入必要的库

        kagglehub:从kaggle中下载数据集

        os:处理文件路径和目录操作

        numpy:数值计算和数组操作

        torch:pytorch深度学习框架,用于构建和训练神经网络

        matplotlib和pil:图像处理和显示

        cv2:opencv库,用于grad-cam热力图的生成和处理

        2.下载并准备数据集

        3.数据预处理和加载

                数据预处理管道,调整图像像素,转化为pytorch张量,进行像素值归一化,标准化

                数据集加载,假设数据集按类别组织在不同的文件夹中,获取类别名称列表

                数据加载器,dataloader将数据集分批次加载,支持打乱和并行加载;设置batch_size,以及shuffle=ture,训练集打乱顺序,测试集不打乱,增强鲁棒性,防止模型学习到顺序。

        4.CNN模型定义

        前面是老三行,  卷积层+全连接层自己定义

        5.模型训练

- 训练配置 :
- device :自动选择GPU或CPU
- criterion :交叉熵损失,适用于分类任务
- optimizer :Adam优化器,学习率0.001
- 训练循环 :
- model.train() :设置为训练模式
- 每个epoch遍历所有训练数据
- 前向传播 :计算模型输出和损失
- 反向传播 :计算梯度并更新参数
- 性能统计 :计算每个epoch的平均损失和准确率
- 关键步骤 :
- optimizer.zero_grad() :清除之前的梯度
- loss.backward() :反向传播计算梯度
- optimizer.step() :更新模型参数

        6.Grad-CAM实现

Grad-CAM原理:通过梯度计算卷积层特征图对类别预测的重要性,生成类别激活热力图  

核心组件:

钩子(Hooks):

register_forward_hook:保存目标层的特征图  

register_backward_hook:保存目标层的梯度    

权重计算:

weights = torch.mean(self.gradients, dim=(2, 3)):对空间维度求平均,得到每个特征图的重要性权重    

热力图生成:

cam = torch.sum(weights * self.feature_maps, dim=1):加权组合特征图  

torch.relu(cam):只保留对类别预测有积极贡献的区域

归一化和上采样:将热力图调整到输入图像大小

        7.可视化

- 图像预处理 :与训练时相同的预处理步骤
- 模型预测 :
- model.eval() :设置为评估模式
- torch.no_grad() :禁用梯度计算,提高效率
- torch.softmax() :将输出转换为概率
- Grad-CAM应用 :
- 创建GradCAM实例,指定目标层为'features'
- 生成热力图
- 可视化 :
- 显示原始图像、Grad-CAM热力图和叠加结果
- 标注预测类别和置信度

import kagglehub
import os
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader, Subset
import matplotlib.pyplot as plt
from PIL import Image
import cv2

# 1. 下载并准备数据集
print("正在下载飞机分类数据集...")
path = kagglehub.dataset_download("sleppyfish/aircraft-classification-dataset")
print("数据集下载路径:", path)

# 数据集路径设置
data_dir = path
train_dir = os.path.join(data_dir, "train")
test_dir = os.path.join(data_dir, "test")

# 2. 数据预处理和加载
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # 调整图像大小为224x224
    transforms.ToTensor(),          # 转换为张量
    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))  # 标准化
])

# 加载数据集
train_dataset = datasets.ImageFolder(root=train_dir, transform=transform)
test_dataset = datasets.ImageFolder(root=test_dir, transform=transform)

# 获取类别信息
class_names = train_dataset.classes
num_classes = len(class_names)
print(f"类别数量: {num_classes}, 类别名称: {class_names}")

# 创建数据加载器
batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=0)

# 3. 定义CNN模型
class SimpleCNN(nn.Module):
    def __init__(self, num_classes):
        super(SimpleCNN, self).__init__()
        # 卷积层
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.relu3 = nn.ReLU()
        self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # 保存最后一个卷积层的输出,用于Grad-CAM
        self.features = None
        
        # 全连接层
        self.fc1 = nn.Linear(128 * 28 * 28, 512)  # 224/2/2/2=28
        self.relu4 = nn.ReLU()
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(512, num_classes)
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.pool1(x)
        
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.pool2(x)
        
        x = self.conv3(x)
        x = self.relu3(x)
        self.features = x  # 保存特征图用于Grad-CAM
        x = self.pool3(x)
        
        x = x.view(x.size(0), -1)  # 展平
        x = self.fc1(x)
        x = self.relu4(x)
        x = self.dropout(x)
        x = self.fc2(x)
        return x

# 4. 初始化模型、损失函数和优化器
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"使用设备: {device}")
model = SimpleCNN(num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 5. 定义训练函数
def train(model, train_loader, criterion, optimizer, device, epochs=5):
    model.train()
    for epoch in range(epochs):
        running_loss = 0.0
        correct = 0
        total = 0
        
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)
            
            # 前向传播
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            # 反向传播和优化
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            # 统计
            running_loss += loss.item() * images.size(0)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
        
        epoch_loss = running_loss / len(train_loader.dataset)
        epoch_acc = 100 * correct / total
        print(f'Epoch {epoch+1}/{epochs}, Loss: {epoch_loss:.4f}, Accuracy: {epoch_acc:.2f}%')

# 6. 定义Grad-CAM可视化函数
class GradCAM:
    def __init__(self, model, target_layer):
        self.model = model
        self.target_layer = target_layer
        self.feature_maps = None
        self.gradients = None
        
        # 注册前向和反向钩子
        self.hook_handles = []
        # 修改1: 确保钩子正确注册
        handle_forward = self.target_layer.register_forward_hook(self.hook_fn_forward)
        handle_backward = self.target_layer.register_backward_hook(self.hook_fn_backward)
        self.hook_handles.append(handle_forward)
        self.hook_handles.append(handle_backward)

    def hook_fn_forward(self, module, input, output):
        self.feature_maps = output.detach()

    def hook_fn_backward(self, module, grad_input, grad_output):
        # 修改2: 正确捕获梯度 (grad_output是一个元组,取第一个元素)
        self.gradients = grad_output[0].detach()

    def __call__(self, x, class_idx):
        # 修改3: 确保执行反向传播以生成梯度
        self.model.zero_grad()
        output = self.model(x)
        if class_idx is None:
            class_idx = torch.argmax(output)
        # 获取目标类别的分数并执行反向传播
        class_score = output[:, class_idx]
        class_score.backward()
        
        # 计算权重 (原错误位置)
        weights = torch.mean(self.gradients, dim=(2, 3), keepdim=True)
        cam = torch.sum(weights * self.feature_maps, dim=1, keepdim=True)
        cam = F.relu(cam)
        
        # 归一化
        cam = cam - torch.min(cam)
        if torch.max(cam) > 0:
            cam = cam / torch.max(cam)
        
        # 调整大小到原始图像尺寸
        cam = nn.functional.interpolate(cam.unsqueeze(0), size=(224, 224), mode='bilinear', align_corners=False)
        cam = cam.squeeze().cpu().detach().numpy()
        
        # 清理钩子
        for handle in self.hook_handles:
            handle.remove()
        
        return cam

# 7. 定义可视化函数
def visualize_gradcam(image_path, model, device, class_names):
    # 加载和预处理图像
    img = Image.open(image_path).convert('RGB')
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
    ])
    input_tensor = transform(img).unsqueeze(0).to(device)
    
    # 使用原始PIL图像进行显示
    original_img = np.array(img.resize((224, 224)))
    
    # 预测类别
    model.eval()
    with torch.no_grad():
        output = model(input_tensor)
        _, predicted_idx = torch.max(output, 1)
        predicted_class = class_names[predicted_idx.item()]
        confidence = torch.softmax(output, dim=1)[0, predicted_idx.item()].item()
    
    # 应用Grad-CAM
    grad_cam = GradCAM(model, 'features')
    cam = grad_cam(input_tensor, predicted_idx.item())
    
    # 生成热力图
    heatmap = cv2.applyColorMap(np.uint8(255 * cam), cv2.COLORMAP_JET)
    heatmap = cv2.cvtColor(heatmap, cv2.COLOR_BGR2RGB)
    heatmap = np.float32(heatmap) / 255
    
    # 将热力图叠加到原始图像上
    superimposed_img = heatmap * 0.4 + np.float32(original_img) / 255
    superimposed_img = np.uint8(255 * superimposed_img / np.max(superimposed_img))
    
    # 显示结果
    plt.figure(figsize=(12, 4))
    plt.subplot(131)
    plt.imshow(original_img)
    plt.title('Original Image')
    plt.axis('off')
    
    plt.subplot(132)
    plt.imshow(cam, cmap='jet')
    plt.title('Grad-CAM Heatmap')
    plt.axis('off')
    
    plt.subplot(133)
    plt.imshow(superimposed_img)
    plt.title(f'Prediction: {predicted_class} ({confidence:.2f})')
    plt.axis('off')
    
    plt.tight_layout()
    plt.show()

# 8. 训练模型
epochs = 5
print(f"开始训练模型,共{epochs}个epochs...")
train(model, train_loader, criterion, optimizer, device, epochs)

# 9. 保存模型
torch.save(model.state_dict(), 'aircraft_cnn_model.pth')
print("模型已保存为: aircraft_cnn_model.pth")

# 10. 从测试集中选择几张图片进行Grad-CAM可视化
print("正在生成Grad-CAM可视化结果...")
# 获取测试集中的前5个图像路径
test_image_paths = []
test_labels = []
for i in range(min(5, len(test_dataset))):
    img_path, label = test_dataset.samples[i]
    test_image_paths.append(img_path)
    test_labels.append(label)
    
# 对每个图像进行可视化
for img_path in test_image_paths:
    visualize_gradcam(img_path, model, device, class_names)
正在下载飞机分类数据集...
数据集下载路径: C:\Users\wjy\.cache\kagglehub\datasets\sleppyfish\aircraft-classification-dataset\versions\1
类别数量: 9, 类别名称: ['1', '2', '3', '4', '5', '6', '7', '8', '9']
使用设备: cuda
开始训练模型,共5个epochs...
Epoch 1/5, Loss: 1.8492, Accuracy: 33.03%
Epoch 2/5, Loss: 1.2424, Accuracy: 56.92%
Epoch 3/5, Loss: 0.7498, Accuracy: 74.34%
Epoch 4/5, Loss: 0.4426, Accuracy: 85.22%
Epoch 5/5, Loss: 0.2967, Accuracy: 90.40%
模型已保存为: aircraft_cnn_model.pth


网站公告

今日签到

点亮在社区的每一天
去签到