大模型减肥之NNI方法框架

发布于:2024-12-08 ⋅ 阅读:(176) ⋅ 点赞:(0)

NNI(Neural Network Intelligence)是一个开源的神经网络自动化调优框架,它提供了模型压缩的工具包,帮助用户设计并使用剪枝和量化算法来压缩模型。以下是NNI进行模型压缩的主要方法:

  1. 剪枝(Pruning)

    • 剪枝算法通过删除冗余权重或层通道来压缩原始网络,从而降低模型复杂性并解决过拟合问题。
    • NNI支持多种剪枝算法,如SlimPruner、L1FilterPruner、L2FilterPruner、FPGMPruner、LevelPruner、AGP_Pruner等。
    • 剪枝过程包括预训练模型、应用剪枝算法、微调模型。
    • 剪枝算法可以迭代地对模型进行剪枝,在训练过程中逐步修改掩码。
  2. 量化(Quantization)

    • 量化算法通过减少表示权重或激活函数所需的精度位数来压缩原始网络,这可以减少计算和推理时间。
    • 量化主要包括低精度和重编码两类方法,低精度方法使用更低位数的浮点数或整型数进行训练、测试或存储;重编码方法对原有数据进行重新编码,采用更少的位数对原有数据进行表示。
    • NNI支持多种量化算法,用户可以通过配置文件指定量化的类型、位长、操作类型等。
  3. 模型加速

    • NNI的加速工具可以真正压缩模型并减少延迟。基于掩码的模型加速详细教程可以在NNI文档中找到。
    • 模型压缩的目的是减少推理延迟和模型大小,但现有的模型压缩算法主要通过模拟的方法来检查压缩模型性能(如精度)。
  4. 压缩工具

    • NNI提供了一些有用的工具,帮助用户理解并分析要压缩的模型,例如检查每层对剪枝的敏感度,计算模型的FLOPs和参数数量。
  5. 高级用法

    • NNI模型压缩提供了简洁的接口,用于自定义新的压缩算法。用户可以进一步了解NNI的压缩框架,并根据框架定制新的压缩算法(剪枝算法或量化算法)。
from pathlib import Path

import torch
import torch.nn.functional as F
from torch.optim import Adam
from torch.optim.lr_scheduler import _LRScheduler
from torch.utils.data import DataLoader

from torchvision import datasets, transforms
from torchvision.models.mobilenetv3 import mobilenet_v3_small
from torchvision.models.resnet import resnet18

import nni

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


def build_mobilenet_v3():
    model = mobilenet_v3_small(pretrained=True)
    model.classifier[-1] = torch.nn.Linear(1024, 10)
    return model.to(device)


def build_resnet18():
    model = resnet18(pretrained=True)
    model.fc = torch.nn.Linear(512, 10)
    return model.to(device)


def prepare_dataloader(batch_size: int = 128):
    normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
    train_loader = DataLoader(
        datasets.CIFAR10(Path(__file__).parent / 'data', train=True, transform=transforms.Compose([
            transforms.RandomHorizontalFlip(),
            transforms.RandomCrop(32, 4),
            transforms.ToTensor(),
            normalize,
        ]), download=True),
        batch_size=batch_size, shuffle=True, num_workers=8)

    test_loader = DataLoader(
        datasets.CIFAR10(Path(__file__).parent / 'data', train=False, transform=transforms.Compose([
            transforms.ToTensor(),
            normalize,
        ])),
        batch_size=batch_size, shuffle=False, num_workers=8)
    return train_loader, test_loader


def prepare_optimizer(model: torch.nn.Module):
    optimize_params = [param for param in model.parameters() if param.requires_grad == True]
    optimizer = nni.trace(Adam)(optimize_params, lr=0.001)
    return optimizer


def train(model: torch.nn.Module, optimizer: torch.optim.Optimizer, training_step,
          lr_scheduler: _LRScheduler, max_steps: int, max_epochs: int):
    assert max_epochs is not None or max_steps is not None
    train_loader, test_loader = prepare_dataloader()
    max_steps = max_steps if max_steps else max_epochs * len(train_loader)
    max_epochs = max_steps // len(train_loader) + (0 if max_steps % len(train_loader) == 0 else 1)
    count_steps = 0

    model.train()
    for epoch in range(max_epochs):
        for data, target in train_loader:
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            loss = training_step((data, target), model)
            loss.backward()
            optimizer.step()
            count_steps += 1
            if count_steps >= max_steps:
                acc = evaluate(model, test_loader)
                print(f'[Training Epoch {epoch} / Step {count_steps}] Final Acc: {acc}%')
                return
        acc = evaluate(model, test_loader)
        print(f'[Training Epoch {epoch} / Step {count_steps}] Final Acc: {acc}%')


def evaluate(model: torch.nn.Module, test_loader):
    model.eval()
    correct = 0.0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
    return 100 * correct / len(test_loader.dataset)


def training_step(batch, model: torch.nn.Module):
    output = model(batch[0])
    loss = F.cross_entropy(output, batch[1])
    return loss

以上是models.py的文件,展示了一个完整的训练和评估流程,包括数据预处理、模型构建、训练、评估和学习率调度。通过NNI的trace函数,代码还可以用于模型压缩和自动化调优。

from pathlib import Path
import sys
sys.path.append(str(Path(__file__).absolute().parents[1]))

import torch

from models import (
    build_resnet18,
    prepare_dataloader,
    prepare_optimizer,
    train,
    training_step,
    evaluate,
    device
)

from nni.compression.pruning import (
    L1NormPruner,
    L2NormPruner,
    FPGMPruner
)
from nni.compression.utils import auto_set_denpendency_group_ids
from nni.compression.speedup import ModelSpeedup

prune_type = 'l1'


if __name__ == '__main__':
    # finetuning resnet18 on Cifar10
    model = build_resnet18()
    optimizer = prepare_optimizer(model)
    train(model, optimizer, training_step, lr_scheduler=None, max_steps=None, max_epochs=10)
    _, test_loader = prepare_dataloader()
    print('Original model paramater number: ', sum([param.numel() for param in model.parameters()]))
    print('Original model after 10 epochs finetuning acc: ', evaluate(model, test_loader), '%')

    config_list = [{
        'op_types': ['Conv2d'],
        'sparse_ratio': 0.5
    }]
    dummy_input = torch.rand(8, 3, 224, 224).to(device)
    config_list = auto_set_denpendency_group_ids(model, config_list, dummy_input)
    optimizer = prepare_optimizer(model)

    if prune_type == 'l1':
        pruner = L1NormPruner(model, config_list)
    elif prune_type == 'l2':
        pruner = L2NormPruner(model, config_list)
    else:
        pruner = FPGMPruner(model, config_list)

    _, masks = pruner.compress()
    pruner.unwrap_model()

    model = ModelSpeedup(model, dummy_input, masks).speedup_model()
    print('Pruned model paramater number: ', sum([param.numel() for param in model.parameters()]))
    print('Pruned model without finetuning acc: ', evaluate(model, test_loader), '%')

    optimizer = prepare_optimizer(model)
    train(model, optimizer, training_step, lr_scheduler=None, max_steps=None, max_epochs=10)
    _, test_loader = prepare_dataloader()
    print('Pruned model after 10 epochs finetuning acc: ', evaluate(model, test_loader), '%')

以上是剪枝的例子:在CIFAR10数据集上微调ResNet18模型,并应用L1或L2剪枝来压缩模型。

参考链接:https://github.com/microsoft/nni,结果还没跑完。。。