PyTorch使用细节

发布于:2024-07-20 ⋅ 阅读:(345) ⋅ 点赞:(0)

model.eval() :让BatchNorm、Dropout等失效;

with torch.no_grad() : 不再缓存activation,节省显存;

这是矩阵乘法:

y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)

y3 = torch.rand_like(y1)
torch.matmul(tensor, tensor.T, out=y3)

这是点乘:

z1 = tensor * tensor
z2 = tensor.mul(tensor)

z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)

Tensor如果是1*1大小的,可以转为普通Python变量

agg = tensor.sum()
agg_item = agg.item()

Tensor和numpy之间,是share内存的,改一个另一个也被改动

n = torch.ones(5).numpy()

n = np.ones(5)
t = torch.from_numpy(n)

root本地文件夹里有,则从本地读;没有的话,如指定了ownload=True,则从远程下载;

import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
    target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)

Dataset类:通过index,拿到1条数据;

        数据可以都在磁盘上,用到哪条,就加载哪条;

        自定义一个类,需要继承Dataset类,并重写__init__、__len__、__getitem__

DataLoader类:batching, shuffle(sampling策略), multiprocess加载,pin memory,...

ToTensor(): 把PIL格式的Image,转成Tensor;

Lambda: 把int的y,转成10维度的1-hot向量;

一切模型层,皆继承自torch.nn.Module

class NeuralNetwork(nn.Module):

Module必须copy到device上

model = NeuralNetwork().to(device)

input data也必须copy到device上

X = torch.rand(1, 28, 28, device=device)

不能直接使用Module.forward,使用Module(input)语法可以使前后的hook起作用

logits = model(X)

model.parameters(): 可训练的参数;

model.named_parameters(): 可训练的参数;包含名称;

state_dict: 可训练的参数、不可训练的参数,都有;

继承自Function类,可以写自定义的forward和backward,input或output可以放在ctx里:

>>> class Exp(Function):
>>>     @staticmethod
>>>     def forward(ctx, i):
>>>         result = i.exp()
>>>         ctx.save_for_backward(result)
>>>         return result
>>>
>>>     @staticmethod
>>>     def backward(ctx, grad_output):
>>>         result, = ctx.saved_tensors
>>>         return grad_output * result
>>>
>>> # Use it by calling the apply method:
>>> output = Exp.apply(input)

 构造计算图:

Tensor的几大成员:grad, grad_fn, is_leaf, requires_grad

Tensor.grad_fn,就是用于backward梯度计算的Function:

print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

# Output:
Gradient function for z = <AddBackward0 object at 0x7f5e9fb64e20>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x7f5e99b11b40>

backward时,注意,是累积加和到Tensor.grad上;这样,链式法则有些地方就是要加和的,accumulate step也可以实现;

只有满足这个条件的才会累积其grad: is_leaf==True && requires_grad==True

只有requires_grad==True,但is_leaf==False,则会将梯度传播给上游,自己的grad成员无值;

只用来inference时,可用"with torch.no_grad()"控制其不生成计算图:(好处:forward速度变快一点儿,不保存activation至ctx节省显存)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

Output: False

某些模型训练,有些parameter要设成frozen不参与权重更新,则手工设其requires_grad=False即可。

用detach()来创造数据引用,脱离了原计算图,原计算图可以被垃圾回收了:

z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

Output: False

backward DAG,在每次forward阶段,都会被重新搭建;所以每个step,计算图可以任意变化(例如根据Tensor的值来走不同的control flow)

向量对向量求偏导,得到的是雅克比矩阵:

以下例子演示:雅克比矩阵、梯度累积、zero_grad

inp = torch.eye(4, 5, requires_grad=True)
out = (inp+1).pow(2).t()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"First call\n{inp.grad}")
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nSecond call\n{inp.grad}")
inp.grad.zero_()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nCall after zeroing gradients\n{inp.grad}")

Output:

First call
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])

Second call
tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.]])

Call after zeroing gradients
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])

optimizer使用例子

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

def train(...):
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()  # 将所有Tensor.grad清0

torch.save: 使用Python的pickle,将一个dict进行序列化,并存至文件;

torch.load: 读取文件,使用Python的pickle,将字节数组进行反序列化,至一个dict;

torch.nn.Module.state_dict: 一个Python的dict,key是字符串,value是Tensor;包含可学习的parameters,不可学习的buffers(例如batch normalization需要的running mean);

optimizer也有state_dic(learning rate,冲量等)

save下来仅仅用于推理:(注意:必须model.eval(),否则dropout、BN,会出毛病)

# save:
torch.save(model.state_dict(), PATH)

# load:
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.eval()

save下来可用于继续训练:

# save:
torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss,
            ...
            }, PATH)

# load:
model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.train()

使用state_dict方式,load之前,model必须初始化好(内存已经被parameters占住了,只是权重是随机的)

map_location、model.to(device)等:Saving and Loading Models — PyTorch Tutorials 2.3.0+cu121 documentation

小众用法:(model不用初始化)

# save:
torch.save(model, PATH)

# load:
# Model class must be defined somewhere
model = torch.load(PATH)
model.eval()