动手学深度学习——线性回归的从零开始实现

发布于:2025-07-15 ⋅ 阅读:(17) ⋅ 点赞:(0)

在动手学深度学习 3.2 线性回归的从零开始实现中有这样的代码:

import random
import torch

# 生成人造数据集
def synthetic_data(w, b, num_examples):  #@save
    """生成y=Xw+b+噪声"""
    X = torch.normal(0, 1, (num_examples, len(w)))  #均值为0,方差为1的随机数,num_examples个样本,列数为len(w)
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X,y.reshape(-1,1)

# 生成大小为batch_size的小批量
def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    random.shuffle(indices)     #将列表indices随机排序
    for i in range(0, num_examples, batch_size):    #batch_size为小批量的大小
        batch_indices = torch.tensor(indices[i:min(i + batch_size, num_examples)])
        yield features[batch_indices], labels[batch_indices]

batch_size = 10
true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = synthetic_data(true_w, true_b, 1000)

# 定义初始化模型参数
w = torch.normal(0, 0.01, size=(2, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

# 定义模型
def linreg(X, w, b):
    """线性回归模型。"""
    return torch.matmul(X, w) + b

# 定义损失函数
def squared_loss(y_hat, y):
    """均方损失。"""
    return (y_hat - y.reshape(y_hat.shape))**2 / 2

# 定义优化算法
def sgd(params, lr, batch_size):    # lr为学习率,
    """小批量随机梯度下降。"""
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()


# 训练过程
lr = 0.03
num_epochs = 1  # 训练次数
net = linreg    # 调用线性回归模型
loss = squared_loss     # 调用损失函数

for epoch in range(num_epochs):
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y)   # 调用损失函数,赋值给l
        l.sum().backward()
        # print('[w,b]:',[w,b])
        sgd([w, b], lr, batch_size)
    with torch.no_grad():
        train_l = loss(net(features, w, b), labels)
        print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')

# 比较真实参数和通过训练学到的参数来评估训练的成功程度
print(f'w的估计误差: {true_w - w.reshape(true_w.shape)}')
print(f'b的估计误差: {true_b - b}')

基于 PyTorch 的线性回归模型梯度计算及验证

下面我将用数学公式推导这段代码中的梯度计算过程:

1. 定义变量和模型

  • 设定 X = tensor([[ 0.3742, 1.0514],[-0.5108, -2.9390],[-0.6907, 2.3641],[-0.5569, 0.4298],[-0.4228, -1.0638],[-1.3704, -1.6127],[ 1.3422, 0.9927],[-1.6255, 0.5072],[ 1.3470, -0.5777],[ 1.6256, 0.8769]])

  • 设定 y = tensor([[ 1.3774],[13.1741],[-5.2062],[ 1.6101],[ 6.9709],[ 6.9469],[ 3.5065],[-0.7634],[ 8.8503],[ 4.4814]])

  • 输入矩阵 X∈R10×2 X \in \mathbb{R}^{10 \times 2} XR10×2

  • 权重向量 w∈R2×1 w \in \mathbb{R}^{2 \times 1} wR2×1

  • 偏置标量 b∈R b \in \mathbb{R} bR

  • 目标值 y∈R10×1 y \in \mathbb{R}^{10 \times 1} yR10×1

2. 线性回归模型

y^=Xw+b \hat{y} = Xw + b y^=Xw+b

其中,y^i=w1xi1+w2xi2+b\hat{y}_i = w_1 x_{i1} + w_2 x_{i2} + by^i=w1xi1+w2xi2+bi=1,…,10i=1,\dots,10i=1,,10

3. 均方损失函数

L(w,b)=12∑i=110(y^i−yi)2 L(w, b) = \frac{1}{2} \sum_{i=1}^{10} (\hat{y}_i - y_i)^2 L(w,b)=21i=110(y^iyi)2

4. 计算梯度

4.1 计算 ∂L∂w \frac{\partial L}{\partial w} wL

损失函数定义为:

L(w,b)=12∑i=110(y^i−yi)2 L(w, b) = \frac{1}{2} \sum_{i=1}^{10} (\hat{y}_i - y_i)^2 L(w,b)=21i=110(y^iyi)2

其中预测值:

y^i=w1xi1+w2xi2+b \hat{y}_i = w_1 x_{i1} + w_2 x_{i2} + b y^i=w1xi1+w2xi2+b

根据复合函数求导法则(链式法则):

∂L∂wj=∑i=110∂L∂y^i⋅∂y^i∂wj \frac{\partial L}{\partial w_j} = \sum_{i=1}^{10} \frac{\partial L}{\partial \hat{y}_i} \cdot \frac{\partial \hat{y}_i}{\partial w_j} wjL=i=110y^iLwjy^i

分步计算:
  1. 计算∂L∂y^i\frac{\partial L}{\partial \hat{y}_i}y^iL

∂L∂y^i=∂∂y^i[12(y^i−yi)2]=y^i−yi \frac{\partial L}{\partial \hat{y}_i} = \frac{\partial}{\partial \hat{y}_i} \left[ \frac{1}{2} (\hat{y}_i - y_i)^2 \right] = \hat{y}_i - y_i y^iL=y^i[21(y^iyi)2]=y^iyi

2.计算∂L∂wj\frac{\partial L}{\partial w_j}wjL

∂y^i∂wj=∂∂wj[w1xi1+w2xi2+b]=xij \frac{\partial \hat{y}_i}{\partial w_j} = \frac{\partial}{\partial w_j} \left[ w_1 x_{i1} + w_2 x_{i2} + b \right] = x_{ij} wjy^i=wj[w1xi1+w2xi2+b]=xij

  1. 合并结果

∂L∂wj=∑i=110(y^i−yi)⋅xij \frac{\partial L}{\partial w_j} = \sum_{i=1}^{10} (\hat{y}_i - y_i) \cdot x_{ij} wjL=i=110(y^iyi)xij

矩阵形式推导:

定义误差向量 e=y^−y e = \hat{y} - y e=y^y,其中 ei=y^i−yi e_i = \hat{y}_i - y_i ei=y^iyi,则:

∂L∂wj=∑i=110ei⋅xij \frac{\partial L}{\partial w_j} = \sum_{i=1}^{10} e_i \cdot x_{ij} wjL=i=110eixij

用矩阵乘法表示所有权重的梯度:

∂L∂w=[∑i=110ei⋅xi1∑i=110ei⋅xi2]=XTe=XT(y^−y) \frac{\partial L}{\partial w} = \begin{bmatrix} \sum_{i=1}^{10} e_i \cdot x_{i1} \\ \sum_{i=1}^{10} e_i \cdot x_{i2} \end{bmatrix} = X^T e = X^T (\hat{y} - y) wL=[i=110eixi1i=110eixi2]=XTe=XT(y^y)

4.2 计算 ∂L∂b \frac{\partial L}{\partial b} bL

同样使用链式法则:

∂L∂b=∑i=110∂L∂y^i⋅∂y^i∂b \frac{\partial L}{\partial b} = \sum_{i=1}^{10} \frac{\partial L}{\partial \hat{y}_i} \cdot \frac{\partial \hat{y}_i}{\partial b} bL=i=110y^iLby^i

分步计算:
  1. 计算∂L∂y^i\frac{\partial L}{\partial \hat{y}_i}y^iL

∂L∂y^i=y^i−yi \frac{\partial L}{\partial \hat{y}_i} = \hat{y}_i - y_i y^iL=y^iyi

  1. 计算∂y^i∂b\frac{\partial \hat{y}_i}{\partial b}by^i

∂y^i∂b=∂∂b[w1xi1+w2xi2+b]=1 \frac{\partial \hat{y}_i}{\partial b} = \frac{\partial}{\partial b} \left[ w_1 x_{i1} + w_2 x_{i2} + b \right] = 1 by^i=b[w1xi1+w2xi2+b]=1

  1. 合并结果

∂L∂b=∑i=110(y^i−yi)⋅1=∑i=110(y^i−yi) \frac{\partial L}{\partial b} = \sum_{i=1}^{10} (\hat{y}_i - y_i) \cdot 1 = \sum_{i=1}^{10} (\hat{y}_i - y_i) bL=i=110(y^iyi)1=i=110(y^iyi)

梯度的直观意义

  • 权重梯度∂L∂w=XT(y^−y)\frac{\partial L}{\partial w}= X^T (\hat{y} - y)wL=XT(y^y)
    每个特征的梯度是该特征在所有样本上的贡献之和,贡献大小由预测误差 y^i−yi \hat{y}_i - y_i y^iyi 缩放。

  • 偏置梯度∂L∂b=∑i=110(y^i−yi)\frac{\partial L}{\partial b}= \sum_{i=1}^{10} (\hat{y}_i - y_i)bL=i=110(y^iyi)
    偏置的梯度是所有样本预测误差的总和,表示整体的误差方向。

    偏置的梯度是所有样本预测误差的总和,表示整体的误差方向。

5. 代入代码中的具体计算

步骤 1:前向传播计算预测值

y^=Xw+b \hat{y} = Xw + b y^=Xw+b

步骤 2:计算损失

li=12(y^i−yi)2 l_i = \frac{1}{2} (\hat{y}_i - y_i)^2 li=21(y^iyi)2

步骤 3:反向传播计算梯度

  • 权重梯度

∂L∂w=XT(y^−y) \frac{\partial L}{\partial w} = X^T (\hat{y} - y) wL=XT(y^y)

  • 偏置梯度

∂L∂b=∑i=110(y^i−yi) \frac{\partial L}{\partial b} = \sum_{i=1}^{10} (\hat{y}_i - y_i) bL=i=110(y^iyi)

6. 验证代码与数学推导的一致性

代码中执行了 l.sum().backward(),这等价于计算总损失 L=∑i=110li L = \sum_{i=1}^{10} l_i L=i=110li 关于 w 和 b 的梯度,完全符合上述数学推导。

最终梯度结果

  • w 的梯度为 XT(y^−y) X^T (\hat{y} - y) XT(y^y)

  • b 的梯度为 ∑i=110(y^i−yi) \sum_{i=1}^{10} (\hat{y}_i - y_i) i=110(y^iyi)

这与 PyTorch 自动求导的结果一致

  • w的估计误差: w.grad = tensor([[-9.0000],[65.4477]])
  • b的估计误差: b.grad = tensor([-40.9567])

网站公告

今日签到

点亮在社区的每一天
去签到