机器学习笔记【Week3】-EW帮帮网

一、逻辑回归（Logistic Regression）

与线性回归的区别：

问题类型	输出类型	举例
回归问题	连续实数	房价预测、气温预测
分类问题	离散类别（0 或 1）	是否患病、是否点击广告、是否合格

我们希望构建一个模型，根据输入 $x$ 输出一个概率值：
$h_\theta(x) = P(y=1 \mid x;\theta)$

应用场景

用于二分类任务，例如：

邮件是否垃圾
是否患病
信用是否违约

二、假设函数 Hypothesis

与线性回归的主要区别：输出范围需限制在 [0, 1]

使用 sigmoid 函数（也称 logistic 函数）：
$h_\theta(x) = g(\theta^T x) = \frac{1}{1 + e^{-\theta^T x}}$
其中：

$g (z)$ 是 sigmoid 函数
输出值 $h_\theta(x)$ 表示输入为正类（y = 1）的概率

Python 实现：

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

三、分类决策

逻辑回归模型最终输出一个概率，我们通常采用：

$h_\theta(x) \ge 0.5$ ⇒ 预测为 1
$h_\theta(x) < 0.5$ ⇒ 预测为 0

决策边界：

满足 $h_\theta(x) = 0.5$ 即：
$\theta^T x = 0$
这就是一条分界线（或超平面），用来把输入空间划分为两类。

四、代价函数（Cost Function）

线性回归的平方误差不适用于分类，会导致非凸函数。因此改用如下对数损失函数：

单个样本：
$\text{Cost}(h_\theta(x), y) = \begin{cases} - \log(h_\theta(x)) & \text{if } y = 1 \\ - \log(1 - h_\theta(x)) & \text{if } y = 0 \end{cases}$
统一表达为：
$J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left[y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)}))\right]$
它是一个凸函数，可用梯度下降优化。

对每个样本：

若 $y = 1$ ：损失为 $-\log(h_\theta(x))$
若 $y = 0$ ：损失为 $-\log(1 - h_\theta(x))$

Python 实现：

def compute_cost(theta, X, y):
    m = len(y)
    h = sigmoid(X @ theta)
    epsilon = 1e-5  # 防止 log(0)
    return (-1 / m) * (y.T @ np.log(h + epsilon) + (1 - y).T @ np.log(1 - h + epsilon))

五、梯度下降优化参数

逻辑回归成本函数依然是凸函数，适用梯度下降：
$\theta_j := \theta_j - \alpha \cdot \frac{1}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) \cdot x_j^{(i)}$

向量化形式：
$\theta := \theta - \frac{\alpha}{m} X^T (h_\theta(x) - y)$
其中 $h_\theta(x) = g(X\theta)$

Python 向量化实现：

def gradient(theta, X, y):
    m = len(y)
    h = sigmoid(X @ theta)
    return (1 / m) * (X.T @ (h - y))

六、训练模型示例（使用 sklearn 数据）

from sklearn.datasets import make_classification
from scipy.optimize import minimize

# 生成模拟数据
X, y = make_classification(n_samples=100, n_features=2, n_informative=2,
                           n_redundant=0, random_state=42)
X = np.c_[np.ones((X.shape[0], 1)), X]  # 添加 x0
y = y.reshape(-1, 1)
theta_init = np.zeros((X.shape[1], 1))

# 定义损失函数封装形式（用于 minimize）
def cost_func(t):
    return compute_cost(t.reshape(-1, 1), X, y)

def grad_func(t):
    return gradient(t.reshape(-1, 1), X, y).flatten()

# 优化
result = minimize(fun=cost_func, x0=theta_init.flatten(), jac=grad_func)
theta_optimized = result.x.reshape(-1, 1)

七、决策边界可视化

import matplotlib.pyplot as plt

def plot_decision_boundary(X, y, theta):
    plt.scatter(X[:, 1], X[:, 2], c=y.flatten(), cmap='bwr')
    x_vals = np.linspace(X[:, 1].min(), X[:, 1].max(), 100)
    y_vals = -(theta[0] + theta[1]*x_vals) / theta[2]
    plt.plot(x_vals, y_vals, 'g--')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('Decision Boundary')
    plt.grid(True)
    plt.show()

plot_decision_boundary(X, y, theta_optimized)

八、过拟合与欠拟合（Overfitting vs Underfitting）

欠拟合（Underfitting）

模型太简单，不能很好地拟合训练数据。
训练误差高，泛化能力差。

过拟合（Overfitting）

模型太复杂（如高阶多项式），虽然训练误差低，但在新数据上表现差。
泛化能力弱。

图示对比：

欠拟合：模型是一条直线
合理拟合：模型是一条平滑曲线
过拟合：模型是高频震荡曲线，精确穿过每个训练点

解决过拟合的两种主要方法

方法 1：减少特征数量（手动或 PCA）

删除噪声特征
降维技术（如 PCA）

方法 2：正则化（Regularization）

惩罚模型中参数过大的情况
防止模型过度复杂

九、多项式回归（Polynomial Regression）

使用更高阶的特征，如：
$h_\theta(x) = \theta_0 + \theta_1 x + \theta_2 x^2 + \theta_3 x^3 + \cdots$
为了防止高阶模型过拟合，需要 正则化。

十、正则化（Regularization）

在代价函数中加入一个惩罚项（L2 范数），避免参数变得过大：

1. 线性回归正则化代价函数：

$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2$

第一项：模型的预测误差

第二项：参数平方和，防止过大

$\lambda$ 是正则化系数（控制惩罚强度）

注意：不对 $\theta_0$ 正则化

2. 对应的梯度更新（带正则化）：

$j = 0$ （偏置项）：

$\theta_0 := \theta_0 - \alpha \cdot \frac{1}{m} \sum (h_\theta(x^{(i)}) - y^{(i)})$

$\ge 1$ ：

$\theta_j := \theta_j - \alpha \cdot \left[ \frac{1}{m} \sum (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} + \frac{\lambda}{m} \theta_j \right] \quad \text{(j ≥ 1)}$

十一、逻辑回归中的正则化

逻辑回归同样适用：
$J(\theta) = -\frac{1}{m} \sum \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1-y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2$

Python 实现（逻辑回归正则化）：

def cost_regularized(theta, X, y, lambda_):
    m = len(y)
    h = sigmoid(X @ theta)
    reg_term = (lambda_ / (2 * m)) * np.sum(np.square(theta[1:]))
    return (-1 / m) * (y.T @ np.log(h + 1e-5) + (1 - y).T @ np.log(1 - h + 1e-5)) + reg_term

def gradient_regularized(theta, X, y, lambda_):
    m = len(y)
    h = sigmoid(X @ theta)
    grad = (1 / m) * (X.T @ (h - y))
    reg = (lambda_ / m) * theta
    reg[0] = 0  # θ₀ 不正则化
    return grad + reg

十二、多项式特征与 sklearn 示例

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge

# 构造多项式特征
poly = PolynomialFeatures(degree=5)
X_poly = poly.fit_transform(X)

# 岭回归（L2 正则化）
model = Ridge(alpha=1.0)  # alpha 对应 λ
model.fit(X_poly, y)

十三、训练集 vs 验证集 vs 测试集

训练集（training set）：用于训练模型
验证集（cross validation set）：用于选择参数，如 λ、模型复杂度等
测试集（test set）：用于评估模型最终泛化性能

通常划分比例为 60% / 20% / 20%

十四、模型选择与评估流程

模型选择步骤：

使用训练集训练多个不同 λ 值的模型
在验证集上评估不同模型的性能，选择最优 λ
使用测试集评估最终模型的泛化误差

机器学习笔记【Week3】

一、逻辑回归（Logistic Regression）

与线性回归的区别：

应用场景

二、假设函数 Hypothesis

Python 实现：

三、分类决策

决策边界：

四、代价函数（Cost Function）

Python 实现：

五、梯度下降优化参数

Python 向量化实现：

六、训练模型示例（使用 sklearn 数据）

七、决策边界可视化

八、过拟合与欠拟合（Overfitting vs Underfitting）

欠拟合（Underfitting）

过拟合（Overfitting）

图示对比：

解决过拟合的两种主要方法

方法 1：减少特征数量（手动或 PCA）

方法 2：正则化（Regularization）

九、多项式回归（Polynomial Regression）

十、正则化（Regularization）

1. 线性回归正则化代价函数：

2. 对应的梯度更新（带正则化）：

十一、逻辑回归中的正则化

Python 实现（逻辑回归正则化）：

十二、多项式特征与 sklearn 示例

十三、训练集 vs 验证集 vs 测试集

十四、模型选择与评估流程

模型选择步骤：

网站公告

今日签到

热门文章

最新发布