高级SQL技术综合指南(MySQL)

发布于:2025-09-12 ⋅ 阅读:(19) ⋅ 点赞:(0)

数据库初始化

在开始学习本指南之前,请先执行数据库初始化脚本来创建测试环境:

📁 初始化脚本位置

🚀 执行说明

  1. 确保您的MySQL数据库服务正在运行
  2. 创建一个新的数据库(推荐名称:sql_advanced_guide
  3. 在该数据库中执行初始化脚本
  4. 脚本将自动创建所有必要的表结构和测试数据

1. 引言

1.1 指南概述

本指南是一篇专门面向MySQL 8.0的高级SQL技术文档,深入探讨MySQL 8.0的高级特性、性能优化技术和最佳实践。通过大量实际业务场景的代码示例和性能分析,帮助读者掌握MySQL 8.0的高级SQL开发技能。

1.2 目标读者

主要读者群体:

  • 具有2年以上SQL开发经验的程序员
  • 数据库管理员(DBA)
  • 系统架构师和技术负责人
  • 需要进行数据库选型和迁移的技术团队

前置知识要求:

  • 熟练掌握基础SQL语法(SELECT、INSERT、UPDATE、DELETE)
  • 了解MySQL基本概念(表、索引、存储引擎)
  • 具备基本的数据库设计经验
  • 理解事务和并发控制的基本概念

学习收益预期:

  • 深度掌握MySQL 8.0的高级特性和企业级最佳实践
  • 具备复杂查询优化和性能调优的专业能力
  • 熟练运用MySQL 8.0新特性解决实际业务问题
  • 掌握大数据量场景下的MySQL优化策略和架构设计

学习路径建议:

  1. 基础巩固阶段:复习第2章高级索引技术,确保理解索引原理
  2. 进阶学习阶段:深入第3章复杂查询优化,掌握执行计划分析
  3. 实践应用阶段:学习第4章数据操作技术,提升开发效率
  4. 性能优化阶段:研究第5章MySQL特定优化,解决性能瓶颈
  5. 最佳实践阶段:掌握第6章最佳实践,避免常见陷阱

1.3 数据库系统版本说明

本文涵盖以下数据库系统的最新版本:

  • MySQL 8.0 - 开源关系型数据库的领导者

1.4 环境准备和测试数据

为了便于理解和实践,我们提供了完整的测试数据库初始化脚本。

📁 数据库初始化脚本

我们为每个主流数据库系统提供了完整的初始化脚本,包含:

  • 完整的表结构定义
  • 优化的索引创建
  • 中国本土化的测试数据
  • 业务逻辑完整的示例数据

📊 测试数据概览

数据规模:

  • t_departments: 15个部门
  • t_employees: 74个员工(包含离职员工)
  • t_products: 30个产品
  • t_sales: 2000条销售记录
  • t_sales_targets: 48个销售目标
  • t_training_records: 24条培训记录
  • t_employee_history: 16条员工历史记录

测试数据特点:

  • 包含多样化的中文姓名和部门名称
  • 涵盖完整的业务场景数据
  • 支持复杂查询和分析操作
  • 数据关联关系完整,便于学习JOIN操作

🚀 使用方法

  1. 选择对应数据库系统的SQL文件
  2. 在数据库管理工具中执行脚本
  3. 脚本会自动创建所有表结构、索引和测试数据
  4. 可以立即开始学习和实践SQL技术

注意: 所有脚本都包含完整的外键关系、约束检查和业务逻辑,确保数据的一致性和完整性。

表结构设计说明:

  1. 主键约束:每个表都明确定义了主键,确保数据唯一性
  2. 外键约束:建立表间关系,保证数据完整性
  3. 非空约束:关键字段设置NOT NULL,避免空值问题
  4. 检查约束:对数据范围进行限制,如薪资必须大于0
  5. 唯一约束:邮箱、部门名称等设置唯一性约束
  6. 默认值:状态字段设置默认值,时间戳自动填充
  7. 索引优化:为常用查询字段创建索引

接下来,我们将深入探讨各个高级SQL技术主题。


2. 高级索引技术

索引是数据库性能优化的核心技术之一。不同的数据库系统在索引实现上各有特色,理解这些差异对于编写高性能的SQL至关重要。

2.1 复合索引(Composite Index)

复合索引是包含多个列的索引,能够显著提升多条件查询的性能。各数据库系统在复合索引的实现和优化策略上存在差异。

复合索引详细说明

创建目的:

  • 优化涉及多个列的WHERE条件查询
  • 减少查询时需要扫描的数据量
  • 支持ORDER BY和GROUP BY操作的快速执行
  • 避免回表操作,提升查询效率

适用场景:

  • 经常同时查询多个列的场景(如:部门+薪资范围查询)
  • 需要按多个字段排序的查询
  • 复杂的连接查询中的连接条件
  • 频繁的分组和聚合操作

性能影响:

  • 查询性能提升:多条件查询可提升10-100倍性能
  • 存储开销:每个复合索引需要额外的存储空间
  • 维护成本:INSERT/UPDATE/DELETE操作需要同时维护索引
  • 内存使用:索引数据需要占用缓冲池内存

使用注意事项:

  • 遵循"最左前缀"原则,索引列顺序至关重要
  • 选择性高的列应放在前面
  • 避免创建过多的复合索引,影响写入性能
  • 定期监控索引使用情况,删除无用索引
2.1.1 MySQL 8.0 复合索引实现

MySQL的复合索引遵循"最左前缀"原则,索引列的顺序对查询性能有重要影响。

-- 业务场景:HR系统中按部门和薪资范围查询员工信息,这是最常见的查询模式
-- 复合索引按查询频率和选择性排序:department_id_(高频+高选择性) -> salary_(范围查询) -> hire_date_(排序)
CREATE INDEX idx_emp_dept_salary ON t_employees (department_id_, salary_, hire_date_);

-- 正例:高效查询(完全使用复合索引)
-- 业务场景:查找特定部门中高薪员工,用于薪资分析和人才盘点
SELECT employee_id_, name_, salary_, hire_date_
FROM t_employees
WHERE department_id_ = 10 AND salary_ > 10000;

-- 反例(不推荐):低效查询(无法使用索引前缀)
-- 问题:跳过了索引的第一列department_id_,导致索引失效,需要全表扫描
SELECT * FROM t_employees
WHERE salary_ > 10000 AND hire_date_ > '2024-06-01';

-- 业务场景:验证查询是否正确使用了复合索引,用于性能调优
EXPLAIN SELECT employee_id_, name_, salary_
FROM t_employees
WHERE department_id_ = 10 AND salary_ BETWEEN 5000 AND 12000;

-- 业务场景:MySQL 8.0新特性 - 安全的索引测试,避免影响生产查询性能
-- 先创建不可见索引,测试性能后再启用
CREATE INDEX idx_emp_status ON t_employees (status_) INVISIBLE;

-- 反例(不推荐):直接创建可见索引可能影响现有查询的执行计划
-- CREATE INDEX idx_emp_status ON t_employees (status_); -- 可能导致优化器选择错误的索引

-- 测试查询性能后再设为可见
ALTER TABLE t_employees ALTER INDEX idx_emp_status VISIBLE;

MySQL复合索引优化技巧:

-- 业务场景:数据库性能调优 - 分析表的数据分布,决定最优的复合索引列顺序
-- 选择性越高的列应该放在索引的前面,以提高索引的过滤效率
SELECT
    COUNT(DISTINCT department_id_) / COUNT(*) as dept_selectivity,
    COUNT(DISTINCT status_) / COUNT(*) as status_selectivity,
    COUNT(DISTINCT salary_) / COUNT(*) as salary_selectivity
FROM t_employees;

-- 反例(不推荐):不分析数据分布就随意创建复合索引
-- CREATE INDEX idx_bad_order ON t_employees (status_, department_id_);
-- 问题:如果status_选择性很低(如只有ACTIVE/INACTIVE两个值),放在前面会降低索引效率

-- 业务场景:员工信息查询系统 - 创建覆盖索引避免回表查询,提升查询性能50-80%
CREATE INDEX idx_covering ON t_employees (department_id_, salary_, name_);

-- 正例:使用覆盖索引的高效查询(所有需要的列都在索引中)
SELECT name_, salary_
FROM t_employees
WHERE department_id_ = 10 AND salary_ > 5000;

-- 反例(不推荐):查询额外列导致回表,性能下降
-- SELECT name_, salary_, email_, hire_date_
-- FROM t_employees
-- WHERE department_id_ = 10 AND salary_ > 5000;
-- 问题:email_和hire_date_不在覆盖索引中,需要回表查询,失去覆盖索引优势

2.2 部分索引(Partial Index)

部分索引只对满足特定条件的行建立索引,可以显著减少索引大小并提升性能。

部分索引详细说明

创建目的:

  • 减少索引存储空间,只索引有意义的数据行
  • 提升索引维护效率,减少不必要的索引更新
  • 优化特定条件下的查询性能
  • 避免对NULL值或无效数据建立索引

适用场景:

  • 大表中只有部分数据经常被查询(如:只查询活跃用户)
  • 状态字段有明确的业务含义(如:只查询有效订单)
  • 时间范围查询(如:只索引最近一年的数据)
  • 排除异常或测试数据的查询

性能影响:

  • 索引大小减少:可减少50%-90%的索引存储空间
  • 查询速度提升:针对特定条件的查询性能显著提升
  • 维护效率:减少索引维护的开销
  • 内存利用:更高效的缓冲池利用率

使用注意事项:

  • 确保查询条件与索引条件匹配
  • 避免过于复杂的条件表达式
  • 定期评估条件的有效性
  • 注意不同数据库系统的语法差异
2.2.1 条件索引的应用场景

条件索引(也称为部分索引或过滤索引)是一种只对满足特定条件的行创建索引的技术。虽然MySQL不直接支持条件索引,但可以通过其他方式实现类似效果。

MySQL实现方案:

-- 业务场景:大型企业员工管理系统,90%的查询只关注活跃员工(status_='ACTIVE')
-- 传统索引会包含大量离职员工数据,浪费存储空间并降低查询效率

-- 方案1:使用虚拟列实现条件索引效果(推荐)
-- 业务价值:索引大小减少60-80%,查询性能提升30-50%
ALTER TABLE t_employees
ADD COLUMN active_flag TINYINT AS (CASE WHEN status_ = 'ACTIVE' THEN 1 ELSE NULL END) STORED;

CREATE INDEX idx_active_employees ON t_employees (active_flag, department_id_, salary_);

-- 正例:高效查询活跃员工(使用条件索引)
SELECT employee_id_, name_, salary_
FROM t_employees
WHERE active_flag = 1 AND department_id_ = 1;

-- 反例(不推荐):传统方式查询,索引包含所有状态的员工
-- CREATE INDEX idx_all_status ON t_employees (status_, department_id_, salary_);
-- SELECT * FROM t_employees WHERE status_ = 'ACTIVE' AND department_id_ = 1;
-- 问题:索引包含INACTIVE、TERMINATED等状态数据,浪费空间且效率低

-- 方案2:使用函数索引模拟条件索引(适用于复杂条件)
-- 业务场景:只为活跃员工的部门信息建立索引,适用于复杂过滤条件
CREATE INDEX idx_active_dept ON t_employees ((CASE WHEN status_ = 'ACTIVE' THEN department_id_ ELSE NULL END));

-- 业务场景:部门员工统计报表 - 使用函数索引查询活跃员工
-- 查询必须与索引表达式完全匹配才能使用索引
SELECT
    department_id_,
    COUNT(*) as active_employee_count,
    AVG(salary_) as avg_salary
FROM t_employees
WHERE (CASE WHEN status_ = 'ACTIVE' THEN department_id_ ELSE NULL END) = 1
GROUP BY department_id_;

-- 业务场景:特定部门活跃员工详细信息查询
SELECT employee_id_, name_, salary_, hire_date_
FROM t_employees
WHERE (CASE WHEN status_ = 'ACTIVE' THEN department_id_ ELSE NULL END) = 2
  AND salary_ > 10000;

-- 验证函数索引使用效果
EXPLAIN SELECT employee_id_, name_, department_id_
FROM t_employees
WHERE (CASE WHEN status_ = 'ACTIVE' THEN department_id_ ELSE NULL END) = 1;
-- 期望结果:key列显示idx_active_dept,type为ref,表示使用了函数索引

-- 反例(不推荐):查询条件与索引表达式不匹配,无法使用索引
-- SELECT * FROM t_employees WHERE status_ = 'ACTIVE' AND department_id_ = 1;
-- 问题:查询条件不匹配函数索引表达式,导致全表扫描

-- 方案3:分表策略(适用于数据量极大的场景)
-- 业务场景:历史数据和活跃数据物理分离,提升查询和维护效率
CREATE TABLE t_active_employees LIKE t_employees;
CREATE INDEX idx_active_dept_salary ON t_active_employees (department_id_, salary_);

-- 业务场景:数据同步 - 将活跃员工数据同步到专用表
-- 注意:如果表中存在 active_flag  虚拟列,需要修改成非虚拟列,再执行下面插入语句,然后再修改成虚拟里即可
INSERT INTO t_active_employees
SELECT * FROM t_employees WHERE status_ = 'ACTIVE';

-- 业务场景:高频活跃员工查询 - 直接查询分表,性能最优
SELECT employee_id_, name_, salary_
FROM t_active_employees
WHERE department_id_ = 1
  AND salary_ BETWEEN 8500 AND 20000
ORDER BY salary_ DESC;

-- 业务场景:部门薪资分析 - 利用分表索引进行高效聚合
SELECT
    department_id_,
    COUNT(*) as employee_count,
    AVG(salary_) as avg_salary,
    MAX(salary_) as max_salary,
    MIN(salary_) as min_salary
FROM t_active_employees
GROUP BY department_id_
ORDER BY avg_salary DESC;

-- 验证分表索引使用效果
EXPLAIN SELECT employee_id_, name_, salary_
FROM t_active_employees
WHERE department_id_ = 1 AND salary_ > 10000;
-- 期望结果:key列显示idx_active_dept_salary,type为range,高效使用复合索引

-- 性能对比:分表查询 vs 原表条件查询
-- 原表查询(包含所有状态员工)
EXPLAIN SELECT employee_id_, name_, salary_
FROM t_employees
WHERE status_ = 'ACTIVE' AND department_id_ = 1 AND salary_ > 10000;

-- 分表查询(仅包含活跃员工)- 性能更优
EXPLAIN SELECT employee_id_, name_, salary_
FROM t_active_employees
WHERE department_id_ = 1 AND salary_ > 10000;
-- 优势:数据量更小,索引更紧凑,查询速度更快

-- 业务场景:分表策略的维护操作
-- 定期同步新的活跃员工数据
INSERT INTO t_active_employees
SELECT * FROM t_employees
WHERE status_ = 'ACTIVE'
  AND employee_id_ NOT IN (SELECT employee_id_ FROM t_active_employees);

-- 清理已离职员工数据
DELETE FROM t_active_employees
WHERE employee_id_ IN (
    SELECT employee_id_ FROM t_employees WHERE status_ != 'ACTIVE'
);

-- 反例(不推荐):不考虑数据分布直接创建普通索引
-- CREATE INDEX idx_status_dept ON t_employees (status_, department_id_);
-- 问题:如果非活跃员工占比很小,这个索引的大部分空间被浪费

-- 方案选择建议:
-- 方案1(虚拟列):适用于查询模式固定,条件简单的场景
-- 方案2(函数索引):适用于复杂条件,但查询必须精确匹配索引表达式
-- 方案3(分表策略):适用于数据量巨大,活跃数据占比很小的场景

应用场景:

  • 大表中只有少部分数据需要频繁查询
  • 状态字段区分度很高的场景
  • 时间范围查询优化
2.2.2 各数据库系统的实现差异

MySQL不直接支持条件索引,但可以通过多种方式实现类似效果。以下是MySQL特有的实现方法和性能测试。

MySQL实现特点:

  • 使用虚拟列模拟条件索引
  • 利用函数索引(MySQL 8.0+)
  • 通过分表策略实现数据分离

MySQL测试数据生成:

-- MySQL版本的大量测试数据生成
INSERT INTO t_employees (name_, email_, department_id_, salary_, hire_date_, manager_id_, status_)
SELECT
    CONCAT('Employee', n.num) as name_,
    CONCAT('email_', n.num, '@company.com') as email_,
    (n.num % 10) + 1 as department_id_,
    30000 + (n.num % 70000) as salary_,
    DATE_ADD('2020-01-01', INTERVAL (n.num % 1000) DAY) as hire_date_,
    CASE WHEN n.num % 10 = 0 THEN NULL
         ELSE (n.num % 100) + 1 END as manager_id_,
    CASE WHEN n.num % 5 = 0 THEN 'INACTIVE'
         ELSE 'ACTIVE' END as status_
FROM (
    SELECT a.N + b.N * 10 + c.N * 100 + d.N * 1000 + e.N * 10000 + 1 as num
    FROM
        (SELECT 0 as N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
         UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) a,
        (SELECT 0 as N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
         UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) b,
        (SELECT 0 as N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
         UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) c,
        (SELECT 0 as N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
         UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) d,
        (SELECT 0 as N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4
         UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) e
) n
WHERE n.num <= 100000;

-- MySQL条件索引替代方案性能测试
-- 方案1:虚拟列索引
ALTER TABLE t_employees
ADD COLUMN active_flag TINYINT AS (CASE WHEN status_ = 'ACTIVE' THEN 1 ELSE NULL END) STORED;

CREATE INDEX idx_active_virtual ON t_employees (active_flag, department_id_, salary_);

-- 方案2:函数索引(MySQL 8.0+)
CREATE INDEX idx_active_func ON t_employees ((CASE WHEN status_ = 'ACTIVE' THEN department_id_ ELSE NULL END));

-- 性能对比测试
EXPLAIN SELECT * FROM t_employees WHERE active_flag = 1 AND department_id_ = 1;
EXPLAIN SELECT * FROM t_employees WHERE status_ = 'ACTIVE' AND department_id_ = 1;

2.3 函数索引(Function-based Index)

函数索引允许在表达式或函数结果上建立索引,适用于复杂的查询条件。

函数索引详细说明

创建目的:

  • 优化基于函数或表达式的查询条件
  • 支持大小写不敏感的字符串查询
  • 加速计算字段的查询性能
  • 实现复杂业务逻辑的快速检索

适用场景:

  • 大小写不敏感的姓名或邮箱查询
  • 日期函数查询(如:按年、月分组)
  • 字符串函数查询(如:SUBSTRING、CONCAT)
  • 数学计算查询(如:价格计算、百分比)
  • JSON字段的特定属性查询

性能影响:

  • 查询性能提升:函数查询可提升5-50倍性能
  • 计算开销:索引创建时需要计算函数值
  • 存储成本:需要存储计算结果
  • 维护复杂度:数据变更时需要重新计算索引

使用注意事项:

  • 确保函数是确定性的(相同输入产生相同输出)
  • 避免使用过于复杂的函数表达式
  • 考虑函数的计算成本
  • 注意不同数据库系统对函数索引的支持差异
  • 定期监控函数索引的使用效果
2.3.1 表达式索引的创建和使用

表达式索引(函数索引)允许在表达式或函数结果上创建索引,适用于经常在WHERE子句中使用函数的查询场景。

核心概念:

  • 对计算结果建立索引而非原始列值
  • 提高函数查询的性能
  • 减少重复计算开销

适用场景:

  • 大小写不敏感查询
  • 日期函数查询
  • 数学计算查询
  • 字符串处理查询
-- 业务场景:国际化员工管理系统,需要支持大小写不敏感的姓名搜索
-- 传统方式每次查询都要对所有记录执行UPPER函数,性能极差

-- 反例(不推荐):没有函数索引的低效查询
-- 问题:每次查询都需要对表中每一行执行UPPER函数,时间复杂度O(n)
SELECT employee_id_, name_, department_id_ FROM t_employees
WHERE UPPER(name_) = 'JOHN SMITH';

-- 反例(不推荐):大小写敏感查询无法匹配用户输入
-- SELECT * FROM t_employees WHERE name_ = 'john smith'; -- 无法匹配'John Smith'

-- 业务场景:邮箱系统集成,需要不区分大小写查找员工邮箱
SELECT employee_id_, name_, email_ FROM t_employees
WHERE LOWER(email_) = 'john.smith@company.com';

-- 正例:MySQL 8.0 函数索引 - 为函数结果创建索引,查询性能提升10-100倍
CREATE INDEX idx_emp_name_upper ON t_employees ((UPPER(name_)));
CREATE INDEX idx_emp_email_lower ON t_employees ((LOWER(email_)));

-- 正例:使用函数索引的高效查询
SELECT employee_id_, name_, department_id_
FROM t_employees
WHERE UPPER(name_) = 'JOHN SMITH'; -- 现在可以直接使用索引

-- 反例(不推荐):函数索引创建后仍使用不匹配的函数
-- SELECT * FROM t_employees WHERE LOWER(name_) = 'john smith';
-- 问题:索引是基于UPPER函数的,使用LOWER函数无法利用索引
2.3.2 性能优化案例分析

通过实际案例分析表达式索引在不同场景下的性能优化效果,包括查询响应时间对比和执行计划分析。

优化原则:

  • 识别高频使用的函数查询
  • 评估索引创建成本
  • 监控索引使用效果
  • 定期维护和优化

案例:日期范围查询优化

-- 业务场景:HR月度报表系统,需要按月统计员工入职情况,查询频率很高
-- 传统日期函数查询无法使用hire_date_上的索引,导致全表扫描

-- 反例(不推荐):使用函数的低效查询,无法使用索引
SELECT COUNT(*) FROM t_employees
WHERE YEAR(hire_date_) = 2022
  AND MONTH(hire_date_) = 6;
-- 问题:YEAR()和MONTH()函数导致索引失效,需要全表扫描

-- 反例(不推荐):DATE_FORMAT函数同样无法使用索引
SELECT COUNT(*) FROM t_employees
WHERE DATE_FORMAT(hire_date_, '%Y-%m') = '2022-06';
-- 问题:DATE_FORMAT函数使hire_date_索引失效

-- 正例:MySQL解决方案 - 使用虚拟列优化日期查询
-- 业务价值:查询性能提升50-200倍,特别适合大表的日期范围查询
ALTER TABLE t_employees ADD hire_year_month INT AS (YEAR(hire_date_) * 100 + MONTH(hire_date_)) VIRTUAL;
CREATE INDEX idx_hire_ym_mysql ON t_employees (hire_year_month);

-- 正例:使用虚拟列的高效查询
SELECT COUNT(*) FROM t_employees
WHERE hire_year_month = 202206; -- 直接使用索引,性能极佳

-- 反例(不推荐):创建虚拟列后仍使用原始函数查询
-- SELECT COUNT(*) FROM t_employees WHERE YEAR(hire_date_) = 2022 AND MONTH(hire_date_) = 6;
-- 问题:没有利用已创建的虚拟列索引,浪费了优化成果

-- 业务场景:季度报表查询优化
ALTER TABLE t_employees ADD hire_quarter INT AS (YEAR(hire_date_) * 10 + QUARTER(hire_date_)) VIRTUAL;
CREATE INDEX idx_hire_quarter ON t_employees (hire_quarter);

2.4 覆盖索引(Covering Index)

覆盖索引是指索引包含了查询所需的所有列,查询可以完全通过索引完成,无需回表查询。

覆盖索引详细说明

创建目的:

  • 避免回表查询,减少I/O操作
  • 提升查询性能,特别是大表查询
  • 减少数据页的访问次数
  • 优化SELECT列表较少的查询

适用场景:

  • 频繁查询的列组合相对固定
  • 查询只需要少数几个列的数据
  • 大表的分页查询
  • 报表和统计查询
  • 连接查询中的关键表

性能影响:

  • 查询性能提升:可提升2-10倍查询性能
  • I/O减少:避免访问数据页,只访问索引页
  • 缓存效率:索引页在内存中的命中率更高
  • 存储开销:需要额外的存储空间存储包含列

使用注意事项:

  • 平衡索引大小和查询性能
  • 避免包含过多的列,影响索引维护
  • 优先选择查询频率高的列组合
  • 定期评估覆盖索引的使用效果

MySQL覆盖索引实现:

-- 业务场景:员工信息展示页面,高频查询特定部门的活跃员工姓名和薪资
-- 覆盖索引可以避免回表查询,I/O减少50-80%,查询性能提升2-5倍

-- 正例:创建覆盖索引,包含WHERE条件列和SELECT列
CREATE INDEX idx_emp_covering ON t_employees (department_id_, status_, name_, salary_);

-- 正例:完全使用覆盖索引的高效查询(所有需要的数据都在索引中)
SELECT name_, salary_
FROM t_employees
WHERE department_id_ = 1 AND status_ = 'ACTIVE';

-- 反例(不推荐):查询额外列导致回表,失去覆盖索引优势
-- SELECT name_, salary_, email_, hire_date_
-- FROM t_employees
-- WHERE department_id_ = 1 AND status_ = 'ACTIVE';
-- 问题:email_和hire_date_不在索引中,需要回表查询,性能下降

-- 业务场景:性能调优验证 - 确认查询是否使用了覆盖索引
EXPLAIN SELECT name_, salary_
FROM t_employees
WHERE department_id_ = 1 AND status_ = 'ACTIVE';
-- 期望结果:Extra列显示"Using index"表示使用了覆盖索引

-- 反例(不推荐):索引列顺序不当,影响覆盖索引效果
-- CREATE INDEX idx_bad_covering ON t_employees (name_, salary_, department_id_, status_);
-- 问题:WHERE条件列不在索引前缀,无法有效过滤数据

覆盖索引优化策略:

-- 优化前:需要回表查询
CREATE INDEX idx_dept_status ON t_employees (department_id_, status_);

-- 优化后:覆盖索引
CREATE INDEX idx_dept_status_covering ON t_employees (department_id_, status_, name_, salary_, hire_date_);

-- 性能对比
SELECT name_, salary_, hire_date_
FROM t_employees
WHERE department_id_ = 1 AND status_ = 'ACTIVE'
ORDER BY salary_ DESC;

注意事项:

  • 覆盖索引会增加存储空间
  • 需要平衡查询性能和维护成本
  • 适合读多写少的场景

2.5 索引优化策略

2.5.1 索引选择性分析

索引选择性是指索引列中不同值的数量与表中记录总数的比值,是评估索引效果的重要指标。

选择性计算:

  • 高选择性:接近1,索引效果好
  • 低选择性:接近0,索引效果差
  • 复合索引选择性分析方法

优化策略:

  • 优先为高选择性列创建索引
  • 复合索引中高选择性列在前
  • 定期分析选择性变化

索引选择性是指索引列中不同值的数量与表中总行数的比值。高选择性的索引通常更有效。

-- 场景:分析MySQL索引选择性,确定最优的索引列顺序
-- 业务价值:通过数据分析指导索引优化决策,避免低效索引
-- 计算方法:选择性 = 不同值数量 / 总行数,越接近1选择性越高
SELECT
    COUNT(DISTINCT department_id_) * 1.0 / COUNT(*) as dept_selectivity,
    COUNT(DISTINCT status_) * 1.0 / COUNT(*) as status_selectivity,
    COUNT(DISTINCT salary_) * 1.0 / COUNT(*) as salary_selectivity
FROM t_employees;

-- 基于选择性的索引策略
-- 高选择性列(如employee_id_, email_)适合单列索引
-- 低选择性列(如status_, department_id_)适合复合索引
CREATE INDEX idx_optimal_composite ON t_employees (status_, department_id_, salary_);
2.5.2 索引维护和重建

随着数据的增删改操作,索引可能出现碎片化,需要定期维护以保持最佳性能。

维护策略:

  • 监控索引碎片率
  • 定期重建高碎片索引
  • 更新索引统计信息
  • 删除未使用的索引

维护时机:

  • 数据变更频繁时
  • 查询性能下降时
  • 定期维护窗口期
-- MySQL 索引维护
-- 检查索引碎片
SELECT
    table_name,
    index_name,
    stat_value as pages,
    stat_description
FROM mysql.innodb_index_stats
WHERE table_name = 't_employees' AND stat_name = 'n_leaf_pages';

-- 重建索引
ALTER TABLE t_employees DROP INDEX idx_emp_dept_salary;
CREATE INDEX idx_emp_dept_salary ON t_employees (department_id_, salary_, hire_date_);

-- 场景:检查MySQL索引统计信息,分析索引使用效果
-- 业务价值:监控索引基数和大小,识别需要重建或删除的索引
-- 输出:表名、索引名、基数(不同值数量)、索引大小
SELECT
    TABLE_NAME,
    INDEX_NAME,
    CARDINALITY,
    SUB_PART,
    NULLABLE,
    INDEX_TYPE,
    -- EXPRESSION字段只在MySQL 8.0+的函数索引中存在
    CASE
        WHEN COLUMN_NAME IS NULL THEN 'Functional Index'
        ELSE COLUMN_NAME
    END as index_column
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 't_employees'
ORDER BY INDEX_NAME;

-- 重建索引(MySQL语法)
ALTER TABLE t_employees DROP INDEX idx_emp_dept_salary;
ALTER TABLE t_employees ADD INDEX idx_emp_dept_salary (department_id_, salary_);

-- 收集统计信息(MySQL语法)
ANALYZE TABLE t_employees;

-- 检查索引使用情况(MySQL语法)
SELECT
    TABLE_NAME,
    INDEX_NAME,
    CARDINALITY,
    SUB_PART,
    NULLABLE,
    INDEX_TYPE,
    -- EXPRESSION字段只在MySQL 8.0+的函数索引中存在
    CASE
        WHEN COLUMN_NAME IS NULL THEN 'Functional Index'
        ELSE COLUMN_NAME
    END as index_column
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 't_employees'
ORDER BY INDEX_NAME, SEQ_IN_INDEX;

-- 重建索引(MySQL语法)
ALTER TABLE t_employees DROP INDEX idx_emp_dept_salary;
ALTER TABLE t_employees ADD INDEX idx_emp_dept_salary (department_id_, salary_);
-- 或者优化表
OPTIMIZE TABLE t_employees;

-- 检查索引大小(MySQL语法)
-- 方法1:使用 INFORMATION_SCHEMA.TABLES 查看表和索引大小(推荐)
SELECT
    TABLE_SCHEMA,
    TABLE_NAME,
    ROUND(DATA_LENGTH/1024/1024, 2) as data_size_mb,
    ROUND(INDEX_LENGTH/1024/1024, 2) as index_size_mb,
    ROUND((DATA_LENGTH + INDEX_LENGTH)/1024/1024, 2) as total_size_mb
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 't_employees';

-- 方法2:使用 INFORMATION_SCHEMA.STATISTICS 查看具体索引信息
SELECT
    TABLE_SCHEMA,
    TABLE_NAME,
    INDEX_NAME,
    CARDINALITY,
    SUB_PART,
    NULLABLE,
    INDEX_TYPE
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 't_employees'
ORDER BY INDEX_NAME, SEQ_IN_INDEX;

-- 方法3:使用 SHOW INDEX 查看索引详细信息
SHOW INDEX FROM t_employees;

-- 方法4:MySQL 8.0+ 使用 INFORMATION_SCHEMA.INNODB_TABLESTATS(如果可用)
SELECT
    NAME as table_name,
    NUM_ROWS,
    CLUST_INDEX_SIZE,
    OTHER_INDEX_SIZE,
    ROUND((CLUST_INDEX_SIZE + OTHER_INDEX_SIZE) * 16 / 1024, 2) as total_index_size_mb
FROM INFORMATION_SCHEMA.INNODB_TABLESTATS
WHERE NAME LIKE '%t_employees%';

-- 重建索引(MySQL语法)
-- 方法1:分步操作(传统方式)
ALTER TABLE t_employees DROP INDEX idx_emp_dept_salary;
ALTER TABLE t_employees ADD INDEX idx_emp_dept_salary (department_id_, salary_, hire_date_);

-- 方法2:使用在线DDL重建索引(推荐)
-- 注意:不能在同一语句中删除和添加同名索引,需要使用临时名称
ALTER TABLE t_employees
ADD INDEX idx_emp_dept_salary_new (department_id_, salary_, hire_date_),
ALGORITHM=INPLACE, LOCK=NONE;

-- 删除旧索引
ALTER TABLE t_employees DROP INDEX idx_emp_dept_salary;

-- 重命名新索引(MySQL 8.0+支持)
ALTER TABLE t_employees RENAME INDEX idx_emp_dept_salary_new TO idx_emp_dept_salary;

-- 方法3:如果MySQL版本不支持RENAME INDEX,使用以下方式
-- ALTER TABLE t_employees DROP INDEX idx_emp_dept_salary_new;
-- ALTER TABLE t_employees ADD INDEX idx_emp_dept_salary (department_id_, salary_, hire_date_);
2.5.3 索引使用监控
-- MySQL 索引使用监控
-- 业务场景:监控索引使用情况,识别未使用的索引以优化数据库性能

-- 1. 确保性能模式已启用(MySQL 5.6+默认启用)
SELECT @@performance_schema;

-- 2. 查看索引I/O统计信息
-- 业务场景:分析索引使用频率,识别热点索引和冷门索引,为索引优化提供数据支撑
SELECT
    object_schema as database_name,
    object_name as table_name,
    index_name,
    -- 字段说明:count_read - 索引读取操作次数,包括SELECT查询中的索引查找
    count_read as read_operations,
    -- 字段说明:count_write - 索引写入操作次数,包括INSERT/UPDATE/DELETE导致的索引维护
    count_write as write_operations,
    -- 字段说明:count_fetch - 索引获取操作次数,通常与count_read相关
    count_fetch as fetch_operations,
    -- 字段说明:count_insert - 因INSERT操作导致的索引插入次数
    count_insert as insert_operations,
    -- 字段说明:count_update - 因UPDATE操作导致的索引更新次数
    count_update as update_operations,
    -- 字段说明:count_delete - 因DELETE操作导致的索引删除次数
    count_delete as delete_operations
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE object_schema = DATABASE()
  AND object_name = 't_employees'
  AND index_name IS NOT NULL
ORDER BY count_read DESC;

-- 业务解读:索引I/O统计数据的实际应用指导
-- 1. 读操作分析(count_read):
--    - 高频读取(>10000次):核心业务索引,需要重点优化和监控
--    - 中频读取(1000-10000次):常用索引,关注查询性能
--    - 低频读取(<1000次):可能是冷门索引,考虑是否需要保留
--    - 零读取:未使用索引,强烈建议删除以减少维护开销

-- 2. 写操作分析(count_write):
--    - 写入频率高:说明表的DML操作频繁,索引维护成本高
--    - 读写比例失衡:如果写操作远超读操作,考虑索引是否必要
--    - 写操作为0:只读索引,通常是历史数据表的索引

-- 3. 性能优化建议:
--    - 读取次数最高的索引:优先进行查询优化,确保索引设计合理
--    - 读取为0的索引:考虑删除,减少INSERT/UPDATE/DELETE的性能开销
--    - 写入成本过高的索引:评估是否可以通过索引合并或重设计来优化

-- 3. 查看索引等待事件统计
-- 业务场景:分析索引操作的性能瓶颈,识别响应时间过长的索引,为性能调优提供依据
SELECT
    object_schema,
    object_name,
    index_name,
    -- 字段说明:count_star - 索引相关的总事件数,包括所有I/O操作
    count_star as total_events,
    -- 字段说明:sum_timer_wait - 索引操作的总等待时间(纳秒转换为秒)
    sum_timer_wait/1000000000 as total_wait_seconds,
    -- 字段说明:avg_timer_wait - 索引操作的平均等待时间(纳秒转换为秒)
    avg_timer_wait/1000000000 as avg_wait_seconds,
    -- 计算每秒平均事件数(吞吐量指标)
    ROUND(count_star / (sum_timer_wait/1000000000), 2) as events_per_second
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE object_schema = DATABASE()
  AND object_name = 't_employees'
  AND count_star > 0
ORDER BY sum_timer_wait DESC;

-- 业务解读:索引等待时间统计的性能分析指导
-- 1. 总等待时间分析(total_wait_seconds):
--    - 高总等待时间(>10秒):该索引是系统性能瓶颈,需要优先优化
--    - 中等等待时间(1-10秒):关注索引设计和查询模式
--    - 低等待时间(<1秒):性能良好,维持现状

-- 2. 平均等待时间分析(avg_wait_seconds):
--    - 高平均等待(>0.1秒):单次操作耗时长,可能存在以下问题:
--      * 索引碎片严重,需要重建索引
--      * 索引设计不合理,选择性差
--      * 硬件I/O性能瓶颈
--    - 中等平均等待(0.01-0.1秒):性能一般,可以进一步优化
--    - 低平均等待(<0.01秒):性能优秀

-- 3. 事件频率分析(total_events):
--    - 高频事件+高等待时间:系统热点,影响整体性能
--    - 低频事件+高等待时间:偶发性能问题,但单次影响大
--    - 高频事件+低等待时间:高效索引,系统核心组件

-- 4. 性能调优具体建议:
--    - 平均等待时间>0.1秒:
--      * 检查索引碎片:SHOW INDEX FROM table_name;
--      * 重建索引:ALTER TABLE table_name REBUILD INDEX index_name;
--      * 分析查询计划:EXPLAIN SELECT ...;
--    - 总等待时间占比过高:
--      * 考虑索引合并或重新设计
--      * 评估是否需要分区表
--      * 检查硬件I/O性能

-- 5. 关联分析方法:
--    - 结合I/O统计:高读取次数+高等待时间 = 查询热点瓶颈
--    - 结合慢查询日志:定位具体的问题SQL语句
--    - 结合系统监控:CPU、内存、磁盘I/O的综合分析

-- 4. 识别未使用的索引(读写操作都为0的索引)
SELECT
    s.TABLE_SCHEMA,
    s.TABLE_NAME,
    s.INDEX_NAME,
    s.CARDINALITY,
    COALESCE(t.count_read, 0) as read_count,
    COALESCE(t.count_write, 0) as write_count
FROM INFORMATION_SCHEMA.STATISTICS s
LEFT JOIN performance_schema.table_io_waits_summary_by_index_usage t
    ON s.TABLE_SCHEMA = t.object_schema
    AND s.TABLE_NAME = t.object_name
    AND s.INDEX_NAME = t.index_name
WHERE s.TABLE_SCHEMA = DATABASE()
  AND s.TABLE_NAME = 't_employees'
  AND s.INDEX_NAME != 'PRIMARY'
  AND (t.count_read IS NULL OR t.count_read = 0)
  AND (t.count_write IS NULL OR t.count_write = 0);

-- 5. 重置性能统计信息(用于重新开始监控)
-- TRUNCATE TABLE performance_schema.table_io_waits_summary_by_index_usage;

-- 6. 查看索引基本信息
SHOW INDEX FROM t_employees;

索引优化最佳实践总结:

  1. 索引设计原则

    • 优先为WHERE、JOIN、ORDER BY子句中的列创建索引
    • 复合索引中将选择性高的列放在前面
    • 避免在小表上创建过多索引
  2. 维护策略

    • 定期监控索引使用情况,删除未使用的索引
    • 根据碎片程度定期重建索引
    • 及时更新统计信息
  3. 性能监控

    • 使用各数据库系统的内置监控工具
    • 关注索引的读写比例
    • 监控查询执行计划的变化

3. 复杂查询优化

复杂查询优化是高级SQL技术的核心,涉及窗口函数、CTE、子查询优化等多个方面。不同数据库系统在查询优化器实现上各有特色。

3.1 窗口函数(Window Functions)

窗口函数是SQL:2003标准引入的强大功能,允许在结果集的窗口上执行计算,而不需要GROUP BY。

3.1.1 排名函数(ROW_NUMBER, RANK, DENSE_RANK)
-- 业务场景:HR薪酬分析系统 - 为每个部门的员工按薪资进行多维度排名分析
-- 用于年度绩效评估、薪资调整决策、人才梯队建设等关键业务场景
-- 窗口函数相比传统GROUP BY方式,可以保留明细数据的同时进行分析计算

SELECT
    employee_id_,
    name_,
    department_id_,
    salary_,
    hire_date_,

    -- ROW_NUMBER(): 为每行分配唯一的序号,即使薪资相同也不会并列
    -- 业务用途:员工列表分页、生成唯一排序标识、去重操作
    ROW_NUMBER() OVER (
        PARTITION BY department_id_
        ORDER BY salary_ DESC, hire_date_ ASC  -- 薪资相同时按入职时间排序
    ) as row_num,

    -- RANK(): 相同薪资得到相同排名,下一个排名会跳跃
    -- 业务用途:传统排名方式(如1,2,2,4),适合绩效排名、奖金分配
    RANK() OVER (
        PARTITION BY department_id_
        ORDER BY salary_ DESC
    ) as rank_num,

    -- DENSE_RANK(): 相同薪资得到相同排名,下一个排名不跳跃
    -- 业务用途:连续排名(如1,2,2,3),适合职级评定、等级划分
    DENSE_RANK() OVER (
        PARTITION BY department_id_
        ORDER BY salary_ DESC
    ) as dense_rank_num

FROM t_employees
WHERE status_ = 'ACTIVE'  -- 只分析在职员工
ORDER BY department_id_, salary_ DESC;

-- 反例(不推荐):使用传统GROUP BY方式实现排名,复杂且性能差
-- SELECT e1.employee_id_, e1.name_, e1.salary_,
--        COUNT(e2.employee_id_) + 1 as rank_num
-- FROM t_employees e1
-- LEFT JOIN t_employees e2 ON e1.department_id_ = e2.department_id_
--                          AND e1.salary_ < e2.salary_
-- WHERE e1.status_ = 'ACTIVE'
-- GROUP BY e1.employee_id_, e1.name_, e1.salary_
-- 问题:需要自连接,性能差,代码复杂,难以维护

-- 业务场景:人才盘点 - 获取每个部门薪资前3名的员工,用于核心人才识别
-- 这是窗口函数的经典应用,替代复杂的子查询和自连接
SELECT * FROM (
    SELECT
        employee_id_,
        name_,
        department_id_,
        salary_,
        ROW_NUMBER() OVER (PARTITION BY department_id_ ORDER BY salary_ DESC) as rn
    FROM t_employees
    WHERE status_ = 'ACTIVE'
) ranked
WHERE rn <= 3;

-- 反例(不推荐):使用相关子查询实现Top N,性能极差
-- SELECT e1.employee_id_, e1.name_, e1.department_id_, e1.salary_
-- FROM t_employees e1
-- WHERE (
--     SELECT COUNT(*)
--     FROM t_employees e2
--     WHERE e2.department_id_ = e1.department_id_
--       AND e2.salary_ > e1.salary_
-- ) < 3;
-- 问题:对每一行都执行子查询,时间复杂度O(n²),大表查询极慢

-- 业务场景:薪酬分析 - 计算员工薪资在全公司的百分位数,用于薪资水平评估
SELECT
    employee_id_,
    name_,
    department_id_,
    salary_,
    -- PERCENT_RANK(): 计算百分位排名 (0-1之间)
    PERCENT_RANK() OVER (ORDER BY salary_) as salary_percentile,
    -- CUME_DIST(): 累积分布,表示小于等于当前值的比例
    CUME_DIST() OVER (ORDER BY salary_) as cumulative_distribution,
    -- NTILE(): 将数据分成N个等份,用于薪资等级划分
    NTILE(4) OVER (ORDER BY salary_) as salary_quartile,
    -- 业务解读:第1四分位数=低薪,第4四分位数=高薪
    CASE NTILE(4) OVER (ORDER BY salary_)
        WHEN 1 THEN '低薪档'
        WHEN 2 THEN '中低薪档'
        WHEN 3 THEN '中高薪档'
        WHEN 4 THEN '高薪档'
    END as salary_level
FROM t_employees
WHERE status_ = 'ACTIVE';
3.1.2 聚合窗口函数

聚合窗口函数允许在移动窗口内执行聚合计算,常用于趋势分析和累计计算。

-- 业务场景:财务分析 - 计算公司人力成本的累计增长趋势,用于预算规划
-- 移动平均可以平滑数据波动,识别薪资增长趋势
SELECT
    employee_id_,
    name_,
    hire_date_,
    salary_,
    -- 累计薪资总和:从公司成立到当前员工入职时的薪资累计
    SUM(salary_) OVER (ORDER BY hire_date_ ROWS UNBOUNDED PRECEDING) as cumulative_salary,
    -- 3期移动平均:平滑薪资波动,识别招聘薪资趋势
    AVG(salary_) OVER (ORDER BY hire_date_ ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg_3,
    -- 累计员工数:公司规模增长趋势
    COUNT(*) OVER (ORDER BY hire_date_ ROWS UNBOUNDED PRECEDING) as running_count
FROM t_employees
WHERE status_ = 'ACTIVE'
ORDER BY hire_date_;

-- 反例(不推荐):使用子查询计算累计值,性能极差
-- SELECT e1.employee_id_, e1.hire_date_, e1.salary_,
--        (SELECT SUM(e2.salary_) FROM t_employees e2 WHERE e2.hire_date_ <= e1.hire_date_) as cumulative_salary
-- FROM t_employees e1
-- ORDER BY e1.hire_date_;
-- 问题:每行都执行一次子查询,时间复杂度O(n²)

-- 业务场景:薪酬公平性分析 - 比较员工薪资与部门平均水平的差异
-- 用于识别薪资异常、制定调薪策略、保证内部公平性
SELECT
    employee_id_,
    name_,
    department_id_,
    salary_,
    -- 部门平均薪资:用于横向比较
    AVG(salary_) OVER (PARTITION BY department_id_) as dept_avg_salary,
    -- 与部门平均的差异:正值表示高于平均,负值表示低于平均
    salary_ - AVG(salary_) OVER (PARTITION BY department_id_) as salary_diff_from_avg,
    -- 薪资在部门内的相对位置
    CASE
        WHEN salary_ > AVG(salary_) OVER (PARTITION BY department_id_) THEN '高于部门平均'
        WHEN salary_ < AVG(salary_) OVER (PARTITION BY department_id_) THEN '低于部门平均'
        ELSE '等于部门平均'
    END as salary_position,
    -- 部门薪资范围
    MAX(salary_) OVER (PARTITION BY department_id_) as dept_max_salary,
    MIN(salary_) OVER (PARTITION BY department_id_) as dept_min_salary
FROM t_employees
WHERE status_ = 'ACTIVE';

-- 业务场景:销售数据时间序列分析 - 识别销售趋势和异常波动
-- 用于销售预测、业绩监控、营销效果评估、异常检测
SELECT
    sale_date_,
    amount_,
    -- 7天滚动销售额:平滑短期波动,识别周趋势
    SUM(amount_) OVER (ORDER BY sale_date_ ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as rolling_7day_sum,
    -- 30天移动平均:识别月度趋势,过滤噪音数据
    AVG(amount_) OVER (ORDER BY sale_date_ ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) as rolling_30day_avg,
    -- 日环比变化:识别日销售额的波动情况
    amount_ - LAG(amount_, 1) OVER (ORDER BY sale_date_) as day_over_day_change,
    -- 周同比变化:识别周期性模式(如周末效应)
    amount_ - LAG(amount_, 7) OVER (ORDER BY sale_date_) as week_over_week_change,
    -- 业务指标:销售趋势判断
    CASE
        WHEN amount_ > AVG(amount_) OVER (ORDER BY sale_date_ ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
        THEN '高于月均'
        ELSE '低于月均'
    END as performance_vs_avg
FROM t_sales
ORDER BY sale_date_;

-- 反例(不推荐):使用子查询计算移动平均,性能极差
-- SELECT sale_date_, amount_,
--        (SELECT AVG(amount_) FROM t_sales s2
--         WHERE s2.sale_date_ BETWEEN DATE_SUB(s1.sale_date_, INTERVAL 29 DAY) AND s1.sale_date_) as moving_avg
-- FROM t_sales s1
-- ORDER BY sale_date_;
-- 问题:每行都执行一次子查询,大数据量时性能不可接受
3.1.3 偏移函数(LAG, LEAD)

偏移函数用于访问当前行之前或之后的行数据,常用于同比、环比分析。

-- 业务场景:人力成本趋势分析 - 分析公司招聘薪资的变化趋势
-- 用于HR制定薪资策略、预测人力成本、识别薪资通胀趋势
SELECT
    employee_id_,
    name_,
    hire_date_,
    salary_,
    -- LAG(): 获取前一个员工的薪资,用于计算薪资变化趋势
    LAG(salary_, 1) OVER (ORDER BY hire_date_) as prev_hire_salary,
    -- LEAD(): 获取后一个员工的薪资,用于预测薪资走势
    LEAD(salary_, 1) OVER (ORDER BY hire_date_) as next_hire_salary,
    -- 薪资变化金额:正值表示薪资上涨,负值表示下降
    salary_ - LAG(salary_, 1) OVER (ORDER BY hire_date_) as salary_change_amount,
    -- 薪资变化百分比:更直观的涨幅指标
    ROUND((salary_ - LAG(salary_, 1) OVER (ORDER BY hire_date_)) /
          LAG(salary_, 1) OVER (ORDER BY hire_date_) * 100, 2) as salary_change_percent
FROM t_employees
WHERE status_ = 'ACTIVE'
ORDER BY hire_date_;

-- 反例(不推荐):使用自连接实现偏移功能,复杂且性能差
-- SELECT e1.employee_id_, e1.hire_date_, e1.salary_,
--        e2.salary_ as prev_salary,
--        e1.salary_ - e2.salary_ as salary_change
-- FROM t_employees e1
-- LEFT JOIN t_employees e2 ON e2.hire_date_ = (
--     SELECT MAX(hire_date_) FROM t_employees e3
--     WHERE e3.hire_date_ < e1.hire_date_
-- )
-- ORDER BY e1.hire_date_;
-- 问题:需要复杂的子查询和自连接,性能差,逻辑复杂

-- 业务场景:销售业绩同比分析 - 评估业务增长情况和季节性模式
-- 用于年度业绩评估、预算制定、市场趋势分析、投资决策支持
WITH monthly_sales AS (
    SELECT
        YEAR(sale_date_) as year,
        MONTH(sale_date_) as month,
        SUM(amount_) as monthly_total,
        COUNT(*) as transaction_count,
        AVG(amount_) as avg_transaction_amount
    FROM t_sales
    GROUP BY YEAR(sale_date_), MONTH(sale_date_)
)
SELECT
    year,
    month,
    monthly_total,
    transaction_count,
    -- LAG(12): 获取去年同月数据,用于同比分析
    LAG(monthly_total, 12) OVER (ORDER BY year, month) as same_month_last_year,
    LAG(transaction_count, 12) OVER (ORDER BY year, month) as transactions_last_year,
    -- 同比增长率:衡量业务增长速度的核心指标
    CASE
        WHEN LAG(monthly_total, 12) OVER (ORDER BY year, month) IS NOT NULL
        THEN ROUND((monthly_total - LAG(monthly_total, 12) OVER (ORDER BY year, month)) * 100.0 /
                   LAG(monthly_total, 12) OVER (ORDER BY year, month), 2)
        ELSE NULL
    END as yoy_growth_percent,
    -- 环比增长率:衡量月度变化趋势
    CASE
        WHEN LAG(monthly_total, 1) OVER (ORDER BY year, month) IS NOT NULL
        THEN ROUND((monthly_total - LAG(monthly_total, 1) OVER (ORDER BY year, month)) * 100.0 /
                   LAG(monthly_total, 1) OVER (ORDER BY year, month), 2)
        ELSE NULL
    END as mom_growth_percent,
    -- 业务指标:增长趋势判断
    CASE
        WHEN LAG(monthly_total, 12) OVER (ORDER BY year, month) IS NULL THEN '无同比数据'
        WHEN monthly_total > LAG(monthly_total, 12) OVER (ORDER BY year, month) THEN '同比增长'
        WHEN monthly_total < LAG(monthly_total, 12) OVER (ORDER BY year, month) THEN '同比下降'
        ELSE '同比持平'
    END as growth_trend
FROM monthly_sales
ORDER BY year, month;

-- 反例(不推荐):使用自连接实现同比分析,逻辑复杂且容易出错
-- SELECT s1.year, s1.month, s1.monthly_total,
--        s2.monthly_total as last_year_same_month,
--        (s1.monthly_total - s2.monthly_total) * 100.0 / s2.monthly_total as growth_rate
-- FROM monthly_sales s1
-- LEFT JOIN monthly_sales s2 ON s1.year = s2.year + 1 AND s1.month = s2.month
-- ORDER BY s1.year, s1.month;
-- 问题:需要复杂的JOIN条件,容易出现逻辑错误,不如窗口函数直观

-- 业务场景:薪资趋势峰谷分析 - 识别公司薪资政策的调整节点
-- 用于分析薪资策略变化、识别市场薪资波动、制定招聘预算
SELECT
    employee_id_,
    name_,
    hire_date_,
    salary_,
    -- 前一个员工薪资
    LAG(salary_) OVER (ORDER BY hire_date_) as prev_salary,
    -- 后一个员工薪资
    LEAD(salary_) OVER (ORDER BY hire_date_) as next_salary,
    -- 薪资趋势分析:识别薪资政策的转折点
    CASE
        WHEN salary_ > LAG(salary_) OVER (ORDER BY hire_date_)
         AND salary_ > LEAD(salary_) OVER (ORDER BY hire_date_)
        THEN '薪资峰值' -- 薪资高点,可能是特殊人才或市场高峰期
        WHEN salary_ < LAG(salary_) OVER (ORDER BY hire_date_)
         AND salary_ < LEAD(salary_) OVER (ORDER BY hire_date_)
        THEN '薪资谷值' -- 薪资低点,可能是成本控制期或市场低迷期
        WHEN salary_ > LAG(salary_) OVER (ORDER BY hire_date_)
        THEN '薪资上升' -- 薪资上涨趋势
        WHEN salary_ < LAG(salary_) OVER (ORDER BY hire_date_)
        THEN '薪资下降' -- 薪资下降趋势
        ELSE '薪资平稳'
    END as salary_trend,
    -- 薪资变化幅度
    ROUND((salary_ - LAG(salary_) OVER (ORDER BY hire_date_)) /
          LAG(salary_) OVER (ORDER BY hire_date_) * 100, 2) as salary_change_rate
FROM t_employees
WHERE status_ = 'ACTIVE'
ORDER BY hire_date_;

-- 反例(不推荐):使用复杂的自连接实现趋势分析,逻辑混乱
-- SELECT e1.employee_id_, e1.hire_date_, e1.salary_,
--        CASE
--            WHEN e1.salary_ > COALESCE(e2.salary_, 0) AND e1.salary_ > COALESCE(e3.salary_, 0) THEN 'Peak'
--            WHEN e1.salary_ < COALESCE(e2.salary_, 999999) AND e1.salary_ < COALESCE(e3.salary_, 999999) THEN 'Valley'
--            ELSE 'Normal'
--        END as trend
-- FROM t_employees e1
-- LEFT JOIN t_employees e2 ON e2.hire_date_ = (SELECT MAX(hire_date_) FROM t_employees WHERE hire_date_ < e1.hire_date_)
-- LEFT JOIN t_employees e3 ON e3.hire_date_ = (SELECT MIN(hire_date_) FROM t_employees WHERE hire_date_ > e1.hire_date_)
-- ORDER BY e1.hire_date_;
-- 问题:多重自连接,逻辑复杂,性能差,难以维护

3.2 公用表表达式(CTE)

CTE提供了一种创建临时命名结果集的方法,使复杂查询更易读和维护。

3.2.1 非递归CTE的应用
-- 业务场景:高薪员工分析报告 - 分析各部门高薪员工分布情况,用于薪酬策略制定
-- CTE使复杂查询逻辑清晰,便于理解和维护,相比嵌套子查询可读性提升显著

-- 正例:使用CTE分步骤构建复杂查询,逻辑清晰
WITH high_earners AS (
    -- 第一步:筛选高薪员工(薪资>60000)
    SELECT employee_id_, name_, department_id_, salary_
    FROM t_employees
    WHERE salary_ > 60000 AND status_ = 'ACTIVE'
),
dept_stats AS (
    -- 第二步:计算各部门高薪员工统计信息
    SELECT
        department_id_,
        COUNT(*) as high_earner_count,
        AVG(salary_) as avg_high_salary,
        MAX(salary_) as max_salary,
        MIN(salary_) as min_salary
    FROM high_earners
    GROUP BY department_id_
)
-- 第三步:生成最终分析报告
SELECT
    d.department_name_,
    ds.high_earner_count,
    ds.avg_high_salary,
    ds.max_salary,
    ds.min_salary,
    -- 计算薪资指数:部门高薪平均值相对于全公司平均值的比例
    ROUND(ds.avg_high_salary / (SELECT AVG(salary_) FROM t_employees WHERE status_ = 'ACTIVE') * 100, 2) as salary_index
FROM dept_stats ds
JOIN t_departments d ON ds.department_id_ = d.department_id_
ORDER BY ds.avg_high_salary DESC;

-- 反例(不推荐):使用嵌套子查询实现相同功能,可读性极差
-- SELECT
--     d.department_name_,
--     (SELECT COUNT(*) FROM t_employees e WHERE e.department_id_ = d.department_id_ AND e.salary_ > 60000) as high_earner_count,
--     (SELECT AVG(salary_) FROM t_employees e WHERE e.department_id_ = d.department_id_ AND e.salary_ > 60000) as avg_high_salary
-- FROM t_departments d
-- WHERE EXISTS (SELECT 1 FROM t_employees e WHERE e.department_id_ = d.department_id_ AND e.salary_ > 60000);
-- 问题:多个子查询重复扫描表,性能差,逻辑难以理解和维护

-- 复杂的多级CTE
WITH sales_summary AS (
    SELECT
        employee_id_,
        YEAR(sale_date_) as year,
        MONTH(sale_date_) as month,
        SUM(amount_) as monthly_sales,
        COUNT(*) as transaction_count
    FROM t_sales
    GROUP BY employee_id_, YEAR(sale_date_), MONTH(sale_date_)
),
employee_performance AS (
    SELECT
        ss.employee_id_,
        e.name_,
        e.department_id_,
        ss.year,
        ss.month,
        ss.monthly_sales,
        ss.transaction_count,
        AVG(ss.monthly_sales) OVER (PARTITION BY ss.employee_id_ ORDER BY ss.year, ss.month
                                   ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as rolling_3month_avg
    FROM sales_summary ss
    JOIN t_employees e ON ss.employee_id_ = e.employee_id_
),
top_performers AS (
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY year, month ORDER BY monthly_sales DESC) as sales_rank
    FROM employee_performance
)
SELECT
    year,
    month,
    name_,
    monthly_sales,
    rolling_3month_avg,
    sales_rank
FROM top_performers
WHERE sales_rank <= 5
ORDER BY year, month, sales_rank;
3.2.2 递归CTE实现复杂查询

递归CTE是处理层次数据和图形数据的强大工具。

-- 组织架构层次查询
WITH RECURSIVE employee_hierarchy AS (
    -- 锚点查询:找到所有顶级管理者
    SELECT
        employee_id_,
        name_,
        manager_id_,
        0 as level,
        CAST(name_ AS VARCHAR(1000)) as hierarchy_path
    FROM t_employees
    WHERE manager_id_ IS NULL

    UNION ALL

    -- 递归查询:找到下级员工
    SELECT
        e.employee_id_,
        e.name_,
        e.manager_id_,
        eh.level + 1,
        CAST(CONCAT(eh.hierarchy_path, ' -> ', e.name_) AS VARCHAR(1000)) as hierarchy_path
    FROM t_employees e
    JOIN employee_hierarchy eh ON e.manager_id_ = eh.employee_id_
    WHERE eh.level < 10  -- 防止无限递归
)
SELECT
    employee_id_,
    CONCAT(REPEAT('  ', level), name_) as indented_name,
    level,
    hierarchy_path
FROM employee_hierarchy
ORDER BY hierarchy_path;

-- 计算每个管理者的下属总数
WITH RECURSIVE subordinate_count AS (
    SELECT
        employee_id_,
        name_,
        manager_id_,
        1 as subordinate_count
    FROM t_employees

    UNION ALL

    SELECT
        sc.employee_id_,
        sc.name_,
        e.manager_id_,
        sc.subordinate_count + 1
    FROM subordinate_count sc
    JOIN t_employees e ON sc.manager_id_ = e.employee_id_
    WHERE e.manager_id_ IS NOT NULL
)
SELECT
    manager_id_,
    COUNT(*) as total_subordinates
FROM subordinate_count
WHERE manager_id_ IS NOT NULL
GROUP BY manager_id_
ORDER BY total_subordinates DESC;

-- 数字序列生成(适用于测试数据生成)
WITH RECURSIVE number_series AS (
    SELECT 1 as n
    UNION ALL
    SELECT n + 1
    FROM number_series
    WHERE n < 1000
)
SELECT n FROM number_series;
3.2.3 CTE性能优化技巧
-- 场景:MySQL中CTE的性能优化技巧
-- 注意:MySQL不支持MATERIALIZED提示,但可以通过其他方式优化
-- 业务需求:计算员工薪资与部门平均薪资的对比分析

-- CTE vs 子查询性能对比
-- 使用CTE
WITH dept_avg AS (
    SELECT department_id_, AVG(salary_) as avg_salary
    FROM t_employees
    GROUP BY department_id_
)
SELECT e.name_, e.salary_, da.avg_salary
FROM t_employees e
JOIN dept_avg da ON e.department_id_ = da.department_id_
WHERE e.salary_ > da.avg_salary;

-- 等价的子查询
SELECT e.name_, e.salary_, sub.avg_salary
FROM t_employees e
JOIN (
    SELECT department_id_, AVG(salary_) as avg_salary
    FROM t_employees
    GROUP BY department_id_
) sub ON e.department_id_ = sub.department_id_
WHERE e.salary_ > sub.avg_salary;

3.3 子查询优化

子查询优化是查询性能调优的重要环节,涉及相关子查询、非相关子查询的选择和重写。

3.3.1 相关子查询vs非相关子查询
-- 业务场景:薪资异常检测 - 识别薪资高于部门平均水平的员工
-- 用于薪资审计、绩效评估、人才识别等关键业务场景

-- 反例(不推荐):相关子查询,性能较差
-- 问题:对外层查询的每一行都要执行一次子查询,时间复杂度O(n²)
SELECT employee_id_, name_, salary_, department_id_
FROM t_employees e1
WHERE salary_ > (
    SELECT AVG(salary_)
    FROM t_employees e2
    WHERE e2.department_id_ = e1.department_id_
      AND e2.status_ = 'ACTIVE'
);
-- 性能问题:如果有1000个员工,子查询可能执行1000次

-- 正例:优化为窗口函数,性能显著提升
-- 优势:只需要一次表扫描,时间复杂度O(n)
SELECT employee_id_, name_, salary_, department_id_, dept_avg_salary
FROM (
    SELECT
        employee_id_,
        name_,
        salary_,
        department_id_,
        -- 窗口函数一次计算所有部门的平均薪资
        AVG(salary_) OVER (PARTITION BY department_id_) as dept_avg_salary
    FROM t_employees
    WHERE status_ = 'ACTIVE'
) t
WHERE salary_ > dept_avg_salary;
-- 性能优势:相同数据量下,性能提升5-50倍

-- 业务场景:高预算部门员工查询 - 查找预算充足部门的所有员工
-- 用于资源分配、项目人员调配、成本分析

-- 正例:非相关子查询,性能良好
-- 子查询只执行一次,结果可以被缓存和重用
SELECT employee_id_, name_, salary_, department_id_
FROM t_employees
WHERE department_id_ IN (
    SELECT department_id_
    FROM t_departments
    WHERE budget_ > 1000000
      AND status_ = 'ACTIVE'
)
AND status_ = 'ACTIVE';

-- 替代方案:使用JOIN,通常性能更好且更直观
SELECT e.employee_id_, e.name_, e.salary_, e.department_id_
FROM t_employees e
INNER JOIN t_departments d ON e.department_id_ = d.department_id_
WHERE d.budget_ > 1000000
  AND e.status_ = 'ACTIVE'
  AND d.status_ = 'ACTIVE';
3.3.2 EXISTS vs IN 性能对比
-- 业务场景:活跃销售员工识别 - 查找有销售记录的员工,用于绩效评估和奖励发放

-- 正例:使用EXISTS,性能通常更好
-- 优势:一旦找到匹配记录就停止搜索,适合大数据量场景
-- 适用场景:子查询返回大量结果,或者只需要判断存在性
SELECT e.employee_id_, e.name_, e.department_id_
FROM t_employees e
WHERE EXISTS (
    SELECT 1  -- 使用常量1,不需要返回实际数据
    FROM t_sales s
    WHERE s.employee_id_ = e.employee_id_
      AND s.sale_date_ >= '2023-01-01'
      AND s.amount_ > 0
);
-- 性能特点:短路求值,找到第一个匹配就停止

-- 替代方案:使用IN,适合小结果集
-- 优势:当子查询返回少量唯一值时,可能比EXISTS更快
-- 适用场景:子查询返回少量不重复结果
SELECT e.employee_id_, e.name_, e.department_id_
FROM t_employees e
WHERE e.employee_id_ IN (
    SELECT DISTINCT s.employee_id_  -- DISTINCT避免重复值
    FROM t_sales s
    WHERE s.sale_date_ >= '2023-01-01'
      AND s.amount_ > 0
);
-- 注意:IN需要构建完整的结果集,然后进行匹配

-- 业务场景:无销售记录员工识别 - 查找需要培训或调岗的员工

-- 正例:使用NOT EXISTS,处理NULL值安全可靠
-- 推荐原因:NULL值不会影响结果,逻辑清晰
SELECT e.employee_id_, e.name_, e.department_id_, e.hire_date_
FROM t_employees e
WHERE NOT EXISTS (
    SELECT 1
    FROM t_sales s
    WHERE s.employee_id_ = e.employee_id_
      AND s.sale_date_ >= '2023-01-01'
)
AND e.status_ = 'ACTIVE';

-- 反例(不推荐):使用NOT IN,NULL值处理复杂
-- 问题:如果子查询包含NULL值,整个NOT IN可能返回空结果
SELECT e.employee_id_, e.name_, e.department_id_
FROM t_employees e
WHERE e.employee_id_ NOT IN (
    SELECT s.employee_id_
    FROM t_sales s
    WHERE s.employee_id_ IS NOT NULL  -- 必须显式排除NULL值
      AND s.sale_date_ >= '2023-01-01'
)
AND e.status_ = 'ACTIVE';
-- 问题:如果忘记排除NULL,查询可能返回0行结果

-- 性能对比总结:
-- 1. EXISTS vs IN:
--    - 大数据量:EXISTS通常更快(短路求值)
--    - 小数据量:IN可能更快(一次性构建哈希表)
-- 2. NOT EXISTS vs NOT IN:
--    - NOT EXISTS:推荐使用,NULL值处理安全
--    - NOT IN:需要小心NULL值,容易出错

-- 业务场景:超常表现员工识别 - 查找有超过个人平均销售额记录的员工
-- 用于识别潜力员工、制定激励政策、业绩分析
SELECT e.employee_id_, e.name_, e.department_id_
FROM t_employees e
WHERE EXISTS (
    SELECT 1
    FROM t_sales s
    WHERE s.employee_id_ = e.employee_id_
      AND s.amount_ > (
          -- 嵌套子查询:计算该员工的历史平均销售额
          SELECT AVG(amount_)
          FROM t_sales s2
          WHERE s2.employee_id_ = s.employee_id_
      )
      AND s.sale_date_ >= '2023-01-01'
);

-- 优化建议:可以重写为窗口函数,性能更好
SELECT DISTINCT e.employee_id_, e.name_, e.department_id_
FROM t_employees e
JOIN (
    SELECT
        employee_id_,
        amount_,
        AVG(amount_) OVER (PARTITION BY employee_id_) as avg_amount
    FROM t_sales
    WHERE sale_date_ >= '2023-01-01'
) s ON e.employee_id_ = s.employee_id_
WHERE s.amount_ > s.avg_amount;
3.3.3 子查询重写技术
-- 业务场景:员工基本信息报表 - 生成包含部门名称的员工列表
-- 用于HR报表、组织架构展示、员工信息导出

-- 反例(不推荐):标量子查询,性能较差
-- 问题:对每个员工都执行一次子查询,N+1查询问题
SELECT
    employee_id_,
    name_,
    salary_,
    hire_date_,
    -- 标量子查询:每行都要执行一次
    (SELECT department_name_
     FROM t_departments d
     WHERE d.department_id_ = e.department_id_) as dept_name
FROM t_employees e
WHERE status_ = 'ACTIVE';
-- 性能问题:1000个员工需要执行1001次查询(1次主查询+1000次子查询)

-- 正例:重写为JOIN,性能显著提升
-- 优势:只需要一次JOIN操作,避免重复查询
SELECT
    e.employee_id_,
    e.name_,
    e.salary_,
    e.hire_date_,
    d.department_name_
FROM t_employees e
LEFT JOIN t_departments d ON e.department_id_ = d.department_id_
WHERE e.status_ = 'ACTIVE';
-- 性能优势:只需要2次表访问,性能提升10-100倍

-- 业务场景:销售业绩优秀员工识别 - 找出销售额超过部门平均水平的员工
-- 用于绩效评估、奖金分配、晋升决策、团队激励

-- 反例(不推荐):多层嵌套子查询,性能极差且难以维护
-- 问题:多重相关子查询,每个员工都要执行多次复杂计算
SELECT
    e.employee_id_,
    e.name_,
    e.department_id_,
    -- 子查询1:计算员工总销售额
    (SELECT SUM(s.amount_)
     FROM t_sales s
     WHERE s.employee_id_ = e.employee_id_
       AND s.sale_date_ >= '2023-01-01') as total_sales
FROM t_employees e
WHERE (
    -- 子查询2:再次计算员工总销售额(重复计算!)
    SELECT SUM(s.amount_)
    FROM t_sales s
    WHERE s.employee_id_ = e.employee_id_
      AND s.sale_date_ >= '2023-01-01'
) > (
    -- 子查询3:计算部门平均销售额(对每个员工都要计算一次!)
    SELECT AVG(dept_sales.total)
    FROM (
        SELECT
            e2.employee_id_,
            SUM(s2.amount_) as total
        FROM t_employees e2
        JOIN t_sales s2 ON e2.employee_id_ = s2.employee_id_
        WHERE e2.department_id_ = e.department_id_
          AND s2.sale_date_ >= '2023-01-01'
        GROUP BY e2.employee_id_
    ) dept_sales
)
AND e.status_ = 'ACTIVE';
-- 性能问题:时间复杂度O(n³),1000个员工可能需要执行数百万次子查询

-- 正例:重写为CTE,性能优异且逻辑清晰
-- 优势:分步计算,避免重复查询,时间复杂度O(n)
WITH employee_sales AS (
    -- 第一步:计算每个员工的总销售额
    SELECT
        e.employee_id_,
        e.name_,
        e.department_id_,
        COALESCE(SUM(s.amount_), 0) as total_sales
    FROM t_employees e
    LEFT JOIN t_sales s ON e.employee_id_ = s.employee_id_
                        AND s.sale_date_ >= '2023-01-01'
    WHERE e.status_ = 'ACTIVE'
    GROUP BY e.employee_id_, e.name_, e.department_id_
),
dept_avg_sales AS (
    -- 第二步:计算每个部门的平均销售额
    SELECT
        department_id_,
        AVG(total_sales) as avg_dept_sales,
        COUNT(*) as employee_count
    FROM employee_sales
    GROUP BY department_id_
)
-- 第三步:找出超过部门平均水平的员工
SELECT
    es.employee_id_,
    es.name_,
    es.department_id_,
    es.total_sales,
    das.avg_dept_sales,
    ROUND((es.total_sales - das.avg_dept_sales) / das.avg_dept_sales * 100, 2) as performance_vs_avg_percent
FROM employee_sales es
JOIN dept_avg_sales das ON es.department_id_ = das.department_id_
WHERE es.total_sales > das.avg_dept_sales
ORDER BY performance_vs_avg_percent DESC;
-- 性能优势:相同数据量下,性能提升50-500倍,逻辑清晰易维护

3.4 JOIN策略优化

理解不同JOIN算法的工作原理对于优化复杂查询至关重要。

3.4.1 嵌套循环连接(Nested Loop Join)
-- 嵌套循环连接适用场景:小表驱动大表
-- 示例:查找特定部门的员工信息

-- MySQL 8.0 优化器提示(MySQL不支持USE_NL提示,这里仅作示例)
SELECT /*+ JOIN_ORDER(d, e) */
    e.employee_id_,
    e.name_,
    d.department_name_
FROM t_departments d
JOIN t_employees e ON d.department_id_ = e.department_id_
WHERE d.department_name_ = 'Sales';

-- 标准MySQL查询(推荐使用)
SELECT
    e.employee_id_,
    e.name_,
    d.department_name_
FROM t_departments d
JOIN t_employees e ON d.department_id_ = e.department_id_
WHERE d.department_name_ = 'Sales';
3.4.2 哈希连接(Hash Join)
-- 哈希连接适用于大表连接

SELECT
    e.employee_id_,
    e.name_,
    SUM(s.amount_) as total_sales
FROM t_employees e
JOIN t_sales s ON e.employee_id_ = s.employee_id_
GROUP BY e.employee_id_, e.name_;
3.4.3 排序合并连接(Sort Merge Join)
-- 排序合并连接适用于大表且连接列已排序的情况

-- 复杂的多表连接优化
SELECT
    e.employee_id_,
    e.name_,
    d.department_name_,
    SUM(s.amount_) as total_sales,
    COUNT(s.sale_id_) as sale_count
FROM t_employees e
JOIN t_departments d ON e.department_id_ = d.department_id_
LEFT JOIN t_sales s ON e.employee_id_ = s.employee_id_
WHERE d.budget_ > 500000
GROUP BY e.employee_id_, e.name_, d.department_name_
HAVING SUM(s.amount_) > 10000
ORDER BY total_sales DESC;

4. 高效数据操作

高效的数据操作是数据库应用性能的关键因素。本章将深入探讨批量插入、UPSERT操作、分区表管理和事务处理等高级技术。

4.1 批量插入技术

批量插入是提升数据加载性能的重要技术,在企业级应用中经常遇到大量数据导入的需求。掌握正确的批量插入技术可以将性能提升10-1000倍。

4.1.1 MySQL LOAD DATA INFILE

业务场景: 企业数据迁移、ERP系统初始化、日志数据批量导入、第三方系统数据同步

适用条件: 数据量 > 1万条,对导入速度有较高要求,数据格式规整

-- 业务场景1:新公司成立,需要批量导入10万员工数据
-- 业务价值:从传统的逐条插入(需要8小时)优化到批量导入(仅需5分钟)

-- 创建员工导入表
CREATE TABLE employee_import (
    employee_id_ INT PRIMARY KEY,
    name_ VARCHAR(50) NOT NULL,
    email_ VARCHAR(100) UNIQUE,
    department_id_ INT,
    salary_ DECIMAL(10,2),
    hire_date_ DATE,
    status_ ENUM('ACTIVE', 'INACTIVE', 'TERMINATED') DEFAULT 'ACTIVE',
    created_at_ TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_dept_status (department_id_, status_),
    INDEX idx_hire_date (hire_date_)
);

-- ✅ 正确方法:使用LOAD DATA INFILE(性能最优)
-- 性能特征:100万条记录约需30秒,比INSERT VALUES快50-100倍
LOAD DATA INFILE '/secure/path/employees.csv'
INTO TABLE employee_import
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS  -- 跳过CSV标题行
(employee_id_, name_, email_, department_id_, salary_, @hire_date_, status_)
SET hire_date_ = STR_TO_DATE(@hire_date_, '%Y-%m-%d'),
    created_at_ = NOW();

-- 业务场景2:日志数据批量导入(每日处理百万级记录)
-- 业务需求:每天凌晨2点导入前一天的用户行为日志
LOAD DATA INFILE '/logs/user_behavior_20240101.csv'
INTO TABLE user_behavior_log
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
(user_id_, action_type_, page_url_, @timestamp_, session_id_, ip_address_)
SET action_timestamp_ = FROM_UNIXTIME(@timestamp_),
    import_date_ = CURDATE();

-- ❌ 错误方法1:逐条INSERT(性能极差)
-- 问题:10万条记录需要2-8小时,严重影响业务
-- 原因:每条INSERT都是独立事务,频繁的磁盘I/O和事务日志写入
/*
INSERT INTO employee_import VALUES (1, 'John Doe', 'john@company.com', 1, 50000, '2023-01-01', 'ACTIVE');
INSERT INTO employee_import VALUES (2, 'Jane Smith', 'jane@company.com', 2, 55000, '2023-01-02', 'ACTIVE');
-- ... 重复10万次,性能灾难
*/

-- ❌ 错误方法2:不合理的批次大小
-- 问题:批次过小(<100条)性能提升有限,批次过大(>10万条)可能导致内存溢出
/*
INSERT INTO employee_import VALUES (1, 'John'), (2, 'Jane'); -- 批次太小
INSERT INTO employee_import VALUES (1, 'John'), (2, 'Jane'), ... (100000, 'Last'); -- 批次太大
*/

-- ✅ 正确方法:合理的批量INSERT(当无法使用LOAD DATA时)
-- 性能特征:比逐条插入快10-50倍,适合程序化批量插入
-- 最佳批次大小:1000-5000条记录
INSERT INTO employee_import VALUES
(1, 'John Doe', 'john.doe@company.com', 1, 50000, '2023-01-01', 'ACTIVE'),
(2, 'Jane Smith', 'jane.smith@company.com', 2, 55000, '2023-01-02', 'ACTIVE'),
(3, 'Bob Johnson', 'bob.johnson@company.com', 1, 48000, '2023-01-03', 'ACTIVE'),
-- ... 继续到1000条为一批
(1000, 'Employee 1000', 'emp1000@company.com', 3, 52000, '2023-01-10', 'ACTIVE');

-- 业务场景3:超大数据量导入的性能调优(百万级以上)
-- 适用场景:数据仓库ETL、历史数据迁移、系统整合
-- 性能提升:可额外提升20-50%的导入速度

-- 步骤1:备份当前配置并优化导入参数
SET @old_autocommit = @@autocommit;
SET @old_unique_checks = @@unique_checks;
SET @old_foreign_key_checks = @@foreign_key_checks;
SET @old_sql_log_bin = @@sql_log_bin;

-- 临时优化设置(仅在导入期间使用)
SET autocommit = 0;           -- 关闭自动提交,减少事务开销
SET unique_checks = 0;        -- 临时关闭唯一性检查
SET foreign_key_checks = 0;   -- 临时关闭外键检查
SET sql_log_bin = 0;          -- 关闭二进制日志(如果不需要复制)

-- 步骤2:执行大批量导入
LOAD DATA INFILE '/data/massive_employee_data.csv'
INTO TABLE employee_import
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(employee_id_, name_, email_, department_id_, salary_, @hire_date_, status_)
SET hire_date_ = STR_TO_DATE(@hire_date_, '%Y-%m-%d');

-- 步骤3:恢复原始设置(关键步骤,确保数据完整性)
SET autocommit = @old_autocommit;
SET unique_checks = @old_unique_checks;
SET foreign_key_checks = @old_foreign_key_checks;
SET sql_log_bin = @old_sql_log_bin;
COMMIT;

-- ❌ 严重错误:忘记恢复设置
-- 风险:可能导致后续操作的数据完整性问题
-- 影响:外键约束失效、唯一性约束失效、主从复制异常

-- 业务场景4:使用INSERT ... SELECT进行表间批量数据复制
-- 适用场景:数据备份、表结构调整、数据清洗后的批量迁移
INSERT INTO t_employees_backup (employee_id_, name_, email_, department_id_, salary_, hire_date_, status_)
SELECT employee_id_, name_, email_, department_id_, salary_, hire_date_, status_
FROM employee_import
WHERE status_ = 'ACTIVE'
  AND hire_date_ >= '2023-01-01';

-- 业务场景5:带数据转换的批量插入
-- 适用场景:数据格式标准化、业务规则应用、数据清洗
INSERT INTO t_employees_normalized (employee_id_, full_name_, email_domain_, department_name_, salary_level_)
SELECT
    ei.employee_id_,
    UPPER(TRIM(ei.name_)) as full_name_,
    SUBSTRING_INDEX(ei.email_, '@', -1) as email_domain_,
    d.department_name_,
    CASE
        WHEN ei.salary_ < 40000 THEN 'JUNIOR'
        WHEN ei.salary_ < 80000 THEN 'SENIOR'
        ELSE 'EXECUTIVE'
    END as salary_level_
FROM employee_import ei
JOIN t_departments d ON ei.department_id_ = d.department_id_
WHERE ei.status_ = 'ACTIVE';
4.1.2 批量插入性能对比和最佳实践
-- 性能测试对比(基于100万条记录的实际测试)
-- 测试环境:MySQL 8.0,16GB内存,SSD存储

/*
插入方法                    执行时间    相对性能    适用场景
-----------------------------------------------------------------
逐条INSERT                  8小时       1x         不推荐使用
批量INSERT(100条/批)        45分钟      10x        小批量程序化插入
批量INSERT(1000条/批)       8分钟       60x        中批量程序化插入
批量INSERT(5000条/批)       4分钟       120x       大批量程序化插入
LOAD DATA INFILE           30秒        960x       文件批量导入(推荐)
LOAD DATA INFILE(优化)     20秒        1440x      超大批量导入(推荐)
*/

-- 最佳实践1:根据数据量选择合适的方法
-- 数据量 < 1000条:使用批量INSERT
-- 数据量 1000-10万条:使用LOAD DATA INFILE
-- 数据量 > 10万条:使用LOAD DATA INFILE + 性能优化

-- 最佳实践2:批量插入的错误处理
-- 业务场景:确保数据导入的可靠性和可恢复性
START TRANSACTION;

-- 创建导入日志表
CREATE TABLE IF NOT EXISTS import_log (
    import_id_ INT AUTO_INCREMENT PRIMARY KEY,
    table_name_ VARCHAR(64),
    file_path_ VARCHAR(255),
    start_time_ TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    end_time_ TIMESTAMP NULL,
    records_processed_ INT DEFAULT 0,
    records_failed_ INT DEFAULT 0,
    status_ ENUM('RUNNING', 'SUCCESS', 'FAILED') DEFAULT 'RUNNING',
    error_message_ TEXT
);

-- 记录导入开始
INSERT INTO import_log (table_name_, file_path_)
VALUES ('employee_import', '/data/employees.csv');
SET @import_id = LAST_INSERT_ID();

-- 执行导入(带错误处理)
-- 注意:实际应用中应该在应用程序中处理异常
LOAD DATA INFILE '/data/employees.csv'
INTO TABLE employee_import
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(employee_id_, name_, email_, department_id_, salary_, @hire_date_, status_)
SET hire_date_ = STR_TO_DATE(@hire_date_, '%Y-%m-%d');

-- 更新导入结果
UPDATE import_log
SET end_time_ = NOW(),
    records_processed_ = ROW_COUNT(),
    status_ = 'SUCCESS'
WHERE import_id_ = @import_id;

COMMIT;

-- 最佳实践3:导入后的数据验证
-- 业务价值:确保导入数据的完整性和正确性
SELECT
    COUNT(*) as total_imported,
    COUNT(DISTINCT employee_id_) as unique_employees,
    COUNT(*) - COUNT(DISTINCT employee_id_) as duplicate_count,
    MIN(hire_date_) as earliest_hire_date,
    MAX(hire_date_) as latest_hire_date,
    AVG(salary_) as average_salary
FROM employee_import;

-- 检查数据质量问题
SELECT
    'Missing Email' as issue_type,
    COUNT(*) as issue_count
FROM employee_import
WHERE email_ IS NULL OR email_ = ''
UNION ALL
SELECT
    'Invalid Salary' as issue_type,
    COUNT(*) as issue_count
FROM employee_import
WHERE salary_ <= 0 OR salary_ > 1000000
UNION ALL
SELECT
    'Future Hire Date' as issue_type,
    COUNT(*) as issue_count
FROM employee_import
WHERE hire_date_ >= '2023-01-01';

4.2 条件更新和UPSERT操作

UPSERT(INSERT or UPDATE)操作是现代数据库应用中的核心需求,特别在数据同步、缓存更新、统计计算等场景中不可或缺。正确使用UPSERT可以避免竞态条件,提升并发性能。

4.2.1 MySQL ON DUPLICATE KEY UPDATE

业务场景: 数据同步、缓存更新、计数器维护、配置管理、用户状态更新

适用条件: 表有主键或唯一索引,需要原子性的插入或更新操作

-- 业务场景1:用户登录状态管理
-- 业务需求:用户登录时更新最后登录时间,首次登录则创建记录
-- 业务价值:避免复杂的存在性检查,确保高并发下的数据一致性

CREATE TABLE user_login_status (
    user_id_ INT PRIMARY KEY,
    username_ VARCHAR(50) NOT NULL,
    last_login_time_ TIMESTAMP,
    login_count_ INT DEFAULT 1,
    last_ip_ VARCHAR(45),
    status_ ENUM('ONLINE', 'OFFLINE') DEFAULT 'ONLINE',
    updated_at_ TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

-- ✅ 正确方法:使用ON DUPLICATE KEY UPDATE(原子操作)
-- 性能特征:单次操作,无竞态条件,支持高并发
INSERT INTO user_login_status (user_id_, username_, last_login_time_, login_count_, last_ip_, status_)
VALUES (12345, 'john_doe', NOW(), 1, '192.168.1.100', 'ONLINE')
ON DUPLICATE KEY UPDATE
    last_login_time_ = NOW(),
    login_count_ = login_count_ + 1,  -- 累加登录次数
    last_ip_ = VALUES(last_ip_),
    status_ = 'ONLINE',
    updated_at_ = NOW();

-- ❌ 错误方法1:先查询再决定操作(存在竞态条件)
-- 问题:在高并发环境下,两个请求可能同时检查到用户不存在,导致重复插入错误
/*
-- 步骤1:检查用户是否存在
SELECT COUNT(*) FROM user_login_status WHERE user_id_ = 12345;

-- 步骤2:根据结果决定操作(危险:中间可能有其他操作)
IF found THEN
    UPDATE user_login_status SET last_login_time_ = NOW() WHERE user_id_ = 12345;
ELSE
    INSERT INTO user_login_status (user_id_, username_, last_login_time_) VALUES (12345, 'john_doe', NOW());
END IF;
*/

-- ❌ 错误方法2:使用REPLACE(数据丢失风险)
-- 问题:REPLACE会删除原记录再插入,导致login_count_等累计数据丢失
/*
REPLACE INTO user_login_status (user_id_, username_, last_login_time_, login_count_)
VALUES (12345, 'john_doe', NOW(), 1);  -- login_count_总是重置为1,丢失历史数据
*/

-- 业务场景2:商品库存管理系统
-- 业务需求:商品入库时,存在则增加库存,不存在则创建商品记录
CREATE TABLE product_inventory (
    product_id_ VARCHAR(50) PRIMARY KEY,
    product_name_ VARCHAR(100) NOT NULL,
    current_stock_ INT DEFAULT 0,
    reserved_stock_ INT DEFAULT 0,
    last_updated_ TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    version_ INT DEFAULT 1  -- 乐观锁版本号
);

-- ✅ 商品入库操作(支持新建和补货)
INSERT INTO product_inventory (product_id_, product_name_, current_stock_)
VALUES ('PROD-001', 'iPhone 15 Pro', 100)
ON DUPLICATE KEY UPDATE
    current_stock_ = current_stock_ + VALUES(current_stock_),  -- 累加库存
    version_ = version_ + 1,  -- 更新版本号
    last_updated_ = NOW();

-- 业务场景3:实时统计计数器
-- 业务需求:网站访问统计,按小时统计PV/UV,支持实时更新
CREATE TABLE hourly_statistics (
    stat_date_ DATE,
    stat_hour_ TINYINT,
    page_path_ VARCHAR(255),
    page_views_ BIGINT DEFAULT 0,
    unique_visitors_ BIGINT DEFAULT 0,
    PRIMARY KEY (stat_date_, stat_hour_, page_path_),
    INDEX idx_date_hour (stat_date_, stat_hour_)
);

-- ✅ 实时统计更新(高频操作,性能关键)
INSERT INTO hourly_statistics (stat_date_, stat_hour_, page_path_, page_views_, unique_visitors_)
VALUES (CURDATE(), HOUR(NOW()), '/product/detail', 1, 1)
ON DUPLICATE KEY UPDATE
    page_views_ = page_views_ + VALUES(page_views_),
    unique_visitors_ = unique_visitors_ + VALUES(unique_visitors_);

-- 业务场景4:批量数据同步(ETL场景)
-- 业务需求:从外部系统同步员工数据,支持新增和更新
-- 业务价值:一次操作处理混合的新增和更新数据,简化ETL逻辑
INSERT INTO t_employees (employee_id_, name_, email_, department_id_, salary_, hire_date_, status_)
VALUES
    (1001, 'John Doe', 'john.updated@company.com', 1, 52000, '2023-01-01', 'ACTIVE'),
    (1002, 'Jane Smith', 'jane.smith@company.com', 2, 55000, '2023-01-02', 'ACTIVE'),
    (2001, 'New Employee', 'new.employee@company.com', 3, 45000, '2023-06-01', 'ACTIVE')
ON DUPLICATE KEY UPDATE
    name_ = VALUES(name_),
    email_ = VALUES(email_),
    department_id_ = VALUES(department_id_),
    -- 业务规则:薪资只能向上调整,防止数据错误导致降薪
    salary_ = GREATEST(salary_, VALUES(salary_)),
    status_ = VALUES(status_),
    updated_at_ = NOW();

-- 业务场景5:条件性UPSERT(复杂业务规则)
-- 业务需求:员工信息更新时应用复杂的业务规则
INSERT INTO t_employees (employee_id_, name_, email_, department_id_, salary_, hire_date_, status_)
VALUES (1001, 'John Doe', 'john.doe@company.com', 1, 45000, '2023-01-01', 'ACTIVE')
ON DUPLICATE KEY UPDATE
    -- 规则1:薪资只在新值更高时更新
    salary_ = CASE
        WHEN VALUES(salary_) > salary_ THEN VALUES(salary_)
        ELSE salary_
    END,
    -- 规则2:邮箱只在原值为空时更新
    email_ = CASE
        WHEN email_ IS NULL OR email_ = '' THEN VALUES(email_)
        ELSE email_
    END,
    -- 规则3:部门变更需要记录变更时间
    department_id_ = VALUES(department_id_),
    dept_change_date_ = CASE
        WHEN department_id_ != VALUES(department_id_) THEN NOW()
        ELSE dept_change_date_
    END,
    -- 规则4:状态变更日志
    status_ = VALUES(status_),
    status_change_date_ = CASE
        WHEN status_ != VALUES(status_) THEN NOW()
        ELSE status_change_date_
    END;

-- MySQL 8.0 新语法:INSERT ... AS alias ON DUPLICATE KEY UPDATE
-- 优势:语法更清晰,避免重复的VALUES()调用
INSERT INTO t_employees (employee_id_, name_, email_, salary_, department_id_)
VALUES (1001, 'John Doe', 'john.doe@company.com', 50000, 1) AS new_data
ON DUPLICATE KEY UPDATE
    name_ = new_data.name_,
    email_ = new_data.email_,
    salary_ = GREATEST(salary_, new_data.salary_),  -- 应用业务规则
    department_id_ = new_data.department_id_,
    updated_at_ = NOW();
4.2.2 UPSERT性能优化和最佳实践
-- 性能优化1:批量UPSERT操作
-- 业务场景:批量处理用户行为数据,每分钟处理10万条记录
-- 性能提升:比逐条UPSERT快50-100倍

-- ✅ 批量UPSERT(推荐)
INSERT INTO user_behavior_stats (user_id_, action_date_, action_count_, last_action_time_)
VALUES
    (1001, '2024-01-01', 5, '2024-01-01 10:30:00'),
    (1002, '2024-01-01', 3, '2024-01-01 10:31:00'),
    (1003, '2024-01-01', 8, '2024-01-01 10:32:00'),
    -- ... 批量数据(建议每批1000-5000条)
    (2000, '2024-01-01', 2, '2024-01-01 10:45:00')
ON DUPLICATE KEY UPDATE
    action_count_ = action_count_ + VALUES(action_count_),
    last_action_time_ = GREATEST(last_action_time_, VALUES(last_action_time_));

-- ❌ 逐条UPSERT(性能差)
/*
INSERT INTO user_behavior_stats (user_id_, action_date_, action_count_) VALUES (1001, '2024-01-01', 5)
ON DUPLICATE KEY UPDATE action_count_ = action_count_ + VALUES(action_count_);
INSERT INTO user_behavior_stats (user_id_, action_date_, action_count_) VALUES (1002, '2024-01-01', 3)
ON DUPLICATE KEY UPDATE action_count_ = action_count_ + VALUES(action_count_);
-- ... 重复10万次,性能灾难
*/

-- 性能优化2:索引优化
-- 确保UPSERT操作涉及的列有适当的索引
CREATE TABLE user_preferences (
    user_id_ INT,
    preference_key_ VARCHAR(50),
    preference_value_ TEXT,
    updated_at_ TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id_, preference_key_),  -- 复合主键支持UPSERT
    INDEX idx_updated (updated_at_)  -- 支持按更新时间查询
);

-- 性能优化3:避免不必要的字段更新
-- ✅ 只更新真正变化的字段(减少写入开销)
INSERT INTO user_preferences (user_id_, preference_key_, preference_value_)
VALUES (1001, 'theme', 'dark_mode')
ON DUPLICATE KEY UPDATE
    preference_value_ = CASE
        WHEN preference_value_ != VALUES(preference_value_) THEN VALUES(preference_value_)
        ELSE preference_value_
    END,
    updated_at_ = CASE
        WHEN preference_value_ != VALUES(preference_value_) THEN NOW()
        ELSE updated_at_
    END;

-- 最佳实践1:UPSERT的事务处理
-- 业务场景:确保相关数据的一致性
START TRANSACTION;

-- 更新用户积分
INSERT INTO user_points (user_id_, points_, last_earned_date_)
VALUES (1001, 100, NOW())
ON DUPLICATE KEY UPDATE
    points_ = points_ + VALUES(points_),
    last_earned_date_ = NOW();

-- 记录积分变更日志
INSERT INTO point_change_log (user_id_, change_amount_, change_type_, change_date_)
VALUES (1001, 100, 'EARNED', NOW());

COMMIT;

-- 最佳实践2:UPSERT的错误处理和监控
-- 创建UPSERT操作监控表
CREATE TABLE upsert_monitoring (
    operation_id_ INT AUTO_INCREMENT PRIMARY KEY,
    table_name_ VARCHAR(64),
    operation_type_ ENUM('INSERT', 'UPDATE'),
    affected_rows_ INT,
    execution_time_ms_ INT,
    created_at_ TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- 监控UPSERT性能的存储过程示例
DELIMITER //
CREATE PROCEDURE MonitoredUpsert(
    IN p_user_id INT,
    IN p_username VARCHAR(50),
    IN p_last_login TIMESTAMP
)
BEGIN
    DECLARE v_start_time BIGINT;
    DECLARE v_affected_rows INT;
    DECLARE v_operation_type VARCHAR(10);

    SET v_start_time = UNIX_TIMESTAMP(NOW(3)) * 1000;

    -- 执行UPSERT操作
    INSERT INTO user_login_status (user_id_, username_, last_login_time_, login_count_)
    VALUES (p_user_id, p_username, p_last_login, 1)
    ON DUPLICATE KEY UPDATE
        last_login_time_ = p_last_login,
        login_count_ = login_count_ + 1;

    SET v_affected_rows = ROW_COUNT();
    SET v_operation_type = IF(v_affected_rows = 1, 'INSERT', 'UPDATE');

    -- 记录监控数据
    INSERT INTO upsert_monitoring (table_name_, operation_type_, affected_rows_, execution_time_ms_)
    VALUES ('user_login_status', v_operation_type, v_affected_rows,
            UNIX_TIMESTAMP(NOW(3)) * 1000 - v_start_time);
END //
DELIMITER ;

-- 使用监控存储过程
CALL MonitoredUpsert(1001, 'john_doe', NOW());

-- 查看UPSERT性能统计
SELECT
    table_name_,
    operation_type_,
    COUNT(*) as operation_count,
    AVG(execution_time_ms_) as avg_execution_time_ms,
    MAX(execution_time_ms_) as max_execution_time_ms,
    MIN(execution_time_ms_) as min_execution_time_ms
FROM upsert_monitoring
WHERE created_at_ >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
GROUP BY table_name_, operation_type_
ORDER BY avg_execution_time_ms_ DESC;

4.3 分区表数据操作

分区表是处理大数据量的重要技术,能够显著提升查询和维护性能。

4.3.1 分区表的创建和管理
-- MySQL 8.0 分区表
-- 按范围分区
CREATE TABLE sales_partitioned (
    sale_id_ INT NOT NULL,
    employee_id_ INT,
    product_id_ INT,
    sale_date_ DATE NOT NULL,
    amount_ DECIMAL(10,2),
    quantity_ INT,
    region_ VARCHAR(50),
    PRIMARY KEY (sale_id_, sale_date_)
) PARTITION BY RANGE (YEAR(sale_date_)) (
    PARTITION p2020 VALUES LESS THAN (2021),
    PARTITION p2021 VALUES LESS THAN (2022),
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- 按哈希分区
CREATE TABLE employees_hash_partitioned (
    employee_id_ INT NOT NULL,
    name_ VARCHAR(50),
    email_ VARCHAR(100),
    department_id_ INT,
    salary_ DECIMAL(10,2),
    hire_date_ DATE,
    status_ VARCHAR(20),
    PRIMARY KEY (employee_id_)
) PARTITION BY HASH(employee_id_) PARTITIONS 4;

4.3.1.1 REORGANIZE PARTITION处理MAXVALUE分区

当分区表已经创建了包含MAXVALUE的分区时,不能直接使用ADD PARTITION语法添加新分区。必须使用REORGANIZE PARTITION来重新组织现有分区。

-- ❌ 错误示例:当存在MAXVALUE分区时,不能直接ADD PARTITION
-- ALTER TABLE sales_partitioned ADD PARTITION (
--     PARTITION p2024 VALUES LESS THAN (2025)
-- );
-- 错误信息:ERROR 1481 (HY000): MAXVALUE can only be used in last partition definition

-- ✅ 正确方法:使用REORGANIZE PARTITION重新组织包含MAXVALUE的分区

-- 业务场景1:年度销售数据分区扩展
-- 当前分区结构:p2020, p2021, p2022, p2023, p_future(MAXVALUE)
-- 需求:为2024年和2025年添加新分区

-- 步骤1:查看当前分区状态
-- 业务价值:了解数据分布,评估重组影响范围
SELECT
    PARTITION_NAME as partition_name,
    PARTITION_DESCRIPTION as partition_range,
    TABLE_ROWS as row_count,
    ROUND(DATA_LENGTH/1024/1024, 2) as data_size_mb,
    ROUND(INDEX_LENGTH/1024/1024, 2) as index_size_mb,
    CREATE_TIME as created_time
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 'sales_partitioned'
  AND PARTITION_NAME IS NOT NULL
ORDER BY PARTITION_ORDINAL_POSITION;

-- 步骤2:重新组织分区 - 拆分MAXVALUE分区
-- 注意事项:
-- 1. 此操作会锁定表,建议在业务低峰期执行
-- 2. 数据量大时可能耗时较长,需要监控进度
-- 3. 确保有足够的磁盘空间用于临时数据存储
ALTER TABLE sales_partitioned
REORGANIZE PARTITION p_future INTO (
    -- 新增2024年分区
    PARTITION p2024 VALUES LESS THAN (2025),
    -- 新增2025年分区
    PARTITION p2025 VALUES LESS THAN (2026),
    -- 保留MAXVALUE分区用于未来数据
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- 步骤3:验证分区重组结果
-- 业务价值:确认分区创建成功,数据完整性保持
SELECT
    PARTITION_NAME as partition_name,
    PARTITION_DESCRIPTION as partition_range,
    TABLE_ROWS as row_count,
    ROUND(DATA_LENGTH/1024/1024, 2) as data_size_mb,
    CREATE_TIME as created_time,
    -- 业务解读:分区状态评估
    CASE
        WHEN TABLE_ROWS = 0 THEN '新分区-等待数据'
        WHEN PARTITION_NAME = 'p_future' THEN 'MAXVALUE分区-捕获未来数据'
        ELSE '历史分区-数据稳定'
    END as partition_status
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 'sales_partitioned'
  AND PARTITION_NAME IS NOT NULL
ORDER BY PARTITION_ORDINAL_POSITION;

-- 业务场景2:按月份分区的复杂重组
-- 创建按月份分区的表(用于演示)
CREATE TABLE monthly_sales (
    sale_id_ INT NOT NULL,
    sale_date_ DATE NOT NULL,
    amount_ DECIMAL(10,2),
    customer_id_ INT,
    region_ VARCHAR(50),
    PRIMARY KEY (sale_id_, sale_date_)
) PARTITION BY RANGE (YEAR(sale_date_) * 100 + MONTH(sale_date_)) (
    PARTITION p202301 VALUES LESS THAN (202302),  -- 2023年1月
    PARTITION p202302 VALUES LESS THAN (202303),  -- 2023年2月
    PARTITION p202303 VALUES LESS THAN (202304),  -- 2023年3月
    PARTITION p_future VALUES LESS THAN MAXVALUE  -- 未来数据
);

-- 为2023年4-6月添加新分区
-- 业务需求:随着业务发展,需要为新的月份创建分区
ALTER TABLE monthly_sales
REORGANIZE PARTITION p_future INTO (
    PARTITION p202304 VALUES LESS THAN (202305),  -- 2023年4月
    PARTITION p202305 VALUES LESS THAN (202306),  -- 2023年5月
    PARTITION p202306 VALUES LESS THAN (202307),  -- 2023年6月
    PARTITION p_future VALUES LESS THAN MAXVALUE  -- 保留MAXVALUE分区
);

-- 高级场景:重新组织多个分区
-- 业务场景:将多个月份分区合并为季度分区,简化管理
-- 注意:此操作会重新分布数据,需要充足的维护时间窗口
ALTER TABLE monthly_sales
REORGANIZE PARTITION p202301, p202302, p202303 INTO (
    PARTITION p2023q1 VALUES LESS THAN (202304)  -- 2023年第一季度
);

-- 业务场景3:处理数据倾斜的分区重组
-- 当某个分区数据量过大时,可以将其拆分为多个小分区
-- 假设p2023分区数据量过大,需要按季度拆分
ALTER TABLE sales_partitioned
REORGANIZE PARTITION p2023 INTO (
    PARTITION p2023q1 VALUES LESS THAN (2023.25),  -- 第一季度
    PARTITION p2023q2 VALUES LESS THAN (2023.5),   -- 第二季度
    PARTITION p2023q3 VALUES LESS THAN (2023.75),  -- 第三季度
    PARTITION p2023q4 VALUES LESS THAN (2024)      -- 第四季度
);
4.3.1.2 分区维护最佳实践
-- 最佳实践1:自动化分区管理
-- 创建存储过程自动添加新分区
DELIMITER //
CREATE PROCEDURE AddMonthlyPartition(
    IN table_name VARCHAR(64),
    IN target_year INT,
    IN target_month INT
)
BEGIN
    DECLARE partition_name VARCHAR(64);
    DECLARE next_value INT;
    DECLARE sql_stmt TEXT;

    -- 生成分区名称
    SET partition_name = CONCAT('p', target_year, LPAD(target_month, 2, '0'));

    -- 计算下个月的值
    IF target_month = 12 THEN
        SET next_value = (target_year + 1) * 100 + 1;
    ELSE
        SET next_value = target_year * 100 + target_month + 1;
    END IF;

    -- 构建REORGANIZE PARTITION语句
    SET sql_stmt = CONCAT(
        'ALTER TABLE ', table_name,
        ' REORGANIZE PARTITION p_future INTO (',
        'PARTITION ', partition_name, ' VALUES LESS THAN (', next_value, '),',
        'PARTITION p_future VALUES LESS THAN MAXVALUE)'
    );

    -- 执行分区重组
    SET @sql = sql_stmt;
    PREPARE stmt FROM @sql;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;

    -- 记录操作日志
    SELECT CONCAT('成功添加分区: ', partition_name, ' 到表 ', table_name) as result;
END //
DELIMITER ;

-- 使用存储过程添加分区
CALL AddMonthlyPartition('monthly_sales', 2024, 1);  -- 添加2024年1月分区

-- 最佳实践2:分区健康检查
-- 定期检查分区数据分布和性能
SELECT
    TABLE_NAME as table_name,
    PARTITION_NAME as partition_name,
    TABLE_ROWS as row_count,
    ROUND(DATA_LENGTH/1024/1024, 2) as data_mb,
    ROUND(INDEX_LENGTH/1024/1024, 2) as index_mb,
    ROUND((DATA_LENGTH + INDEX_LENGTH)/1024/1024, 2) as total_mb,
    -- 计算数据分布百分比
    ROUND(
        TABLE_ROWS * 100.0 / (
            SELECT SUM(TABLE_ROWS)
            FROM INFORMATION_SCHEMA.PARTITIONS p2
            WHERE p2.TABLE_SCHEMA = p.TABLE_SCHEMA
            AND p2.TABLE_NAME = p.TABLE_NAME
            AND p2.PARTITION_NAME IS NOT NULL
        ), 2
    ) as row_percentage,
    -- 业务解读:分区状态评估
    CASE
        WHEN TABLE_ROWS = 0 THEN '空分区-可考虑删除'
        WHEN TABLE_ROWS > (
            SELECT AVG(TABLE_ROWS) * 5
            FROM INFORMATION_SCHEMA.PARTITIONS p2
            WHERE p2.TABLE_SCHEMA = p.TABLE_SCHEMA
            AND p2.TABLE_NAME = p.TABLE_NAME
            AND p2.PARTITION_NAME IS NOT NULL
        ) THEN '数据倾斜-需要拆分'
        WHEN ROUND((DATA_LENGTH + INDEX_LENGTH)/1024/1024, 2) > 1000 THEN '大分区-监控性能'
        ELSE '正常状态'
    END as health_status
FROM INFORMATION_SCHEMA.PARTITIONS p
WHERE TABLE_SCHEMA = DATABASE()
  AND PARTITION_NAME IS NOT NULL
  AND TABLE_NAME IN ('sales_partitioned', 'monthly_sales')
ORDER BY TABLE_NAME, PARTITION_ORDINAL_POSITION;
4.3.1.3 REORGANIZE PARTITION重要注意事项
-- 注意事项1:性能影响和锁定时间
-- REORGANIZE PARTITION操作的性能特征:
-- 1. 表级锁定:整个操作期间表被锁定,影响并发访问
-- 2. 数据迁移:需要物理移动数据,耗时与数据量成正比
-- 3. 磁盘空间:需要额外空间存储临时数据,约为原数据的1.5-2倍

-- 监控REORGANIZE PARTITION进度
-- 在另一个会话中执行以下查询监控进度
SELECT
    ID,
    USER,
    HOST,
    DB,
    COMMAND,
    TIME as duration_seconds,
    STATE,
    SUBSTRING(INFO, 1, 100) as current_operation
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE INFO LIKE '%REORGANIZE PARTITION%'
   OR STATE LIKE '%partition%';

-- 注意事项2:事务和一致性
-- REORGANIZE PARTITION是原子操作,要么全部成功,要么全部回滚
-- 操作期间的数据一致性由MySQL自动保证

-- 注意事项3:外键约束影响
-- 如果表有外键约束,需要特别注意:
-- 1. 子表的外键约束可能影响分区操作
-- 2. 建议在操作前临时禁用外键检查(谨慎使用)

-- 临时禁用外键检查(仅在必要时使用)
SET foreign_key_checks = 0;
-- 执行分区操作
ALTER TABLE sales_partitioned
REORGANIZE PARTITION p_future INTO (
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- 重新启用外键检查
SET foreign_key_checks = 1;

-- 注意事项4:索引和统计信息
-- REORGANIZE PARTITION后,MySQL会自动:
-- 1. 重建受影响分区的索引
-- 2. 更新表统计信息
-- 3. 刷新查询缓存中相关的缓存项

-- 手动更新统计信息(可选,用于确保最新统计)
ANALYZE TABLE sales_partitioned;

-- 注意事项5:binlog和复制影响
-- REORGANIZE PARTITION操作会:
-- 1. 生成大量binlog记录
-- 2. 影响主从复制的延迟
-- 3. 在从库上同样执行相同的重组操作

-- 检查binlog大小增长
SHOW BINARY LOGS;

-- 监控主从复制延迟
SHOW SLAVE STATUS;
4.3.1.4 常见错误和解决方案
-- 错误1:MAXVALUE分区不是最后一个分区
-- 错误信息:ERROR 1481 (HY000): MAXVALUE can only be used in last partition definition
-- 原因:尝试在MAXVALUE分区后添加新分区
-- 解决方案:使用REORGANIZE PARTITION重新组织

-- 错误示例:
-- ALTER TABLE sales_partitioned ADD PARTITION (
--     PARTITION p2024 VALUES LESS THAN (2025)
-- );

-- 正确方法:
ALTER TABLE sales_partitioned
REORGANIZE PARTITION p_future INTO (
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- 错误2:分区值重叠或顺序错误
-- 错误信息:ERROR 1493 (HY000): VALUES LESS THAN value must be strictly increasing for each partition
-- 原因:新分区的VALUES LESS THAN值不正确

-- 错误示例:
-- ALTER TABLE sales_partitioned
-- REORGANIZE PARTITION p_future INTO (
--     PARTITION p2024 VALUES LESS THAN (2023),  -- 错误:值小于已存在的分区
--     PARTITION p_future VALUES LESS THAN MAXVALUE
-- );

-- 正确方法:确保分区值严格递增
ALTER TABLE sales_partitioned
REORGANIZE PARTITION p_future INTO (
    PARTITION p2024 VALUES LESS THAN (2025),  -- 正确:大于p2023的2024
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- 错误3:磁盘空间不足
-- 错误信息:ERROR 1114 (HY000): The table is full
-- 原因:临时空间不足以完成分区重组
-- 解决方案:
-- 1. 清理磁盘空间
-- 2. 调整tmpdir配置
-- 3. 分批处理大表

-- 检查磁盘空间使用
SELECT
    TABLE_SCHEMA,
    TABLE_NAME,
    ROUND((DATA_LENGTH + INDEX_LENGTH)/1024/1024/1024, 2) as size_gb,
    ROUND(DATA_FREE/1024/1024/1024, 2) as free_gb
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'sales_partitioned';

-- 错误4:表被锁定
-- 错误信息:ERROR 1205 (HY000): Lock wait timeout exceeded
-- 原因:其他会话持有表锁
-- 解决方案:
-- 1. 等待其他操作完成
-- 2. 终止阻塞的会话
-- 3. 在业务低峰期执行

-- 查找阻塞的会话
SELECT
    r.trx_id as blocking_trx_id,
    r.trx_mysql_thread_id as blocking_thread,
    r.trx_query as blocking_query,
    b.trx_id as blocked_trx_id,
    b.trx_mysql_thread_id as blocked_thread,
    b.trx_query as blocked_query
FROM INFORMATION_SCHEMA.INNODB_TRX r
JOIN performance_schema.data_lock_waits w ON r.trx_id = w.blocking_trx_id
JOIN INFORMATION_SCHEMA.INNODB_TRX b ON w.requesting_trx_id = b.trx_id;
4.3.1.5 分区自动化管理脚本
-- 自动化脚本1:定期添加未来分区
-- 创建事件调度器自动添加分区
SET GLOBAL event_scheduler = ON;

DELIMITER //
CREATE EVENT auto_add_monthly_partitions
ON SCHEDULE EVERY 1 MONTH
STARTS '2024-01-01 02:00:00'
DO
BEGIN
    DECLARE next_year INT;
    DECLARE next_month INT;
    DECLARE partition_name VARCHAR(64);
    DECLARE partition_value INT;

    -- 计算下个月
    SET next_year = YEAR(DATE_ADD(NOW(), INTERVAL 2 MONTH));
    SET next_month = MONTH(DATE_ADD(NOW(), INTERVAL 2 MONTH));
    SET partition_name = CONCAT('p', next_year, LPAD(next_month, 2, '0'));
    SET partition_value = next_year * 100 + next_month + 1;

    -- 检查分区是否已存在
    IF NOT EXISTS (
        SELECT 1 FROM INFORMATION_SCHEMA.PARTITIONS
        WHERE TABLE_SCHEMA = DATABASE()
        AND TABLE_NAME = 'monthly_sales'
        AND PARTITION_NAME = partition_name
    ) THEN
        -- 添加新分区
        SET @sql = CONCAT(
            'ALTER TABLE monthly_sales REORGANIZE PARTITION p_future INTO (',
            'PARTITION ', partition_name, ' VALUES LESS THAN (', partition_value, '),',
            'PARTITION p_future VALUES LESS THAN MAXVALUE)'
        );
        PREPARE stmt FROM @sql;
        EXECUTE stmt;
        DEALLOCATE PREPARE stmt;

        -- 记录日志
        INSERT INTO partition_maintenance_log (
            table_name, operation, partition_name, created_at
        ) VALUES (
            'monthly_sales', 'ADD_PARTITION', partition_name, NOW()
        );
    END IF;
END //
DELIMITER ;

-- 创建分区维护日志表
CREATE TABLE IF NOT EXISTS partition_maintenance_log (
    id INT AUTO_INCREMENT PRIMARY KEY,
    table_name VARCHAR(64),
    operation VARCHAR(32),
    partition_name VARCHAR(64),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_table_created (table_name, created_at)
);

-- 自动化脚本2:清理历史分区
DELIMITER //
CREATE PROCEDURE CleanupOldPartitions(
    IN table_name VARCHAR(64),
    IN retention_months INT
)
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE partition_name VARCHAR(64);
    DECLARE partition_desc VARCHAR(255);
    DECLARE cutoff_date DATE;

    -- 游标定义
    DECLARE partition_cursor CURSOR FOR
        SELECT PARTITION_NAME, PARTITION_DESCRIPTION
        FROM INFORMATION_SCHEMA.PARTITIONS
        WHERE TABLE_SCHEMA = DATABASE()
        AND TABLE_NAME = table_name
        AND PARTITION_NAME IS NOT NULL
        AND PARTITION_NAME != 'p_future'
        ORDER BY PARTITION_ORDINAL_POSITION;

    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

    -- 计算保留截止日期
    SET cutoff_date = DATE_SUB(CURDATE(), INTERVAL retention_months MONTH);

    OPEN partition_cursor;

    read_loop: LOOP
        FETCH partition_cursor INTO partition_name, partition_desc;
        IF done THEN
            LEAVE read_loop;
        END IF;

        -- 检查分区是否超过保留期
        -- 这里需要根据实际的分区命名规则调整逻辑
        IF partition_name < CONCAT('p', YEAR(cutoff_date), LPAD(MONTH(cutoff_date), 2, '0')) THEN
            -- 备份分区数据
            SET @backup_sql = CONCAT(
                'CREATE TABLE ', partition_name, '_backup AS ',
                'SELECT * FROM ', table_name, ' PARTITION (', partition_name, ')'
            );
            PREPARE stmt FROM @backup_sql;
            EXECUTE stmt;
            DEALLOCATE PREPARE stmt;

            -- 删除分区
            SET @drop_sql = CONCAT('ALTER TABLE ', table_name, ' DROP PARTITION ', partition_name);
            PREPARE stmt FROM @drop_sql;
            EXECUTE stmt;
            DEALLOCATE PREPARE stmt;

            -- 记录日志
            INSERT INTO partition_maintenance_log (
                table_name, operation, partition_name, created_at
            ) VALUES (
                table_name, 'DROP_PARTITION', partition_name, NOW()
            );
        END IF;
    END LOOP;

    CLOSE partition_cursor;
END //
DELIMITER ;

-- 使用清理存储过程
CALL CleanupOldPartitions('monthly_sales', 24);  -- 保留24个月的数据
4.3.2 分区剪枝优化

分区剪枝是分区表查询优化的关键技术,能够显著减少扫描的数据量。

-- 分区剪枝示例查询

-- ❌ 错误语法:EXPLAIN PARTITIONS 在MySQL 8.0+中已废弃
-- EXPLAIN PARTITIONS SELECT * FROM sales_partitioned WHERE ...;
-- 错误信息:You have an error in your SQL syntax

-- ✅ 正确方法1:使用标准EXPLAIN查看分区信息
-- MySQL会在partitions列显示访问的分区
EXPLAIN
SELECT * FROM sales_partitioned
WHERE sale_date_ BETWEEN STR_TO_DATE('2023-01-01', '%Y-%m-%d')
  AND STR_TO_DATE('2023-12-31', '%Y-%m-%d');

-- ✅ 正确方法2:使用EXPLAIN FORMAT=JSON获取详细分区信息
-- 场景:分析MySQL分区表的执行计划,验证分区剪枝效果
-- 业务价值:确认查询是否正确利用了分区特性,避免全表扫描
-- 输出:JSON格式的详细执行计划,包含分区访问信息
EXPLAIN FORMAT=JSON
SELECT * FROM sales_partitioned
WHERE sale_date_ BETWEEN STR_TO_DATE('2023-01-01', '%Y-%m-%d')
  AND STR_TO_DATE('2023-12-31', '%Y-%m-%d');

-- ✅ 正确方法3:使用EXPLAIN ANALYZE查看实际执行统计(MySQL 8.0+)
-- 提供实际的分区访问统计信息
EXPLAIN ANALYZE
SELECT * FROM sales_partitioned
WHERE sale_date_ BETWEEN STR_TO_DATE('2023-01-01', '%Y-%m-%d')
  AND STR_TO_DATE('2023-12-31', '%Y-%m-%d');

-- 分区剪枝验证查询
-- 查看哪些分区被访问
SELECT
    PARTITION_NAME,
    PARTITION_DESCRIPTION,
    TABLE_ROWS,
    ROUND(DATA_LENGTH/1024/1024, 2) as data_size_mb
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 'sales_partitioned'
  AND PARTITION_NAME IS NOT NULL
ORDER BY PARTITION_ORDINAL_POSITION;

-- 复杂的分区剪枝查询分析
-- 多条件分区剪枝验证
EXPLAIN FORMAT=JSON
SELECT
    s.sale_date_,
    s.amount_,
    e.name_
FROM sales_partitioned s
JOIN t_employees e ON s.employee_id_ = e.employee_id_
WHERE s.sale_date_ >= STR_TO_DATE('2023-06-01', '%Y-%m-%d')
  AND s.sale_date_ < STR_TO_DATE('2023-07-01', '%Y-%m-%d')
  AND s.amount_ > 1000;

-- 分区剪枝效果对比分析
-- 业务场景:对比有无分区条件的查询性能差异

-- 查询1:利用分区剪枝(只扫描特定分区)
EXPLAIN
SELECT COUNT(*), AVG(amount_)
FROM sales_partitioned
WHERE sale_date_ BETWEEN STR_TO_DATE('2023-06-01', '%Y-%m-%d')
  AND STR_TO_DATE('2023-06-30', '%Y-%m-%d');

-- 查询2:无法利用分区剪枝(扫描所有分区)
EXPLAIN
SELECT COUNT(*), AVG(amount_)
FROM sales_partitioned
WHERE amount_ > 5000;  -- 非分区键条件

-- 分区统计信息和健康检查
-- MySQL分区详细信息
SELECT
    TABLE_NAME as table_name,
    PARTITION_NAME as partition_name,
    PARTITION_DESCRIPTION as partition_range,
    TABLE_ROWS as estimated_rows,
    ROUND(AVG_ROW_LENGTH, 2) as avg_row_length_bytes,
    ROUND(DATA_LENGTH/1024/1024, 2) as data_size_mb,
    ROUND(INDEX_LENGTH/1024/1024, 2) as index_size_mb,
    ROUND((DATA_LENGTH + INDEX_LENGTH)/1024/1024, 2) as total_size_mb,
    CREATE_TIME as partition_created,
    UPDATE_TIME as last_updated,
    -- 业务解读:分区状态评估
    CASE
        WHEN TABLE_ROWS = 0 THEN '空分区-无数据'
        WHEN ROUND(DATA_LENGTH/1024/1024, 2) > 1000 THEN '大分区-需监控'
        WHEN UPDATE_TIME < DATE_SUB(NOW(), INTERVAL 30 DAY) THEN '冷数据-可归档'
        ELSE '正常状态'
    END as partition_status
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 'sales_partitioned'
  AND PARTITION_NAME IS NOT NULL
ORDER BY PARTITION_ORDINAL_POSITION;

-- 分区剪枝效果验证方法
-- 方法1:通过EXPLAIN的partitions列查看访问的分区
-- 方法2:通过EXPLAIN FORMAT=JSON的"partitions"字段查看详细信息
-- 方法3:通过performance_schema监控实际的表访问统计

-- 监控分区表的访问模式
SELECT
    OBJECT_SCHEMA as database_name,
    OBJECT_NAME as table_name,
    INDEX_NAME as index_or_partition,
    COUNT_READ as read_operations,
    COUNT_WRITE as write_operations,
    COUNT_FETCH as fetch_operations,
    ROUND(SUM_TIMER_WAIT/1000000000, 3) as total_wait_seconds
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA = DATABASE()
  AND OBJECT_NAME = 'sales_partitioned'
ORDER BY (COUNT_READ + COUNT_WRITE) DESC;

4.3.2.1 分区表EXPLAIN分析常见错误和解决方案

MySQL分区表的执行计划分析有一些特殊的语法要求和常见陷阱,需要特别注意。

-- ❌ 常见错误1:使用已废弃的EXPLAIN PARTITIONS语法
-- 错误示例:
-- EXPLAIN PARTITIONS SELECT * FROM sales_partitioned WHERE sale_date_ = '2023-01-01';
-- 错误信息:You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version

-- ✅ 正确方法:使用标准EXPLAIN语法
-- MySQL会自动在partitions列显示访问的分区
EXPLAIN
SELECT * FROM sales_partitioned
WHERE sale_date_ = '2023-01-01';

-- ❌ 常见错误2:日期字符串格式问题
-- 错误示例:
-- EXPLAIN SELECT * FROM sales_partitioned
-- WHERE sale_date_ BETWEEN str_to_date('2023-01-01', '%Y-%m-%d') AND '2023-12-31';
-- 问题:函数名大小写不一致,可能导致语法错误

-- ✅ 正确方法:统一使用正确的函数名和格式
EXPLAIN
SELECT * FROM sales_partitioned
WHERE sale_date_ BETWEEN STR_TO_DATE('2023-01-01', '%Y-%m-%d')
  AND STR_TO_DATE('2023-12-31', '%Y-%m-%d');

-- 或者使用更简单的日期字面量(推荐)
EXPLAIN
SELECT * FROM sales_partitioned
WHERE sale_date_ BETWEEN '2023-01-01' AND '2023-12-31';

-- ❌ 常见错误3:分区键类型不匹配
-- 错误示例:假设分区键是INT类型的年份
-- EXPLAIN SELECT * FROM sales_partitioned WHERE year_column = '2023';  -- 字符串比较
-- 问题:类型不匹配可能导致分区剪枝失效

-- ✅ 正确方法:确保数据类型匹配
EXPLAIN
SELECT * FROM sales_partitioned
WHERE year_column = 2023;  -- 数值比较

-- 分区剪枝效果验证的完整流程
-- 步骤1:查看表的分区结构
SELECT
    PARTITION_NAME,
    PARTITION_EXPRESSION,
    PARTITION_DESCRIPTION,
    TABLE_ROWS
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 'sales_partitioned'
  AND PARTITION_NAME IS NOT NULL
ORDER BY PARTITION_ORDINAL_POSITION;

-- 步骤2:分析查询的执行计划
EXPLAIN FORMAT=JSON
SELECT * FROM sales_partitioned
WHERE sale_date_ BETWEEN '2023-06-01' AND '2023-06-30';

-- 步骤3:验证分区剪枝效果
-- 在JSON输出中查找"partitions"字段,确认只访问了相关分区

-- 步骤4:性能对比测试
-- 测试1:利用分区剪枝的查询
SELECT SQL_NO_CACHE COUNT(*) FROM sales_partitioned
WHERE sale_date_ BETWEEN '2023-06-01' AND '2023-06-30';

-- 测试2:无法利用分区剪枝的查询
SELECT SQL_NO_CACHE COUNT(*) FROM sales_partitioned
WHERE amount_ > 1000;  -- 非分区键条件

-- 分区剪枝失效的常见原因和解决方案
-- 原因1:使用函数包装分区键
-- ❌ 错误:YEAR(sale_date_) = 2023  -- 函数包装导致剪枝失效
-- ✅ 正确:sale_date_ BETWEEN '2023-01-01' AND '2023-12-31'

-- 原因2:使用OR条件连接非连续分区
-- ❌ 可能低效:sale_date_ = '2023-01-01' OR sale_date_ = '2023-12-01'
-- ✅ 更好:使用UNION或IN操作

-- 原因3:复杂的WHERE条件
-- ❌ 可能低效:WHERE (sale_date_ > '2023-01-01' AND amount_ > 1000) OR (sale_date_ < '2022-12-31')
-- ✅ 优化:简化条件逻辑,优先使用分区键条件

-- 分区表性能监控查询
-- 监控各分区的访问频率
SELECT
    OBJECT_NAME as table_name,
    INDEX_NAME as partition_or_index,
    COUNT_READ as read_count,
    COUNT_WRITE as write_count,
    ROUND(SUM_TIMER_READ/1000000000, 3) as read_time_seconds,
    ROUND(SUM_TIMER_WRITE/1000000000, 3) as write_time_seconds,
    -- 业务解读:访问模式分析
    CASE
        WHEN COUNT_READ = 0 AND COUNT_WRITE = 0 THEN '未访问分区'
        WHEN COUNT_READ > COUNT_WRITE * 10 THEN '读密集分区'
        WHEN COUNT_WRITE > COUNT_READ THEN '写密集分区'
        ELSE '读写均衡'
    END as access_pattern
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA = DATABASE()
  AND OBJECT_NAME = 'sales_partitioned'
  AND INDEX_NAME IS NOT NULL
ORDER BY (COUNT_READ + COUNT_WRITE) DESC;

-- MySQL版本兼容性说明
-- MySQL 5.7及以下:支持EXPLAIN PARTITIONS语法
-- MySQL 8.0及以上:EXPLAIN PARTITIONS已废弃,使用标准EXPLAIN
-- 推荐:统一使用EXPLAIN FORMAT=JSON获取最详细的分区信息
4.3.3 跨分区查询性能

跨分区查询是分区表性能优化的重要考虑因素。不当的跨分区操作可能导致严重的性能问题。

4.3.3.1 跨分区查询的性能特征
-- 跨分区查询的性能特征分析

-- 1. 分区扫描成本分析
-- 查看分区表的分区分布
SELECT
    PARTITION_NAME,
    PARTITION_DESCRIPTION,
    TABLE_ROWS,
    ROUND(DATA_LENGTH/1024/1024, 2) as data_mb,
    ROUND(INDEX_LENGTH/1024/1024, 2) as index_mb,
    -- 业务解读:分区访问成本评估
    CASE
        WHEN TABLE_ROWS = 0 THEN '空分区-无扫描成本'
        WHEN TABLE_ROWS < 10000 THEN '小分区-低扫描成本'
        WHEN TABLE_ROWS < 100000 THEN '中分区-中等扫描成本'
        ELSE '大分区-高扫描成本'
    END as scan_cost_level
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 'sales_partitioned'
  AND PARTITION_NAME IS NOT NULL
ORDER BY PARTITION_ORDINAL_POSITION;

-- 2. 跨分区查询类型和性能影响

-- 类型1:单分区查询(最优)
-- 业务场景:查询特定日期的销售数据
-- 性能特征:只访问一个分区,性能最佳
EXPLAIN FORMAT=JSON
SELECT * FROM sales_partitioned
WHERE sale_date_ = '2023-06-15';  -- 只访问p2023分区

-- 类型2:多分区范围查询(良好)
-- 业务场景:查询一个季度的销售数据
-- 性能特征:访问连续的几个分区,性能较好
EXPLAIN FORMAT=JSON
SELECT * FROM sales_partitioned
WHERE sale_date_ BETWEEN '2023-04-01' AND '2023-06-30';  -- 访问Q2的3个分区

-- 类型3:跨分区JOIN查询(需要优化)
-- 业务场景:比较不同时期的销售数据
-- 性能特征:需要访问多个分区并进行JOIN,性能较差
EXPLAIN FORMAT=JSON
SELECT
    s1.sale_id_,
    s1.amount_ as current_amount,
    s2.amount_ as prev_amount
FROM sales_partitioned s1
JOIN sales_partitioned s2 ON s1.employee_id_ = s2.employee_id_
WHERE s1.sale_date_ = '2023-06-15'  -- 访问p2023分区
  AND s2.sale_date_ = '2023-05-15'; -- 访问p2023分区

-- 类型4:全分区扫描查询(最差)
-- 业务场景:基于非分区键的查询
-- 性能特征:需要扫描所有分区,性能最差
EXPLAIN FORMAT=JSON
SELECT * FROM sales_partitioned
WHERE amount_ > 10000;  -- 非分区键条件,扫描所有分区
4.3.3.2 跨分区JOIN操作优化策略
-- 跨分区JOIN优化策略

-- ❌ 低效方法:直接跨分区JOIN
-- 问题:需要在多个分区间进行数据交换,I/O开销大
SELECT
    s1.sale_id_,
    s1.amount_ as june_amount,
    s2.amount_ as may_amount,
    (s1.amount_ - s2.amount_) as amount_diff
FROM sales_partitioned s1
JOIN sales_partitioned s2 ON s1.employee_id_ = s2.employee_id_
WHERE s1.sale_date_ = '2023-06-15'
  AND s2.sale_date_ = '2023-05-15';

-- ✅ 优化方法1:使用窗口函数避免跨分区JOIN
-- 优势:在单次扫描中完成计算,减少分区间数据交换
WITH sales_with_prev AS (
    SELECT
        sale_id_,
        employee_id_,
        sale_date_,
        amount_,
        -- 使用窗口函数获取上一次销售金额
        LAG(amount_, 1) OVER (
            PARTITION BY employee_id_
            ORDER BY sale_date_
        ) as prev_amount,
        -- 计算与上次销售的时间差
        DATEDIFF(
            sale_date_,
            LAG(sale_date_, 1) OVER (
                PARTITION BY employee_id_
                ORDER BY sale_date_
            )
        ) as days_since_prev_sale
    FROM sales_partitioned
    WHERE sale_date_ BETWEEN '2023-05-01' AND '2023-06-30'  -- 限制扫描范围
)
SELECT
    sale_id_,
    amount_ as june_amount,
    prev_amount as may_amount,
    (amount_ - prev_amount) as amount_diff,
    days_since_prev_sale
FROM sales_with_prev
WHERE sale_date_ = '2023-06-15'
  AND prev_amount IS NOT NULL;

-- ✅ 优化方法2:分步查询策略
-- 适用场景:复杂的跨分区分析,需要多步骤处理

-- 步骤1:提取6月数据
CREATE TEMPORARY TABLE temp_june_sales AS
SELECT employee_id_, sale_id_, amount_, sale_date_
FROM sales_partitioned
WHERE sale_date_ BETWEEN '2023-06-01' AND '2023-06-30';

-- 步骤2:提取5月数据
CREATE TEMPORARY TABLE temp_may_sales AS
SELECT employee_id_, sale_id_, amount_, sale_date_
FROM sales_partitioned
WHERE sale_date_ BETWEEN '2023-05-01' AND '2023-05-31';

-- 步骤3:在临时表上进行JOIN(内存操作,速度快)
SELECT
    j.employee_id_,
    j.sale_id_ as june_sale_id,
    j.amount_ as june_amount,
    m.amount_ as may_avg_amount,
    (j.amount_ - m.amount_) as amount_diff
FROM temp_june_sales j
JOIN (
    -- 计算5月平均销售额
    SELECT employee_id_, AVG(amount_) as amount_
    FROM temp_may_sales
    GROUP BY employee_id_
) m ON j.employee_id_ = m.employee_id_
WHERE j.sale_date_ = '2023-06-15';

-- 清理临时表
DROP TEMPORARY TABLE temp_june_sales;
DROP TEMPORARY TABLE temp_may_sales;

-- ✅ 优化方法3:使用分区键优化JOIN条件
-- 当JOIN条件包含分区键时,可以显著提升性能
SELECT
    s1.sale_id_,
    s1.amount_,
    s2.amount_ as same_day_other_sale
FROM sales_partitioned s1
JOIN sales_partitioned s2 ON s1.sale_date_ = s2.sale_date_  -- 分区键JOIN
                           AND s1.region_ = s2.region_
                           AND s1.sale_id_ != s2.sale_id_
WHERE s1.sale_date_ = '2023-06-15'  -- 利用分区剪枝
  AND s1.amount_ > 5000;
4.3.3.3 分区并行查询配置和优化
-- 分区并行查询配置

-- 1. 查看当前并行查询配置
SHOW VARIABLES LIKE '%parallel%';
SHOW VARIABLES LIKE '%thread%';

-- 2. MySQL并行查询相关参数
-- 注意:MySQL的并行查询支持有限,主要依赖存储引擎层面的优化

-- 查看InnoDB并行读取配置
SHOW VARIABLES LIKE 'innodb_parallel_read_threads';

-- 3. 分区表并行扫描示例
-- MySQL会自动对分区表进行并行扫描优化

-- 大数据量聚合查询(利用分区并行)
-- 业务场景:计算全年销售统计
SELECT
    YEAR(sale_date_) as sale_year,
    MONTH(sale_date_) as sale_month,
    COUNT(*) as total_sales,
    SUM(amount_) as total_amount,
    AVG(amount_) as avg_amount,
    MIN(amount_) as min_amount,
    MAX(amount_) as max_amount
FROM sales_partitioned
WHERE sale_date_ BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY YEAR(sale_date_), MONTH(sale_date_)
ORDER BY sale_year, sale_month;

-- 4. 分区并行查询性能监控
-- 监控分区表的并行执行情况
SELECT
    EVENT_NAME,
    COUNT_STAR as execution_count,
    ROUND(SUM_TIMER_WAIT/1000000000, 3) as total_time_seconds,
    ROUND(AVG_TIMER_WAIT/1000000000, 3) as avg_time_seconds,
    ROUND(MAX_TIMER_WAIT/1000000000, 3) as max_time_seconds
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE EVENT_NAME LIKE '%partition%'
   OR EVENT_NAME LIKE '%parallel%'
ORDER BY total_time_seconds DESC;

-- 5. 分区表I/O性能监控
-- 监控各分区的I/O性能
SELECT
    OBJECT_NAME as table_name,
    INDEX_NAME as partition_name,
    COUNT_READ,
    COUNT_WRITE,
    ROUND(SUM_TIMER_READ/1000000000, 3) as read_time_seconds,
    ROUND(SUM_TIMER_WRITE/1000000000, 3) as write_time_seconds,
    ROUND(SUM_TIMER_READ/COUNT_READ/1000000, 3) as avg_read_time_ms,
    -- 业务解读:I/O性能评估
    CASE
        WHEN ROUND(SUM_TIMER_READ/COUNT_READ/1000000, 3) > 10 THEN '读取较慢-需优化'
        WHEN COUNT_READ > 1000000 THEN '高频访问-热点分区'
        WHEN COUNT_READ = 0 THEN '未访问分区'
        ELSE '正常性能'
    END as io_performance_status
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA = DATABASE()
  AND OBJECT_NAME = 'sales_partitioned'
  AND COUNT_READ > 0
ORDER BY read_time_seconds DESC;
4.3.3.4 避免跨分区操作的最佳实践
-- 避免跨分区操作的最佳实践

-- 最佳实践1:合理的分区策略设计
-- 原则:让大部分查询都能利用分区剪枝

-- ❌ 不良分区设计:按随机字段分区
-- CREATE TABLE sales_bad_partition (
--     sale_id_ INT,
--     sale_date_ DATE,
--     amount_ DECIMAL(10,2)
-- ) PARTITION BY HASH(sale_id_) PARTITIONS 4;  -- 大部分查询都会跨分区

-- ✅ 良好分区设计:按业务查询模式分区
CREATE TABLE sales_good_partition (
    sale_id_ INT NOT NULL,
    sale_date_ DATE NOT NULL,
    amount_ DECIMAL(10,2),
    region_ VARCHAR(50),
    PRIMARY KEY (sale_id_, sale_date_)
) PARTITION BY RANGE (YEAR(sale_date_)) (  -- 按时间分区,符合查询模式
    PARTITION p2021 VALUES LESS THAN (2022),
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- 最佳实践2:查询条件优化
-- 原则:尽可能在WHERE条件中包含分区键

-- ❌ 低效查询:不包含分区键
SELECT COUNT(*), AVG(amount_)
FROM sales_partitioned
WHERE region_ = 'North';  -- 需要扫描所有分区

-- ✅ 高效查询:包含分区键
SELECT COUNT(*), AVG(amount_)
FROM sales_partitioned
WHERE sale_date_ >= '2023-01-01'  -- 分区键条件
  AND sale_date_ < '2024-01-01'
  AND region_ = 'North';

-- 最佳实践3:避免跨分区的复杂JOIN
-- 使用应用层逻辑或ETL过程预处理数据

-- ❌ 复杂跨分区JOIN
SELECT
    s1.employee_id_,
    s1.amount_ as q1_total,
    s2.amount_ as q2_total,
    s3.amount_ as q3_total,
    s4.amount_ as q4_total
FROM (
    SELECT employee_id_, SUM(amount_) as amount_
    FROM sales_partitioned
    WHERE sale_date_ BETWEEN '2023-01-01' AND '2023-03-31'
    GROUP BY employee_id_
) s1
JOIN (
    SELECT employee_id_, SUM(amount_) as amount_
    FROM sales_partitioned
    WHERE sale_date_ BETWEEN '2023-04-01' AND '2023-06-30'
    GROUP BY employee_id_
) s2 ON s1.employee_id_ = s2.employee_id_
JOIN (
    SELECT employee_id_, SUM(amount_) as amount_
    FROM sales_partitioned
    WHERE sale_date_ BETWEEN '2023-07-01' AND '2023-09-30'
    GROUP BY employee_id_
) s3 ON s1.employee_id_ = s3.employee_id_
JOIN (
    SELECT employee_id_, SUM(amount_) as amount_
    FROM sales_partitioned
    WHERE sale_date_ BETWEEN '2023-10-01' AND '2023-12-31'
    GROUP BY employee_id_
) s4 ON s1.employee_id_ = s4.employee_id_;

-- ✅ 优化方法:使用聚合和条件表达式
SELECT
    employee_id_,
    SUM(CASE WHEN sale_date_ BETWEEN '2023-01-01' AND '2023-03-31' THEN amount_ ELSE 0 END) as q1_total,
    SUM(CASE WHEN sale_date_ BETWEEN '2023-04-01' AND '2023-06-30' THEN amount_ ELSE 0 END) as q2_total,
    SUM(CASE WHEN sale_date_ BETWEEN '2023-07-01' AND '2023-09-30' THEN amount_ ELSE 0 END) as q3_total,
    SUM(CASE WHEN sale_date_ BETWEEN '2023-10-01' AND '2023-12-31' THEN amount_ ELSE 0 END) as q4_total
FROM sales_partitioned
WHERE sale_date_ BETWEEN '2023-01-01' AND '2023-12-31'  -- 一次扫描完成
GROUP BY employee_id_;

-- 最佳实践4:使用汇总表减少跨分区查询
-- 创建按月汇总的表,减少对原始分区表的跨分区访问
CREATE TABLE sales_monthly_summary (
    summary_year INT,
    summary_month INT,
    employee_id_ INT,
    region_ VARCHAR(50),
    total_sales_count INT,
    total_amount DECIMAL(15,2),
    avg_amount DECIMAL(10,2),
    PRIMARY KEY (summary_year, summary_month, employee_id_),
    INDEX idx_region (region_)
);

-- 定期更新汇总表(可以通过定时任务执行)
INSERT INTO sales_monthly_summary
SELECT
    YEAR(sale_date_) as summary_year,
    MONTH(sale_date_) as summary_month,
    employee_id_,
    region_,
    COUNT(*) as total_sales_count,
    SUM(amount_) as total_amount,
    AVG(amount_) as avg_amount
FROM sales_partitioned
WHERE sale_date_ >= DATE_SUB(CURDATE(), INTERVAL 1 MONTH)
  AND sale_date_ < CURDATE()
GROUP BY YEAR(sale_date_), MONTH(sale_date_), employee_id_, region_
ON DUPLICATE KEY UPDATE
    total_sales_count = VALUES(total_sales_count),
    total_amount = VALUES(total_amount),
    avg_amount = VALUES(avg_amount);

-- 使用汇总表进行快速查询
SELECT
    employee_id_,
    SUM(total_amount) as yearly_total,
    AVG(avg_amount) as yearly_avg
FROM sales_monthly_summary
WHERE summary_year = 2023
GROUP BY employee_id_
ORDER BY yearly_total DESC;
4.3.3.5 跨分区查询性能对比和测试
-- 跨分区查询性能对比测试

-- 测试环境准备
-- 创建测试数据(假设已有大量数据)

-- 性能测试1:单分区 vs 跨分区查询
-- 测试场景:统计特定时期的销售数据

-- 测试1.1:单分区查询(最优性能)
SET @start_time = NOW(6);
SELECT COUNT(*), SUM(amount_), AVG(amount_)
FROM sales_partitioned
WHERE sale_date_ BETWEEN '2023-06-01' AND '2023-06-30';  -- 只访问一个分区
SET @end_time = NOW(6);
SELECT TIMESTAMPDIFF(MICROSECOND, @start_time, @end_time) as single_partition_microseconds;

-- 测试1.2:跨分区查询(性能较差)
SET @start_time = NOW(6);
SELECT COUNT(*), SUM(amount_), AVG(amount_)
FROM sales_partitioned
WHERE sale_date_ BETWEEN '2023-05-15' AND '2023-07-15';  -- 跨越3个分区
SET @end_time = NOW(6);
SELECT TIMESTAMPDIFF(MICROSECOND, @start_time, @end_time) as cross_partition_microseconds;

-- 性能测试2:JOIN操作对比
-- 测试场景:员工销售数据关联分析

-- 测试2.1:跨分区JOIN(低效)
SET @start_time = NOW(6);
SELECT
    s1.employee_id_,
    COUNT(s1.sale_id_) as june_sales,
    COUNT(s2.sale_id_) as may_sales
FROM sales_partitioned s1
LEFT JOIN sales_partitioned s2 ON s1.employee_id_ = s2.employee_id_
WHERE s1.sale_date_ BETWEEN '2023-06-01' AND '2023-06-30'
  AND s2.sale_date_ BETWEEN '2023-05-01' AND '2023-05-31'
GROUP BY s1.employee_id_;
SET @end_time = NOW(6);
SELECT TIMESTAMPDIFF(MICROSECOND, @start_time, @end_time) as cross_partition_join_microseconds;

-- 测试2.2:窗口函数优化(高效)
SET @start_time = NOW(6);
WITH monthly_sales AS (
    SELECT
        employee_id_,
        YEAR(sale_date_) as sale_year,
        MONTH(sale_date_) as sale_month,
        COUNT(*) as monthly_count
    FROM sales_partitioned
    WHERE sale_date_ BETWEEN '2023-05-01' AND '2023-06-30'
    GROUP BY employee_id_, YEAR(sale_date_), MONTH(sale_date_)
)
SELECT
    employee_id_,
    SUM(CASE WHEN sale_month = 6 THEN monthly_count ELSE 0 END) as june_sales,
    SUM(CASE WHEN sale_month = 5 THEN monthly_count ELSE 0 END) as may_sales
FROM monthly_sales
GROUP BY employee_id_;
SET @end_time = NOW(6);
SELECT TIMESTAMPDIFF(MICROSECOND, @start_time, @end_time) as window_function_microseconds;

-- 性能测试结果分析查询
-- 创建性能测试结果表
CREATE TEMPORARY TABLE performance_test_results (
    test_name VARCHAR(100),
    execution_time_microseconds BIGINT,
    relative_performance DECIMAL(5,2)
);

-- 插入测试结果(实际使用时需要替换为真实的测试结果)
INSERT INTO performance_test_results VALUES
('单分区查询', 1000, 1.00),
('跨分区查询', 5000, 5.00),
('跨分区JOIN', 15000, 15.00),
('窗口函数优化', 3000, 3.00);

-- 性能对比分析
SELECT
    test_name,
    execution_time_microseconds,
    ROUND(execution_time_microseconds / 1000, 2) as execution_time_ms,
    relative_performance,
    -- 业务解读:性能等级评估
    CASE
        WHEN relative_performance <= 1.5 THEN '优秀性能'
        WHEN relative_performance <= 3.0 THEN '良好性能'
        WHEN relative_performance <= 5.0 THEN '可接受性能'
        WHEN relative_performance <= 10.0 THEN '需要优化'
        ELSE '严重性能问题'
    END as performance_level,
    -- 优化建议
    CASE
        WHEN test_name LIKE '%跨分区JOIN%' THEN '建议使用窗口函数或汇总表'
        WHEN test_name LIKE '%跨分区查询%' THEN '建议优化分区策略或查询条件'
        WHEN relative_performance > 5.0 THEN '建议重新设计查询逻辑'
        ELSE '性能表现良好'
    END as optimization_suggestion
FROM performance_test_results
ORDER BY relative_performance;

-- 清理测试表
DROP TEMPORARY TABLE performance_test_results;

-- 分区查询性能监控和告警
-- 创建性能监控视图
CREATE VIEW partition_performance_monitor AS
SELECT
    OBJECT_NAME as table_name,
    INDEX_NAME as partition_name,
    COUNT_READ + COUNT_WRITE as total_operations,
    ROUND((SUM_TIMER_READ + SUM_TIMER_WRITE)/1000000000, 3) as total_time_seconds,
    ROUND((SUM_TIMER_READ + SUM_TIMER_WRITE)/(COUNT_READ + COUNT_WRITE)/1000000, 3) as avg_operation_time_ms,
    -- 性能告警级别
    CASE
        WHEN ROUND((SUM_TIMER_READ + SUM_TIMER_WRITE)/(COUNT_READ + COUNT_WRITE)/1000000, 3) > 50 THEN 'CRITICAL'
        WHEN ROUND((SUM_TIMER_READ + SUM_TIMER_WRITE)/(COUNT_READ + COUNT_WRITE)/1000000, 3) > 20 THEN 'WARNING'
        WHEN ROUND((SUM_TIMER_READ + SUM_TIMER_WRITE)/(COUNT_READ + COUNT_WRITE)/1000000, 3) > 10 THEN 'INFO'
        ELSE 'NORMAL'
    END as alert_level
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA = DATABASE()
  AND OBJECT_NAME LIKE '%partitioned%'
  AND (COUNT_READ + COUNT_WRITE) > 0;

-- 查看分区性能监控结果
SELECT * FROM partition_performance_monitor
WHERE alert_level IN ('CRITICAL', 'WARNING')
ORDER BY avg_operation_time_ms DESC;

4.4 事务处理和并发控制

事务处理是数据库系统的核心功能,正确理解和使用事务机制对于构建高可靠、高并发的应用系统至关重要。本节将深入分析各种事务隔离级别、锁机制和并发控制策略。

4.4.1 事务隔离级别对比

业务场景: 金融系统、电商订单处理、库存管理、账户余额操作

核心问题: 脏读、不可重复读、幻读的预防和性能平衡

-- 业务场景1:银行转账系统的隔离级别选择
-- 业务需求:确保转账过程中账户余额的一致性和准确性

-- 查看当前隔离级别
SELECT @@transaction_isolation as current_isolation_level;

-- 创建账户表用于演示
CREATE TABLE bank_accounts (
    account_id_ INT PRIMARY KEY,
    account_holder_ VARCHAR(100),
    balance_ DECIMAL(15,2) NOT NULL DEFAULT 0.00,
    account_status_ ENUM('ACTIVE', 'FROZEN', 'CLOSED') DEFAULT 'ACTIVE',
    last_updated_ TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    version_ INT DEFAULT 1,  -- 乐观锁版本号
    INDEX idx_status (account_status_)
);

-- 插入测试数据
INSERT INTO bank_accounts (account_id_, account_holder_, balance_) VALUES
(1001, 'Alice Johnson', 10000.00),
(1002, 'Bob Smith', 5000.00),
(1003, 'Charlie Brown', 15000.00);

-- 隔离级别1:READ UNCOMMITTED(读未提交)
-- ❌ 问题:存在脏读风险,不适用于金融业务
-- 业务风险:可能读取到未提交的错误数据,导致业务决策错误

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

-- 会话A:开始转账但不提交
START TRANSACTION;
UPDATE bank_accounts SET balance_ = balance_ - 1000 WHERE account_id_ = 1001;
-- 此时不提交事务

-- 会话B:在READ UNCOMMITTED级别下会看到未提交的数据(脏读)
SELECT balance_ FROM bank_accounts WHERE account_id_ = 1001;  -- 看到9000.00(脏数据)

-- 如果会话A回滚,会话B读取的数据就是错误的
-- ROLLBACK;  -- 会话A回滚

-- 隔离级别2:READ COMMITTED(读已提交)
-- ✅ 适用场景:大多数OLTP系统的默认选择
-- 优势:避免脏读,性能较好
-- 问题:可能出现不可重复读

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- 业务场景:账户余额查询和风险评估
-- 会话A:查询账户余额进行风险评估
START TRANSACTION;
SELECT balance_ FROM bank_accounts WHERE account_id_ = 1001;  -- 第一次读取:10000.00

-- 会话B:在此期间修改了账户余额
-- START TRANSACTION;
-- UPDATE bank_accounts SET balance_ = balance_ - 2000 WHERE account_id_ = 1001;
-- COMMIT;

-- 会话A:再次读取同一账户(不可重复读)
SELECT balance_ FROM bank_accounts WHERE account_id_ = 1001;  -- 第二次读取:8000.00(数据不一致)
COMMIT;

-- 隔离级别3:REPEATABLE READ(可重复读)- MySQL InnoDB默认级别
-- ✅ 适用场景:需要事务内数据一致性的业务
-- 优势:避免脏读和不可重复读
-- 问题:可能出现幻读(MySQL InnoDB通过Next-Key Lock避免了幻读)

SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- 业务场景:月度账户报表生成
-- 需要确保报表生成过程中数据的一致性
START TRANSACTION;

-- 第一次统计活跃账户数量
SELECT COUNT(*) as active_accounts FROM bank_accounts WHERE account_status_ = 'ACTIVE';

-- 第一次计算总余额
SELECT SUM(balance_) as total_balance FROM bank_accounts WHERE account_status_ = 'ACTIVE';

-- 即使其他会话在此期间插入了新的活跃账户,
-- 在REPEATABLE READ级别下,当前事务看到的数据保持一致

-- 再次统计(结果与第一次相同,保证了可重复读)
SELECT COUNT(*) as active_accounts FROM bank_accounts WHERE account_status_ = 'ACTIVE';
SELECT SUM(balance_) as total_balance FROM bank_accounts WHERE account_status_ = 'ACTIVE';

COMMIT;

-- 隔离级别4:SERIALIZABLE(串行化)
-- ✅ 适用场景:对数据一致性要求极高的关键业务
-- 优势:完全避免并发问题
-- 问题:性能最差,并发度最低

SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- 业务场景:年度审计或关键财务操作
-- 需要完全的数据一致性,可以接受较低的并发性能
START TRANSACTION;

-- 在SERIALIZABLE级别下,所有读取都会加共享锁
SELECT * FROM bank_accounts WHERE balance_ > 5000;

-- 其他会话的任何修改操作都会被阻塞,直到当前事务提交
COMMIT;

-- 业务场景对比总结和选择建议
/*
隔离级别          脏读    不可重复读    幻读      性能    适用场景
----------------------------------------------------------------
READ UNCOMMITTED   ✗        ✗         ✗       最高    数据分析、报表(非关键)
READ COMMITTED     ✓        ✗         ✗       高      大多数OLTP应用
REPEATABLE READ    ✓        ✓         ✓*      中      金融交易、库存管理
SERIALIZABLE       ✓        ✓         ✓       最低    审计、关键财务操作

注:MySQL InnoDB在REPEATABLE READ级别通过Next-Key Lock机制避免了幻读
*/

-- 实际业务中的隔离级别选择示例
-- 电商系统的不同业务场景

-- 场景1:商品浏览、搜索 - 使用READ COMMITTED
-- 原因:对数据一致性要求不高,优先考虑性能
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT * FROM products WHERE category_id_ = 1 AND status_ = 'ACTIVE';

-- 场景2:订单处理、库存扣减 - 使用REPEATABLE READ
-- 原因:需要确保订单处理过程中数据的一致性
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT stock_quantity_ FROM products WHERE product_id_ = 1001 FOR UPDATE;
UPDATE products SET stock_quantity_ = stock_quantity_ - 1 WHERE product_id_ = 1001;
INSERT INTO orders (customer_id_, product_id_, quantity_) VALUES (2001, 1001, 1);
COMMIT;

-- 场景3:财务结算、对账 - 使用SERIALIZABLE
-- 原因:对数据准确性要求极高,可以接受性能损失
SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
SELECT SUM(order_amount_) FROM orders WHERE order_date_ = CURDATE();
UPDATE daily_summary SET total_sales_ = (SELECT SUM(order_amount_) FROM orders WHERE order_date_ = CURDATE());
COMMIT;
4.4.2 锁机制和死锁处理

业务场景: 高并发系统、金融交易、库存管理、订单处理、资源竞争场景

核心问题: 数据一致性保证、死锁预防、锁等待优化、并发性能平衡

-- 锁机制监控和诊断工具
-- 查看当前锁状态(MySQL 8.0+)
SELECT
    dl.OBJECT_SCHEMA as database_name,
    dl.OBJECT_NAME as table_name,
    dl.LOCK_TYPE,
    dl.LOCK_MODE,
    dl.LOCK_STATUS,
    dl.LOCK_DATA,
    p.USER as lock_holder,
    p.HOST as client_host,
    p.TIME as lock_duration_seconds
FROM performance_schema.data_locks dl
LEFT JOIN INFORMATION_SCHEMA.PROCESSLIST p ON dl.THREAD_ID = p.ID
WHERE dl.OBJECT_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys');

-- 查看锁等待情况
SELECT
    dlw.REQUESTING_THREAD_ID as waiting_thread,
    dlw.BLOCKING_THREAD_ID as blocking_thread,
    dlw.OBJECT_NAME as table_name,
    dlw.LOCK_TYPE,
    p1.USER as waiting_user,
    p1.INFO as waiting_query,
    p2.USER as blocking_user,
    p2.INFO as blocking_query,
    p2.TIME as blocking_duration_seconds
FROM performance_schema.data_lock_waits dlw
LEFT JOIN INFORMATION_SCHEMA.PROCESSLIST p1 ON dlw.REQUESTING_THREAD_ID = p1.ID
LEFT JOIN INFORMATION_SCHEMA.PROCESSLIST p2 ON dlw.BLOCKING_THREAD_ID = p2.ID;

-- 业务场景1:电商库存管理的悲观锁应用
-- 业务需求:确保高并发下单时库存扣减的准确性
-- 业务价值:防止超卖,保证库存数据的一致性

CREATE TABLE product_inventory (
    product_id_ INT PRIMARY KEY,
    product_name_ VARCHAR(100),
    available_stock_ INT NOT NULL DEFAULT 0,
    reserved_stock_ INT NOT NULL DEFAULT 0,
    total_stock_ INT GENERATED ALWAYS AS (available_stock_ + reserved_stock_) STORED,
    last_updated_ TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    version_ INT DEFAULT 1,
    INDEX idx_stock (available_stock_)
);

-- 插入测试数据
INSERT INTO product_inventory (product_id_, product_name_, available_stock_) VALUES
(1001, 'iPhone 15 Pro', 100),
(1002, 'MacBook Pro', 50),
(1003, 'iPad Air', 200);

-- ✅ 正确方法:使用悲观锁确保库存扣减的原子性
-- 适用场景:高并发下单,对数据一致性要求极高
START TRANSACTION;

-- 锁定商品库存记录,防止并发修改
SELECT product_id_, product_name_, available_stock_
FROM product_inventory
WHERE product_id_ = 1001
FOR UPDATE;

-- 检查库存是否充足
SET @available_stock = (SELECT available_stock_ FROM product_inventory WHERE product_id_ = 1001);
SET @order_quantity = 5;

IF @available_stock >= @order_quantity THEN
    -- 扣减库存
    UPDATE product_inventory
    SET available_stock_ = available_stock_ - @order_quantity,
        version_ = version_ + 1
    WHERE product_id_ = 1001;

    -- 创建订单记录
    INSERT INTO orders (customer_id_, product_id_, quantity_, order_status_)
    VALUES (2001, 1001, @order_quantity, 'CONFIRMED');

    SELECT 'Order created successfully' as result;
ELSE
    SELECT 'Insufficient stock' as result;
END IF;

COMMIT;

-- ❌ 错误方法:不使用锁的库存扣减(存在竞态条件)
-- 问题:多个并发请求可能同时读取到相同的库存数量,导致超卖
/*
START TRANSACTION;
-- 危险:读取库存时没有加锁
SELECT available_stock_ FROM product_inventory WHERE product_id_ = 1001;
-- 其他会话可能在此期间修改了库存
UPDATE product_inventory SET available_stock_ = available_stock_ - 5 WHERE product_id_ = 1001;
COMMIT;
*/

-- 业务场景2:银行转账的锁顺序优化
-- 业务需求:避免转账操作中的死锁问题
-- 解决方案:按账户ID顺序获取锁,确保所有事务以相同顺序访问资源

CREATE TABLE bank_accounts_demo (
    account_id_ INT PRIMARY KEY,
    account_holder_ VARCHAR(100),
    balance_ DECIMAL(15,2) NOT NULL DEFAULT 0.00,
    account_status_ ENUM('ACTIVE', 'FROZEN') DEFAULT 'ACTIVE',
    last_transaction_time_ TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

INSERT INTO bank_accounts_demo VALUES
(1001, 'Alice Johnson', 10000.00, 'ACTIVE', NOW()),
(1002, 'Bob Smith', 5000.00, 'ACTIVE', NOW()),
(1003, 'Charlie Brown', 15000.00, 'ACTIVE', NOW());

-- ✅ 正确方法:按账户ID顺序加锁,避免死锁
-- 转账函数:从账户A转账到账户B
DELIMITER //
CREATE PROCEDURE SafeTransfer(
    IN from_account INT,
    IN to_account INT,
    IN transfer_amount DECIMAL(15,2)
)
BEGIN
    DECLARE min_account INT;
    DECLARE max_account INT;
    DECLARE from_balance DECIMAL(15,2);
    DECLARE exit handler FOR SQLEXCEPTION
    BEGIN
        ROLLBACK;
        RESIGNAL;
    END;

    -- 确定锁的顺序(总是按账户ID升序加锁)
    SET min_account = LEAST(from_account, to_account);
    SET max_account = GREATEST(from_account, to_account);

    START TRANSACTION;

    -- 按顺序锁定账户(避免死锁)
    SELECT balance_ INTO @temp FROM bank_accounts_demo WHERE account_id_ = min_account FOR UPDATE;
    SELECT balance_ INTO @temp FROM bank_accounts_demo WHERE account_id_ = max_account FOR UPDATE;

    -- 检查转出账户余额
    SELECT balance_ INTO from_balance FROM bank_accounts_demo WHERE account_id_ = from_account;

    IF from_balance >= transfer_amount THEN
        -- 执行转账
        UPDATE bank_accounts_demo SET balance_ = balance_ - transfer_amount WHERE account_id_ = from_account;
        UPDATE bank_accounts_demo SET balance_ = balance_ + transfer_amount WHERE account_id_ = to_account;

        -- 记录转账日志
        INSERT INTO transfer_log (from_account_, to_account_, amount_, transfer_time_)
        VALUES (from_account, to_account, transfer_amount, NOW());

        SELECT 'Transfer completed successfully' as result;
    ELSE
        SELECT 'Insufficient balance' as result;
    END IF;

    COMMIT;
END //
DELIMITER ;

-- 使用安全转账函数
CALL SafeTransfer(1001, 1002, 1000.00);

-- ❌ 错误方法:不按顺序加锁(死锁风险)
-- 会话1:A→B转账,先锁1001再锁1002
-- 会话2:B→A转账,先锁1002再锁1001
-- 结果:两个会话互相等待对方释放锁,形成死锁

-- 业务场景3:共享锁的正确使用
-- 业务需求:生成财务报表时确保数据一致性
-- 使用场景:需要读取多个相关表的数据,确保读取期间数据不被修改

START TRANSACTION;

-- 使用共享锁读取账户数据
SELECT account_id_, account_holder_, balance_
FROM bank_accounts_demo
WHERE account_status_ = 'ACTIVE'
LOCK IN SHARE MODE;

-- 使用共享锁读取交易数据
SELECT from_account_, to_account_, amount_, transfer_time_
FROM transfer_log
WHERE transfer_time_ >= CURDATE()
LOCK IN SHARE MODE;

-- 生成报表(此期间数据不会被修改)
SELECT
    'Daily Financial Report' as report_title,
    COUNT(*) as total_active_accounts,
    SUM(balance_) as total_balance,
    (SELECT COUNT(*) FROM transfer_log WHERE transfer_time_ >= CURDATE()) as daily_transfers
FROM bank_accounts_demo
WHERE account_status_ = 'ACTIVE';

COMMIT;

-- 业务场景4:死锁检测和处理
-- MySQL自动死锁检测和处理机制

-- 查看死锁信息
SHOW ENGINE INNODB STATUS;

-- 创建死锁监控表
CREATE TABLE deadlock_log (
    deadlock_id_ INT AUTO_INCREMENT PRIMARY KEY,
    detection_time_ TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    victim_thread_id_ BIGINT,
    victim_query_ TEXT,
    deadlock_info_ JSON,
    INDEX idx_detection_time (detection_time_)
);

-- 死锁预防最佳实践
-- 1. 统一资源访问顺序
-- 2. 缩短事务持续时间
-- 3. 降低事务隔离级别(如果业务允许)
-- 4. 使用乐观锁替代悲观锁(适当场景)

-- 乐观锁示例:使用版本号控制并发更新
UPDATE product_inventory
SET available_stock_ = available_stock_ - 5,
    version_ = version_ + 1
WHERE product_id_ = 1001
  AND version_ = @original_version;  -- 乐观锁检查

-- 检查更新是否成功
IF ROW_COUNT() = 0 THEN
    -- 版本冲突,需要重试
    SELECT 'Concurrent update detected, please retry' as result;
ELSE
    SELECT 'Update successful' as result;
END IF;

-- 查看锁信息(MySQL语法)
SELECT
    OBJECT_SCHEMA as schema_name,
    OBJECT_NAME as table_name,
    LOCK_TYPE as lock_type,
    LOCK_MODE as lock_mode,
    LOCK_STATUS as lock_status,
    LOCK_DATA as lock_data
FROM performance_schema.data_locks
WHERE OBJECT_SCHEMA IS NOT NULL;

-- 死锁图

-- 查看锁信息(MySQL语法)
SELECT
    r.trx_id,
    r.trx_mysql_thread_id,
    r.trx_query,
    r.trx_state,
    r.trx_started,
    r.trx_isolation_level
FROM INFORMATION_SCHEMA.INNODB_TRX r;

-- 查看阻塞查询(MySQL语法)
SELECT
    r.trx_id as blocking_trx_id,
    r.trx_mysql_thread_id as blocking_thread,
    r.trx_query as blocking_query,
    b.trx_id as blocked_trx_id,
    b.trx_mysql_thread_id as blocked_thread,
    b.trx_query as blocked_query,
    w.requesting_trx_id,
    w.requested_lock_id,
    w.blocking_trx_id as blocking_transaction,
    w.blocking_lock_id
FROM INFORMATION_SCHEMA.INNODB_TRX r
JOIN performance_schema.data_lock_waits w ON r.trx_id = w.blocking_trx_id
JOIN INFORMATION_SCHEMA.INNODB_TRX b ON w.requesting_trx_id = b.trx_id;
4.4.3 MVCC实现原理
-- MySQL InnoDB MVCC
-- 查看事务信息
SELECT
    trx_id,
    trx_state,
    trx_started,
    trx_isolation_level,
    trx_rows_locked,
    trx_rows_modified
FROM information_schema.innodb_trx;

-- 查看回滚段信息
SELECT
    space,
    page_no,
    type,
    n_owned,
    heap_no
FROM information_schema.innodb_buffer_page
WHERE page_type = 'UNDO_LOG';

-- 查看撤销段信息
SELECT
    segment_name,
    tablespace_name,
    bytes,
    blocks,
    extents
FROM dba_segments
WHERE segment_type = 'TYPE2 UNDO';

-- 查看事务信息(MySQL语法)
SELECT
    trx_id,
    trx_state,
    trx_started,
    trx_requested_lock_id,
    trx_wait_started,
    trx_weight,
    trx_mysql_thread_id,
    trx_query,
    trx_operation_state,
    trx_tables_in_use,
    trx_tables_locked,
    trx_lock_structs,
    trx_lock_memory_bytes,
    trx_rows_locked,
    trx_rows_modified,
    trx_isolation_level,
    trx_is_read_only
FROM INFORMATION_SCHEMA.INNODB_TRX;

-- 查看磁盘空间使用情况(MySQL语法)
SELECT
    TABLE_SCHEMA as database_name,
    TABLE_NAME as table_name,
    ROUND(((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024), 2) as total_size_mb,
    ROUND((DATA_LENGTH / 1024 / 1024), 2) as data_size_mb,
    ROUND((INDEX_LENGTH / 1024 / 1024), 2) as index_size_mb,
    ROUND((DATA_FREE / 1024 / 1024), 2) as free_space_mb
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA NOT IN ('information_schema', 'mysql', 'performance_schema', 'sys')
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;

-- 查看InnoDB状态信息(MySQL语法)
SELECT
    VARIABLE_NAME,
    VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME LIKE 'Innodb_trx%'
   OR VARIABLE_NAME LIKE 'Innodb_lock%'
   OR VARIABLE_NAME LIKE 'Innodb_row_lock%'
   OR VARIABLE_NAME LIKE 'Innodb_buffer_pool%';

-- 查看表的统计信息(MySQL语法)
SELECT
    TABLE_SCHEMA as schema_name,
    TABLE_NAME as table_name,
    TABLE_ROWS as estimated_rows,
    AVG_ROW_LENGTH as avg_row_length,
    DATA_LENGTH as data_length,
    INDEX_LENGTH as index_length,
    DATA_FREE as data_free,
    AUTO_INCREMENT as auto_increment,
    CREATE_TIME as create_time,
    UPDATE_TIME as update_time,
    CHECK_TIME as check_time
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = DATABASE();

-- 手动优化表(MySQL语法)
OPTIMIZE TABLE t_employees;

4.5 多表操作详解

多表操作是企业级数据库应用的核心技术,涉及复杂的业务逻辑处理、数据一致性保证和性能优化。正确掌握多表操作技术对于构建高效、可靠的数据库应用至关重要。

4.5.1 多表更新操作

业务场景: 绩效管理、数据同步、批量调整、业务规则应用、数据清洗

核心价值: 基于复杂关联条件的批量数据更新,避免多次单表操作的性能损失

-- 业务场景1:基于销售业绩的员工薪资调整系统
-- 业务需求:根据年度销售业绩自动调整销售人员薪资
-- 业务价值:自动化绩效管理,提高HR工作效率,确保薪资调整的公平性

-- 创建相关表结构
CREATE TABLE employee_performance (
    employee_id_ INT PRIMARY KEY,
    name_ VARCHAR(100),
    department_id_ INT,
    current_salary_ DECIMAL(10,2),
    performance_score_ DECIMAL(3,2),  -- 0.00-5.00
    last_review_date_ DATE,
    salary_adjustment_rate_ DECIMAL(5,4) DEFAULT 0.0000,
    updated_at_ TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    INDEX idx_dept_score (department_id_, performance_score_)
);

CREATE TABLE sales_performance (
    employee_id_ INT,
    sales_year_ YEAR,
    total_sales_ DECIMAL(15,2),
    sales_target_ DECIMAL(15,2),
    achievement_rate_ DECIMAL(5,4),  -- 销售达成率
    commission_earned_ DECIMAL(10,2),
    PRIMARY KEY (employee_id_, sales_year_),
    INDEX idx_achievement (achievement_rate_)
);

-- 插入测试数据
INSERT INTO employee_performance VALUES
(1001, 'Alice Johnson', 1, 50000.00, 4.2, '2023-12-01', 0.0000, NOW()),
(1002, 'Bob Smith', 1, 48000.00, 3.8, '2023-12-01', 0.0000, NOW()),
(1003, 'Charlie Brown', 2, 52000.00, 4.5, '2023-12-01', 0.0000, NOW());

INSERT INTO sales_performance VALUES
(1001, 2023, 150000.00, 120000.00, 1.2500, 15000.00),
(1002, 2023, 95000.00, 100000.00, 0.9500, 9500.00),
(1003, 2023, 180000.00, 150000.00, 1.2000, 18000.00);

-- ✅ 正确方法:多表关联更新(高效,一次性完成)
-- 业务规则:
-- 1. 销售达成率 >= 1.2:薪资上调15%
-- 2. 销售达成率 >= 1.0:薪资上调8%
-- 3. 销售达成率 < 1.0:薪资上调3%(基本调整)
-- 4. 绩效评分 >= 4.0:额外奖励5%

UPDATE employee_performance ep
JOIN sales_performance sp ON ep.employee_id_ = sp.employee_id_
JOIN t_departments d ON ep.department_id_ = d.department_id_
SET
    -- 基于销售达成率的薪资调整
    ep.salary_adjustment_rate_ = CASE
        WHEN sp.achievement_rate_ >= 1.20 THEN 0.15  -- 超额完成20%以上
        WHEN sp.achievement_rate_ >= 1.00 THEN 0.08  -- 完成目标
        ELSE 0.03  -- 未完成目标的基本调整
    END +
    -- 基于绩效评分的额外奖励
    CASE
        WHEN ep.performance_score_ >= 4.5 THEN 0.05  -- 优秀员工额外奖励
        WHEN ep.performance_score_ >= 4.0 THEN 0.03  -- 良好员工额外奖励
        ELSE 0.00
    END,
    -- 应用薪资调整
    ep.current_salary_ = ep.current_salary_ * (1 +
        CASE
            WHEN sp.achievement_rate_ >= 1.20 THEN 0.15
            WHEN sp.achievement_rate_ >= 1.00 THEN 0.08
            ELSE 0.03
        END +
        CASE
            WHEN ep.performance_score_ >= 4.5 THEN 0.05
            WHEN ep.performance_score_ >= 4.0 THEN 0.03
            ELSE 0.00
        END
    ),
    ep.last_review_date_ = CURDATE(),
    ep.updated_at_ = NOW()
WHERE sp.sales_year_ = 2023
  AND d.department_name_ IN ('Sales', 'Business Development')  -- 只调整销售相关部门
  AND ep.performance_score_ >= 3.0;  -- 绩效评分达标的员工

-- ❌ 错误方法:多次单表更新(低效,存在一致性风险)
-- 问题:需要多次查询和更新,性能差,可能出现数据不一致
/*
-- 步骤1:查询销售业绩
SELECT employee_id_, achievement_rate_ FROM sales_performance WHERE sales_year_ = 2023;

-- 步骤2:逐个更新员工薪资(需要循环处理)
UPDATE employee_performance SET current_salary_ = current_salary_ * 1.15 WHERE employee_id_ = 1001;
UPDATE employee_performance SET current_salary_ = current_salary_ * 1.08 WHERE employee_id_ = 1002;
-- ... 重复处理每个员工,效率极低
*/

-- 业务场景2:库存管理中的批量价格调整
-- 业务需求:根据供应商成本变化和市场策略调整商品价格
-- 业务价值:快速响应市场变化,保持合理的利润率

CREATE TABLE product_pricing (
    product_id_ INT PRIMARY KEY,
    product_name_ VARCHAR(100),
    supplier_id_ INT,
    cost_price_ DECIMAL(10,2),
    selling_price_ DECIMAL(10,2),
    profit_margin_ DECIMAL(5,4),
    price_update_date_ DATE,
    INDEX idx_supplier (supplier_id_)
);

CREATE TABLE supplier_cost_changes (
    supplier_id_ INT,
    cost_change_rate_ DECIMAL(5,4),  -- 成本变化率
    effective_date_ DATE,
    change_reason_ VARCHAR(200),
    PRIMARY KEY (supplier_id_, effective_date_)
);

-- ✅ 基于供应商成本变化的智能价格调整
UPDATE product_pricing pp
JOIN supplier_cost_changes scc ON pp.supplier_id_ = scc.supplier_id_
JOIN (
    -- 计算每个供应商的最新成本变化
    SELECT
        supplier_id_,
        cost_change_rate_,
        ROW_NUMBER() OVER (PARTITION BY supplier_id_ ORDER BY effective_date_ DESC) as rn
    FROM supplier_cost_changes
    WHERE effective_date_ <= CURDATE()
) latest_changes ON pp.supplier_id_ = latest_changes.supplier_id_ AND latest_changes.rn = 1
SET
    -- 调整成本价格
    pp.cost_price_ = pp.cost_price_ * (1 + latest_changes.cost_change_rate_),
    -- 保持目标利润率的销售价格调整
    pp.selling_price_ = pp.cost_price_ * (1 + latest_changes.cost_change_rate_) * (1 + pp.profit_margin_),
    pp.price_update_date_ = CURDATE()
WHERE latest_changes.cost_change_rate_ IS NOT NULL;

-- 业务场景3:客户等级升级和折扣调整
-- 业务需求:根据客户年度消费金额自动调整客户等级和享受的折扣率
CREATE TABLE customer_levels (
    customer_id_ INT PRIMARY KEY,
    customer_name_ VARCHAR(100),
    current_level_ ENUM('BRONZE', 'SILVER', 'GOLD', 'PLATINUM') DEFAULT 'BRONZE',
    discount_rate_ DECIMAL(4,4) DEFAULT 0.0000,
    annual_spending_ DECIMAL(15,2) DEFAULT 0.00,
    level_update_date_ DATE,
    INDEX idx_level_spending (current_level_, annual_spending_)
);

CREATE TABLE customer_orders_summary (
    customer_id_ INT,
    order_year_ YEAR,
    total_orders_ INT,
    total_amount_ DECIMAL(15,2),
    avg_order_value_ DECIMAL(10,2),
    PRIMARY KEY (customer_id_, order_year_)
);

-- ✅ 客户等级和折扣的智能升级
UPDATE customer_levels cl
JOIN customer_orders_summary cos ON cl.customer_id_ = cos.customer_id_
SET
    cl.annual_spending_ = cos.total_amount_,
    cl.current_level_ = CASE
        WHEN cos.total_amount_ >= 100000 THEN 'PLATINUM'
        WHEN cos.total_amount_ >= 50000 THEN 'GOLD'
        WHEN cos.total_amount_ >= 20000 THEN 'SILVER'
        ELSE 'BRONZE'
    END,
    cl.discount_rate_ = CASE
        WHEN cos.total_amount_ >= 100000 THEN 0.15  -- 白金客户15%折扣
        WHEN cos.total_amount_ >= 50000 THEN 0.10   -- 金牌客户10%折扣
        WHEN cos.total_amount_ >= 20000 THEN 0.05   -- 银牌客户5%折扣
        ELSE 0.00  -- 铜牌客户无折扣
    END,
    cl.level_update_date_ = CURDATE()
WHERE cos.order_year_ = YEAR(CURDATE())
  AND cos.total_amount_ > 0;

-- 业务场景4:多表更新的事务安全性
-- 业务需求:确保相关表数据的一致性更新
START TRANSACTION;

-- 更新员工薪资
UPDATE employee_performance ep
JOIN sales_performance sp ON ep.employee_id_ = sp.employee_id_
SET ep.current_salary_ = ep.current_salary_ * 1.1
WHERE sp.achievement_rate_ >= 1.0;

-- 同步更新薪资历史记录
INSERT INTO salary_history (employee_id_, old_salary_, new_salary_, change_reason_, change_date_)
SELECT
    ep.employee_id_,
    ep.current_salary_ / 1.1 as old_salary_,
    ep.current_salary_ as new_salary_,
    'Performance-based adjustment' as change_reason_,
    NOW() as change_date_
FROM employee_performance ep
JOIN sales_performance sp ON ep.employee_id_ = sp.employee_id_
WHERE sp.achievement_rate_ >= 1.0;

-- 更新部门薪资预算
UPDATE t_departments d
SET d.salary_budget_used_ = (
    SELECT SUM(ep.current_salary_)
    FROM employee_performance ep
    WHERE ep.department_id_ = d.department_id_
)
WHERE d.department_id_ IN (
    SELECT DISTINCT ep.department_id_
    FROM employee_performance ep
    JOIN sales_performance sp ON ep.employee_id_ = sp.employee_id_
    WHERE sp.achievement_rate_ >= 1.0
);

COMMIT;

-- 业务场景5:复杂的多表更新(基于部门预算的薪资调整)
-- 业务需求:根据部门预算情况和员工薪资水平进行差异化调整
UPDATE t_employees e
JOIN t_departments d ON e.department_id_ = d.department_id_
JOIN (
    SELECT
        department_id_,
        AVG(salary_) as avg_salary,
        COUNT(*) as emp_count
    FROM t_employees
    WHERE status_ = 'ACTIVE'
    GROUP BY department_id_
) dept_stats ON e.department_id_ = dept_stats.department_id_
SET e.salary_ = CASE
    WHEN d.budget_ > 2000000 AND e.salary_ < dept_stats.avg_salary THEN e.salary_ * 1.05
    WHEN d.budget_ < 1000000 AND e.salary_ > dept_stats.avg_salary THEN e.salary_ * 0.98
    ELSE e.salary_
END
WHERE e.status_ = 'ACTIVE';

-- 性能分析和优化建议:
-- 优点:语法简洁,支持复杂的JOIN条件
-- 注意事项:
-- 1. 确保JOIN条件有适当的索引
-- 2. 避免更新大量数据时的锁等待
-- 3. 考虑使用LIMIT分批更新大表
-- 4. 在事务中执行以保证数据一致性

-- 分批更新示例(避免长时间锁表)
UPDATE t_employees e
JOIN t_departments d ON e.department_id_ = d.department_id_
SET e.salary_ = e.salary_ * 1.1
WHERE d.department_name_ = 'Sales'
  AND e.status_ = 'ACTIVE'
LIMIT 100;

-- 业务场景6:使用相关子查询的复杂更新
-- 业务需求:基于部门预算和平均薪资进行个性化调整
-- 适用场景:需要复杂计算逻辑的薪资调整

UPDATE t_employees e
SET salary_ = (
    SELECT
        CASE
            WHEN d.budget_ > 2000000 AND e.salary_ < dept_avg.avg_salary THEN e.salary_ * 1.05
            WHEN d.budget_ < 1000000 AND e.salary_ > dept_avg.avg_salary THEN e.salary_ * 0.98
            ELSE e.salary_
        END
    FROM t_departments d
    JOIN (
        SELECT department_id_, AVG(salary_) as avg_salary
        FROM t_employees
        WHERE status_ = 'ACTIVE'
        GROUP BY department_id_
    ) dept_avg ON d.department_id_ = dept_avg.department_id_
    WHERE d.department_id_ = e.department_id_
)
WHERE e.status_ = 'ACTIVE'
  AND EXISTS (
    SELECT 1 FROM t_departments d
    WHERE d.department_id_ = e.department_id_
  );

-- MySQL多表更新性能优化建议:
-- 1. 优先使用JOIN语法,避免相关子查询
-- 2. 确保JOIN条件列有适当的索引
-- 3. 大批量更新时考虑分批处理
-- 4. 使用EXPLAIN分析执行计划
-- 5. 监控锁等待和死锁情况

-- 高性能批量更新(MySQL优化版本)
UPDATE t_employees e
JOIN t_departments d ON e.department_id_ = d.department_id_
SET e.salary_ = e.salary_ * 1.1,
    e.updated_at_ = NOW()
WHERE d.budget_ > 2000000
  AND e.status_ = 'ACTIVE'
LIMIT 1000;  -- 分批处理,避免长时间锁表
4.5.1.2 高级多表更新技术

业务场景: 复杂的业务规则应用、多维度数据更新、条件性批量调整

-- 业务场景7:基于多维度条件的复杂薪资调整
-- 业务需求:结合销售业绩、部门预算、员工级别进行差异化薪资调整

-- MySQL实现方案(使用多表JOIN)
UPDATE t_employees e
JOIN t_departments d ON e.department_id_ = d.department_id_
JOIN (
    SELECT
        employee_id_,
        SUM(amount_) as total_sales,
        COUNT(*) as sale_count,
        AVG(amount_) as avg_sale_amount
    FROM t_sales
    WHERE sale_date_ >= '2023-01-01'
    GROUP BY employee_id_
) s ON e.employee_id_ = s.employee_id_
SET e.salary_ = CASE
    -- 高业绩 + 高预算部门:15%涨幅
    WHEN s.total_sales > 150000 AND d.budget_ > 2000000 THEN e.salary_ * 1.15
    -- 中等业绩 + 中等预算:10%涨幅
    WHEN s.total_sales > 100000 AND d.budget_ > 1000000 THEN e.salary_ * 1.10
    -- 基础调整:5%涨幅
    WHEN s.total_sales > 50000 THEN e.salary_ * 1.05
    -- 无销售业绩:基本调整3%
    ELSE e.salary_ * 1.03
END,
e.updated_at_ = NOW()
WHERE e.status_ = 'ACTIVE'
  AND e.hire_date_ < DATE_SUB(NOW(), INTERVAL 6 MONTH);  -- 入职满6个月

-- 业务场景8:批量更新员工薪资(分批处理避免锁表)
-- 业务需求:为指定部门的所有员工加薪,但要避免长时间锁表
-- 解决方案:分批处理,每批处理1000条记录

-- 创建临时表记录处理进度
CREATE TEMPORARY TABLE salary_update_progress (
    batch_id INT AUTO_INCREMENT PRIMARY KEY,
    processed_count INT,
    update_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- 分批更新存储过程
DELIMITER $$
CREATE PROCEDURE BatchUpdateSalary(
    IN target_department_ids VARCHAR(100),
    IN salary_increase_rate DECIMAL(5,4),
    IN batch_size INT DEFAULT 1000
)
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE batch_count INT DEFAULT 0;
    DECLARE total_updated INT DEFAULT 0;

    -- 错误处理
    DECLARE EXIT HANDLER FOR SQLEXCEPTION
    BEGIN
        ROLLBACK;
        RESIGNAL;
    END;

    -- 开始分批处理
    batch_loop: LOOP
        START TRANSACTION;

        -- 更新一批数据
        UPDATE t_employees
        SET salary_ = salary_ * (1 + salary_increase_rate),
            updated_at_ = NOW()
        WHERE department_id_ IN (target_department_ids)
          AND status_ = 'ACTIVE'
          AND salary_updated_flag = 0  -- 使用标记避免重复更新
        LIMIT batch_size;

        SET batch_count = ROW_COUNT();
        SET total_updated = total_updated + batch_count;

        -- 记录进度
        INSERT INTO salary_update_progress (processed_count) VALUES (batch_count);

        COMMIT;

        -- 如果没有更多记录需要处理,退出循环
        IF batch_count = 0 THEN
            LEAVE batch_loop;
        END IF;

        -- 短暂休息,避免持续占用资源
        SELECT SLEEP(0.1);

    END LOOP;

    -- 重置更新标记
    UPDATE t_employees
    SET salary_updated_flag = 0
    WHERE department_id_ IN (target_department_ids);

    SELECT CONCAT('Total updated: ', total_updated, ' employees') as result;
END $$
DELIMITER ;

-- 使用分批更新存储过程
CALL BatchUpdateSalary('1,2,3', 0.10, 1000);  -- 为部门1,2,3加薪10%,每批1000条

-- MySQL多表更新的性能优化总结:
-- 1. 使用JOIN语法进行多表更新(MySQL标准语法)
-- 2. 通过子查询实现复杂的业务逻辑
-- 3. 使用存储过程实现批量处理和错误处理
-- 4. 合理使用事务确保数据一致性

-- 性能优化建议:
-- 1. 确保JOIN条件列有适当的索引
-- 2. 避免更新大量数据时的长时间锁表
-- 3. 考虑使用LIMIT分批更新大表
-- 4. 在事务中执行以保证数据一致性
-- 5. 监控锁等待和死锁情况
-- 6. 使用EXPLAIN分析执行计划
-- 7. 定期收集表统计信息以优化查询计划
4.5.2 多表插入操作

多表插入操作允许同时向多个表插入相关数据,确保数据的一致性和完整性。

MySQL 多表插入:

-- MySQL 不直接支持多表INSERT,但可以通过事务实现
-- 场景:新员工入职,同时插入员工信息和初始销售目标

START TRANSACTION;

-- 插入员工信息
INSERT INTO t_employees (name_, email_, department_id_, salary_, hire_date_, manager_id_, status_)
VALUES ('Alice Cooper', 'alice.cooper@company.com', 1, 65000, '2024-01-15', 1, 'ACTIVE');

-- 获取新插入的员工ID
SET @new_employee_id = LAST_INSERT_ID();

-- 插入销售目标
INSERT INTO t_sales_targets (employee_id_, target_amount_, target_year_, created_at_)
VALUES (@new_employee_id, 120000, YEAR(CURDATE()), NOW());

-- 插入员工培训记录
INSERT INTO t_training_records (employee_id_, training_type_, status_, created_at_)
VALUES (@new_employee_id, 'Orientation', 'Scheduled', NOW());

COMMIT;

-- 批量插入相关数据的另一种方法
INSERT INTO t_employees (name_, email_, department_id_, salary_, hire_date_, status_)
SELECT
    name_,
    email_,
    department_id_,
    salary_,
    hire_date_,
    'ACTIVE'
FROM temp_new_employees;

-- 然后基于刚插入的数据插入相关表
INSERT INTO t_sales_targets (employee_id_, target_amount_, target_year_)
SELECT
    e.employee_id_,
    100000,  -- 默认目标
    YEAR(CURDATE())
FROM t_employees e
JOIN temp_new_employees t ON e.email_ = t.email_
WHERE e.created_at_ >= CURDATE();

-- MySQL多表插入的性能考虑:
-- 优点:通过事务保证数据一致性
-- 注意事项:
-- 1. 使用事务确保原子性
-- 2. 合理设置外键约束
-- 3. 考虑使用批量插入提高性能
-- 4. 监控锁等待情况
-- MySQL多表插入的正确实现方法

-- 业务场景9:新员工入职的完整数据插入流程
-- 业务需求:新员工入职时需要同时创建员工记录、销售目标、培训记录
-- MySQL实现:使用事务确保数据一致性

START TRANSACTION;

-- 1. 插入员工基本信息
INSERT INTO t_employees (name_, email_, department_id_, salary_, hire_date_, status_)
VALUES ('John Doe', 'john.doe@company.com', 1, 65000, NOW(), 'ACTIVE');

-- 获取新插入的员工ID
SET @new_employee_id = LAST_INSERT_ID();

-- 2. 根据部门创建相应的记录
-- 销售部门:创建销售目标
INSERT INTO t_sales_targets (employee_id_, target_amount_, target_year_)
SELECT @new_employee_id, 100000, YEAR(NOW())
WHERE (SELECT department_id_ FROM t_employees WHERE employee_id_ = @new_employee_id) = 1;

-- 技术部门:创建技能认证记录
INSERT INTO tech_certifications (employee_id_, required_cert, deadline)
SELECT @new_employee_id, 'Basic Programming', DATE_ADD(NOW(), INTERVAL 6 MONTH)
WHERE (SELECT department_id_ FROM t_employees WHERE employee_id_ = @new_employee_id) = 3;

-- 3. 为所有新员工创建培训记录
INSERT INTO t_training_records (employee_id_, training_type_, status_)
VALUES (@new_employee_id, 'Orientation', 'Scheduled');

-- 4. 创建员工历史记录
INSERT INTO t_employee_history (employee_id_, action_type_, action_date_, details_)
VALUES (@new_employee_id, 'HIRED', NOW(), 'New employee hired');

COMMIT;

-- 业务场景10:批量员工数据同步(从临时表到正式表)
-- 业务需求:从HR系统导入的临时数据批量同步到正式表
-- 使用存储过程实现复杂的多表插入逻辑

DELIMITER $$
CREATE PROCEDURE BatchEmployeeSync()
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE v_name VARCHAR(100);
    DECLARE v_email VARCHAR(100);
    DECLARE v_dept_id INT;
    DECLARE v_salary DECIMAL(10,2);
    DECLARE v_hire_date DATE;
    DECLARE v_employee_id INT;

    -- 声明游标
    DECLARE emp_cursor CURSOR FOR
        SELECT name_, email_, department_id_, salary_, hire_date_
        FROM staging_employees
        WHERE processed = 'N';

    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

    -- 错误处理
    DECLARE EXIT HANDLER FOR SQLEXCEPTION
    BEGIN
        ROLLBACK;
        RESIGNAL;
    END;

    START TRANSACTION;

    OPEN emp_cursor;

    read_loop: LOOP
        FETCH emp_cursor INTO v_name, v_email, v_dept_id, v_salary, v_hire_date;

        IF done THEN
            LEAVE read_loop;
        END IF;

        -- 插入或更新员工信息
        INSERT INTO t_employees (name_, email_, department_id_, salary_, hire_date_, status_)
        VALUES (v_name, v_email, v_dept_id, v_salary, v_hire_date, 'ACTIVE')
        ON DUPLICATE KEY UPDATE
            salary_ = VALUES(salary_),
            department_id_ = VALUES(department_id_),
            updated_at_ = NOW();

        SET v_employee_id = LAST_INSERT_ID();

        -- 根据部门创建相应记录
        IF v_dept_id IN (1, 2) THEN
            -- 销售相关部门:创建销售目标
            INSERT IGNORE INTO t_sales_targets (employee_id_, target_amount_, target_year_)
            VALUES (v_employee_id, v_salary * 2, YEAR(NOW()));
        END IF;

        IF v_dept_id = 3 THEN
            -- 技术部门:创建技能要求
            INSERT IGNORE INTO tech_certifications (employee_id_, required_cert, deadline)
            VALUES (v_employee_id, 'Basic Programming', DATE_ADD(NOW(), INTERVAL 6 MONTH));
        END IF;

        -- 创建培训记录
        INSERT IGNORE INTO t_training_records (employee_id_, training_type_, status_)
        VALUES (v_employee_id, 'Orientation', 'Scheduled');

    END LOOP;

    CLOSE emp_cursor;

    -- 标记已处理
    UPDATE staging_employees SET processed = 'Y' WHERE processed = 'N';

    COMMIT;

    SELECT ROW_COUNT() as processed_count;
END $$
DELIMITER ;

-- 使用存储过程进行批量同步
CALL BatchEmployeeSync();

-- 同时插入相关的历史记录
INSERT INTO t_employee_history (employee_id_, action_type_, action_date_, old_value_, new_value_)
SELECT
    e.employee_id_,
    'SALARY_CHANGE',
    NOW(),
    CAST(e.salary_ AS CHAR),
    CAST(s.salary_ AS CHAR)
FROM t_employees e
JOIN staging_employees s ON e.email_ = s.email_
WHERE e.salary_ != s.salary_;

-- 优点:原生支持,语法简洁,性能优秀
-- 注意事项:
-- 1. 使用序列确保主键唯一性
-- 2. 条件插入时注意WHEN子句的顺序
-- 3. 大批量操作时使用APPEND提示
-- 4. 监控回滚段的使用情况

-- 业务场景11:使用临时表的批量多表插入
-- 业务需求:批量导入新员工数据并同时创建相关记录
-- MySQL实现:使用临时表和事务确保数据一致性

START TRANSACTION;

-- 创建临时表存储新员工信息
CREATE TEMPORARY TABLE temp_new_employees (
    temp_id INT AUTO_INCREMENT PRIMARY KEY,
    name_ VARCHAR(100),
    email_ VARCHAR(100),
    department_id_ INT,
    salary_ DECIMAL(10,2),
    hire_date_ DATE
);

-- 插入待处理的员工数据
INSERT INTO temp_new_employees (name_, email_, department_id_, salary_, hire_date_)
VALUES
    ('Alice Johnson', 'alice.johnson@company.com', 1, 65000, '2024-01-15'),
    ('Bob Smith', 'bob.smith@company.com', 2, 70000, '2024-01-15'),
    ('Carol Davis', 'carol.davis@company.com', 3, 75000, '2024-01-15');

-- 批量插入员工数据
INSERT INTO t_employees (name_, email_, department_id_, salary_, hire_date_, status_)
SELECT name_, email_, department_id_, salary_, hire_date_, 'ACTIVE'
FROM temp_new_employees;

-- 获取新插入员工的ID范围
SET @start_id = LAST_INSERT_ID();
SET @end_id = @start_id + (SELECT COUNT(*) FROM temp_new_employees) - 1;

-- 为销售部门员工创建销售目标
INSERT INTO t_sales_targets (employee_id_, target_amount_, target_year_, created_at_)
SELECT
    e.employee_id_,
    e.salary_ * 1.5,
    YEAR(CURDATE()),
    NOW()
FROM t_employees e
JOIN temp_new_employees t ON e.email_ = t.email_
WHERE e.employee_id_ BETWEEN @start_id AND @end_id
  AND e.department_id_ IN (1, 2);

-- 为所有新员工创建培训记录
INSERT INTO t_training_records (employee_id_, training_type_, status_, created_at_)
SELECT
    e.employee_id_,
    'Orientation',
    'Scheduled',
    NOW()
FROM t_employees e
JOIN temp_new_employees t ON e.email_ = t.email_
WHERE e.employee_id_ BETWEEN @start_id AND @end_id;

-- 创建员工历史记录
INSERT INTO t_employee_history (employee_id_, action_type_, action_date_, details_)
SELECT
    e.employee_id_,
    'HIRED',
    NOW(),
    CONCAT('Batch hire: ', e.name_)
FROM t_employees e
JOIN temp_new_employees t ON e.email_ = t.email_
WHERE e.employee_id_ BETWEEN @start_id AND @end_id;

-- 清理临时表
DROP TEMPORARY TABLE temp_new_employees;

COMMIT;

-- MySQL多表插入的性能优化总结:
-- 1. 使用事务确保数据一致性
-- 2. 利用LAST_INSERT_ID()获取新插入记录的ID
-- 3. 使用临时表处理复杂的批量操作
-- 4. 合理设置外键约束和索引
-- 5. 监控事务日志的增长
-- 业务场景12:单个员工入职的完整流程
-- 业务需求:新员工入职时需要在多个相关表中创建记录
-- MySQL实现:使用事务和变量确保数据一致性

START TRANSACTION;

-- 插入新员工基本信息
INSERT INTO t_employees (name_, email_, department_id_, salary_, hire_date_, status_)
VALUES ('Emma Brown', 'emma.brown@company.com', 1, 67000, '2024-01-15', 'ACTIVE');

-- 获取新插入的员工ID
SET @new_employee_id = LAST_INSERT_ID();

-- 根据部门创建销售目标(仅限销售部门)
INSERT INTO t_sales_targets (employee_id_, target_amount_, target_year_, created_at_)
SELECT @new_employee_id, 67000 * 1.8, YEAR(CURDATE()), NOW()
WHERE (SELECT department_id_ FROM t_employees WHERE employee_id_ = @new_employee_id) IN (1, 2);

-- 创建培训记录
INSERT INTO t_training_records (employee_id_, training_type_, status_, created_at_)
VALUES (@new_employee_id, 'Orientation', 'Scheduled', NOW());

-- 创建员工历史记录
INSERT INTO t_employee_history (employee_id_, action_type_, action_date_, details_)
VALUES (@new_employee_id, 'HIRED', NOW(),
        CONCAT('New employee hired: ', (SELECT name_ FROM t_employees WHERE employee_id_ = @new_employee_id)));

COMMIT;

-- 业务场景13:从临时表批量同步到正式表
-- 业务需求:从外部系统导入的数据需要批量同步到多个相关表
-- MySQL实现:分步骤执行,确保数据一致性

START TRANSACTION;

-- 步骤1:批量插入员工数据
INSERT INTO t_employees (name_, email_, department_id_, salary_, hire_date_, status_)
SELECT
    name_,
    email_,
    department_id_,
    salary_,
    hire_date_,
    'ACTIVE'
FROM staging_employees
WHERE processed = 0;

-- 步骤2:获取新插入的员工ID范围
SET @min_employee_id = (SELECT MIN(employee_id_) FROM t_employees WHERE created_at_ >= NOW() - INTERVAL 1 MINUTE);
SET @max_employee_id = (SELECT MAX(employee_id_) FROM t_employees WHERE created_at_ >= NOW() - INTERVAL 1 MINUTE);

-- 步骤3:为销售部门员工插入销售目标
INSERT INTO t_sales_targets (employee_id_, target_amount_, target_year_)
SELECT
    e.employee_id_,
    e.salary_ * 1.5,
    YEAR(CURDATE())
FROM t_employees e
WHERE e.employee_id_ BETWEEN @min_employee_id AND @max_employee_id
  AND e.department_id_ IN (1, 2);

-- 步骤4:为所有新员工插入培训记录
INSERT INTO t_training_records (employee_id_, training_type_, status_)
SELECT
    e.employee_id_,
    'Orientation',
    'Scheduled'
FROM t_employees e
WHERE e.employee_id_ BETWEEN @min_employee_id AND @max_employee_id;

-- 步骤5:插入历史记录
INSERT INTO t_employee_history (employee_id_, action_type_, action_date_, details_)
SELECT
    e.employee_id_,
    'HIRED',
    NOW(),
    CONCAT('Batch hire: ', e.name_)
FROM t_employees e
WHERE e.employee_id_ BETWEEN @min_employee_id AND @max_employee_id;

-- 步骤6:标记临时表数据为已处理
UPDATE staging_employees SET processed = 1 WHERE processed = 0;

COMMIT;

-- MySQL存储过程实现复杂的多表插入
DELIMITER $$
CREATE PROCEDURE HireEmployee(
    IN p_first_name VARCHAR(50),
    IN p_last_name VARCHAR(50),
    IN p_email VARCHAR(100),
    IN p_department_id INT,
    IN p_salary DECIMAL(10,2),
    IN p_hire_date DATE,
    OUT new_employee_id INT
)
BEGIN
    DECLARE dept_name VARCHAR(100);
    DECLARE EXIT HANDLER FOR SQLEXCEPTION
    BEGIN
        ROLLBACK;
        RESIGNAL;
    END;

    START TRANSACTION;

    -- 获取部门名称
    SELECT department_name_ INTO dept_name
    FROM t_departments
    WHERE department_id_ = p_department_id;

    -- 插入员工
    INSERT INTO t_employees (name_, email_, department_id_, salary_, hire_date_, status_)
    VALUES (CONCAT(p_first_name, ' ', p_last_name), p_email, p_department_id, p_salary, p_hire_date, 'ACTIVE');

    SET new_employee_id = LAST_INSERT_ID();

    -- 插入销售目标(如果是销售部门)
    IF dept_name = 'Sales' THEN
        INSERT INTO t_sales_targets (employee_id_, target_amount_, target_year_)
        VALUES (new_employee_id, p_salary * 2, YEAR(CURRENT_DATE));
    END IF;

    -- 插入培训记录
    INSERT INTO t_training_records (employee_id_, training_type_, status_)
    VALUES (new_employee_id, 'Orientation', 'Scheduled');

    -- 插入历史记录
    INSERT INTO t_employee_history (employee_id_, action_type_, action_date_, details_)
    VALUES (new_employee_id, 'HIRED', NOW(),
            CONCAT('New employee hired in ', dept_name, ' department'));

    COMMIT;
END $$
DELIMITER ;

-- 使用存储过程
CALL HireEmployee('Frank', 'Miller', 'frank.miller@company.com', 1, 72000, '2024-01-15', @new_id);
SELECT @new_id as new_employee_id;

-- MySQL多表插入的优势:
-- 1. 使用存储过程确保事务一致性
-- 2. 通过LAST_INSERT_ID()获取新插入的记录ID
-- 3. 支持复杂的业务逻辑和条件判断
-- 4. 提供完整的错误处理和回滚机制

-- 注意事项:
-- 1. 大批量操作时考虑分批处理
-- 2. 监控binlog日志的增长
-- 3. 使用存储过程封装复杂的业务逻辑
-- 4. 合理设置事务隔离级别
4.5.3 多表删除操作

多表删除操作用于删除相关联的数据,确保数据的完整性和一致性。不同数据库系统在语法和实现上有显著差异。

MySQL 多表删除:

-- MySQL 多表删除语法
-- 场景:删除离职员工及其相关数据

-- 基本多表删除语法
DELETE e, s, t
FROM t_employees e
LEFT JOIN t_sales s ON e.employee_id_ = s.employee_id_
LEFT JOIN t_sales_targets t ON e.employee_id_ = t.employee_id_
WHERE e.status_ = 'TERMINATED'
  AND e.hire_date_ < '2020-01-01';

-- 复杂的多表删除:删除低绩效员工及相关数据
DELETE e, st, tr
FROM t_employees e
LEFT JOIN t_sales_targets st ON e.employee_id_ = st.employee_id_
LEFT JOIN t_training_records tr ON e.employee_id_ = tr.employee_id_
WHERE e.employee_id_ IN (
    SELECT emp_id FROM (
        SELECT
            e2.employee_id_ as emp_id,
            COALESCE(SUM(s.amount_), 0) as total_sales,
            COUNT(s.sale_id_) as sale_count
        FROM t_employees e2
        LEFT JOIN t_sales s ON e2.employee_id_ = s.employee_id_
            AND s.sale_date_ >= DATE_SUB(CURDATE(), INTERVAL 1 YEAR)
        WHERE e2.department_id_ = 1  -- 销售部门
          AND e2.status_ = 'ACTIVE'
        GROUP BY e2.employee_id_
        HAVING total_sales < 50000 OR sale_count < 10
    ) low_performers
);

-- 安全的级联删除(使用事务)
START TRANSACTION;

-- 首先删除子表数据
DELETE FROM t_sales WHERE employee_id_ IN (
    SELECT employee_id_ FROM t_employees WHERE status_ = 'TERMINATED'
);

DELETE FROM t_sales_targets WHERE employee_id_ IN (
    SELECT employee_id_ FROM t_employees WHERE status_ = 'TERMINATED'
);

DELETE FROM t_training_records WHERE employee_id_ IN (
    SELECT employee_id_ FROM t_employees WHERE status_ = 'TERMINATED'
);

DELETE FROM t_employee_history WHERE employee_id_ IN (
    SELECT employee_id_ FROM t_employees WHERE status_ = 'TERMINATED'
);

-- 最后删除主表数据
DELETE FROM t_employees WHERE status_ = 'TERMINATED';

COMMIT;

-- 批量删除避免锁表
DELIMITER //
CREATE PROCEDURE BatchDeleteTerminatedEmployees()
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE emp_id INT;
    DECLARE emp_cursor CURSOR FOR
        SELECT employee_id_ FROM t_employees WHERE status_ = 'TERMINATED' LIMIT 100;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

    OPEN emp_cursor;

    delete_loop: LOOP
        FETCH emp_cursor INTO emp_id;
        IF done THEN
            LEAVE delete_loop;
        END IF;

        -- 删除相关数据
        DELETE FROM t_sales WHERE employee_id_ = emp_id;
        DELETE FROM t_sales_targets WHERE employee_id_ = emp_id;
        DELETE FROM t_training_records WHERE employee_id_ = emp_id;
        DELETE FROM t_employee_history WHERE employee_id_ = emp_id;
        DELETE FROM t_employees WHERE employee_id_ = emp_id;

    END LOOP;

    CLOSE emp_cursor;
END //
DELIMITER ;

-- MySQL多表删除的性能考虑:
-- 优点:语法直观,支持多表同时删除
-- 注意事项:
-- 1. 注意外键约束的影响
-- 2. 大批量删除时使用LIMIT分批处理
-- 3. 删除前备份重要数据
-- 4. 监控binlog的增长
-- 5. 考虑使用软删除替代物理删除
-- 业务场景14:使用存储过程的安全多表删除
-- 业务需求:清理历史离职员工数据,释放存储空间
-- MySQL实现:使用存储过程和游标处理复杂的多表删除

DELIMITER $$
CREATE PROCEDURE DeleteTerminatedEmployees()
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE emp_id INT;
    DECLARE deleted_count INT DEFAULT 0;

    -- 声明游标
    DECLARE emp_cursor CURSOR FOR
        SELECT employee_id_
        FROM t_employees
        WHERE status_ = 'TERMINATED'
          AND hire_date_ < '2020-01-01';

    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

    -- 错误处理
    DECLARE EXIT HANDLER FOR SQLEXCEPTION
    BEGIN
        ROLLBACK;
        RESIGNAL;
    END;

    -- 开始事务
    START TRANSACTION;

    -- 打开游标
    OPEN emp_cursor;

    read_loop: LOOP
        FETCH emp_cursor INTO emp_id;
        IF done THEN
            LEAVE read_loop;
        END IF;

        -- 删除相关数据(按外键依赖顺序)
        DELETE FROM t_sales WHERE employee_id_ = emp_id;
        DELETE FROM t_sales_targets WHERE employee_id_ = emp_id;
        DELETE FROM t_training_records WHERE employee_id_ = emp_id;
        DELETE FROM t_employee_history WHERE employee_id_ = emp_id;
        DELETE FROM t_employees WHERE employee_id_ = emp_id;

        SET deleted_count = deleted_count + 1;
    END LOOP;

    -- 关闭游标
    CLOSE emp_cursor;

    -- 提交事务
    COMMIT;

    -- 输出结果
    SELECT CONCAT('Deleted ', deleted_count, ' employees and related data') AS result;
END$$
DELIMITER ;

-- 业务场景15:使用EXISTS的相关删除
-- 业务需求:删除特定条件的员工相关数据
-- MySQL实现:使用EXISTS子查询确保数据一致性

DELETE FROM t_sales s
WHERE EXISTS (
    SELECT 1 FROM t_employees e
    WHERE e.employee_id_ = s.employee_id_
      AND e.status_ = 'TERMINATED'
      AND e.hire_date_ < '2020-01-01'
);

DELETE FROM t_sales_targets st
WHERE EXISTS (
    SELECT 1 FROM t_employees e
    WHERE e.employee_id_ = st.employee_id_
      AND e.status_ = 'TERMINATED'
      AND e.hire_date_ < '2020-01-01'
);

-- 业务场景16:基于业绩的条件删除
-- 业务需求:删除低绩效员工及其相关数据
-- MySQL实现:使用子查询识别低绩效员工

DELETE FROM t_employees
WHERE employee_id_ IN (
    SELECT emp_id FROM (
        SELECT
            e2.employee_id_ as emp_id,
            IFNULL(SUM(s.amount_), 0) as total_sales,
            COUNT(s.sale_id_) as sale_count
        FROM t_employees e2
        LEFT JOIN t_sales s ON e2.employee_id_ = s.employee_id_
            AND s.sale_date_ >= DATE_SUB(NOW(), INTERVAL 12 MONTH)
        WHERE e2.department_id_ = 1  -- 销售部门
          AND e2.status_ = 'ACTIVE'
        GROUP BY e2.employee_id_
        HAVING IFNULL(SUM(s.amount_), 0) < 50000 OR COUNT(s.sale_id_) < 10
    ) low_performers
);

-- 业务场景17:分步骤的安全删除流程
-- 业务需求:先标记后删除,确保数据安全
-- MySQL实现:两步操作,先更新状态再删除

-- 第一步:标记低绩效员工为离职状态
UPDATE t_employees e
JOIN (
    SELECT
        e2.employee_id_,
        IFNULL(SUM(s.amount_), 0) as total_sales
    FROM t_employees e2
    LEFT JOIN t_sales s ON e2.employee_id_ = s.employee_id_
        AND s.sale_date_ >= DATE_SUB(NOW(), INTERVAL 12 MONTH)
    WHERE e2.department_id_ = 1
    GROUP BY e2.employee_id_
    HAVING total_sales < 30000
) performance ON e.employee_id_ = performance.employee_id_
SET e.status_ = 'TERMINATED',
    e.updated_at_ = NOW();

-- 第二步:删除已标记的员工(可选)
-- DELETE FROM t_employees WHERE status_ = 'TERMINATED' AND updated_at_ >= CURDATE();

-- MySQL多表删除的性能优化策略:
-- 1. 按外键依赖顺序删除,避免约束冲突
-- 2. 使用LIMIT分批处理,避免长时间锁表
-- 3. 监控binlog增长情况,控制日志大小
-- 4. 考虑使用分区表提高删除性能
-- 5. 删除后执行OPTIMIZE TABLE清理空间
-- 业务场景18:MySQL标准的多表删除语法
-- 业务需求:删除离职员工及其相关数据
-- MySQL实现:使用JOIN语法进行多表删除

-- MySQL多表删除的正确实现(按依赖顺序删除)
-- 删除2020年前入职的已离职员工及其相关数据

-- 步骤1:删除销售记录
DELETE s FROM t_sales s
JOIN t_employees e ON s.employee_id_ = e.employee_id_
WHERE e.status_ = 'TERMINATED'
  AND e.hire_date_ < '2020-01-01';

-- 步骤2:删除销售目标
DELETE st FROM t_sales_targets st
JOIN t_employees e ON st.employee_id_ = e.employee_id_
WHERE e.status_ = 'TERMINATED'
  AND e.hire_date_ < '2020-01-01';

-- 步骤3:删除培训记录
DELETE tr FROM t_training_records tr
JOIN t_employees e ON tr.employee_id_ = e.employee_id_
WHERE e.status_ = 'TERMINATED'
  AND e.hire_date_ < '2020-01-01';

-- 步骤4:删除历史记录
DELETE eh FROM t_employee_history eh
JOIN t_employees e ON eh.employee_id_ = e.employee_id_
WHERE e.status_ = 'TERMINATED'
  AND e.hire_date_ < '2020-01-01';

-- 步骤5:最后删除员工记录
DELETE FROM t_employees
WHERE status_ = 'TERMINATED'
  AND hire_date_ < '2020-01-01';

-- MySQL存储过程实现安全的多表删除
DELIMITER $$
CREATE PROCEDURE DeleteEmployeeCascade(IN p_employee_id INT)
BEGIN
    DECLARE v_count INT DEFAULT 0;
    DECLARE EXIT HANDLER FOR SQLEXCEPTION
    BEGIN
        ROLLBACK;
        RESIGNAL;
    END;

    START TRANSACTION;

    -- 检查员工是否存在
    SELECT COUNT(*) INTO v_count
    FROM t_employees
    WHERE employee_id_ = p_employee_id;

    IF v_count = 0 THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Employee not found';
    END IF;

    -- 删除相关数据(按依赖顺序)
    DELETE FROM t_sales WHERE employee_id_ = p_employee_id;
    SET v_count = ROW_COUNT();
    SELECT CONCAT('Deleted ', v_count, ' sales records') as info;

    DELETE FROM t_sales_targets WHERE employee_id_ = p_employee_id;
    SET v_count = ROW_COUNT();
    SELECT CONCAT('Deleted ', v_count, ' sales targets') as info;

    DELETE FROM t_training_records WHERE employee_id_ = p_employee_id;
    SET v_count = ROW_COUNT();
    SELECT CONCAT('Deleted ', v_count, ' training records') as info;

    DELETE FROM t_employee_history WHERE employee_id_ = p_employee_id;
    SET v_count = ROW_COUNT();
    SELECT CONCAT('Deleted ', v_count, ' history records') as info;

    -- 删除员工记录
    DELETE FROM t_employees WHERE employee_id_ = p_employee_id;
    SET v_count = ROW_COUNT();
    SELECT CONCAT('Deleted ', v_count, ' employee record') as info;

    COMMIT;
    SELECT 'Employee deletion completed successfully' as result;
END $$
DELIMITER ;

-- 使用存储过程删除员工
CALL DeleteEmployeeCascade(123);

-- MySQL批量删除存储过程
DELIMITER $$
CREATE PROCEDURE BatchDeleteTerminatedEmployees(IN p_batch_size INT)
BEGIN
    DECLARE v_total_deleted INT DEFAULT 0;
    DECLARE v_batch_count INT DEFAULT 0;
    DECLARE done INT DEFAULT FALSE;

    DECLARE EXIT HANDLER FOR SQLEXCEPTION
    BEGIN
        ROLLBACK;
        RESIGNAL;
    END;

    batch_loop: LOOP
        START TRANSACTION;

        -- 删除一批离职员工的相关数据
        DELETE s FROM t_sales s
        JOIN t_employees e ON s.employee_id_ = e.employee_id_
        WHERE e.status_ = 'TERMINATED'
        LIMIT p_batch_size;

        DELETE st FROM t_sales_targets st
        JOIN t_employees e ON st.employee_id_ = e.employee_id_
        WHERE e.status_ = 'TERMINATED'
        LIMIT p_batch_size;

        DELETE tr FROM t_training_records tr
        JOIN t_employees e ON tr.employee_id_ = e.employee_id_
        WHERE e.status_ = 'TERMINATED'
        LIMIT p_batch_size;

        DELETE eh FROM t_employee_history eh
        JOIN t_employees e ON eh.employee_id_ = e.employee_id_
        WHERE e.status_ = 'TERMINATED'
        LIMIT p_batch_size;

        -- 删除员工记录
        DELETE FROM t_employees
        WHERE status_ = 'TERMINATED'
        LIMIT p_batch_size;

        SET v_batch_count = ROW_COUNT();
        SET v_total_deleted = v_total_deleted + v_batch_count;

        COMMIT;

        -- 如果没有更多记录需要删除,退出循环
        IF v_batch_count = 0 THEN
            LEAVE batch_loop;
        END IF;

        -- 短暂休息
        SELECT SLEEP(0.1);

    END LOOP;

    SELECT CONCAT('Total deleted: ', v_total_deleted, ' employees') as result;
END $$
DELIMITER ;

-- 执行批量删除
CALL BatchDeleteTerminatedEmployees(500);

-- 业务场景19:软删除替代方案
-- 业务需求:保留数据历史,避免误删除
-- MySQL实现:使用状态标记替代物理删除

UPDATE t_employees
SET status_ = 'DELETED',
    updated_at_ = CURRENT_TIMESTAMP,
    deleted_at_ = CURRENT_TIMESTAMP
WHERE status_ = 'TERMINATED'
  AND hire_date_ < '2020-01-01';

-- 创建视图隐藏已删除的记录
CREATE VIEW active_employees AS
SELECT *
FROM t_employees
WHERE status_ != 'DELETED' OR status_ IS NULL;

-- MySQL多表删除的最佳实践总结:
-- 1. 按外键依赖顺序删除,避免约束冲突
-- 2. 大批量删除时使用LIMIT分批处理
-- 3. 监控binlog日志增长情况
-- 4. 删除后执行OPTIMIZE TABLE清理空间
-- 5. 考虑使用软删除避免数据丢失
-- 6. 使用事务确保删除操作的原子性
-- 7. 删除前备份重要数据

-- 删除后的维护操作
OPTIMIZE TABLE t_employees;
OPTIMIZE TABLE t_sales;
OPTIMIZE TABLE t_sales_targets;

4.6 多表操作的性能分析和最佳实践

4.6.1 性能影响因素分析

索引对多表操作的影响:

-- 多表更新中的索引使用分析
-- 场景:根据销售业绩更新员工薪资

-- 1. 确保JOIN条件有适当的索引
CREATE INDEX idx_sales_employee_date ON t_sales(employee_id_, sale_date_);
CREATE INDEX idx_employees_dept_status ON t_employees(department_id_, status_);

-- 2. 分析执行计划(以MySQL为例)
EXPLAIN FORMAT=JSON
UPDATE t_employees e
JOIN (
    SELECT
        employee_id_,
        SUM(amount_) as total_sales
    FROM t_sales
    WHERE sale_date_ >= '2023-01-01'
    GROUP BY employee_id_
    HAVING SUM(amount_) > 100000
) s ON e.employee_id_ = s.employee_id_
SET e.salary_ = e.salary_ * 1.1;

-- 执行计划分析要点:
-- - 检查是否使用了索引扫描而非全表扫描
-- - 关注JOIN算法的选择(Nested Loop vs Hash Join)
-- - 注意临时表的使用情况
-- - 观察行数估算的准确性

-- 3. 索引优化建议
-- 为多表操作创建覆盖索引
CREATE INDEX idx_sales_covering ON t_sales(employee_id_, sale_date_, amount_);

-- 这样可以避免回表查询,提高性能

锁机制对多表操作的影响:

-- 多表操作中的锁分析

-- MySQL中的锁影响
-- 1. 行锁 vs 表锁
SELECT @@innodb_lock_wait_timeout;  -- 查看锁等待超时时间

-- 2. 减少锁等待的策略
-- 按主键顺序更新,避免死锁
UPDATE t_employees
SET salary_ = salary_ * 1.1
WHERE employee_id_ IN (1, 2, 3, 4, 5)  -- 按ID顺序
ORDER BY employee_id_;  -- 确保按顺序加锁

-- 3. 使用较低的隔离级别(如果业务允许)
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- 1. 查看锁等待情况
SELECT
    -- MySQL锁分析查询
    p1.USER as waiting_user,
    p1.HOST as waiting_host,
    p2.USER as blocking_user,
    p2.HOST as blocking_host,
    il.OBJECT_NAME,
    il.LOCK_TYPE
FROM v$locked_object lo, all_objects ao, v$session s1, v$session s2
WHERE ao.object_id = lo.object_id
  AND lo.session_id = s1.sid
  AND s1.blocking_session = s2.sid;

-- 2. 减少锁竞争的策略
-- 使用NOWAIT避免长时间等待
UPDATE t_employees SET salary_ = salary_ * 1.1
WHERE department_id_ = 1
FOR UPDATE NOWAIT;
4.6.2 多表操作的最佳实践

1. 操作顺序优化:

-- 正确的删除顺序(从子表到父表)
-- 错误的做法:先删除父表
DELETE FROM t_employees WHERE status_ = 'TERMINATED';  -- 可能违反外键约束

-- 正确的做法:按依赖关系删除
DELETE FROM t_sales WHERE employee_id_ IN (SELECT employee_id_ FROM t_employees WHERE status_ = 'TERMINATED');
DELETE FROM t_sales_targets WHERE employee_id_ IN (SELECT employee_id_ FROM t_employees WHERE status_ = 'TERMINATED');
DELETE FROM t_employees WHERE status_ = 'TERMINATED';

-- 插入顺序(从父表到子表)
INSERT INTO t_departments (department_name_, location_) VALUES ('New Dept', 'New York');
INSERT INTO t_employees (name_, department_id_) VALUES ('John Doe', LAST_INSERT_ID());

2. 事务管理最佳实践:

-- 合理的事务边界
-- 避免长事务
BEGIN;
-- 只包含相关的操作
UPDATE t_employees SET salary_ = salary_ * 1.1 WHERE department_id_ = 1;
UPDATE t_sales_targets SET target_amount_ = target_amount_ * 1.1 WHERE employee_id_ IN (
    SELECT employee_id_ FROM t_employees WHERE department_id_ = 1
);
COMMIT;

-- MySQL批量操作中的事务管理
-- 每处理一定数量的记录就提交一次
DELIMITER $$
CREATE PROCEDURE BatchDeleteWithTransactionControl(IN p_batch_size INT)
BEGIN
    DECLARE v_rows_processed INT DEFAULT 0;
    DECLARE v_batch_count INT DEFAULT 0;
    DECLARE done INT DEFAULT FALSE;

    batch_loop: LOOP
        START TRANSACTION;

        -- 删除一批销售记录
        DELETE FROM t_sales
        WHERE employee_id_ IN (
            SELECT employee_id_ FROM (
                SELECT employee_id_ FROM t_employees
                WHERE status_ = 'TERMINATED'
                ORDER BY employee_id_
                LIMIT p_batch_size
            ) tmp
        );

        -- 删除一批员工记录
        DELETE FROM t_employees
        WHERE employee_id_ IN (
            SELECT employee_id_ FROM (
                SELECT employee_id_ FROM t_employees
                WHERE status_ = 'TERMINATED'
                ORDER BY employee_id_
                LIMIT p_batch_size
            ) tmp2
        );

        SET v_batch_count = ROW_COUNT();
        SET v_rows_processed = v_rows_processed + v_batch_count;

        COMMIT;

        -- 如果没有更多记录,退出循环
        IF v_batch_count = 0 THEN
            LEAVE batch_loop;
        END IF;

        -- 避免长时间占用资源,短暂休息
        IF v_rows_processed % 10000 = 0 THEN
            SELECT SLEEP(1);
        END IF;

    END LOOP;

    SELECT CONCAT('Total processed: ', v_rows_processed, ' records') as result;
END $$
DELIMITER ;

3. 错误处理和回滚策略:

-- MySQL错误处理和回滚策略示例
DELIMITER $$
CREATE PROCEDURE SafeDeleteEmployees()
BEGIN
    DECLARE v_employee_count INT DEFAULT 0;
    DECLARE v_sales_count INT DEFAULT 0;
    DECLARE EXIT HANDLER FOR SQLEXCEPTION
    BEGIN
        ROLLBACK;
        RESIGNAL;
    END;

    START TRANSACTION;

    -- 删除销售记录
    DELETE FROM t_sales WHERE employee_id_ IN (
        SELECT employee_id_ FROM t_employees WHERE status_ = 'TERMINATED'
    );
    SET v_sales_count = ROW_COUNT();

    -- 删除员工记录
    DELETE FROM t_employees WHERE status_ = 'TERMINATED';
    SET v_employee_count = ROW_COUNT();

    -- 检查结果的合理性
    IF v_employee_count = 0 THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'No employees were deleted, rolling back';
    END IF;

    COMMIT;

    SELECT CONCAT('Successfully deleted ', v_employee_count, ' employees and ', v_sales_count, ' sales records') as result;
END $$
DELIMITER ;

4. 性能监控和调优:

-- 监控多表操作的性能

-- MySQL 性能监控(标准化字段名)
SELECT
    SCHEMA_NAME as database_name,
    SUBSTRING(DIGEST_TEXT, 1, 100) as query_pattern,
    COUNT_STAR as execution_count,
    AVG_TIMER_WAIT/1000000000 as avg_time_seconds,
    SUM_TIMER_WAIT/1000000000 as total_time_seconds,
    SUM_ROWS_EXAMINED as total_rows_examined,
    SUM_ROWS_SENT as total_rows_sent
FROM performance_schema.events_statements_summary_by_digest
WHERE (DIGEST_TEXT LIKE '%UPDATE%t_employees%'
   OR DIGEST_TEXT LIKE '%DELETE%t_employees%')
  AND SCHEMA_NAME IS NOT NULL
ORDER BY AVG_TIMER_WAIT DESC;

5. 常见陷阱和避免方法:

-- 陷阱1:忘记WHERE条件导致全表更新
-- 错误示例
UPDATE t_employees SET salary_ = salary_ * 1.1;  -- 危险!更新所有员工

-- 正确做法:始终包含WHERE条件
UPDATE t_employees SET salary_ = salary_ * 1.1
WHERE department_id_ = 1 AND status_ = 'ACTIVE';

-- 陷阱2:外键约束导致的删除失败
-- 错误示例:直接删除被引用的记录
DELETE FROM t_departments WHERE department_id_ = 1;  -- 可能失败

-- 正确做法:先处理引用关系
UPDATE t_employees SET department_id_ = NULL WHERE department_id_ = 1;
-- 或者先删除引用记录
DELETE FROM t_employees WHERE department_id_ = 1;
DELETE FROM t_departments WHERE department_id_ = 1;

-- 陷阱3:大事务导致的锁等待
-- 错误示例:在一个事务中处理大量数据
BEGIN;
UPDATE t_employees SET salary_ = salary_ * 1.1;  -- 可能锁定大量行
-- ... 其他复杂操作 ...
COMMIT;

-- 正确做法:分批处理
DECLARE @BatchSize INT = 1000;
WHILE EXISTS (SELECT 1 FROM t_employees WHERE salary_updated = 0)
BEGIN
    UPDATE t_employees
    SET salary_ = salary_ * 1.1, salary_updated = 1
    WHERE employee_id_ IN (
        SELECT employee_id_ FROM (
            SELECT employee_id_ FROM t_employees
            WHERE salary_updated = 0
            ORDER BY employee_id_
            LIMIT batch_size
        ) tmp
    )
    WHERE salary_updated = 0;
END;

这些多表操作的详细分析和最佳实践,帮助开发者在实际项目中更好地处理复杂的数据操作需求,避免常见的性能问题和数据一致性问题。


7.4 数据库迁移注意事项

7.4.1 MySQL迁移策略和最佳实践

业务场景: 系统升级、数据中心迁移、云平台迁移、MySQL版本升级

-- MySQL迁移前的准备工作

-- 1. 检查当前MySQL版本和配置
SELECT VERSION() as mysql_version;
SHOW VARIABLES LIKE 'innodb%';
SHOW VARIABLES LIKE 'sql_mode';

-- 2. 分析数据库大小和表结构
SELECT
    table_schema as database_name,
    ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) as size_mb,
    COUNT(*) as table_count
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema', 'mysql', 'performance_schema', 'sys')
GROUP BY table_schema
ORDER BY size_mb DESC;

-- 3. 检查存储引擎使用情况
SELECT
    engine,
    COUNT(*) as table_count,
    ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) as total_size_mb
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema', 'mysql', 'performance_schema', 'sys')
GROUP BY engine;

-- 4. 检查字符集和排序规则
SELECT
    table_schema,
    table_name,
    table_collation,
    COUNT(*) as column_count
FROM information_schema.tables t
JOIN information_schema.columns c ON t.table_schema = c.table_schema AND t.table_name = c.table_name
WHERE t.table_schema NOT IN ('information_schema', 'mysql', 'performance_schema', 'sys')
GROUP BY table_schema, table_name, table_collation
ORDER BY table_schema, table_name;

-- 5. 检查外键约束
SELECT
    constraint_schema,
    table_name,
    constraint_name,
    referenced_table_name
FROM information_schema.referential_constraints
WHERE constraint_schema NOT IN ('information_schema', 'mysql', 'performance_schema', 'sys');

-- MySQL迁移数据导出
-- 使用mysqldump进行逻辑备份
-- mysqldump -u root -p --single-transaction --routines --triggers --events database_name > backup.sql

-- 大表的分批导出策略
-- 对于超大表,使用WHERE条件分批导出
-- mysqldump -u root -p --single-transaction --where="id >= 1 AND id < 100000" database_name table_name > table_part1.sql
-- mysqldump -u root -p --single-transaction --where="id >= 100000 AND id < 200000" database_name table_name > table_part2.sql

-- 物理备份方案(适用于大数据量)
-- 使用MySQL Enterprise Backup或Percona XtraBackup
-- xtrabackup --backup --target-dir=/backup/full-backup
7.4.2 MySQL版本兼容性处理
-- MySQL 5.7 到 MySQL 8.0 迁移注意事项

-- 1. SQL_MODE变化处理
-- MySQL 8.0默认启用了更严格的SQL_MODE
SET sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_DATE,NO_ZERO_IN_DATE,ERROR_FOR_DIVISION_BY_ZERO';

-- 检查可能受影响的查询
-- 查找使用了GROUP BY但未包含所有非聚合列的查询
SELECT
    table_schema,
    table_name,
    column_name
FROM information_schema.columns
WHERE table_schema = DATABASE()
  AND column_name NOT IN (
    SELECT column_name
    FROM information_schema.statistics
    WHERE table_schema = DATABASE()
  );

-- 2. 密码验证插件变化
-- MySQL 8.0使用caching_sha2_password作为默认认证插件
-- 如果需要兼容旧客户端,可以修改用户认证方式
ALTER USER 'username'@'host' IDENTIFIED WITH mysql_native_password BY 'password';

-- 3. 保留字变化检查
-- MySQL 8.0新增了一些保留字,检查表名和列名是否冲突
SELECT
    table_schema,
    table_name,
    column_name
FROM information_schema.columns
WHERE column_name IN ('RANK', 'DENSE_RANK', 'ROW_NUMBER', 'LEAD', 'LAG', 'FIRST_VALUE', 'LAST_VALUE')
  AND table_schema NOT IN ('information_schema', 'mysql', 'performance_schema', 'sys');

-- 4. 字符集和排序规则升级
-- MySQL 8.0默认字符集从latin1改为utf8mb4
ALTER DATABASE your_database CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- 批量修改表的字符集
SELECT CONCAT('ALTER TABLE ', table_name, ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;') as alter_sql
FROM information_schema.tables
WHERE table_schema = DATABASE()
  AND table_type = 'BASE TABLE';

-- 5. 时间戳默认值处理
-- MySQL 8.0对TIMESTAMP的默认值处理更严格
-- 检查可能有问题的TIMESTAMP列
SELECT
    table_schema,
    table_name,
    column_name,
    column_default,
    is_nullable
FROM information_schema.columns
WHERE data_type = 'timestamp'
  AND column_default IS NULL
  AND is_nullable = 'NO'
  AND table_schema NOT IN ('information_schema', 'mysql', 'performance_schema', 'sys');
7.4.3 MySQL数据迁移性能优化
-- 大数据量迁移的性能优化策略

-- 1. 迁移前的性能调优
-- 临时调整MySQL配置以提高导入性能
SET GLOBAL innodb_buffer_pool_size = 2147483648;  -- 2GB
SET GLOBAL innodb_log_file_size = 268435456;      -- 256MB
SET GLOBAL innodb_flush_log_at_trx_commit = 2;    -- 降低持久性要求
SET GLOBAL sync_binlog = 0;                       -- 临时关闭binlog同步
SET GLOBAL foreign_key_checks = 0;                -- 临时关闭外键检查
SET GLOBAL unique_checks = 0;                     -- 临时关闭唯一性检查

-- 2. 分批迁移大表的策略
-- 创建迁移进度跟踪表
CREATE TABLE migration_progress (
    table_name VARCHAR(64) PRIMARY KEY,
    total_rows BIGINT,
    migrated_rows BIGINT DEFAULT 0,
    batch_size INT DEFAULT 10000,
    last_id BIGINT DEFAULT 0,
    start_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_update TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    status ENUM('PENDING', 'IN_PROGRESS', 'COMPLETED', 'FAILED') DEFAULT 'PENDING'
);

-- 分批迁移存储过程
DELIMITER $$
CREATE PROCEDURE MigrateTableInBatches(
    IN source_table VARCHAR(64),
    IN target_table VARCHAR(64),
    IN batch_size INT,
    IN primary_key_column VARCHAR(64)
)
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE current_id BIGINT DEFAULT 0;
    DECLARE max_id BIGINT;
    DECLARE batch_count INT DEFAULT 0;
    DECLARE total_migrated BIGINT DEFAULT 0;

    -- 获取最大ID
    SET @sql = CONCAT('SELECT MAX(', primary_key_column, ') INTO @max_id FROM ', source_table);
    PREPARE stmt FROM @sql;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    SET max_id = @max_id;

    -- 初始化进度记录
    INSERT INTO migration_progress (table_name, total_rows, batch_size)
    SELECT source_table, COUNT(*), batch_size FROM information_schema.tables WHERE table_name = source_table
    ON DUPLICATE KEY UPDATE
        total_rows = VALUES(total_rows),
        batch_size = VALUES(batch_size),
        status = 'IN_PROGRESS';

    migration_loop: WHILE current_id < max_id DO
        -- 分批插入数据
        SET @sql = CONCAT(
            'INSERT INTO ', target_table,
            ' SELECT * FROM ', source_table,
            ' WHERE ', primary_key_column, ' > ', current_id,
            ' AND ', primary_key_column, ' <= ', current_id + batch_size
        );

        PREPARE stmt FROM @sql;
        EXECUTE stmt;
        DEALLOCATE PREPARE stmt;

        SET batch_count = ROW_COUNT();
        SET total_migrated = total_migrated + batch_count;
        SET current_id = current_id + batch_size;

        -- 更新进度
        UPDATE migration_progress
        SET migrated_rows = total_migrated,
            last_id = current_id,
            last_update = NOW()
        WHERE table_name = source_table;

        -- 短暂休息,避免系统负载过高
        SELECT SLEEP(0.1);

        -- 如果当前批次没有数据,跳出循环
        IF batch_count = 0 THEN
            LEAVE migration_loop;
        END IF;

    END WHILE;

    -- 标记完成
    UPDATE migration_progress
    SET status = 'COMPLETED',
        last_update = NOW()
    WHERE table_name = source_table;

    SELECT CONCAT('Migration completed for table: ', source_table, ', Total rows: ', total_migrated) as result;
END $$
DELIMITER ;

-- 3. 迁移后的数据验证
-- 数据一致性检查
SELECT
    'source_table' as table_type,
    COUNT(*) as row_count,
    SUM(CRC32(CONCAT_WS('|', col1, col2, col3))) as checksum
FROM source_table
UNION ALL
SELECT
    'target_table' as table_type,
    COUNT(*) as row_count,
    SUM(CRC32(CONCAT_WS('|', col1, col2, col3))) as checksum
FROM target_table;

-- 4. 迁移后的性能恢复
-- 恢复原始配置
SET GLOBAL foreign_key_checks = 1;
SET GLOBAL unique_checks = 1;
SET GLOBAL innodb_flush_log_at_trx_commit = 1;
SET GLOBAL sync_binlog = 1;

-- 重建统计信息
ANALYZE TABLE target_table;

-- 检查索引使用情况
SELECT
    table_schema,
    table_name,
    index_name,
    cardinality,
    sub_part,
    packed,
    nullable,
    index_type
FROM information_schema.statistics
WHERE table_schema = DATABASE()
  AND table_name = 'target_table'
ORDER BY table_name, seq_in_index;
7.4.4 MySQL迁移常见问题和解决方案
-- 常见迁移问题的诊断和解决

-- 1. 字符集问题诊断
-- 检查数据中的字符集问题
SELECT
    table_schema,
    table_name,
    column_name,
    character_set_name,
    collation_name
FROM information_schema.columns
WHERE character_set_name IS NOT NULL
  AND table_schema = DATABASE()
ORDER BY table_name, ordinal_position;

-- 修复字符集问题
-- 先备份数据,然后修改字符集
ALTER TABLE problem_table MODIFY COLUMN text_column TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- 2. 自增ID冲突解决
-- 检查自增ID的当前值
SELECT
    table_schema,
    table_name,
    auto_increment
FROM information_schema.tables
WHERE auto_increment IS NOT NULL
  AND table_schema = DATABASE()

-- 调整自增ID起始值
ALTER TABLE target_table AUTO_INCREMENT = 1000000;

-- 3. 外键约束问题
-- 临时禁用外键检查进行数据导入
SET foreign_key_checks = 0;
-- 执行数据导入
-- ...
SET foreign_key_checks = 1;

-- 检查外键约束的完整性
SELECT
    table_name,
    constraint_name,
    referenced_table_name,
    referenced_column_name
FROM information_schema.key_column_usage
WHERE referenced_table_name IS NOT NULL
  AND table_schema = DATABASE();

-- 4. 大事务导致的锁等待
-- 监控长时间运行的事务
SELECT
    p.id,
    p.user,
    p.host,
    p.db,
    p.command,
    p.time,
    p.state,
    p.info
FROM information_schema.processlist p
WHERE p.command != 'Sleep'
  AND p.time > 300  -- 超过5分钟的事务
ORDER BY p.time DESC;

-- 5. 迁移性能监控
-- 创建迁移性能监控视图
CREATE VIEW migration_performance AS
SELECT
    table_name,
    total_rows,
    migrated_rows,
    ROUND((migrated_rows / total_rows) * 100, 2) as progress_percent,
    batch_size,
    TIMESTAMPDIFF(SECOND, start_time, last_update) as elapsed_seconds,
    ROUND(migrated_rows / TIMESTAMPDIFF(SECOND, start_time, last_update), 2) as rows_per_second,
    status
FROM migration_progress
WHERE status IN ('IN_PROGRESS', 'COMPLETED');

-- 查看迁移进度
SELECT * FROM migration_performance ORDER BY progress_percent DESC;
7.4.5 跨平台SQL兼容性处理
-- 业务场景:从其他数据库系统迁移到MySQL时的SQL语法兼容性处理

-- 1. PostgreSQL到MySQL的语法转换

-- PostgreSQL语法(不兼容)
-- CREATE OR REPLACE FUNCTION get_employee_count(dept_id INT)
-- RETURNS INT AS $$
-- BEGIN
--     RETURN (SELECT COUNT(*) FROM employees WHERE department_id = dept_id);
-- END;
-- $$ LANGUAGE plpgsql;

-- ✅ MySQL兼容语法
DELIMITER $$
CREATE FUNCTION get_employee_count(dept_id INT)
RETURNS INT
READS SQL DATA
DETERMINISTIC
BEGIN
    DECLARE emp_count INT DEFAULT 0;
    SELECT COUNT(*) INTO emp_count
    FROM t_employees
    WHERE department_id_ = dept_id;
    RETURN emp_count;
END $$
DELIMITER ;

-- 2. Oracle到MySQL的语法转换

-- Oracle语法(不兼容)
-- SELECT employee_id, name,
--        ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as rank
-- FROM employees
-- WHERE ROWNUM <= 10;

-- ✅ MySQL兼容语法
SELECT
    employee_id_,
    name_,
    ROW_NUMBER() OVER (PARTITION BY department_id_ ORDER BY salary_ DESC) as rank_num
FROM t_employees
ORDER BY department_id_, salary_ DESC
LIMIT 10;

-- 3. SQL Server到MySQL的语法转换

-- SQL Server语法(不兼容)
-- SELECT TOP 10 employee_id, name, salary
-- FROM employees
-- WHERE department_id = 1
-- ORDER BY salary DESC;

-- ✅ MySQL兼容语法
SELECT employee_id_, name_, salary_
FROM t_employees
WHERE department_id_ = 1
ORDER BY salary_ DESC
LIMIT 10;

-- 4. 日期函数兼容性处理

-- PostgreSQL/SQL Server语法(不兼容)
-- SELECT * FROM employees WHERE EXTRACT(YEAR FROM hire_date) = 2023;
-- SELECT * FROM employees WHERE YEAR(hire_date) = 2023;  -- SQL Server

-- ✅ MySQL兼容语法
SELECT * FROM t_employees WHERE YEAR(hire_date_) = 2023;
-- 或者使用更高效的范围查询
SELECT * FROM t_employees
WHERE hire_date_ >= '2023-01-01'
  AND hire_date_ < '2024-01-01';

-- 5. 字符串函数兼容性处理

-- PostgreSQL语法(不兼容)
-- SELECT * FROM employees WHERE name ILIKE '%john%';

-- ✅ MySQL兼容语法
SELECT * FROM t_employees WHERE UPPER(name_) LIKE UPPER('%john%');
-- 或者创建函数索引提高性能
CREATE INDEX idx_name_upper ON t_employees ((UPPER(name_)));

-- 6. 递归查询兼容性(MySQL 8.0+)

-- PostgreSQL语法
-- WITH RECURSIVE employee_hierarchy AS (
--     SELECT employee_id, name, manager_id, 1 as level
--     FROM employees WHERE manager_id IS NULL
--     UNION ALL
--     SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
--     FROM employees e
--     JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
-- )
-- SELECT * FROM employee_hierarchy;

-- ✅ MySQL 8.0兼容语法
WITH RECURSIVE employee_hierarchy AS (
    SELECT employee_id_, name_, manager_id_, 1 as level_
    FROM t_employees WHERE manager_id_ IS NULL
    UNION ALL
    SELECT e.employee_id_, e.name_, e.manager_id_, eh.level_ + 1
    FROM t_employees e
    JOIN employee_hierarchy eh ON e.manager_id_ = eh.employee_id_
)
SELECT * FROM employee_hierarchy;

-- 7. 窗口函数兼容性处理

-- Oracle语法(部分不兼容)
-- SELECT employee_id, salary,
--        FIRST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary DESC
--                                  ROWS UNBOUNDED PRECEDING) as max_salary
-- FROM employees;

-- ✅ MySQL 8.0兼容语法
SELECT
    employee_id_,
    salary_,
    FIRST_VALUE(salary_) OVER (
        PARTITION BY department_id_
        ORDER BY salary_ DESC
        ROWS UNBOUNDED PRECEDING
    ) as max_salary
FROM t_employees;

-- 8. 批量操作兼容性处理

-- PostgreSQL语法(不兼容)
-- INSERT INTO employees (name, department_id)
-- VALUES ('John', 1), ('Jane', 2)
-- ON CONFLICT (employee_id) DO UPDATE SET
--     name = EXCLUDED.name,
--     department_id = EXCLUDED.department_id;

-- ✅ MySQL兼容语法
INSERT INTO t_employees (name_, department_id_)
VALUES ('John', 1), ('Jane', 2)
ON DUPLICATE KEY UPDATE
    name_ = VALUES(name_),
    department_id_ = VALUES(department_id_);

-- 9. 事务隔离级别兼容性

-- PostgreSQL语法
-- SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- ✅ MySQL兼容语法
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- 或者设置会话级别
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- 10. 索引创建兼容性

-- PostgreSQL语法(部分不兼容)
-- CREATE INDEX CONCURRENTLY idx_employee_name ON employees (name);

-- ✅ MySQL兼容语法(MySQL 8.0.12+支持在线DDL)
CREATE INDEX idx_employee_name ON t_employees (name_);
-- 对于大表,使用在线DDL
ALTER TABLE t_employees ADD INDEX idx_employee_name (name_), ALGORITHM=INPLACE, LOCK=NONE;
7.4.6 数据类型映射和转换
-- 业务场景:从其他数据库系统迁移到MySQL时的数据类型映射和转换

-- 1. PostgreSQL到MySQL数据类型映射

-- PostgreSQL: SERIAL -> MySQL: INT AUTO_INCREMENT
-- PostgreSQL语法
-- CREATE TABLE employees (
--     id SERIAL PRIMARY KEY,
--     name VARCHAR(100)
-- );

-- ✅ MySQL兼容语法
CREATE TABLE t_employees (
    employee_id_ INT AUTO_INCREMENT PRIMARY KEY,
    name_ VARCHAR(100)
);

-- PostgreSQL: BOOLEAN -> MySQL: TINYINT(1) 或 BOOLEAN
-- PostgreSQL语法
-- ALTER TABLE employees ADD COLUMN is_active BOOLEAN DEFAULT TRUE;

-- ✅ MySQL兼容语法
ALTER TABLE t_employees ADD COLUMN is_active_ BOOLEAN DEFAULT TRUE;
-- 或者使用TINYINT
ALTER TABLE t_employees ADD COLUMN is_active_ TINYINT(1) DEFAULT 1;

-- PostgreSQL: TEXT -> MySQL: TEXT 或 LONGTEXT
-- PostgreSQL语法
-- ALTER TABLE employees ADD COLUMN description TEXT;

-- ✅ MySQL兼容语法
ALTER TABLE t_employees ADD COLUMN description_ TEXT;
-- 对于更大的文本,使用LONGTEXT
ALTER TABLE t_employees ADD COLUMN large_description_ LONGTEXT;

-- 2. Oracle到MySQL数据类型映射

-- Oracle: NUMBER -> MySQL: DECIMAL/INT
-- Oracle语法
-- CREATE TABLE products (
--     id NUMBER(10),
--     price NUMBER(10,2),
--     quantity NUMBER
-- );

-- ✅ MySQL兼容语法
CREATE TABLE t_products (
    product_id_ INT,
    price_ DECIMAL(10,2),
    quantity_ INT
);

-- Oracle: VARCHAR2 -> MySQL: VARCHAR
-- Oracle语法
-- ALTER TABLE products ADD product_name VARCHAR2(255);

-- ✅ MySQL兼容语法
ALTER TABLE t_products ADD COLUMN product_name_ VARCHAR(255);

-- Oracle: CLOB -> MySQL: LONGTEXT
-- Oracle语法
-- ALTER TABLE products ADD description CLOB;

-- ✅ MySQL兼容语法
ALTER TABLE t_products ADD COLUMN description_ LONGTEXT;

-- Oracle: DATE -> MySQL: DATETIME
-- Oracle语法
-- ALTER TABLE products ADD created_date DATE;

-- ✅ MySQL兼容语法
ALTER TABLE t_products ADD COLUMN created_date_ DATETIME DEFAULT CURRENT_TIMESTAMP;

-- 3. SQL Server到MySQL数据类型映射

-- SQL Server: IDENTITY -> MySQL: AUTO_INCREMENT
-- SQL Server语法
-- CREATE TABLE customers (
--     id INT IDENTITY(1,1) PRIMARY KEY,
--     name NVARCHAR(100)
-- );

-- ✅ MySQL兼容语法
CREATE TABLE t_customers (
    customer_id_ INT AUTO_INCREMENT PRIMARY KEY,
    name_ VARCHAR(100) CHARACTER SET utf8mb4
);

-- SQL Server: NVARCHAR -> MySQL: VARCHAR with utf8mb4
-- SQL Server语法
-- ALTER TABLE customers ADD address NVARCHAR(500);

-- ✅ MySQL兼容语法
ALTER TABLE t_customers ADD COLUMN address_ VARCHAR(500) CHARACTER SET utf8mb4;

-- SQL Server: DATETIME2 -> MySQL: DATETIME(6)
-- SQL Server语法
-- ALTER TABLE customers ADD created_at DATETIME2;

-- ✅ MySQL兼容语法
ALTER TABLE t_customers ADD COLUMN created_at_ DATETIME(6) DEFAULT CURRENT_TIMESTAMP(6);

-- SQL Server: BIT -> MySQL: TINYINT(1)
-- SQL Server语法
-- ALTER TABLE customers ADD is_vip BIT DEFAULT 0;

-- ✅ MySQL兼容语法
ALTER TABLE t_customers ADD COLUMN is_vip_ TINYINT(1) DEFAULT 0;

-- 4. 数据类型转换最佳实践

-- 创建数据类型映射参考表
CREATE TABLE data_type_mapping (
    source_db VARCHAR(20),
    source_type VARCHAR(50),
    mysql_type VARCHAR(50),
    notes TEXT,
    example_conversion TEXT
);

INSERT INTO data_type_mapping VALUES
('PostgreSQL', 'SERIAL', 'INT AUTO_INCREMENT', '自增主键', 'id SERIAL -> employee_id_ INT AUTO_INCREMENT'),
('PostgreSQL', 'BOOLEAN', 'TINYINT(1)', '布尔值', 'is_active BOOLEAN -> is_active_ TINYINT(1)'),
('PostgreSQL', 'TEXT', 'TEXT/LONGTEXT', '长文本', 'description TEXT -> description_ TEXT'),
('Oracle', 'NUMBER(p,s)', 'DECIMAL(p,s)', '精确数值', 'price NUMBER(10,2) -> price_ DECIMAL(10,2)'),
('Oracle', 'VARCHAR2(n)', 'VARCHAR(n)', '变长字符串', 'name VARCHAR2(100) -> name_ VARCHAR(100)'),
('Oracle', 'CLOB', 'LONGTEXT', '大文本对象', 'content CLOB -> content_ LONGTEXT'),
('SQL Server', 'IDENTITY', 'AUTO_INCREMENT', '自增标识', 'id INT IDENTITY -> id_ INT AUTO_INCREMENT'),
('SQL Server', 'NVARCHAR(n)', 'VARCHAR(n) utf8mb4', 'Unicode字符串', 'name NVARCHAR(100) -> name_ VARCHAR(100) utf8mb4'),
('SQL Server', 'DATETIME2', 'DATETIME(6)', '高精度日期时间', 'created DATETIME2 -> created_ DATETIME(6)');

-- 查看数据类型映射参考
SELECT * FROM data_type_mapping WHERE source_db = 'PostgreSQL';

-- 5. 字符集和排序规则转换

-- 设置数据库默认字符集
ALTER DATABASE your_database CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- 转换现有表的字符集
ALTER TABLE t_employees CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

-- 检查字符集转换结果
SELECT
    table_name,
    table_collation,
    column_name,
    character_set_name,
    collation_name
FROM information_schema.columns
WHERE table_schema = DATABASE()
  AND table_name = 't_employees'
  AND data_type IN ('varchar', 'char', 'text');

-- 6. 数值精度转换处理

-- 创建精度转换验证函数
DELIMITER $$
CREATE FUNCTION validate_numeric_precision(
    original_value DECIMAL(65,30),
    target_precision INT,
    target_scale INT
) RETURNS BOOLEAN
READS SQL DATA
DETERMINISTIC
BEGIN
    DECLARE max_value DECIMAL(65,30);
    DECLARE min_value DECIMAL(65,30);

    SET max_value = POW(10, target_precision - target_scale) - POW(10, -target_scale);
    SET min_value = -max_value;

    RETURN (original_value BETWEEN min_value AND max_value);
END $$
DELIMITER ;

-- 使用示例:验证Oracle NUMBER(10,2)到MySQL DECIMAL(10,2)的转换
SELECT
    product_id_,
    price_,
    validate_numeric_precision(price_, 10, 2) as is_valid_precision
FROM t_products
WHERE NOT validate_numeric_precision(price_, 10, 2);

-- 7. 日期时间格式转换

-- 创建日期格式转换函数
DELIMITER $$
CREATE FUNCTION convert_oracle_date(oracle_date_str VARCHAR(50))
RETURNS DATETIME
DETERMINISTIC
BEGIN
    -- Oracle: DD-MON-YYYY -> MySQL: YYYY-MM-DD HH:MM:SS
    DECLARE mysql_datetime DATETIME;

    -- 简化示例:实际实现需要处理各种Oracle日期格式
    SET mysql_datetime = STR_TO_DATE(oracle_date_str, '%d-%b-%Y');

    RETURN mysql_datetime;
END $$
DELIMITER ;

-- 使用示例
SELECT convert_oracle_date('15-JAN-2023') as converted_date;

-- 8. 数据类型转换验证脚本

-- 创建转换验证存储过程
DELIMITER $$
CREATE PROCEDURE validate_data_type_conversion(
    IN table_name VARCHAR(64),
    IN column_name VARCHAR(64),
    IN expected_type VARCHAR(50)
)
BEGIN
    DECLARE actual_type VARCHAR(50);

    SELECT data_type INTO actual_type
    FROM information_schema.columns
    WHERE table_schema = DATABASE()
      AND table_name = table_name
      AND column_name = column_name;

    IF actual_type = expected_type THEN
        SELECT CONCAT('✅ ', table_name, '.', column_name, ' 类型转换正确: ', actual_type) as result;
    ELSE
        SELECT CONCAT('❌ ', table_name, '.', column_name, ' 类型转换错误: 期望 ', expected_type, ', 实际 ', actual_type) as result;
    END IF;
END $$
DELIMITER ;

-- 验证转换结果
CALL validate_data_type_conversion('t_employees', 'employee_id_', 'int');
CALL validate_data_type_conversion('t_employees', 'name_', 'varchar');
CALL validate_data_type_conversion('t_employees', 'salary_', 'decimal');

5. 性能优化实践

性能优化是数据库管理的核心技能,需要深入理解各数据库系统的特性和优化策略。本章将详细介绍各数据库系统的特定优化技巧。

5.1 MySQL 8.0 特定优化

MySQL 8.0引入了许多新特性和改进,为性能优化提供了更多选择。

5.1.1 InnoDB存储引擎优化
-- InnoDB缓冲池优化
-- 查看缓冲池状态
SELECT
    pool_id,
    pool_size,
    free_buffers,
    database_pages,
    old_database_pages,
    modified_database_pages
FROM information_schema.innodb_buffer_pool_stats;

-- 缓冲池命中率(修复版本:处理除零错误和数据类型转换)
SELECT
    CASE
        WHEN CAST(b.Innodb_buffer_pool_read_requests AS UNSIGNED) = 0 THEN 0
        ELSE ROUND((1 - (CAST(a.Innodb_buffer_pool_reads AS UNSIGNED) / CAST(b.Innodb_buffer_pool_read_requests AS UNSIGNED))) * 100, 2)
    END as buffer_pool_hit_rate,
    CAST(a.Innodb_buffer_pool_reads AS UNSIGNED) as total_reads,
    CAST(b.Innodb_buffer_pool_read_requests AS UNSIGNED) as total_requests
FROM
    (SELECT variable_value as Innodb_buffer_pool_reads FROM performance_schema.global_status WHERE variable_name = 'Innodb_buffer_pool_reads') a,
    (SELECT variable_value as Innodb_buffer_pool_read_requests FROM performance_schema.global_status WHERE variable_name = 'Innodb_buffer_pool_read_requests') b;

-- InnoDB配置优化示例
```ini
[mysqld]
# 缓冲池大小(建议设为物理内存的70-80%)
innodb_buffer_pool_size = 8G
innodb_buffer_pool_instances = 8

# 日志文件优化
innodb_log_file_size = 1G
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 2

# 并发优化
innodb_thread_concurrency = 0
innodb_read_io_threads = 8
innodb_write_io_threads = 8

# 页面大小优化
innodb_page_size = 16K

# 自适应哈希索引
innodb_adaptive_hash_index = ON
-- 查看InnoDB状态
SHOW ENGINE INNODB STATUS;

-- 分析表碎片
SELECT
    table_schema,
    table_name,
    ROUND(((data_length + index_length) / 1024 / 1024), 2) as table_size_mb,
    ROUND((data_free / 1024 / 1024), 2) as free_space_mb,
    ROUND((data_free / (data_length + index_length)) * 100, 2) as fragmentation_percent
FROM information_schema.tables
WHERE table_schema = DATABASE()
  AND data_free > 0
ORDER BY fragmentation_percent DESC;

-- 优化表碎片
OPTIMIZE TABLE t_employees;
ALTER TABLE t_employees ENGINE=InnoDB;

-- MySQL 8.0 不可见索引特性
CREATE INDEX idx_emp_invisible ON t_employees (hire_date_) INVISIBLE;

-- 测试查询性能(索引不可见)
EXPLAIN SELECT * FROM t_employees WHERE hire_date_ > '2020-01-01';

-- 使索引可见
ALTER TABLE t_employees ALTER INDEX idx_emp_invisible VISIBLE;

-- MySQL 8.0 降序索引
CREATE INDEX idx_salary_desc ON t_employees (salary_ DESC, hire_date_ ASC);

-- MySQL 8.0 函数索引
CREATE INDEX idx_upper_name ON t_employees ((UPPER(name_)));
SELECT * FROM t_employees WHERE UPPER(name_) = 'JOHN SMITH';

-- MySQL 8.0 多值索引(JSON数组)
ALTER TABLE t_employees ADD COLUMN skills JSON;
CREATE INDEX idx_skills ON t_employees ((CAST(skills->'$[*]' AS CHAR(50) ARRAY)));
5.1.2 查询缓存和缓冲池调优
-- MySQL 8.0移除了查询缓存,但可以使用其他缓存策略

-- 预编译语句缓存
-- 查看预编译语句缓存状态
SELECT
    variable_name,
    variable_value
FROM performance_schema.global_status
WHERE variable_name LIKE 'Com_stmt%'
   OR variable_name LIKE 'Prepared_stmt%';

-- 临时表优化
SELECT
    variable_name,
    variable_value
FROM performance_schema.global_status
WHERE variable_name IN (
    'Created_tmp_tables',
    'Created_tmp_disk_tables'
);

-- 如果Created_tmp_disk_tables过高,需要调整临时表大小
-- SET GLOBAL tmp_table_size = 256M;
-- SET GLOBAL max_heap_table_size = 256M;

-- 排序缓冲区优化
SELECT
    variable_name,
    variable_value
FROM performance_schema.global_status
WHERE variable_name LIKE 'Sort%';
5.1.3 MySQL 8.0新特性应用
-- MySQL 8.0 窗口函数高级应用
-- 计算每个部门的薪资排名和百分位数
SELECT
    employee_id_,
    name_,
    department_id_,
    salary_,
    ROW_NUMBER() OVER (PARTITION BY department_id_ ORDER BY salary_ DESC) as salary_rank,
    RANK() OVER (PARTITION BY department_id_ ORDER BY salary_ DESC) as salary_rank_with_ties,
    PERCENT_RANK() OVER (PARTITION BY department_id_ ORDER BY salary_) as salary_percentile,
    CUME_DIST() OVER (PARTITION BY department_id_ ORDER BY salary_) as cumulative_dist,
    NTILE(4) OVER (PARTITION BY department_id_ ORDER BY salary_) as salary_quartile
FROM t_employees;

-- 使用LAG和LEAD函数分析薪资变化趋势
SELECT
    employee_id_,
    name_,
    salary_,
    LAG(salary_, 1) OVER (PARTITION BY employee_id_ ORDER BY hire_date_) as prev_salary,
    LEAD(salary_, 1) OVER (PARTITION BY employee_id_ ORDER BY hire_date_) as next_salary,
    salary_ - LAG(salary_, 1) OVER (PARTITION BY employee_id_ ORDER BY hire_date_) as salary_increase
FROM t_employee_history;

-- MySQL 8.0 递归CTE(公用表表达式)
-- 构建部门层次结构
WITH RECURSIVE dept_hierarchy AS (
    -- 锚点:顶级部门
    SELECT department_id_, department_name_, parent_department_id_, 0 as level
    FROM t_departments
    WHERE parent_department_id_ IS NULL

    UNION ALL

    -- 递归:子部门
    SELECT d.department_id_, d.department_name_, d.parent_department_id_, dh.level + 1
    FROM t_departments d
    INNER JOIN dept_hierarchy dh ON d.parent_department_id_ = dh.department_id_
)
SELECT
    CONCAT(REPEAT('  ', level), department_name_) as hierarchy,
    department_id_,
    level
FROM dept_hierarchy
ORDER BY level, department_name_;

-- MySQL 8.0 JSON函数高级应用
-- 创建包含JSON数据的表
ALTER TABLE t_employees ADD COLUMN profile JSON;

-- 更新JSON数据
UPDATE t_employees
SET profile = JSON_OBJECT(
    'skills', JSON_ARRAY('SQL', 'Python', 'Java'),
    'certifications', JSON_ARRAY('MySQL Certified', 'AWS Certified'),
    'performance_rating', 4.5,
    'last_review_date', '2024-01-15'
)
WHERE employee_id_ = 1;

-- 查询JSON数据
SELECT
    name_,
    JSON_EXTRACT(profile, '$.skills') as skills,
    JSON_UNQUOTE(JSON_EXTRACT(profile, '$.performance_rating')) as rating,
    JSON_LENGTH(profile, '$.skills') as skill_count
FROM t_employees
WHERE JSON_EXTRACT(profile, '$.performance_rating') > 4.0;

-- JSON路径查询
SELECT name_
FROM t_employees
WHERE JSON_CONTAINS(profile, '"SQL"', '$.skills');

-- MySQL 8.0 角色和权限管理
CREATE ROLE 'app_developer', 'app_read', 'app_write';

GRANT SELECT ON hr.* TO 'app_read';
GRANT INSERT, UPDATE, DELETE ON hr.* TO 'app_write';
GRANT ALL PRIVILEGES ON hr.* TO 'app_developer';

-- 为用户分配角色
GRANT 'app_read', 'app_write' TO 'john'@'localhost';
SET DEFAULT ROLE 'app_read' TO 'john'@'localhost';

-- MySQL 8.0 资源组管理
CREATE RESOURCE GROUP batch_jobs
  TYPE = USER
  VCPU = 0-3
  THREAD_PRIORITY = -10;

-- 为会话设置资源组
SET RESOURCE GROUP batch_jobs;

-- MySQL 8.0 克隆插件(这里就不详细展开,只是告诉大家有这个功能)
INSTALL PLUGIN clone SONAME 'mysql_clone.so';

-- 本地克隆
CLONE LOCAL DATA DIRECTORY = '/path/to/clone';

-- MySQL 8.0 直方图统计
ANALYZE TABLE t_employees UPDATE HISTOGRAM ON salary_, hire_date_ WITH 100 BUCKETS;

-- 查看直方图信息
SELECT
    SCHEMA_NAME,
    TABLE_NAME,
    COLUMN_NAME,
    JSON_EXTRACT(HISTOGRAM, '$.buckets[0]') as first_bucket
FROM information_schema.COLUMN_STATISTICS
WHERE TABLE_NAME = 't_employees';

-- MySQL 8.0 窗口函数性能优化
SELECT
    employee_id_,
    name_,
    salary_,
    AVG(salary_) OVER (PARTITION BY department_id_) as dept_avg,
    RANK() OVER (PARTITION BY department_id_ ORDER BY salary_ DESC) as salary_rank
FROM t_employees;

-- JSON函数优化
-- 为JSON路径创建函数索引(假设profile列已存在)
CREATE INDEX idx_emp_skills ON t_employees ((CAST(profile->'$.skills[0]' AS CHAR(50))));

-- 查询JSON数据
SELECT
    employee_id_,
    name_,
    JSON_EXTRACT(profile, '$.skills') as skills
FROM t_employees
WHERE JSON_CONTAINS(profile->'$.skills', '"MySQL"');

-- 不可见索引测试
CREATE INDEX idx_test ON t_employees (hire_date_) INVISIBLE;
-- 测试性能后决定是否设为可见
-- ALTER TABLE t_employees ALTER INDEX idx_test VISIBLE;

-- 降序索引优化ORDER BY DESC查询
CREATE INDEX idx_salary_desc ON t_employees (salary_ DESC);
SELECT * FROM t_employees ORDER BY salary_ DESC LIMIT 10;

5.2 跨平台性能对比

5.2.1 基准测试方法
-- 标准化测试查询集合

-- 1. 简单选择查询
-- MySQL
SELECT SQL_NO_CACHE * FROM t_employees WHERE employee_id_ = 1000;

-- 2. 复杂连接查询
SELECT
    e.employee_id_,
    e.name_,
    d.department_name_,
    SUM(s.amount_) as total_sales
FROM t_employees e
JOIN t_departments d ON e.department_id_ = d.department_id_
LEFT JOIN t_sales s ON e.employee_id_ = s.employee_id_
WHERE e.hire_date_ >= '2020-01-01'
GROUP BY e.employee_id_, e.name_, d.department_name_
HAVING SUM(s.amount_) > 10000
ORDER BY total_sales DESC
LIMIT 100;

-- 3. 窗口函数查询
SELECT
    employee_id_,
    name_,
    salary_,
    department_id_,
    ROW_NUMBER() OVER (PARTITION BY department_id_ ORDER BY salary_ DESC) as dept_rank,
    AVG(salary_) OVER (PARTITION BY department_id_) as dept_avg_salary,
    salary_ - AVG(salary_) OVER (PARTITION BY department_id_) as salary_diff
FROM t_employees;

-- 4. 递归查询(层次结构)
-- 场景:MySQL 8.0递归查询,构建员工层次结构
WITH RECURSIVE employee_hierarchy AS (
    SELECT employee_id_, name_, manager_id_, 0 as level
    FROM t_employees
    WHERE manager_id_ IS NULL

    UNION ALL

    SELECT e.employee_id_, e.name_, e.manager_id_, eh.level + 1
    FROM t_employees e
    JOIN employee_hierarchy eh ON e.manager_id_ = eh.employee_id_
    WHERE eh.level < 5
)
SELECT * FROM employee_hierarchy;

5.2.2 实际场景性能分析
-- 性能测试脚本模板

-- 测试1:大批量插入性能
-- 准备测试数据
CREATE TABLE performance_test (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    value DECIMAL(10,2),
    created_date DATE
);

-- MySQL批量插入测试
SET autocommit = 0;
INSERT INTO performance_test VALUES
(1, 'Test1', 100.00, '2023-01-01'),
(2, 'Test2', 200.00, '2023-01-02'),
-- ... 重复10000次
COMMIT;

-- 测试2:复杂查询性能
-- 创建测试索引
CREATE INDEX idx_perf_name ON performance_test (name);
CREATE INDEX idx_perf_value ON performance_test (value);
CREATE INDEX idx_perf_date ON performance_test (created_date);

-- 执行复杂查询并记录时间
SELECT
    YEAR(created_date) as year,
    MONTH(created_date) as month,
    COUNT(*) as record_count,
    AVG(value) as avg_value,
    SUM(value) as total_value,
    MIN(value) as min_value,
    MAX(value) as max_value
FROM performance_test
WHERE created_date BETWEEN '2023-01-01' AND '2023-12-31'
  AND value > 50
GROUP BY YEAR(created_date), MONTH(created_date)
HAVING COUNT(*) > 100
ORDER BY year, month;

-- 测试3:并发性能测试
-- 使用多个连接同时执行更新操作
-- 连接1
BEGIN;
UPDATE performance_test SET value = value * 1.1 WHERE id BETWEEN 1 AND 1000;
-- 延迟提交

-- 连接2
BEGIN;
UPDATE performance_test SET value = value * 1.1 WHERE id BETWEEN 1001 AND 2000;
-- 延迟提交

-- 测试4:内存使用效率
-- 查看各数据库系统的内存使用情况
-- MySQL
SELECT
    (@@innodb_buffer_pool_size / 1024 / 1024) as buffer_pool_mb,
    (@@query_cache_size / 1024 / 1024) as query_cache_mb;

5.2.3 选型建议

基于性能测试结果和特性对比,以下是不同场景的MySQL优化建议:

场景1:高并发OLTP系统

  • 推荐配置: MySQL 8.0 + InnoDB存储引擎
  • 优化重点: 连接池、索引优化、读写分离
  • 关键参数: innodb_buffer_pool_size、max_connections、query_cache_size=0

场景2:数据分析和报表系统

  • 推荐配置: MySQL 8.0 + 列存储引擎(如果需要)
  • 优化重点: 窗口函数、CTE、索引覆盖
  • 关键参数: tmp_table_size、max_heap_table_size、sort_buffer_size

场景3:大数据量存储系统

  • 推荐配置: MySQL 8.0 + 分区表
  • 优化重点: 分区策略、批量操作、归档策略
  • 关键参数: innodb_log_file_size、innodb_flush_log_at_trx_commit

场景4:混合负载系统

  • 推荐配置: MySQL 8.0 + 读写分离 + 缓存层
  • 优化重点: 负载均衡、缓存策略、监控告警
  • 关键参数: 根据具体负载特征调整

性能优化总结:

  1. 硬件选择: SSD存储、充足内存、多核CPU
  2. 配置优化: 根据业务特点调整MySQL参数
  3. 架构设计: 读写分离、分库分表、缓存层
  4. 监控运维: 完善的监控体系和自动化运维

6. MySQL系统表和查询分析工具详解

MySQL提供了丰富的系统表和分析工具,用于监控数据库性能、诊断问题和优化查询。本章将全面介绍这些重要的系统资源,帮助您成为MySQL性能调优专家。

6.1 MySQL系统表概述

MySQL系统表分布在四个主要的系统数据库中,每个都有特定的用途和功能:

6.1.1 系统数据库分类
系统数据库 主要用途 表数量 访问权限 使用频率
INFORMATION_SCHEMA 元数据查询 60+ SELECT 🔴 高频
performance_schema 性能监控 100+ SELECT 🔴 高频
mysql 系统配置 30+ 受限 🟡 中频
sys 系统视图 100+ SELECT 🟢 推荐
6.1.2 权限要求
-- 基本权限配置
-- 查看当前用户权限
SHOW GRANTS FOR CURRENT_USER();

-- 性能监控所需的最小权限
GRANT SELECT ON performance_schema.* TO 'monitor_user'@'%';
GRANT SELECT ON INFORMATION_SCHEMA.* TO 'monitor_user'@'%';
GRANT PROCESS ON *.* TO 'monitor_user'@'%';  -- 查看进程列表
GRANT REPLICATION CLIENT ON *.* TO 'monitor_user'@'%';  -- 查看复制状态

-- 检查performance_schema是否启用
SELECT @@performance_schema;

-- 检查系统表可用性
SHOW TABLES FROM performance_schema LIKE '%events_statements%';
SHOW TABLES FROM INFORMATION_SCHEMA LIKE '%INNODB%';
6.1.3 版本兼容性
MySQL版本 支持特性 重要变化
5.7 基础performance_schema sys库引入
8.0 完整功能支持 新增多个监控表
8.0.13+ 增强的锁监控 data_locks表改进
8.0.20+ 改进的直方图统计 COLUMN_STATISTICS增强

6.2 INFORMATION_SCHEMA系统表

INFORMATION_SCHEMA是MySQL的元数据信息库,提供了数据库结构、表信息、索引统计等重要信息。

6.2.1 表结构和索引相关表
6.2.1.1 INFORMATION_SCHEMA.STATISTICS - 索引统计信息

表用途: 提供所有索引的详细统计信息,包括索引基数、列顺序等关键性能指标。

主要字段:

字段名 数据类型 含义 业务价值
TABLE_SCHEMA VARCHAR(64) 数据库名 定位具体数据库
TABLE_NAME VARCHAR(64) 表名 定位具体表
INDEX_NAME VARCHAR(64) 索引名称 索引标识
COLUMN_NAME VARCHAR(64) 列名 索引包含的列
SEQ_IN_INDEX INT 列在索引中的位置 复合索引顺序
CARDINALITY BIGINT 索引基数(唯一值数量) 索引选择性评估
SUB_PART INT 前缀索引长度 前缀索引优化
NULLABLE VARCHAR(3) 是否允许NULL 索引设计参考
INDEX_TYPE VARCHAR(16) 索引类型 BTREE/HASH等

使用场景:

  • 分析索引选择性,识别低效索引
  • 检查复合索引的列顺序是否合理
  • 监控索引基数变化,判断是否需要重建统计信息

查询示例:

-- 业务场景:索引选择性分析 - 识别低选择性索引,优化索引设计
-- 用途:找出基数较低的索引,考虑删除或重新设计
SELECT
    TABLE_SCHEMA as database_name,
    TABLE_NAME as table_name,
    INDEX_NAME as index_name,
    COLUMN_NAME as column_name,
    CARDINALITY as unique_values,
    -- 计算选择性(基数/表行数)
    ROUND(CARDINALITY / (SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES t
                        WHERE t.TABLE_SCHEMA = s.TABLE_SCHEMA
                        AND t.TABLE_NAME = s.TABLE_NAME), 4) as selectivity,
    SUB_PART as prefix_length,
    NULLABLE,
    INDEX_TYPE,
    -- 业务解读:选择性评估
    CASE
        WHEN CARDINALITY / (SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES t
                           WHERE t.TABLE_SCHEMA = s.TABLE_SCHEMA
                           AND t.TABLE_NAME = s.TABLE_NAME) > 0.8 THEN '高选择性-优秀'
        WHEN CARDINALITY / (SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES t
                           WHERE t.TABLE_SCHEMA = s.TABLE_SCHEMA
                           AND t.TABLE_NAME = s.TABLE_NAME) > 0.3 THEN '中选择性-良好'
        ELSE '低选择性-需优化'
    END as selectivity_assessment
FROM INFORMATION_SCHEMA.STATISTICS s
WHERE TABLE_SCHEMA = DATABASE()
  AND TABLE_NAME = 't_employees'
  AND INDEX_NAME != 'PRIMARY'
ORDER BY selectivity DESC, INDEX_NAME, SEQ_IN_INDEX;

-- 反例(不推荐):忽视索引选择性分析
-- 问题:创建了大量低选择性索引,浪费存储空间和维护成本
-- CREATE INDEX idx_low_selectivity ON t_employees(status_); -- 假设status_只有2-3个值
6.2.1.2 INFORMATION_SCHEMA.TABLES - 表基本信息

表用途: 提供数据库中所有表的基本信息,包括存储引擎、行数估算、数据大小等。

主要字段:

字段名 数据类型 含义 业务价值
TABLE_SCHEMA VARCHAR(64) 数据库名 定位数据库
TABLE_NAME VARCHAR(64) 表名 表标识
ENGINE VARCHAR(64) 存储引擎 InnoDB/MyISAM等
TABLE_ROWS BIGINT 行数估算 数据量评估
DATA_LENGTH BIGINT 数据大小(字节) 存储空间使用
INDEX_LENGTH BIGINT 索引大小(字节) 索引空间使用
DATA_FREE BIGINT 碎片空间(字节) 碎片率分析
CREATE_TIME DATETIME 创建时间 表生命周期
UPDATE_TIME DATETIME 最后更新时间 数据活跃度

查询示例:

-- 业务场景:表空间使用分析 - 监控数据库存储使用情况,制定容量规划
-- 用途:识别大表、高碎片率表,制定数据归档和优化策略
SELECT
    TABLE_SCHEMA as database_name,
    TABLE_NAME as table_name,
    ENGINE as storage_engine,
    TABLE_ROWS as estimated_rows,
    ROUND(DATA_LENGTH/1024/1024, 2) as data_size_mb,
    ROUND(INDEX_LENGTH/1024/1024, 2) as index_size_mb,
    ROUND((DATA_LENGTH + INDEX_LENGTH)/1024/1024, 2) as total_size_mb,
    ROUND(DATA_FREE/1024/1024, 2) as free_space_mb,
    -- 计算碎片率
    ROUND((DATA_FREE / (DATA_LENGTH + INDEX_LENGTH)) * 100, 2) as fragmentation_percent,
    CREATE_TIME,
    UPDATE_TIME,
    -- 业务解读:存储状态评估
    CASE
        WHEN (DATA_FREE / (DATA_LENGTH + INDEX_LENGTH)) * 100 > 25 THEN '高碎片-需整理'
        WHEN (DATA_LENGTH + INDEX_LENGTH)/1024/1024 > 1000 THEN '大表-需关注'
        WHEN UPDATE_TIME < DATE_SUB(NOW(), INTERVAL 30 DAY) THEN '冷数据-可归档'
        ELSE '正常状态'
    END as storage_status
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA NOT IN ('information_schema', 'mysql', 'performance_schema', 'sys')
  AND TABLE_TYPE = 'BASE TABLE'
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC
LIMIT 20;

-- 反例(不推荐):忽视表碎片整理
-- 问题:长期不整理碎片,导致存储空间浪费和查询性能下降
-- 解决方案:定期执行 OPTIMIZE TABLE table_name; 或 ALTER TABLE table_name ENGINE=InnoDB;
6.2.1.3 INFORMATION_SCHEMA.PARTITIONS - 分区信息

表用途: 提供表分区的详细信息,用于分区表的管理和优化。

主要字段:

字段名 数据类型 含义 业务价值
TABLE_SCHEMA VARCHAR(64) 数据库名 定位数据库
TABLE_NAME VARCHAR(64) 表名 表标识
PARTITION_NAME VARCHAR(64) 分区名称 分区标识
PARTITION_METHOD VARCHAR(18) 分区方法 RANGE/HASH/LIST等
PARTITION_EXPRESSION LONGTEXT 分区表达式 分区依据
TABLE_ROWS BIGINT 分区行数 数据分布
DATA_LENGTH BIGINT 分区数据大小 存储使用
CREATE_TIME DATETIME 分区创建时间 分区生命周期

查询示例:

-- 业务场景:分区表数据分布分析 - 监控分区数据均衡性,优化分区策略
-- 用途:识别数据倾斜的分区,制定分区维护计划
SELECT
    TABLE_SCHEMA as database_name,
    TABLE_NAME as table_name,
    PARTITION_NAME as partition_name,
    PARTITION_METHOD as partition_method,
    PARTITION_EXPRESSION as partition_key,
    TABLE_ROWS as partition_rows,
    ROUND(DATA_LENGTH/1024/1024, 2) as data_size_mb,
    ROUND(INDEX_LENGTH/1024/1024, 2) as index_size_mb,
    CREATE_TIME as partition_created,
    -- 计算分区数据占比
    ROUND((TABLE_ROWS / (SELECT SUM(TABLE_ROWS) FROM INFORMATION_SCHEMA.PARTITIONS p2
                        WHERE p2.TABLE_SCHEMA = p.TABLE_SCHEMA
                        AND p2.TABLE_NAME = p.TABLE_NAME)) * 100, 2) as data_distribution_percent,
    -- 业务解读:分区状态评估
    CASE
        WHEN TABLE_ROWS = 0 THEN '空分区-可删除'
        WHEN TABLE_ROWS > (SELECT AVG(TABLE_ROWS) * 3 FROM INFORMATION_SCHEMA.PARTITIONS p2
                          WHERE p2.TABLE_SCHEMA = p.TABLE_SCHEMA
                          AND p2.TABLE_NAME = p.TABLE_NAME) THEN '数据倾斜-需调整'
        ELSE '数据均衡'
    END as partition_status
FROM INFORMATION_SCHEMA.PARTITIONS p
WHERE TABLE_SCHEMA = DATABASE()
  AND PARTITION_NAME IS NOT NULL
ORDER BY TABLE_NAME, PARTITION_ORDINAL_POSITION;
6.2.1.4 INFORMATION_SCHEMA.COLUMN_STATISTICS - 列统计信息

表用途: 提供列的直方图统计信息,用于查询优化器的成本估算(MySQL 8.0+)。

主要字段:

字段名 数据类型 含义 业务价值
SCHEMA_NAME VARCHAR(64) 数据库名 定位数据库
TABLE_NAME VARCHAR(64) 表名 表标识
COLUMN_NAME VARCHAR(64) 列名 列标识
HISTOGRAM JSON 直方图数据 数据分布信息

查询示例:

-- 业务场景:列数据分布分析 - 分析列值分布,优化查询条件和索引设计
-- 用途:了解数据倾斜情况,为查询优化提供依据
SELECT
    SCHEMA_NAME as database_name,
    TABLE_NAME as table_name,
    COLUMN_NAME as column_name,
    JSON_EXTRACT(HISTOGRAM, '$.buckets') as histogram_buckets,
    JSON_EXTRACT(HISTOGRAM, '$.data-type') as data_type,
    JSON_EXTRACT(HISTOGRAM, '$.null-values') as null_values_fraction,
    JSON_EXTRACT(HISTOGRAM, '$.collation-id') as collation_id,
    JSON_EXTRACT(HISTOGRAM, '$.last-updated') as last_updated,
    -- 业务解读:数据分布特征
    CASE
        WHEN JSON_EXTRACT(HISTOGRAM, '$.null-values') > 0.5 THEN '高NULL值比例'
        WHEN JSON_LENGTH(JSON_EXTRACT(HISTOGRAM, '$.buckets')) < 10 THEN '数据分布集中'
        ELSE '数据分布均匀'
    END as distribution_characteristic
FROM INFORMATION_SCHEMA.COLUMN_STATISTICS
WHERE SCHEMA_NAME = DATABASE()
  AND TABLE_NAME = 't_employees'
ORDER BY TABLE_NAME, COLUMN_NAME;

-- 创建和更新直方图统计信息
-- ANALYZE TABLE t_employees UPDATE HISTOGRAM ON salary_, department_id_;
-- DROP HISTOGRAM ON t_employees.salary_;
6.2.2 InnoDB引擎相关表
6.2.2.1 INFORMATION_SCHEMA.INNODB_TRX - 事务信息

表用途: 提供当前活跃事务的详细信息,用于事务监控和死锁分析。

主要字段:

字段名 数据类型 含义 业务价值
trx_id VARCHAR(18) 事务ID 事务标识
trx_state VARCHAR(13) 事务状态 RUNNING/LOCK WAIT等
trx_started DATETIME 事务开始时间 事务持续时间
trx_requested_lock_id VARCHAR(105) 请求的锁ID 锁等待分析
trx_wait_started DATETIME 等待开始时间 等待时长
trx_weight BIGINT 事务权重 回滚成本
trx_mysql_thread_id BIGINT MySQL线程ID 关联进程
trx_query VARCHAR(1024) 当前执行的SQL 问题定位
trx_isolation_level VARCHAR(16) 隔离级别 并发控制
trx_rows_locked BIGINT 锁定行数 锁影响范围
trx_rows_modified BIGINT 修改行数 事务影响

查询示例:

-- 业务场景:长事务监控 - 识别长时间运行的事务,避免锁等待和性能问题
-- 用途:监控事务健康状态,及时发现和处理问题事务
SELECT
    trx_id as transaction_id,
    trx_state as transaction_state,
    trx_started as start_time,
    TIMESTAMPDIFF(SECOND, trx_started, NOW()) as duration_seconds,
    trx_mysql_thread_id as thread_id,
    SUBSTRING(trx_query, 1, 100) as current_query,
    trx_isolation_level as isolation_level,
    trx_rows_locked as rows_locked,
    trx_rows_modified as rows_modified,
    trx_weight as transaction_weight,
    -- 等待锁信息
    CASE
        WHEN trx_state = 'LOCK WAIT' THEN CONCAT('等待锁: ', trx_requested_lock_id)
        ELSE '正常运行'
    END as lock_status,
    -- 业务解读:事务状态评估
    CASE
        WHEN TIMESTAMPDIFF(SECOND, trx_started, NOW()) > 300 THEN '长事务-需关注'
        WHEN trx_rows_locked > 10000 THEN '大量锁定-影响并发'
        WHEN trx_state = 'LOCK WAIT' THEN '锁等待-需处理'
        ELSE '正常状态'
    END as transaction_assessment
FROM INFORMATION_SCHEMA.INNODB_TRX
ORDER BY trx_started ASC;

-- 反例(不推荐):忽视长事务监控
-- 问题:长事务占用大量锁资源,影响系统并发性能
-- 解决方案:设置事务超时时间,定期监控和终止异常事务
6.2.3 进程和连接相关表
6.2.3.1 INFORMATION_SCHEMA.PROCESSLIST - 进程列表

表用途: 显示当前所有MySQL连接和正在执行的查询,用于连接监控和问题诊断。

主要字段:

字段名 数据类型 含义 业务价值
ID BIGINT 连接ID 连接标识
USER VARCHAR(32) 用户名 用户识别
HOST VARCHAR(261) 客户端主机 连接来源
DB VARCHAR(64) 当前数据库 操作范围
COMMAND VARCHAR(16) 命令类型 Query/Sleep等
TIME INT 执行时间(秒) 性能指标
STATE VARCHAR(64) 连接状态 执行阶段
INFO LONGTEXT 执行的SQL语句 问题定位

查询示例:

-- 业务场景:活跃连接监控 - 监控数据库连接状态,识别慢查询和异常连接
-- 用途:实时监控数据库负载,快速定位性能问题
SELECT
    ID as connection_id,
    USER as username,
    HOST as client_host,
    DB as current_database,
    COMMAND as command_type,
    TIME as execution_time_seconds,
    STATE as connection_state,
    SUBSTRING(COALESCE(INFO, ''), 1, 100) as current_query,
    -- 业务解读:连接状态评估
    CASE
        WHEN COMMAND = 'Sleep' THEN '空闲连接'
        WHEN TIME > 60 THEN '慢查询-需关注'
        WHEN TIME > 300 THEN '超长查询-需终止'
        WHEN STATE LIKE '%lock%' THEN '锁等待-需处理'
        ELSE '正常执行'
    END as connection_assessment
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE COMMAND != 'Sleep'
   OR (COMMAND = 'Sleep' AND TIME > 3600)  -- 显示长时间空闲的连接
ORDER BY TIME DESC;

-- 终止异常连接的命令(谨慎使用)
-- KILL CONNECTION connection_id;
-- KILL QUERY connection_id;  -- 只终止查询,保留连接

6.3 performance_schema系统表

performance_schema是MySQL的性能监控核心,提供了详细的性能统计信息。

6.3.1 语句执行统计表
6.3.1.1 performance_schema.events_statements_summary_by_digest - 语句摘要统计

表用途: 按SQL语句模式聚合的执行统计信息,是慢查询分析的核心工具。

主要字段:

字段名 数据类型 含义 业务价值
SCHEMA_NAME VARCHAR(64) 数据库名 定位数据库
DIGEST_TEXT LONGTEXT SQL语句模式 查询模式识别
COUNT_STAR BIGINT 执行次数 频率统计
SUM_TIMER_WAIT BIGINT 总执行时间(纳秒) 总耗时
AVG_TIMER_WAIT BIGINT 平均执行时间(纳秒) 平均性能
MIN_TIMER_WAIT BIGINT 最小执行时间(纳秒) 最佳性能
MAX_TIMER_WAIT BIGINT 最大执行时间(纳秒) 最差性能
SUM_ROWS_EXAMINED BIGINT 总检查行数 I/O成本
SUM_ROWS_SENT BIGINT 总返回行数 结果集大小
SUM_CREATED_TMP_TABLES BIGINT 创建临时表次数 内存使用
SUM_CREATED_TMP_DISK_TABLES BIGINT 创建磁盘临时表次数 磁盘I/O

查询示例:

-- 业务场景:慢查询TOP分析 - 识别系统中最耗时的SQL语句模式
-- 用途:性能优化的重点目标识别,资源分配优化
SELECT
    SCHEMA_NAME as database_name,
    SUBSTRING(DIGEST_TEXT, 1, 100) as query_pattern,
    COUNT_STAR as execution_count,
    ROUND(SUM_TIMER_WAIT/1000000000, 3) as total_time_seconds,
    ROUND(AVG_TIMER_WAIT/1000000000, 3) as avg_time_seconds,
    ROUND(MIN_TIMER_WAIT/1000000000, 3) as min_time_seconds,
    ROUND(MAX_TIMER_WAIT/1000000000, 3) as max_time_seconds,
    SUM_ROWS_EXAMINED as total_rows_examined,
    SUM_ROWS_SENT as total_rows_sent,
    ROUND(SUM_ROWS_EXAMINED/COUNT_STAR, 2) as avg_rows_examined_per_query,
    SUM_CREATED_TMP_DISK_TABLES as disk_tmp_tables,
    -- 计算查询效率指标
    ROUND((SUM_ROWS_SENT / SUM_ROWS_EXAMINED) * 100, 2) as efficiency_percent,
    -- 业务解读:性能评估
    CASE
        WHEN AVG_TIMER_WAIT/1000000000 > 10 THEN '严重慢查询-优先优化'
        WHEN AVG_TIMER_WAIT/1000000000 > 1 THEN '慢查询-需优化'
        WHEN SUM_CREATED_TMP_DISK_TABLES > 0 THEN '磁盘临时表-内存不足'
        WHEN (SUM_ROWS_SENT / SUM_ROWS_EXAMINED) < 0.1 THEN '低效率查询-需优化'
        ELSE '性能良好'
    END as performance_assessment
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME IS NOT NULL
  AND COUNT_STAR > 10  -- 过滤执行次数少的查询
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 20;

-- 反例(不推荐):忽视慢查询监控
-- 问题:不定期分析慢查询,导致性能问题积累
-- 解决方案:建立定期的慢查询分析流程,设置性能监控告警
6.3.2 索引和表I/O统计表
6.3.2.1 performance_schema.table_io_waits_summary_by_index_usage - 索引I/O统计

表用途: 提供按索引统计的I/O操作信息,用于分析索引使用效率和识别未使用的索引。

主要字段:

字段名 数据类型 含义 业务价值
OBJECT_SCHEMA VARCHAR(64) 数据库名 定位数据库
OBJECT_NAME VARCHAR(64) 表名 表标识
INDEX_NAME VARCHAR(64) 索引名称 索引标识
COUNT_READ BIGINT 读操作次数 查询频率
COUNT_WRITE BIGINT 写操作次数 维护成本
COUNT_FETCH BIGINT 获取操作次数 访问模式
COUNT_INSERT BIGINT 插入操作次数 插入影响
COUNT_UPDATE BIGINT 更新操作次数 更新影响
COUNT_DELETE BIGINT 删除操作次数 删除影响
SUM_TIMER_WAIT BIGINT 总等待时间(纳秒) 总耗时
SUM_TIMER_READ BIGINT 读操作总时间(纳秒) 读性能
SUM_TIMER_WRITE BIGINT 写操作总时间(纳秒) 写性能

查询示例:

-- 业务场景:索引使用效率分析 - 识别热点索引和冷门索引,优化索引设计
-- 用途:发现未使用的索引(可删除)和高频使用的索引(需优化)
SELECT
    OBJECT_SCHEMA as database_name,
    OBJECT_NAME as table_name,
    INDEX_NAME as index_name,
    COUNT_READ as read_operations,
    COUNT_WRITE as write_operations,
    COUNT_FETCH as fetch_operations,
    COUNT_INSERT as insert_operations,
    COUNT_UPDATE as update_operations,
    COUNT_DELETE as delete_operations,
    ROUND(SUM_TIMER_WAIT/1000000000, 3) as total_wait_seconds,
    ROUND(SUM_TIMER_READ/1000000000, 3) as read_wait_seconds,
    ROUND(SUM_TIMER_WRITE/1000000000, 3) as write_wait_seconds,
    -- 计算读写比例
    ROUND((COUNT_READ / (COUNT_READ + COUNT_WRITE + 1)) * 100, 2) as read_percentage,
    -- 业务解读:索引使用状态评估
    CASE
        WHEN COUNT_READ = 0 AND COUNT_WRITE = 0 THEN '未使用索引-可删除'
        WHEN COUNT_READ > 100000 THEN '高频读取-核心索引'
        WHEN COUNT_WRITE > COUNT_READ * 2 THEN '写入密集-考虑优化'
        WHEN SUM_TIMER_WAIT/1000000000 > 60 THEN '高等待时间-性能瓶颈'
        ELSE '正常使用'
    END as index_assessment
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA = DATABASE()
  AND OBJECT_NAME = 't_employees'
  AND INDEX_NAME IS NOT NULL
ORDER BY (COUNT_READ + COUNT_WRITE) DESC;

-- 反例(不推荐):保留大量未使用的索引
-- 问题:未使用的索引浪费存储空间,增加DML操作的维护成本
-- 解决方案:定期检查索引使用情况,删除长期未使用的索引
6.3.3 锁和并发控制表
6.3.3.1 performance_schema.data_locks - 数据锁信息

表用途: 显示当前所有数据锁的状态,用于锁等待分析和死锁诊断。

主要字段:

字段名 数据类型 含义 业务价值
ENGINE VARCHAR(32) 存储引擎 InnoDB等
ENGINE_LOCK_ID VARCHAR(128) 引擎锁ID 锁标识
ENGINE_TRANSACTION_ID BIGINT 事务ID 事务关联
THREAD_ID BIGINT 线程ID 线程关联
OBJECT_SCHEMA VARCHAR(64) 数据库名 锁定对象
OBJECT_NAME VARCHAR(64) 表名 锁定表
PARTITION_NAME VARCHAR(64) 分区名 分区锁
SUBPARTITION_NAME VARCHAR(64) 子分区名 子分区锁
INDEX_NAME VARCHAR(64) 索引名 索引锁
LOCK_TYPE VARCHAR(32) 锁类型 TABLE/RECORD
LOCK_MODE VARCHAR(32) 锁模式 S/X/IS/IX等
LOCK_STATUS VARCHAR(32) 锁状态 GRANTED/WAITING
LOCK_DATA VARCHAR(8192) 锁定数据 具体锁定内容

查询示例:

-- 业务场景:锁等待分析 - 实时监控数据库锁状态,快速定位锁等待问题
-- 用途:识别锁冲突,分析死锁原因,优化并发性能
SELECT
    ENGINE as storage_engine,
    OBJECT_SCHEMA as database_name,
    OBJECT_NAME as table_name,
    INDEX_NAME as index_name,
    LOCK_TYPE as lock_type,
    LOCK_MODE as lock_mode,
    LOCK_STATUS as lock_status,
    SUBSTRING(LOCK_DATA, 1, 100) as lock_data_sample,
    ENGINE_TRANSACTION_ID as transaction_id,
    THREAD_ID as thread_id,
    -- 业务解读:锁状态分析
    CASE
        WHEN LOCK_STATUS = 'WAITING' THEN '锁等待-需关注'
        WHEN LOCK_MODE IN ('X', 'S') AND LOCK_TYPE = 'TABLE' THEN '表级锁-影响并发'
        WHEN LOCK_MODE = 'X' AND LOCK_TYPE = 'RECORD' THEN '行级排他锁-正常'
        ELSE '正常锁定'
    END as lock_assessment
FROM performance_schema.data_locks
WHERE OBJECT_SCHEMA IS NOT NULL
ORDER BY
    CASE WHEN LOCK_STATUS = 'WAITING' THEN 1 ELSE 2 END,
    OBJECT_SCHEMA, OBJECT_NAME;

-- 查找锁等待关系
SELECT
    blocking.ENGINE_TRANSACTION_ID as blocking_trx_id,
    waiting.ENGINE_TRANSACTION_ID as waiting_trx_id,
    blocking.OBJECT_SCHEMA as schema_name,
    blocking.OBJECT_NAME as table_name,
    blocking.LOCK_MODE as blocking_lock_mode,
    waiting.LOCK_MODE as waiting_lock_mode,
    blocking.LOCK_DATA as blocking_lock_data
FROM performance_schema.data_locks blocking
JOIN performance_schema.data_lock_waits w ON blocking.ENGINE_LOCK_ID = w.BLOCKING_ENGINE_LOCK_ID
JOIN performance_schema.data_locks waiting ON w.REQUESTING_ENGINE_LOCK_ID = waiting.ENGINE_LOCK_ID;

6.5 查询执行计划分析工具

查询执行计划分析是SQL优化的核心技能,MySQL提供了强大的EXPLAIN工具来帮助开发者理解查询的执行过程。

6.5.1 MySQL EXPLAIN详解

EXPLAIN工具概述:
MySQL的EXPLAIN命令是查询优化的重要工具,它可以显示MySQL如何执行SELECT语句,包括表的连接顺序、使用的索引、扫描的行数等关键信息。

EXPLAIN的三种格式:

-- 1. 标准格式EXPLAIN - 表格形式输出,易于阅读
EXPLAIN SELECT
    e.employee_id_,
    e.name_,
    d.department_name_
FROM t_employees e
JOIN t_departments d ON e.department_id_ = d.department_id_
WHERE e.salary_ > 50000;

-- 2. JSON格式EXPLAIN - 详细信息,包含成本估算
EXPLAIN FORMAT=JSON SELECT
    e.employee_id_,
    e.name_,
    d.department_name_,
    (SELECT COUNT(*) FROM t_sales s WHERE s.employee_id_ = e.employee_id_) as sale_count
FROM t_employees e
JOIN t_departments d ON e.department_id_ = d.department_id_
WHERE e.salary_ > 50000;

-- 3. EXPLAIN ANALYZE - 实际执行统计(MySQL 8.0+)
EXPLAIN ANALYZE SELECT
    e.employee_id_,
    e.name_,
    d.department_name_
FROM t_employees e
JOIN t_departments d ON e.department_id_ = d.department_id_
WHERE e.salary_ > 50000;
6.5.2 EXPLAIN输出字段详解
6.5.2.1 MySQL EXPLAIN标准输出字段
字段名 含义 常见值 性能分析要点
id SELECT标识符 1, 2, 3… 数字越大越先执行;相同id从上到下执行
select_type SELECT类型 SIMPLE, PRIMARY, SUBQUERY, DERIVED SIMPLE最优;DEPENDENT SUBQUERY需优化
table 访问的表名 表名或别名 显示查询涉及的表
partitions 匹配的分区 p0, p1, p2… 分区剪枝效果,NULL表示非分区表
type 连接类型 system, const, eq_ref, ref, range, index, ALL 性能从左到右递减,ALL最差
possible_keys 可能使用的索引 索引名列表 候选索引,NULL表示无可用索引
key 实际使用的索引 索引名 NULL表示未使用索引,需要优化
key_len 索引长度 字节数 越短越好,显示索引使用的精确度
ref 索引比较的列 const, column名 显示索引查找的参考值
rows 扫描行数估算 数字 估算值,实际可能不同
filtered 过滤百分比 0.00-100.00 显示WHERE条件过滤效果
Extra 额外信息 详见下表 包含重要的执行细节
6.5.2.2 type字段详细说明(性能关键指标)
type值 性能等级 含义 优化建议 使用场景
system 🟢 最优 表只有一行记录(系统表) 无需优化 系统表查询
const 🟢 最优 通过主键或唯一索引访问,最多返回一行 理想状态 主键等值查询
eq_ref 🟢 优秀 唯一索引扫描,对于前表的每一行,后表只有一行匹配 JOIN优化良好 主键/唯一键JOIN
ref 🟡 良好 非唯一索引扫描,返回匹配某个单独值的所有行 可接受的性能 普通索引等值查询
fulltext 🟡 良好 全文索引检索 全文搜索场景 MATCH AGAINST查询
ref_or_null 🟡 良好 类似ref,但包含NULL值的查找 注意NULL值处理 包含NULL的索引查询
index_merge 🟡 一般 使用了索引合并优化 考虑创建复合索引 多个单列索引OR条件
range 🟡 一般 索引范围扫描 可接受,注意范围大小 BETWEEN, >, <, IN查询
index 🔴 较差 全索引扫描 考虑添加WHERE条件 覆盖索引但无WHERE
ALL 🔴 最差 全表扫描 急需优化,添加索引 无可用索引
6.5.2.3 Extra字段重要值说明
Extra值 性能影响 含义 优化建议
Using index 🟢 优秀 覆盖索引,无需回表 理想状态,保持
Using where 🟡 一般 WHERE条件过滤 正常情况
Using index condition 🟢 良好 索引条件下推(ICP) MySQL 5.6+优化特性
Using temporary 🔴 较差 使用临时表 考虑索引优化,避免GROUP BY/ORDER BY临时表
Using filesort 🔴 较差 文件排序 添加ORDER BY索引
Using join buffer 🔴 较差 使用连接缓冲 添加JOIN索引
Using MRR 🟢 良好 多范围读优化 MySQL优化特性,保持
Using sort_union 🟡 一般 索引合并排序联合 考虑复合索引
Using union 🟡 一般 索引合并联合 考虑复合索引
Using intersect 🟡 一般 索引合并交集 考虑复合索引
6.5.2.4 EXPLAIN ANALYZE输出解读

EXPLAIN ANALYZE输出示例:

-- 示例查询
EXPLAIN ANALYZE SELECT
    e.employee_id_, e.name_, d.department_name_
FROM t_employees e
JOIN t_departments d ON e.department_id_ = d.department_id_
WHERE e.salary_ > 5000;

-- 输出示例解读:
-- -> Nested loop inner join  (cost=2.75 rows=5) (actual time=0.043..0.068 rows=5 loops=1)
--     -> Filter: (e.salary_ > 5000)  (cost=1.25 rows=5) (actual time=0.028..0.041 rows=5 loops=1)
--         -> Table scan on e  (cost=1.25 rows=10) (actual time=0.024..0.035 rows=10 loops=1)
--     -> Single-row index lookup on d using PRIMARY (department_id_=e.department_id_)  (cost=0.30 rows=1) (actual time=0.003..0.004 rows=1 loops=5)

EXPLAIN ANALYZE关键指标解读:

指标 含义 分析要点
cost 优化器估算的成本 相对值,用于比较不同执行计划
rows 估算返回行数 与actual rows对比,评估估算准确性
actual time 实际执行时间(毫秒) 第一个值是首行时间,第二个是总时间
actual rows 实际返回行数 真实的行数,用于验证估算
loops 执行循环次数 嵌套循环的执行次数
6.5.3 性能瓶颈识别和优化策略
6.5.3.1 常见性能瓶颈识别

1. 全表扫描问题

-- 问题症状:type=ALL, rows很大
-- 示例问题查询
EXPLAIN SELECT * FROM t_employees WHERE salary_ > 50000;
-- 可能输出:type=ALL, rows=100000

-- 解决方案:添加索引
CREATE INDEX idx_employees_salary ON t_employees(salary_);
-- 优化后:type=range, rows=5000

2. 排序性能问题

-- 问题症状:Extra包含"Using filesort"
-- 示例问题查询
EXPLAIN SELECT * FROM t_employees ORDER BY hire_date_, salary_;
-- 可能输出:Extra: Using filesort

-- 解决方案:创建复合索引
CREATE INDEX idx_employees_hire_salary ON t_employees(hire_date_, salary_);
-- 优化后:Extra: Using index

3. 临时表问题

-- 问题症状:Extra包含"Using temporary"
-- 示例问题查询
EXPLAIN SELECT department_id_, COUNT(*) FROM t_employees GROUP BY department_id_;
-- 可能输出:Extra: Using temporary

-- 解决方案:创建合适的索引
CREATE INDEX idx_employees_dept ON t_employees(department_id_);
-- 优化后:Extra: Using index
6.5.3.2 优化策略工作流程

步骤1:收集执行计划信息

-- 获取基础执行计划
EXPLAIN SELECT ...;

-- 获取详细成本信息
EXPLAIN FORMAT=JSON SELECT ...;

-- 获取实际执行统计(MySQL 8.0+)
EXPLAIN ANALYZE SELECT ...;

步骤2:识别性能瓶颈

-- 检查关键指标
-- 1. type字段:避免ALL和index
-- 2. Extra字段:关注Using filesort, Using temporary
-- 3. rows字段:检查扫描行数是否合理
-- 4. key字段:确认使用了合适的索引

步骤3:制定优化方案

-- 索引优化
CREATE INDEX idx_name ON table_name(column1, column2);

-- 查询重写
-- 将子查询改写为JOIN
-- 优化WHERE条件顺序

-- 统计信息更新
ANALYZE TABLE table_name;

步骤4:验证优化效果

-- 对比优化前后的执行计划
-- 测试实际执行时间
-- 监控资源使用变化

6.6 系统表使用最佳实践

6.6.1 权限配置和安全考虑

基础权限配置:

-- 创建专门的监控用户
CREATE USER 'db_monitor'@'%' IDENTIFIED BY 'secure_password';

-- 授予必要的权限
GRANT SELECT ON performance_schema.* TO 'db_monitor'@'%';
GRANT SELECT ON INFORMATION_SCHEMA.* TO 'db_monitor'@'%';
GRANT PROCESS ON *.* TO 'db_monitor'@'%';
GRANT REPLICATION CLIENT ON *.* TO 'db_monitor'@'%';

-- 限制权限范围(可选)
GRANT SELECT ON performance_schema.events_statements_summary_by_digest TO 'db_monitor'@'%';
GRANT SELECT ON performance_schema.table_io_waits_summary_by_index_usage TO 'db_monitor'@'%';
6.6.2 性能监控查询模板

模板1:系统整体性能监控

-- 综合性能监控仪表板
SELECT
    '连接状态' as metric_category,
    VARIABLE_NAME as metric_name,
    VARIABLE_VALUE as current_value,
    CASE
        WHEN VARIABLE_NAME = 'Threads_connected' AND CAST(VARIABLE_VALUE AS UNSIGNED) > 100 THEN '需关注'
        WHEN VARIABLE_NAME = 'Threads_running' AND CAST(VARIABLE_VALUE AS UNSIGNED) > 10 THEN '需关注'
        ELSE '正常'
    END as status
FROM performance_schema.global_status
WHERE VARIABLE_NAME IN ('Threads_connected', 'Threads_running', 'Max_used_connections')

UNION ALL

SELECT
    '缓冲池性能' as metric_category,
    '缓冲池命中率' as metric_name,
    CONCAT(ROUND((1 - (
        (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') /
        (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests')
    )) * 100, 2), '%') as current_value,
    CASE
        WHEN (1 - (
            (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') /
            (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests')
        )) * 100 > 99 THEN '优秀'
        WHEN (1 - (
            (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') /
            (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests')
        )) * 100 > 95 THEN '良好'
        ELSE '需优化'
    END as status;

模板2:慢查询TOP10监控

-- 慢查询TOP10监控模板
SELECT
    RANK() OVER (ORDER BY SUM_TIMER_WAIT DESC) as ranking,
    SCHEMA_NAME as database_name,
    SUBSTRING(DIGEST_TEXT, 1, 80) as query_pattern,
    COUNT_STAR as execution_count,
    ROUND(AVG_TIMER_WAIT/1000000000, 3) as avg_seconds,
    ROUND(SUM_TIMER_WAIT/1000000000, 3) as total_seconds,
    ROUND(SUM_ROWS_EXAMINED/COUNT_STAR, 0) as avg_rows_examined,
    FIRST_SEEN as first_execution,
    LAST_SEEN as last_execution
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME IS NOT NULL
  AND COUNT_STAR > 5
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
6.6.3 常见问题和解决方案

问题1:performance_schema占用内存过多

-- 检查performance_schema内存使用
SELECT
    EVENT_NAME,
    COUNT_ALLOC,
    COUNT_FREE,
    SUM_NUMBER_OF_BYTES_ALLOC,
    SUM_NUMBER_OF_BYTES_FREE,
    LOW_COUNT_USED,
    HIGH_COUNT_USED
FROM performance_schema.memory_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'memory/performance_schema/%'
ORDER BY SUM_NUMBER_OF_BYTES_ALLOC DESC
LIMIT 10;

-- 解决方案:调整performance_schema参数
-- 在my.cnf中设置:
-- performance_schema_max_digest_length = 1024
-- performance_schema_digests_size = 10000

问题2:系统表查询性能慢

-- 问题:大量并发查询系统表导致性能下降
-- 解决方案:
-- 1. 使用LIMIT限制结果集
-- 2. 在业务低峰期执行复杂查询
-- 3. 缓存查询结果,避免频繁查询

-- 示例:优化后的查询
SELECT * FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME = DATABASE()
  AND LAST_SEEN > DATE_SUB(NOW(), INTERVAL 1 HOUR)
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 20;

问题3:统计信息不准确

-- 问题:INFORMATION_SCHEMA.TABLES中的TABLE_ROWS不准确
-- 原因:InnoDB的行数是估算值
-- 解决方案:使用精确计数

-- 不准确的方法
SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'your_db' AND TABLE_NAME = 'your_table';

-- 准确的方法(但性能较慢)
SELECT COUNT(*) FROM your_table;

-- 折中方案:定期更新统计信息
ANALYZE TABLE your_table;

第6章小结

本章全面介绍了MySQL系统表和查询分析工具,包括:

  1. 系统表分类:INFORMATION_SCHEMA、performance_schema、mysql系统库的详细介绍
  2. 核心表详解:每个重要系统表的字段含义、使用场景和查询示例
  3. EXPLAIN工具:查询执行计划分析的完整指南
  4. 最佳实践:权限配置、监控模板和常见问题解决方案

掌握这些工具和技术,将大大提升您的MySQL性能调优能力!


7. 最佳实践和常见陷阱

7.1 SQL编写最佳实践

7.1.1 查询优化原则
-- 1. 避免SELECT *,明确指定需要的列
-- 不推荐
SELECT * FROM t_employees WHERE department_id_ = 1;

-- 推荐
SELECT employee_id_, name_, salary_
FROM t_employees WHERE department_id_ = 1;

-- 2. 合理使用WHERE条件顺序
-- 将选择性高的条件放在前面
SELECT * FROM t_employees
WHERE status_ = 'ACTIVE'  -- 选择性高
  AND department_id_ = 1  -- 选择性中等
  AND salary_ > 30000;    -- 选择性低

-- 3. 避免在WHERE子句中使用函数
-- 不推荐
SELECT * FROM t_employees WHERE YEAR(hire_date_) = 2023;

-- 推荐
SELECT * FROM t_employees
WHERE hire_date_ >= '2023-01-01' AND hire_date_ < '2024-01-01';

-- 4. 使用EXISTS替代IN(当子查询返回大量结果时)
-- 不推荐(当sales表很大时)
SELECT * FROM t_employees
WHERE employee_id_ IN (SELECT employee_id_ FROM t_sales WHERE amount_ > 1000);

-- 推荐
SELECT * FROM t_employees e
WHERE EXISTS (SELECT 1 FROM t_sales s WHERE s.employee_id_ = e.employee_id_ AND s.amount_ > 1000);

-- 5. 合理使用UNION vs UNION ALL
-- 如果确定没有重复数据,使用UNION ALL
SELECT employee_id_, name_ FROM t_employees WHERE department_id_ = 1
UNION ALL
SELECT employee_id_, name_ FROM t_employees WHERE department_id_ = 2;

-- 6. 避免隐式类型转换
-- 不推荐
SELECT * FROM t_employees WHERE employee_id_ = '123';  -- 字符串比较数字

-- 推荐
SELECT * FROM t_employees WHERE employee_id_ = 123;
7.1.2 索引使用最佳实践
-- 1. 复合索引的列顺序很重要
-- 创建索引时考虑查询模式
CREATE INDEX idx_emp_dept_salary_status ON t_employees (department_id_, salary_, status_);

-- 可以使用索引的查询
SELECT * FROM t_employees WHERE department_id_ = 1;
SELECT * FROM t_employees WHERE department_id_ = 1 AND salary_ > 50000;
SELECT * FROM t_employees WHERE department_id_ = 1 AND salary_ > 50000 AND status_ = 'ACTIVE';

-- 无法使用索引的查询
SELECT * FROM t_employees WHERE salary_ > 50000;  -- 跳过了第一列
SELECT * FROM t_employees WHERE status_ = 'ACTIVE';  -- 跳过了前两列

-- 2. 避免在索引列上使用函数
-- 不推荐
SELECT * FROM t_employees WHERE UPPER(name_) = 'JOHN';

-- 推荐:创建函数索引或使用LIKE
CREATE INDEX idx_emp_first_name_upper ON t_employees (UPPER(name_));
-- 或者
SELECT * FROM t_employees WHERE name_ LIKE 'John%';

-- 3. 合理使用覆盖索引
-- 创建覆盖索引避免回表查询(MySQL语法)
CREATE INDEX idx_emp_covering ON t_employees (department_id_, salary_, name_);

SELECT name_, salary_
FROM t_employees
WHERE department_id_ = 1 AND salary_ > 50000;
7.1.3 事务处理最佳实践
-- 业务场景:事务设计最佳实践 - 确保高并发环境下的系统稳定性

-- 反例(不推荐):长事务,严重影响系统性能和并发能力
-- 业务影响:长时间持有锁,阻塞其他事务,可能导致系统响应缓慢
BEGIN;
SELECT * FROM t_employees;  -- 大量数据处理,占用大量内存
-- ... 复杂业务逻辑处理,耗时可能几分钟 ...
UPDATE t_employees SET salary_ = salary_ * 1.1;  -- 长时间持有表锁
-- ... 更多操作 ...
COMMIT;
-- 问题:事务时间过长,锁定资源时间长,影响并发性能

-- 正例:短事务,提升系统并发能力
-- 业务价值:减少锁等待时间,提高系统吞吐量
BEGIN;
UPDATE t_employees SET salary_ = salary_ * 1.1 WHERE department_id_ = 1 AND status_ = 'ACTIVE';
COMMIT;

-- 业务场景:根据业务特点选择合适的事务隔离级别
-- 大多数OLTP业务场景下,READ COMMITTED级别可以平衡一致性和性能
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;  -- 避免幻读问题,性能较好

-- 反例(不推荐):盲目使用最高隔离级别
-- SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;  -- 性能最差,只在特殊场景使用

-- 业务场景:死锁预防策略 - 统一资源访问顺序
-- 所有涉及多个员工记录的事务都按employee_id_升序访问
BEGIN;
UPDATE t_employees SET salary_ = 50000 WHERE employee_id_ = 1;
UPDATE t_employees SET salary_ = 51000 WHERE employee_id_ = 2;
COMMIT;

-- 反例(不推荐):不同会话以不同顺序访问相同资源
-- 会话A: 先更新ID=2,再更新ID=1
-- 会话B: 先更新ID=1,再更新ID=2
-- 问题:容易形成循环等待,导致死锁

-- 业务场景:选择合适的锁粒度,平衡并发性和一致性
-- 行级锁:高并发场景的首选
SELECT employee_id_, salary_ FROM t_employees WHERE employee_id_ = 1 FOR UPDATE;

-- 反例(不推荐):不必要的表级锁
-- LOCK TABLES t_employees WRITE;  -- 阻塞所有其他操作,并发性极差

7.2 性能监控和诊断

7.2.1 关键性能指标
-- 业务场景:数据库性能监控和故障诊断 - 识别系统瓶颈和优化机会

-- 业务场景:慢查询识别 - 找出响应时间最长的SQL语句进行优化
-- 用于日常性能监控和故障排查,识别需要优化的查询
SELECT
    SCHEMA_NAME as database_name,
    SUBSTRING(DIGEST_TEXT, 1, 100) as query_sample,  -- 截取查询示例
    COUNT_STAR as execution_count,
    AVG_TIMER_WAIT/1000000000 as avg_response_time_seconds,
    SUM_TIMER_WAIT/1000000000 as total_time_seconds,
    -- 业务指标:平均每次执行的逻辑读次数
    ROUND(SUM_ROWS_EXAMINED/COUNT_STAR, 2) as avg_rows_examined
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME IS NOT NULL
ORDER BY AVG_TIMER_WAIT DESC
LIMIT 10;

-- 反例(不推荐):不监控查询性能,问题发生后才被动处理
-- 问题:缺乏主动监控,性能问题可能长期存在影响用户体验

-- 业务场景:连接池监控 - 确保数据库连接资源充足,避免连接耗尽
-- 关键指标:当前连接数不应超过最大连接数的80%
SELECT
    variable_name,
    variable_value,
    CASE variable_name
        WHEN 'Threads_connected' THEN '当前连接数'
        WHEN 'Threads_running' THEN '活跃连接数'
        WHEN 'Max_used_connections' THEN '历史最大连接数'
    END as description
FROM performance_schema.global_status
WHERE variable_name IN ('Threads_connected', 'Threads_running', 'Max_used_connections');

-- 业务场景:缓冲池性能监控 - 确保内存配置合理,避免频繁磁盘I/O
-- 目标:缓冲池命中率应该 > 99%,低于95%需要调整内存配置

-- 修复版本:处理除零错误和数据类型转换
SELECT
    CASE
        WHEN CAST(bp_requests.variable_value AS UNSIGNED) = 0 THEN 0
        ELSE ROUND((1 - (
            CAST(bp_reads.variable_value AS UNSIGNED) /
            CAST(bp_requests.variable_value AS UNSIGNED)
        )) * 100, 2)
    END as buffer_pool_hit_rate_percent,

    CAST(bp_reads.variable_value AS UNSIGNED) as buffer_pool_reads,
    CAST(bp_requests.variable_value AS UNSIGNED) as buffer_pool_read_requests,

    -- 业务解读
    CASE
        WHEN CAST(bp_requests.variable_value AS UNSIGNED) = 0 THEN '无数据'
        WHEN (1 - (CAST(bp_reads.variable_value AS UNSIGNED) / CAST(bp_requests.variable_value AS UNSIGNED))) * 100 > 99 THEN '优秀'
        WHEN (1 - (CAST(bp_reads.variable_value AS UNSIGNED) / CAST(bp_requests.variable_value AS UNSIGNED))) * 100 > 95 THEN '良好'
        ELSE '需要优化'
    END as performance_level
FROM
    (SELECT variable_value FROM performance_schema.global_status WHERE variable_name = 'Innodb_buffer_pool_reads') bp_reads,
    (SELECT variable_value FROM performance_schema.global_status WHERE variable_name = 'Innodb_buffer_pool_read_requests') bp_requests;

-- 反例(不推荐):忽视缓冲池命中率,导致I/O性能问题
-- 问题:低命中率会导致大量磁盘I/O,严重影响查询性能

-- 1. 等待事件分析
SELECT
    event,
    total_waits,
    total_timeouts,
    time_waited/100 as time_waited_seconds,
    average_wait/100 as average_wait_seconds
FROM v$system_event
WHERE event NOT LIKE 'SQL*Net%'
ORDER BY time_waited DESC;

-- 2. SQL性能统计
SELECT
    sql_id,
    executions,
    elapsed_time/1000000 as elapsed_seconds,
    cpu_time/1000000 as cpu_seconds,
    buffer_gets,
    disk_reads,
    sql_text
FROM v$sql
WHERE executions > 0
ORDER BY elapsed_time DESC;

-- 1. MySQL等待事件统计
SELECT
    EVENT_NAME,
    COUNT_STAR as total_events,
    SUM_TIMER_WAIT/1000000000 as total_wait_seconds,
    AVG_TIMER_WAIT/1000000000 as avg_wait_seconds,
    MIN_TIMER_WAIT/1000000000 as min_wait_seconds,
    MAX_TIMER_WAIT/1000000000 as max_wait_seconds
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE COUNT_STAR > 0
  AND EVENT_NAME NOT LIKE 'wait/synch/mutex/innodb%'
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 20;

-- 2. MySQL阻塞查询分析
SELECT
    r.trx_id as blocking_trx_id,
    r.trx_mysql_thread_id as blocking_thread,
    SUBSTRING(r.trx_query, 1, 100) as blocking_query,
    b.trx_id as blocked_trx_id,
    b.trx_mysql_thread_id as blocked_thread,
    SUBSTRING(b.trx_query, 1, 100) as blocked_query,
    TIMESTAMPDIFF(SECOND, b.trx_started, NOW()) as wait_time_seconds
FROM INFORMATION_SCHEMA.INNODB_TRX r
JOIN performance_schema.data_lock_waits w ON r.trx_id = w.blocking_trx_id
JOIN INFORMATION_SCHEMA.INNODB_TRX b ON w.requesting_trx_id = b.trx_id;

-- 3. MySQL慢查询分析
SELECT
    SCHEMA_NAME as database_name,
    SUBSTRING(DIGEST_TEXT, 1, 100) as query_pattern,
    COUNT_STAR as execution_count,
    SUM_TIMER_WAIT/1000000000 as total_time_seconds,
    AVG_TIMER_WAIT/1000000000 as avg_time_seconds,
    SUM_ROWS_EXAMINED as total_rows_examined,
    SUM_ROWS_SENT as total_rows_sent,
    ROUND(SUM_ROWS_EXAMINED/COUNT_STAR, 2) as avg_rows_examined
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME IS NOT NULL
  AND COUNT_STAR > 10
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

-- 4. MySQL锁等待详细监控(修复版本:正确关联线程信息)
SELECT
    dl.OBJECT_SCHEMA as schema_name,
    dl.OBJECT_NAME as table_name,
    dl.LOCK_TYPE,
    dl.LOCK_MODE,
    dl.LOCK_STATUS,
    dl.LOCK_DATA,
    t.PROCESSLIST_HOST as host,
    t.PROCESSLIST_USER as user,
    SUBSTRING(t.PROCESSLIST_INFO, 1, 100) as current_query,
    t.PROCESSLIST_TIME as query_time_seconds
FROM performance_schema.data_locks dl
LEFT JOIN performance_schema.threads t ON dl.THREAD_ID = t.THREAD_ID
WHERE dl.LOCK_STATUS = 'WAITING'
  AND t.PROCESSLIST_ID IS NOT NULL  -- 只显示有进程ID的线程
ORDER BY dl.OBJECT_SCHEMA, dl.OBJECT_NAME;

7.3 常见性能陷阱避免

7.3.1 查询陷阱
-- 陷阱1:N+1查询问题
-- 不推荐:在循环中执行查询
-- 伪代码示例
/*
t_departments = SELECT * FROM t_departments;
for each department in t_departments:
    t_employees = SELECT * FROM t_employees WHERE department_id_ = department.id;
*/

-- 推荐:使用JOIN一次性获取数据
SELECT
    d.department_id_,
    d.department_name_,
    e.employee_id_,
    e.name_,
    e.name_
FROM t_departments d
LEFT JOIN t_employees e ON d.department_id_ = e.department_id_;

-- 陷阱2:不必要的ORDER BY
-- 不推荐:在子查询中使用ORDER BY
SELECT * FROM (
    SELECT * FROM t_employees ORDER BY salary_ DESC  -- 不必要的排序
) t
WHERE department_id_ = 1;

-- 推荐:只在最终结果中排序
SELECT * FROM t_employees
WHERE department_id_ = 1
ORDER BY salary_ DESC;

-- 陷阱3:使用OFFSET进行深度分页
-- 不推荐:大偏移量分页
SELECT * FROM t_employees ORDER BY employee_id_ LIMIT 100 OFFSET 10000;

-- 推荐:使用游标分页
SELECT * FROM t_employees
WHERE employee_id_ > 10000  -- 上一页的最后一个ID
ORDER BY employee_id_
LIMIT 100;

-- 陷阱4:不合理的GROUP BY
-- 不推荐:GROUP BY后再过滤
SELECT department_id_, COUNT(*) as emp_count
FROM t_employees
GROUP BY department_id_
HAVING emp_count > 10;

-- 推荐:先过滤再GROUP BY(如果可能)
SELECT department_id_, COUNT(*) as emp_count
FROM t_employees
WHERE status_ = 'ACTIVE'  -- 先过滤
GROUP BY department_id_
HAVING COUNT(*) > 10;
7.3.2 索引陷阱
-- 陷阱1:过多的索引
-- 不推荐:为每个可能的查询创建索引
CREATE INDEX idx1 ON t_employees (name_);
CREATE INDEX idx2 ON t_employees (name_);
CREATE INDEX idx3 ON t_employees (name_);
CREATE INDEX idx4 ON t_employees (department_id_);
CREATE INDEX idx5 ON t_employees (salary_);
CREATE INDEX idx6 ON t_employees (department_id_, salary_);

-- 推荐:创建合理的复合索引
CREATE INDEX idx_emp_dept_salary ON t_employees (department_id_, salary_);
CREATE INDEX idx_emp_name ON t_employees (name_);

-- 陷阱2:在小表上创建索引
-- 不推荐:为只有几百行的表创建多个索引
-- 小表全表扫描通常比索引查找更快

-- 陷阱3:忽略索引维护
-- 定期检查和维护索引
-- MySQL
OPTIMIZE TABLE t_employees;


结语

高级SQL技术是数据库专业人员必须掌握的核心技能。随着数据量的不断增长和业务复杂性的提升,深入理解各数据库系统的特性和优化技术变得越来越重要。

本文提供的技术指南和最佳实践,希望能够帮助读者在实际工作中更好地设计、优化和管理数据库系统。记住,性能优化是一个持续的过程,需要根据具体的业务场景和数据特点进行调整和改进。


本指南到此结束。希望这份全面的MySQL技术指南能够帮助您在数据库开发和优化的道路上更进一步!


网站公告

今日签到

点亮在社区的每一天
去签到