MySQL 8.0 窗口函数详解:让数据分析更简单高效

发布于:2025-09-03 ⋅ 阅读:(22) ⋅ 点赞:(0)

        在日常的数据分析工作中,我们经常需要对数据进行分组排序、计算移动平均值、统计累计求和等操作。在MySQL 8.0之前,这类需求通常需要编写复杂的子查询或连接查询才能实现。而MySQL 8.0引入的窗口函数(Window Functions)极大地简化了这类操作,让数据分析变得更加简单高效。

        本文将通过通俗易懂的方式,带你全面了解MySQL 8.0中的窗口函数,包括聚合类、排名类和跨行类窗口函数的使用方法。

什么是窗口函数?

        窗口函数是一种特殊的SQL函数,它能够在不改变原有行数的情况下,对查询结果的某个"窗口"(一组相关的行)进行计算,并为每一行返回一个值。

        可以把窗口函数想象成:你有一张全班学生的成绩表,窗口函数允许你同时看到每个学生的成绩、他在班级中的排名、他与前一名同学的分数差等信息,而不需要改变原始数据表的行数。这个排名是新加的一行。

窗口函数的基本语法

SELECT 
    列1, 
    列2,
    窗口函数() OVER (
        [PARTITION BY 分区字段] 
        [ORDER BY 排序字段]
        [frame_clause]
    ) AS 别名
FROM 表名;
  • PARTITION BY:将数据分成多个分区(类似于GROUP BY,但不合并行)

  • ORDER BY:确定分区内数据的排序方式

  • frame_clause:定义窗口框架,即计算范围

一、聚合类窗口函数

聚合类窗口函数可以在保留所有行的同时,计算分组的聚合值。

1. SUM() OVER()

计算分区内的总和:

-- 计算每个部门的工资总额,同时显示每个员工的详细信息
SELECT 
    employee_id,
    name,
    department,
    salary,
    SUM(salary) OVER (PARTITION BY department) AS dept_total_salary
FROM employees;

2. AVG() OVER()

计算分区内的平均值:

-- 计算每个部门的平均工资,同时显示每个员工的详细信息
SELECT 
    employee_id,
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees;

3. COUNT() OVER()

计算分区内的行数:

-- 计算每个部门的员工数量,同时显示每个员工的详细信息
SELECT 
    employee_id,
    name,
    department,
    salary,
    COUNT(*) OVER (PARTITION BY department) AS dept_employee_count
FROM employees;

二、排名类窗口函数

排名类窗口函数用于为分区内的行分配排名、序号等。

1. ROW_NUMBER()

为分区内的每一行分配一个唯一的序号:1,2,3,4,5,6,7

-- 为每个部门的员工按工资从高到低编号
SELECT 
    employee_id,
    name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank_in_dept
FROM employees;

2. RANK()

计算分区内的排名,相同值会有相同排名,并跳过后续排名:1,1,3,3,5,6,7

-- 计算每个部门内的工资排名(允许并列)
SELECT 
    employee_id,
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM employees;

3. DENSE_RANK()

计算分区内的排名,相同值有相同排名,但不跳过后续排名:1,1,2,2,3,4,5,6,6

-- 计算每个部门内的工资密集排名(允许并列但不跳号)
SELECT 
    employee_id,
    name,
    department,
    salary,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rank
FROM employees;

4. NTILE()

将分区内的数据分成指定数量的组:

-- 将每个部门的员工按工资高低分成4个组
SELECT 
    employee_id,
    name,
    department,
    salary,
    NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) as quartile
FROM employees;

三、跨行类窗口函数

跨行类窗口函数可以访问分区内其他行的数据。

1. LAG()

访问分区中当前行之前的数据:

-- 查看每位员工和上一名员工的工资差异
SELECT 
    employee_id,
    name,
    department,
    salary,
    LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) as prev_salary,
    salary - LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) as diff_from_prev
FROM employees;

2. LEAD()

访问分区中当前行之后的数据:

-- 查看每位员工和下一名员工的工资差异
SELECT 
    employee_id,
    name,
    department,
    salary,
    LEAD(salary) OVER (PARTITION BY department ORDER BY salary DESC) as next_salary,
    salary - LEAD(salary) OVER (PARTITION BY department ORDER BY salary DESC) as diff_from_next
FROM employees;

3. FIRST_VALUE()

获取分区内第一行的值:

-- 查看每位员工与部门最高工资的差异
SELECT 
    employee_id,
    name,
    department,
    salary,
    FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) as highest_salary,
    salary - FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) as diff_from_highest
FROM employees;

4. LAST_VALUE()

获取分区内最后一行的值:

-- 查看每位员工与部门最低工资的差异
SELECT 
    employee_id,
    name,
    department,
    salary,
    LAST_VALUE(salary) OVER (
        PARTITION BY department 
        ORDER BY salary DESC
        RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as lowest_salary,
    salary - LAST_VALUE(salary) OVER (
        PARTITION BY department 
        ORDER BY salary DESC
        RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as diff_from_lowest
FROM employees;

注意:使用LAST_VALUE()时需要特别注意窗口框架的定义,否则可能得不到预期结果。

窗口框架(Window Frame)详解

窗口框架定义了窗口函数计算时使用的行范围,常用的有两种:

  1. ROWS模式:基于物理行偏移

  2. RANGE模式:基于逻辑值偏移

示例:计算移动平均值

-- 计算每个员工与前2行、当前行、后2行共5行的平均工资
SELECT 
    employee_id,
    name,
    department,
    salary,
    AVG(salary) OVER (
        PARTITION BY department 
        ORDER BY salary DESC
        ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
    ) as moving_avg
FROM employees;

实际应用场景

场景1:计算同比/环比增长率

-- 计算每月销售额与上月相比的增长率
SELECT 
    year,
    month,
    sales,
    LAG(sales) OVER (ORDER BY year, month) as prev_month_sales,
    (sales - LAG(sales) OVER (ORDER BY year, month)) / LAG(sales) OVER (ORDER BY year, month) * 100 as growth_rate
FROM monthly_sales;

场景2:获取Top N记录

-- 获取每个部门工资前三名的员工
WITH ranked_employees AS (
    SELECT 
        employee_id,
        name,
        department,
        salary,
        DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank_in_dept
    FROM employees
)
SELECT * FROM ranked_employees WHERE rank_in_dept <= 3;

场景3:计算累计百分比

-- 计算每个部门工资的累计百分比
SELECT 
    employee_id,
    name,
    department,
    salary,
    SUM(salary) OVER (PARTITION BY department ORDER BY salary DESC) as running_total,
    SUM(salary) OVER (PARTITION BY department) as dept_total,
    SUM(salary) OVER (PARTITION BY department ORDER BY salary DESC) / SUM(salary) OVER (PARTITION BY department) * 100 as cumulative_percent
FROM employees;

性能优化建议

  1. 合理使用索引:为PARTITION BY和ORDER BY涉及的字段创建索引

  2. 避免过度使用窗口函数:在数据量大的表中,窗口函数可能影响性能

  3. 使用CTE(公用表表达式):将复杂查询分解为多个步骤,提高可读性和性能

  4. 限制窗口大小:对于移动平均等计算,尽量限制窗口框架的范围

总结

MySQL 8.0的窗口函数为数据分析提供了强大而灵活的工具,使我们能够在保留原始行细节的同时,进行各种复杂的计算和分析。通过本文的学习,你应该已经掌握了:

  1. 窗口函数的基本概念和语法结构

  2. 聚合类、排名类和跨行类窗口函数的使用方法

  3. 窗口框架的定义和使用场景

  4. 窗口函数在实际业务中的应用示例

窗口函数的学习曲线可能稍陡,但一旦掌握,将极大地提升你的数据处理能力和工作效率。建议在实际工作中多加练习,逐步掌握这些强大的功能。

希望本文对你理解和使用MySQL 8.0窗口函数有所帮助!如有任何疑问,欢迎留言讨论。