描述
现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分),示例数据如下:
id | uid | exam_id | start_time | submit_time | score |
1 | 1001 | 9001 | 2021-07-02 09:01:01 | 2021-07-02 09:21:01 | 80 |
2 | 1002 | 9001 | 2021-09-05 19:01:01 | 2021-09-05 19:40:01 | 81 |
3 | 1002 | 9002 | 2021-09-02 12:01:01 | (NULL) | (NULL) |
4 | 1002 | 9003 | 2021-09-01 12:01:01 | (NULL) | (NULL) |
5 | 1002 | 9001 | 2021-07-02 19:01:01 | 2021-07-02 19:30:01 | 82 |
6 | 1002 | 9002 | 2021-07-05 18:01:01 | 2021-07-05 18:59:02 | 90 |
7 | 1003 | 9002 | 2021-07-06 12:01:01 | (NULL) | (NULL) |
8 | 1003 | 9003 | 2021-09-07 10:01:01 | 2021-09-07 10:31:01 | 86 |
9 | 1004 | 9003 | 2021-09-06 12:01:01 | (NULL) | (NULL) |
10 | 1002 | 9003 | 2021-09-01 12:01:01 | 2021-09-01 12:31:01 | 81 |
11 | 1005 | 9001 | 2021-09-01 12:01:01 | 2021-09-01 12:31:01 | 88 |
12 | 1005 | 9002 | 2021-09-01 12:01:01 | 2021-09-01 12:31:01 | 88 |
13 | 1006 | 9002 | 2021-09-02 12:11:01 | 2021-09-02 12:31:01 | 89 |
还有一张试卷信息表examination_info(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间),示例数据如下:
id | exam_id | tag | difficulty | duration | release_time |
1 | 9001 | SQL | hard | 60 | 2020-01-01 10:00:00 |
2 | 9002 | SQL | easy | 60 | 2020-02-01 10:00:00 |
3 | 9003 | 算法 | medium | 80 | 2020-08-02 10:00:00 |
请统计2021年每个未完成试卷作答数大于1的有效用户的数据(有效用户指完成试卷作答数至少为1且未完成数小于5),输出用户ID、未完成试卷作答数、完成试卷作答数、作答过的试卷tag集合,按未完成试卷数量由多到少排序。示例数据的输出结果如下:
uid | incomplete_cnt | complete_cnt | detail |
1002 | 2 | 4 | 2021-09-01:算法;2021-07-02:SQL;2021-09-02:SQL;2021-09-05:SQL;2021-07-05:SQL |
解释:2021年的作答记录中,除了1004,其他用户均满足有效用户定义,但只有1002未完成试卷数大于1,因此只输出1002,detail中是1002作答过的试卷{日期:tag}集合,日期和tag间用:连接,多元素间用;连接。
SELECT uid, count(incomplete) as incomplete_cnt,
count(complete) as complete_cnt,
group_concat(distinct concat_ws(':', date(start_time), tag) SEPARATOR ';') as detail
from (
SELECT uid, tag, start_time,
if(submit_time is null, 1, null) as incomplete,
if(submit_time is null, null, 1) as complete
from exam_record
left join examination_info using(exam_id)
where year(start_time)=2021
) as exam_complete_rec
group by uid
having complete_cnt >= 1 and incomplete_cnt BETWEEN 2 and 4
order by incomplete_cnt DESC
🔍 分步解析
Step 1:子查询 —— 打标签(完成 / 未完成)
SELECT
uid, tag, start_time,
IF(submit_time IS NULL, 1, NULL) AS incomplete,
IF(submit_time IS NULL, NULL, 1) AS complete
FROM exam_record
LEFT JOIN examination_info USING(exam_id)
WHERE YEAR(start_time) = 2021
字段 | 说明 |
---|---|
IF(submit_time IS NULL, 1, NULL) |
如果未交卷 → 记为 1 ,否则 NULL |
IF(submit_time IS NULL, NULL, 1) |
如果已交卷 → 记为 1 ,否则 NULL |
LEFT JOIN ... USING(exam_id) |
关联试卷信息,获取 tag 类别 |
WHERE YEAR(start_time)=2021 |
只看 2021 年的数据 |
💡 为什么用
NULL
?因为COUNT()
只统计非空值!
Step 2:外层分组统计
GROUP BY uid
按用户分组,准备聚合。
Step 3:用 COUNT()
区分统计“完成”和“未完成”
COUNT(incomplete) AS incomplete_cnt -- 只统计 non-NULL 的
COUNT(complete) AS complete_cnt
函数 | 行为 |
---|---|
COUNT(列) |
只统计该列 非 NULL 的行数 |
SUM(列) |
会把所有值加起来(如 1+1+1=3) |
COUNT(*) |
统计所有行 |
👉 所以你用 COUNT(incomplete)
就等于“未完成的次数”!
Step 4:HAVING
筛选用户
HAVING
complete_cnt >= 1
AND incomplete_cnt BETWEEN 2 AND 4
✅ 只保留满足条件的用户。
⚠️ 注意:必须用
HAVING
,不能用WHERE
,因为complete_cnt
是聚合结果。
Step 5:生成明细字符串
GROUP_CONCAT(
DISTINCT
CONCAT_WS(':', DATE(start_time), tag)
SEPARATOR ';'
) AS detail
函数 | 说明 |
---|---|
CONCAT_WS(':', A, B) |
把 A 和 B 用 : 拼接 → A:B |
DATE(start_time) |
只取日期部分,如 2021-09-01 |
DISTINCT |
去重,避免重复记录 |
SEPARATOR ';' |
多个记录之间用 ; 分隔 |
GROUP_CONCAT(...) |
把一组字符串拼成一个长串 |
👉 输出示例:2021-09-01:SQL:2021-09-02:Python
🧠 核心知识点总结
技巧 | 说明 | 适用场景 |
---|---|---|
✅ IF(条件, 值1, NULL) + COUNT() |
实现条件计数 | 统计满足某条件的行数 |
✅ COUNT(列) vs COUNT(*) |
前者忽略 NULL | 条件聚合 |
✅ GROUP_CONCAT + CONCAT_WS |
生成结构化字符串 | 报表、明细导出 |
✅ HAVING 筛选聚合结果 |
过滤分组后数据 | 用户行为筛选 |
✅ LEFT JOIN + USING |
关联获取维度信息 | 主表+维度表 |
✅ YEAR(date) |
提取年份 | 时间范围筛选 |
🛠️ 常见变体(面试常考)
1. 统计“连续未完成”考试
-- 可用 LAG() 判断连续
LAG(submit_time) OVER (PARTITION BY uid ORDER BY start_time)
2. 找出“从未完成过考试”的用户
HAVING complete_cnt = 0
3. 找出“未完成率 > 50%”的用户
HAVING incomplete_cnt * 1.0 / (incomplete_cnt + complete_cnt) > 0.5