SparkSQL_数组排序函数 array_sort用法详解
官网:https://spark.apache.org/docs/latest/api/sql/index.html#array_sort
array_sort(expr, [func])
参数 | 类型 | 必须 | 说明 |
---|---|---|---|
expr |
ARRAY<T> |
✅ | 要排序的数组 |
func |
Lambda 函数 (T, T) -> INT |
❌ | 自定义比较函数(可选) |
默认排序行为(不传 func
)
如果不提供比较函数,array_sort
会:
- 按 升序 排序
NULL
值排在最后- 对于
float/double
类型,NaN > 任何非 NaN 值
1:基本排序(升序)
SELECT array_sort(array(5, 6, 1)) AS sorted;
-- 输出: [1, 5, 6]
2:字符串排序(字典序)
SELECT array_sort(array('bc', 'ab', 'dc')) AS sorted;
-- 输出: ["ab", "bc", "dc"]
3:带 NULL
的排序
SELECT array_sort(array('b', 'd', null, 'c', 'a')) AS sorted;
-- 输出: ["a", "b", "c", "d", null]
自定义比较函数格式:
array_sort(array_col, (left, right) ->
CASE
WHEN left < right THEN -1
WHEN left > right THEN 1
ELSE 0
END
)
比较函数返回值含义:
返回值 | 含义 |
---|---|
负数 | left < right |
0 | left == right |
正数 | left > right |
⚠️ 如果比较函数返回
NULL
,会抛出错误!
1:降序排序(数字)
SELECT array_sort(
array(5, 6, 1),
(left, right) -> CASE
WHEN left < right THEN 1
WHEN left > right THEN -1
ELSE 0
END
) AS sorted_desc;
-- 输出: [6, 5, 1]
2:字符串按长度排序(长的在前)
SELECT array_sort(
array('a', 'bb', 'ccc', 'dddd'),
(left, right) -> CASE
WHEN length(left) < length(right) THEN 1
WHEN length(left) > length(right) THEN -1
ELSE 0
END
) AS sorted_by_length;
3:字符串按字典序降序排序
SELECT array_sort(
array('bc', 'ab', 'dc'),
(left, right) -> CASE
WHEN left < right THEN 1
WHEN left > right THEN -1
ELSE 0
END
) AS sorted_desc;
-- 输出: ["dc", "bc", "ab"]
4:处理 NULL
值的自定义排序(把 NULL
放在最前面)
SELECT array_sort(
array('b', 'd', null, 'c', 'a'),
(left, right) -> CASE
WHEN left IS NULL AND right IS NULL THEN 0
WHEN left IS NULL THEN -1
WHEN right IS NULL THEN 1
WHEN left < right THEN -1
WHEN left > right THEN 1
ELSE 0
END
) AS sorted_null_first;
-- 输出: [null, "a", "b", "c", "d"]