dremio数据湖sql行列转换及转置

发布于:2024-04-29 ⋅ 阅读:(39) ⋅ 点赞:(0)

1、行转列 (扁平化)

数据准备 表 aa

1.1 cross join unnest

在Dremio中,UNNEST 函数用于将数组或复杂类型的列(如JSON、Map或Array类型)中的值“炸裂”(分解)成多行.

with aa as (
select '上海' as city, ARRAY['浦东新区','黄浦区']        as area
union 
select '北京' as city, ARRAY['朝阳区','海淀区','昌平区'] as area
)
select city
      ,area_b
from aa
CROSS JOIN UNNEST(aa.area) AS t(area_b)

1.2  flatten

将复合值分解为多行。FLATTEN 函数采用一 LIST 列并生成横向视图(即,包含引用 FROM 子句中它前面的其他表的相关性的内联视图)。

表达式的数据类型 必须为 LIST

with aa as (
select '上海' as city,ARRAY['浦东新区', '黄浦区'] as areas
union 
select '北京' as city,ARRAY['朝阳区', '海淀区','昌平区'] as areas
)
SELECT city,FLATTEN(areas) AS area
FROM aa;

两种函数 都能 把 数组 行转为列,结果如下

然而我们的初始数据可能是 带有分割符的字符串 而不是 数组

dremio 提供 字符 转换 为 数组正则表达式函数  REGEXP_SPLIT ( ) 实例如下:

with aa as(
select '上海' as id,'浦东新区,黄浦区'       as name
union
select '北京' as id,'朝阳区,海淀区,昌平区'  as name
)

SELECT REGEXP_SPLIT(name, 'r(?<=,)(?<=,$)', 'ALL', 1) AS "list"
from aa

结果:

更多dremio 图标类型识别:dremio数据类型图标识别

2 、列转行 

数据准备

2.1  LISTAGG

 将一组行连接成一个字符串列表,并在它们之间放置一个分隔符。 返回字符类型

with aa as (
select '上海' as city,'浦东区新' as area union select '上海' as city,'黄浦区' as area
union 
select '北京' as city,'朝阳区' as area   union select '北京' as city,'海淀区' as area
)
select city,LISTAGG(DISTINCT area, ' | ')
WITHIN GROUP (ORDER BY area) "city_list"
FROM aa
group by city

返回结果:

2.2 ARRAY_AGG

将提供的表达式聚合到一个数组中。返回数组类型

with aa as(
select '上海' as id,'浦东新区'  as name union select '上海' as id,'黄浦区'  as name
union
select '北京' as id,'朝阳区'  as name union select '北京' as id,'海淀区'  as name
union
select '北京' as id,'昌平区'  as name
)
SELECT id, ARRAY_AGG(name) 
FROM aa 
GROUP BY id 

结果:

3、转置 ,反转置

数据准备:

3.1 转置 ( PIVOT )

SELECT *
FROM aa
PIVOT (SUM(sales) FOR region IN ('North', 'South', 'East', 'West'))
order by product

结果:

3.2 反转置 (UNPIVOT

数据准备:

SELECT product, region, sales
FROM aa
UNPIVOT (sales FOR region IN (North, South, East, West)) 
order by product 

结果: