时序数据库:timescaledb
大模型:通义千问2.5
对话开始前提示词:
我正在做数据分析,以下是已知信息:
数据库:timescaledb,表名:dm_tag_value,tag_name列是位号名,app_time列是时间,tag_value列是值计算前要转换下CAST(tag_value AS FLOAT),
其中DRLZM.FIC3101A.DACA.PV是位号1,DRLZM.AT0502A.DACA.PV是位号2,TAG82是位号3
表数据如下:
---
"id","ds_id","tag_name","tag_value","tag_time","app_time","quality","create_time"
1002364859414740992,,DRLZM.FIC3101A.DACA.PV,"20.695463180541992",2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:05.000
1002364859435712513,,DRLZM.AT0502A.DACA.PV,-24.399999618530273,2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:05.000
1002364859460878336,,TAG40,"20.695463180541992",2024-05-29 00:00:04.000,2024-05-29 00:00:05.000,192,2024-05-29 00:00:05.000
1002364859469266945,,DRLZM.AT0502A.DACA.PV,"21.059999465942383",2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:05.000
1002364863663570944,,DRLZM.AT0502A.DACA.PV,-62.939998626708984,2024-05-29 00:00:04.000,2024-05-29 00:00:05.000,192,2024-05-29 00:00:06.000
1002364863684542465,,TAG7000_ISYS7.1,-6.4000000953674316,2024-05-29 00:00:04.000,2024-05-29 00:00:05.000,192,2024-05-29 00:00:06.000
1002364863709708288,,TAG6000_ISYS7.1,-84.94000244140625,2024-05-29 00:00:05.000,2024-05-29 00:00:05.000,192,2024-05-29 00:00:06.000
1002364863709708288,,TAG6000_ISYS7.1,-84.94000244140625,2024-05-29 00:00:05.000,2024-05-29 00:00:05.000,192,2024-05-29 00:00:06.000
1002364880306569217,,DRLZM.FIC3101A.DACA.PV,"4.3653912544250488",2024-05-29 00:00:09.000,2024-05-29 00:00:09.000,192,2024-05-29 00:00:10.000
1002364901273894912,,DRLZM.FIC3101A.DACA.PV,"0.26307547092437744",2024-05-29 00:00:14.000,2024-05-29 00:00:14.000,192,2024-05-29 00:00:15.000
1002364922245414913,,DRLZM.FIC3101A.DACA.PV,"9.4877281188964844",2024-05-29 00:00:19.000,2024-05-29 00:00:19.000,192,2024-05-29 00:00:20.000
1002364943279849472,,TAG81,"20.695463180541992",2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:25.000
1002364943279849473,,TAG81,"9.4877281188964844",2024-05-29 00:00:19.000,2024-05-29 00:00:19.000,192,2024-05-29 00:00:25.000
1002364943279849474,,TAG82,"20.695463180541992",2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:25.000
1002364943279849475,,TAG82,"9.4877281188964844",2024-05-29 00:00:19.000,2024-05-29 00:00:19.000,192,2024-05-29 00:00:25.000
1002364943279849476,,TAG83,"20.695463180541992",2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:25.000
---
根据上面的已知信息,我会问一系列数据分析的问题,分析位号、时间和值,每个问题的回答要求:
生成SQL语句,要避免ambiguous错, 注意:(仅输出SQL查询语句,不要包含任何其他文本或解释)
测试CASE
**提示词:**将今天位号1数据从小到大排序,请注意输出格式,我只需要SQL查询语句,不需要其他解释或信息.
返回:
SELECT app_time, CAST(tag_value AS FLOAT) AS value
FROM dm_tag_value
WHERE tag_name = 'DRLZM.FIC3101A.DACA.PV' AND DATE(app_time) = CURRENT_DATE
ORDER BY value ASC;
**提示词:**使用线性拟合今天位号1与位号2数据,我只需要SQL查询语句,不需要其他解释或信息.
不支持,需要UDF配合
**提示词:**使用均值滤波对位号1进行处理,我只需要SQL查询语句,不需要其他解释或信息.
返回:
WITH ordered_data AS (
SELECT
app_time,
CAST(tag_value AS FLOAT) AS value,
ROW_NUMBER() OVER (ORDER BY app_time) AS row_num
FROM dm_tag_value
WHERE tag_name = 'DRLZM.FIC3101A.DACA.PV' AND DATE(app_time) = CURRENT_DATE
),
smoothed_data AS (
SELECT
o1.app_time,
AVG(o1.value + COALESCE(o2.value, o1.value) + COALESCE(o3.value, o1.value)) AS mean_filtered_value
FROM ordered_data o1
LEFT JOIN ordered_data o2 ON o1.row_num = o2.row_num - 1
LEFT JOIN ordered_data o3 ON o1.row_num = o3.row_num + 1
GROUP BY o1.app_time, o1.value
)
SELECT * FROM smoothed_data;
**提示词:**针对今天A位号1与位号2数据做相关性分析,请注意输出格式,我只需要SQL查询语句,不需要其他解释或信息.
返回:
WITH today_data AS (
SELECT
CAST(tag_value AS FLOAT) AS value,
LAG(CAST(tag_value AS FLOAT)) OVER (PARTITION BY tag_name ORDER BY app_time) AS prev_value,
app_time
FROM dm_tag_value
WHERE
tag_name IN ('DRLZM.FIC3101A.DACA.PV', 'DRLZM.AT0502A.DACA.PV')
AND DATE(app_time) = CURRENT_DATE
),
correlation_data AS (
SELECT
CASE WHEN tag_name = 'DRLZM.FIC3101A.DACA.PV' THEN value ELSE prev_value END AS value_a,
CASE WHEN tag_name = 'DRLZM.AT0502A.DACA.PV' THEN value ELSE prev_value END AS value_b
FROM today_data
WHERE tag_name IN ('DRLZM.FIC3101A.DACA.PV', 'DRLZM.AT0502A.DACA.PV')
),
agg_data AS (
SELECT
SUM(value_a * value_b) AS sum_ab,
SUM(value_a * value_a) AS sum_aa,
SUM(value_b * value_b) AS sum_bb,
COUNT(*) AS count
FROM correlation_data
)
SELECT
(sum_ab * count - SUM(value_a) * SUM(value_b)) /
(SQRT(sum_aa * count - POWER(SUM(value_a), 2)) * SQRT(sum_bb * count - POWER(SUM(value_b), 2))) AS correlation_coefficient
FROM agg_data;
------------------------------需要修改后才能运行
WITH today_data AS (
SELECT
tag_name,
CAST(tag_value AS FLOAT) AS value,
LAG(CAST(tag_value AS FLOAT)) OVER (PARTITION BY tag_name ORDER BY app_time) AS prev_value,
app_time
FROM dm_tag_value
WHERE
tag_name IN ('DRLZM.FIC3101A.DACA.PV', 'DRLZM.AT0502A.DACA.PV')
AND DATE(app_time) = CURRENT_DATE
),
correlation_data AS (
SELECT
CASE WHEN tag_name = 'DRLZM.FIC3101A.DACA.PV' THEN value ELSE prev_value END AS value_a,
CASE WHEN tag_name = 'DRLZM.AT0502A.DACA.PV' THEN value ELSE prev_value END AS value_b
FROM today_data
WHERE tag_name IN ('DRLZM.FIC3101A.DACA.PV', 'DRLZM.AT0502A.DACA.PV')
),
agg_data AS (
select
sum(value_a) as sum_a,
sum(value_b) as sum_b,
SUM(value_a * value_b) AS sum_ab,
SUM(value_a * value_a) AS sum_aa,
SUM(value_b * value_b) AS sum_bb,
COUNT(*) AS count
FROM correlation_data
)
SELECT
(sum_ab * count - sum_a * sum_b) /
(SQRT(sum_aa * count - POWER(sum_a, 2)) * SQRT(sum_bb * count - POWER(sum_b, 2))) AS correlation_coefficient
FROM agg_data;
**提示词:**我们把超过平均值1个标准差的数据定义为异常值,请你给出今天位号1异常值有哪些
请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:
WITH today_data AS (
SELECT
CAST(tag_value AS FLOAT) AS value,
app_time
FROM dm_tag_value
WHERE
tag_name = 'DRLZM.FIC3101A.DACA.PV'
AND DATE(app_time) = CURRENT_DATE
),
stats AS (
SELECT
AVG(value) AS average,
STDDEV(value) AS std_dev
FROM today_data
)
SELECT
td.app_time,
td.value
FROM today_data td
JOIN stats ON TRUE
WHERE td.value > stats.average + stats.std_dev OR td.value < stats.average - stats.std_dev;
**提示词:**给出本月位号1与位号2对比趋势,请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:
SELECT
dv1.app_time,
CAST(dv1.tag_value AS FLOAT) AS DRLZM_FIC3101A_DACA_PV_value,
CAST(dv2.tag_value AS FLOAT) AS DRLZM_AT0502A_DACA_PV_value
FROM
dm_tag_value dv1
JOIN
dm_tag_value dv2
ON dv1.app_time = dv2.app_time
WHERE
dv1.tag_name = 'DRLZM.FIC3101A.DACA.PV'
AND dv2.tag_name = 'DRLZM.AT0502A.DACA.PV'
AND dv1.app_time >= DATE_TRUNC('month', CURRENT_DATE)
AND dv1.app_time < DATE_TRUNC('month', CURRENT_DATE) + INTERVAL '1 month'
ORDER BY
dv1.app_time;
**提示词:**分析今日位号1的平均值和最大值,请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:
SELECT
AVG(CAST(tag_value AS FLOAT)) AS TAG80_avg,
MAX(CAST(tag_value AS FLOAT)) AS TAG80_max
FROM
dm_tag_value
WHERE
tag_name = 'DRLZM.FIC3101A.DACA.PV'
AND app_time >= CURRENT_DATE
AND app_time < CURRENT_DATE + INTERVAL '1 day';
**提示词:**分析今日位号1如何变化,按小时取平均,请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:
SELECT
date_trunc('hour', app_time) AS hour_bucket,
AVG(CAST(tag_value AS FLOAT)) AS TAG80_avg
FROM
dm_tag_value
WHERE
tag_name = 'DRLZM.FIC3101A.DACA.PV'
AND app_time >= CURRENT_DATE
AND app_time < CURRENT_DATE + INTERVAL '1 day'
GROUP BY
hour_bucket
ORDER BY
hour_bucket;
**提示词:**分析位号1和位号2这个月的趋势,按小时分组取平均,每个位号一列,请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:
SELECT
DATE_TRUNC('hour', t1.app_time) AS hour,
AVG(CAST(t1.tag_value AS FLOAT)) AS FIC3101A_avg_value,
AVG(CAST(t2.tag_value AS FLOAT)) AS AT0502A_avg_value
FROM
dm_tag_value t1
JOIN
dm_tag_value t2 ON DATE_TRUNC('hour', t1.app_time) = DATE_TRUNC('hour', t2.app_time)
WHERE
t1.tag_name = 'DRLZM.FIC3101A.DACA.PV' AND
t2.tag_name = 'DRLZM.AT0502A.DACA.PV' AND
EXTRACT(MONTH FROM t1.app_time) = EXTRACT(MONTH FROM CURRENT_DATE) AND
EXTRACT(YEAR FROM t1.app_time) = EXTRACT(YEAR FROM CURRENT_DATE)
GROUP BY
DATE_TRUNC('hour', t1.app_time)
ORDER BY
hour;
**提示词:**分析位号1,取整点数据,请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:
SELECT
app_time,
CAST(tag_value AS FLOAT) AS value
FROM
dm_tag_value
WHERE
tag_name = 'DRLZM.FIC3101A.DACA.PV' AND
EXTRACT(MINUTE FROM app_time) = 0 AND
EXTRACT(SECOND FROM app_time) = 0
ORDER BY
app_time;
总结
用大模型对时序数据进行分析,对于简单的数据分析没有问题,对于复杂或高级的分析有概率说错或处理不了