TIMEGPT时序大模型介绍-EW帮帮网

TimeGPT: A Comprehensive Guide to Time Series Forecasting

Introduction to TimeGPT

TimeGPT, developed by Nixtla, is a generative pre-trained Transformer model specialized for forecasting tasks. It was trained on the largest dataset in history—over 100 billion rows of financial, weather, energy, and web data—democratizing time series analysis capabilities. This tool can identify patterns and predict future data points in seconds.

Principle Introduction

TimeGPT is the first time series foundation model capable of zero-shot inference. The general idea is to train the model on vast datasets from various domains (pre-trained model) and then generate zero-shot inferences on unseen data.

This approach relies on transfer learning, where the model leverages knowledge acquired during training to solve new tasks. This method is effective only when the model is sufficiently large and trained on extensive data.

To this end, the authors trained TimeGPT on over 100 billion data points from open-source time series data. The dataset spans diverse domains, including finance, economics, weather, web traffic, energy, and sales. The authors did not disclose the specific sources of the public data used to manage these 100 billion data points.

This diversity is crucial for the success of the foundation model, as it enables learning different temporal patterns, thus improving generalization. For example, weather data may exhibit daily seasonality (hotter during the day than at night) and yearly seasonality, while traffic data may show daily seasonality (more cars on the road during the day than at night) and weekly seasonality (more cars on weekdays than weekends).

To ensure model robustness and generalization, preprocessing was kept minimal. Only missing values were imputed, with the rest retained in their original form. Although the authors did not specify the data input method, interpolation techniques such as linear, spline, or moving average are likely used.

The model was trained over multiple days, during which hyperparameters and learning rates were optimized. While the exact duration and GPU resources were not disclosed, we know the model was implemented in PyTorch, using the Adam optimizer and a learning rate decay strategy.

TimeGPT utilizes a Transformer architecture, specifically a full encoder-decoder structure. Inputs can include historical data windows and exogenous data windows, such as on-time events or other series. The encoder’s attention mechanism learns different attributes from the input, which are then fed into the decoder to generate predictions until the user-specified forecast horizon is reached.

Notably, the authors implemented conformal prediction in TimeGPT, allowing the model to estimate prediction intervals based on historical errors.

TimeGPT Features

Pre-trained Model: Generates predictions without specific training, though fine-tuning is possible.
Exogenous Variables: Supports multivariate forecasting tasks by incorporating external variables.
Conformal Prediction: Estimates prediction intervals, enabling anomaly detection (e.g., flagging data points outside a 99% confidence interval as anomalies).

All these tasks can be achieved through zero-shot inference or minimal fine-tuning, marking a paradigm shift in time series forecasting.

Currently, TimeGPT is accessible only via API in closed beta. As mentioned, the model was trained on 100 billion data points from publicly available data. Since the authors did not specify the datasets used, testing on known benchmark datasets (e.g., ETT or weather) is unreasonable, as the model may have seen these during training.

Preparation

Install Required Packages

Install the nixtla package (note: nixtlats is deprecated):

!pip install nixtla

Output:

Collecting nixtlats
  Downloading nixtlats-0.5.2-py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: httpx in /usr/local/lib/python3.11/dist-packages (from nixtlats) (0.28.1)
Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (from nixtlats) (2.2.2)
Requirement already satisfied: pydantic in /usr/local/lib/python3.11/dist-packages (from nixtlats) (2.11.4)
Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from nixtlats) (2.32.3)
Requirement already satisfied: tenacity in /usr/local/lib/python3.11/dist-packages (from nixtlats) (9.1.2)
Collecting utilsforecast>=0.1.7 (from nixtlats)
  Downloading utilsforecast-0.2.12-py3-none-any.whl.metadata (7.6 kB)
...
Installing collected packages: utilsforecast, nixtlats
Successfully installed nixtlats-0.5.2 utilsforecast-0.2.12

Obtain API Token

Register at Nixtla’s dashboard using an institutional email to obtain an API token. The free tier allows 1,000 API calls per month.

Usage Tutorial

Univariate Forecasting

Load and Initialize

from nixtlats import TimeGPT
import pandas as pd

# Create TimeGPT object with token
timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')

Note: The nixtlats package and TimeGPT class are deprecated; use nixtla and NixtlaClient instead.

Load Dataset

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')
df = df.sort_values(['unique_id', 'ds'])
df.head()

Output:

   unique_id                  ds      y
0        BE 2016-10-22 00:00:00  70.00
1        BE 2016-10-22 01:00:00  37.10
2        BE 2016-10-22 02:00:00  37.10
3        BE 2016-10-22 03:00:00  44.75
4        BE 2016-10-22 04:00:00  37.10

Visualize Data

timegpt.plot(df, time_col='ds', target_col='y')

This generates a plot of the time series data.

Forecast with Confidence Intervals

fcst_df = timegpt.forecast(df, h=24, level=[80, 90])
fcst_df.head()

Output:

   unique_id                  ds   TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0        BE 2016-12-31 00:00:00  45.190582      33.011285      35.508618      54.872547      57.369880
1        BE 2016-12-31 01:00:00  43.244987      30.388532      35.376340      51.113635      56.101443
2        BE 2016-12-31 02:00:00  41.958897      29.285654      35.340688      48.577106      54.632139
3        BE 2016-12-31 03:00:00  39.796680      29.909487      32.327371      47.265990      49.683874
4        BE 2016-12-31 04:00:00  39.204865      30.731904      30.998638      47.411091      47.677825

Forecast with Different Parameters

timegpt_fcst_pred_int_df = timegpt.forecast(
    df=df, h=12, level=[80, 90, 99.7],
    time_col='timestamp', target_col='value',
)
print(timegpt_fcst_pred_int_df.shape)
timegpt_fcst_pred_int_df.head()

Output:

(48, 9)
   unique_id           timestamp   TimeGPT  TimeGPT-lo-99.7  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90  TimeGPT-hi-99.7
0        BE 2016-12-31 00:00:00  45.190453       28.008072      33.011395      35.508424      54.872481      57.369510       62.372833
1        BE 2016-12-31 01:00:00  43.244446       27.750938      30.387266      35.374624      51.114267      56.101625       58.737954
2        BE 2016-12-31 02:00:00  41.958389       25.092357      29.283794      35.340795      48.575984      54.632985       58.824421
3        BE 2016-12-31 03:00:00  39.796486       26.072040      29.910928      32.326250      47.266722      49.682044       53.520932
4        BE 2016-12-31 04:00:00  39.204536       18.367774      30.731239      30.998955      47.410118      47.677833       60.041299

Plot Forecasts

timegpt.plot(fcst_df, time_col='ds', target_col='TimeGPT')
timegpt.plot(df, fcst_df, level=[80, 90], max_insample_length=24 * 5)

Air Passengers Dataset

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.head()

Output:

   timestamp  value
0 1949-01-01    112
1 1949-02-01    118
2 1949-03-01    132
3 1949-04-01    129
4 1949-05-01    121
timegpt.plot(df, time_col='timestamp', target_col='value')

Forecast Air Passengers

timegpt_fcst_df = timegpt.forecast(df=df, h=12, freq='MS', time_col='timestamp', target_col='value')
timegpt_fcst_df.head()

Output:

   timestamp   TimeGPT
0 1961-01-01  437.837921
1 1961-02-01  426.062744
2 1961-03-01  463.116547
3 1961-04-01  478.244507
4 1961-05-01  505.646484
timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')

Long-Term Forecasting

timegpt_fcst_df = timegpt.forecast(df=df, h=36, time_col='timestamp', target_col='value', freq='MS')
timegpt_fcst_df.head()

Output:

WARNING:nixtlats.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
   timestamp   TimeGPT
0 1961-01-01  437.837921
1 1961-02-01  426.062744
2 1961-03-01  463.116547
3 1961-04-01  478.244507
4 1961-05-01  505.646484
timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')

Short-Term Forecasting

timegpt_fcst_df = timegpt.forecast(df=df, h=6, time_col='timestamp', target_col='value', freq='MS')
timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')

Setting Frequency

The freq parameter is critical, indicating the time unit between consecutive data points. Ensure the DataFrame has a DateTime index with the appropriate frequency:

df_time_index = df.set_index('timestamp')
df_time_index.index = pd.DatetimeIndex(df_time_index.index, freq='MS')
timegpt.forecast(df=df, h=36, time_col='timestamp', target_col='value').head()

Output:

WARNING:nixtlats.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
   timestamp   TimeGPT
0 1961-01-01  437.837921
1 1961-02-01  426.062744
2 1961-03-01  463.116547
3 1961-04-01  478.244507
4 1961-05-01  505.646484

Validate Token

timegpt.validate_token()

Output:

True

Anomaly Detection

Anomaly detection in time series data is critical in fields like finance, healthcare, security, and infrastructure. TimeGPT’s detect_anomalies method automatically identifies anomalies by evaluating each observation’s context within the time series, using a 99% prediction interval by default. Observations outside this interval are flagged as anomalies (labeled as 1 in the anomaly column).

import pandas as pd
from nixtlats import TimeGPT

pm_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/peyton_manning.csv')
timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D')
timegpt_anomalies_df.head()

Output:

   timestamp  anomaly  TimeGPT-lo-99  TimeGPT  TimeGPT-hi-99
0 2008-01-10        0       6.936009  8.224194       9.512378
1 2008-01-11        0       6.863336  8.151521       9.439705
2 2008-01-12        0       6.839064  8.127249       9.415433
3 2008-01-13        0       7.629072  8.917256      10.205441
4 2008-01-14        0       7.714111  9.002295      10.290480
timegpt.plot(pm_df, timegpt_anomalies_df, time_col='timestamp', target_col='value')

Adjusting Anomaly Detection Threshold

Adjust the level parameter to modify the prediction interval:

timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D', level=90)
timegpt.plot(pm_df, timegpt_anomalies_df, time_col='timestamp', target_col='value')

A higher level (e.g., 99.99) widens the interval, detecting fewer anomalies:

timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D', level=99.99)
timegpt.plot(pm_df, timegpt_anomalies_df, time_col='timestamp', target_col='value')

Including Date Features

Incorporate date features for better anomaly detection:

from nixtlats.date_features import CountryHolidays

timegpt_anomalies_df_x = timegpt.detect_anomalies(
    pm_df, time_col='timestamp', target_col='value', freq='D', date_features=True, level=99.99,
)
timegpt.plot(pm_df, timegpt_anomalies_df_x, time_col='timestamp', target_col='value')

Forecasting with Exogenous Variables

Exogenous variables provide additional context that can improve predictions. For example, temperature data can enhance ice cream sales forecasts. Add exogenous variables as columns after the target column.

Example: Electricity Price Forecasting

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df.head()

Output:

   unique_id                  ds      y  Exogenous1  Exogenous2  day_0  day_1  day_2  day_3  day_4  day_5  day_6
0        BE 2016-10-22 00:00:00  70.00     57253.0     49593.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
1        BE 2016-10-22 01:00:00  37.10     51887.0     46073.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
2        BE 2016-10-22 02:00:00  37.10     51896.0     44927.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
3        BE 2016-10-22 03:00:00  44.75     48428.0     44483.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
4        BE 2016-10-22 04:00:00  37.10     46721.0     44338.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0

Load future exogenous variables:

future_ex_vars_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv')
print(future_ex_vars_df.shape)
future_ex_vars_df.head()

Output:

(96, 11)
   unique_id                  ds  Exogenous1  Exogenous2  day_0  day_1  day_2  day_3  day_4  day_5  day_6
0        BE 2016-12-31 00:00:00     70318.0     64108.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
1        BE 2016-12-31 01:00:00     67898.0     62492.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
2        BE 2016-12-31 02:00:00     68379.0     61571.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
3        BE 2016-12-31 03:00:00     64972.0     60381.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0
4        BE 2016-12-31 04:00:00     62900.0     60298.0    0.0    0.0    0.0    0.0    0.0    1.0    0.0

Forecast with exogenous variables:

timegpt_fcst_ex_vars_df = timegpt.forecast(df=df, X_df=future_ex_vars_df, h=24, level=[80, 90])
timegpt_fcst_ex_vars_df.head()

Output:

   unique_id                  ds   TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0        BE 2016-12-31 00:00:00  51.633533      37.170360      41.667443      61.599622      66.096706
1        BE 2016-12-31 01:00:00  45.751707      31.324216      36.895095      54.608318      60.179197
2        BE 2016-12-31 02:00:00  39.651087      26.457148      33.045684      46.256490      52.845026
3        BE 2016-12-31 03:00:00  34.000518      20.566910      23.985966      44.015071      47.434127
4        BE 2016-12-31 04:00:00  33.785968      18.989039      24.427422      43.144514      48.582897
timegpt.plot(
    df[['unique_id', 'ds', 'y']],
    timegpt_fcst_ex_vars_df,
    max_insample_length=365,
    level=[80, 90],
)

Feature Importance

timegpt.weights_x.plot.barh(x='features', y='weights')

Adding Country Holidays

from nixtlats.date_features import CountryHolidays

timegpt_fcst_ex_vars_df = timegpt.forecast(
    df=df, X_df=future_ex_vars_df, h=24, level=[80, 90],
    date_features=[CountryHolidays(['US'])]
)
timegpt.weights_x.plot.barh(x='features', y='weights')

Multivariate Forecasting

Install neuralforecast for comparison:

!pip install neuralforecast

Perform rolling forecasts with TimeGPT:

timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')
timegpt_preds = []

for i in range(0, 162, 7):
    timegpt_preds_df = timegpt.forecast(
        df=df.iloc[:1213+i],
        X_df=future_exog[i:i+7],
        h=7,
        finetune_steps=50,
        id_col='unique_id',
        time_col='ds',
        target_col='y'
    )
    preds = timegpt_preds_df['TimeGPT']
    timegpt_preds.extend(preds)

len(timegpt_preds)

Output:

168
test['TimeGPT'] = timegpt_preds
test.tail(100)

Output:

      unique_id         ds     y  published  is_holiday     TimeGPT
1281         0 2023-07-05  1864        0.0          0  2479.581543
1282         0 2023-07-06  1706        0.0          0  2305.177979
1283         0 2023-07-07  1468        0.0          0  2074.042725
1284         0 2023-07-08   977        0.0          0   969.086182
1285         0 2023-07-09  1063        0.0          0  1131.523193
...        ...        ...   ...        ...        ...          ...
1376         0 2023-10-08   737        0.0          0   692.883728
1377         0 2023-10-09  1237        0.0          1  1092.072266
1378         0 2023-10-10  1755        1.0          0  1192.525146
1379         0 2023-10-11  3241        0.0          0   926.260010
1380         0 2023-10-12  2262        0.0          0   922.098145
test.to_csv('medium_views_test.csv', header=True, index=False)

Visualize Predictions

import matplotlib.pyplot as plt

published_dates = test[test['published'] == 1]
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(test['ds'], test['y'])
ax.plot(test['ds'], test['TimeGPT'], label='TimeGPT')
ax.scatter(published_dates['ds'], published_dates['y'], marker='o', color='red', label='New article')
ax.set_xlabel('Day')
ax.set_ylabel('Total views')
ax.legend(loc='best')
fig.autofmt_xdate()
plt.tight_layout()

Comparison with Other Models

Compare TimeGPT with N-BEATS, N-HiTS, and PatchTST:

from neuralforecast.models import NHITS, NBEATS, PatchTST
from neuralforecast import NeuralForecast

horizon = 7
models = [
    NHITS(h=horizon, input_size=5*horizon, max_steps=50),
    NBEATS(h=horizon, input_size=5*horizon, max_steps=50),
    PatchTST(h=horizon, input_size=5*horizon, max_steps=50)
]
nf = NeuralForecast(models=models, freq='D')
future_exog = test[['unique_id', 'published', 'is_holiday']]
preds_df = nf.cross_validation(df=df, static_df=future_exog, step_size=7, n_windows=24)
preds_df.head()

Output:

   unique_id         ds      cutoff       NHITS      NBEATS    PatchTST     y
0         0 2023-04-28 2023-04-27  1571.078491  1479.276855  1484.261597  1470
1         0 2023-04-29 2023-04-27  1208.617920  1015.555298  1099.190552  1004
2         0 2023-04-30 2023-04-27  1308.625122  1204.944092  1200.549072  1051
3         0 2023-05-01 2023-04-27  1811.447754  1830.838379  1797.462524  1333
4         0 2023-05-02 2023-04-27  1952.458862  1857.008911  1853.445679  1778
preds_df['TimeGPT'] = test['TimeGPT']
preds_df.head()

Output:

   unique_id         ds      cutoff       NHITS      NBEATS    PatchTST     y     TimeGPT
0         0 2023-04-28 2023-04-27  1571.078491  1479.276855  1484.261597  1470  1351.176636
1         0 2023-04-29 2023-04-27  1208.617920  1015.555298  1099.190552  1004   962.566650
2         0 2023-04-30 2023-04-27  1308.625122  1204.944092  1200.549072  1051  1148.006470
3         0 2023-05-01 2023-04-27  1811.447754  1830.838379  1797.462524  1333  1856.734009
4         0 2023-05-02 2023-04-27  1952.458862  1857.008911  1853.445679  1778  1914.413452

Visualize Model Comparison

published_dates = test[test['published'] == 1]
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(preds_df['ds'], preds_df['y'], label='actual')
ax.plot(preds_df['ds'], preds_df['TimeGPT'], ls='--', label='TimeGPT')
ax.plot(preds_df['ds'], preds_df['NHITS'], ls=':', label='NHiTS')
ax.plot(preds_df['ds'], preds_df['NBEATS'], ls='-.', label='NBEATS')
ax.plot(preds_df['ds'], preds_df['PatchTST'], ls='-', label='PatchTST')
ax.scatter(published_dates['ds'], published_dates['y'], marker='o', color='red', label='New article')
ax.set_xlabel('Day')
ax.set_ylabel('Total views')
ax.legend(loc='best')
fig.autofmt_xdate()
plt.tight_layout()

Observation: N-HiTS predicts peaks not observed in reality, PatchTST often under-predicts, while TimeGPT closely aligns with actual data.

Evaluation

Evaluate model performance using Mean Absolute Error (MAE) and Mean Squared Error (MSE). Round predictions to integers for daily view counts:

from neuralforecast.losses.numpy import mae, mse

preds_df = preds_df.round({
    'NHITS': 0, 'NBEATS': 0, 'PatchTST': 0, 'TimeGPT': 0
})
preds_df.head()

Output:

   unique_id         ds      cutoff  NHITS  NBEATS  PatchTST     y  TimeGPT
0         0 2023-04-28 2023-04-27   1571    1479      1484  1470     1351
1         0 2023-04-29 2023-04-27   1209    1016      1099  1004      963
2         0 2023-04-30 2023-04-27   1309    1205      1201  1051     1148
3         0 2023-05-01 2023-04-27   1811    1831      1797  1333     1857
4         0 2023-05-02 2023-04-27   1952    1857      1853  1778     1914
data = {
    'N-HiTS': [mae(preds_df['NHITS'], preds_df['y']), mse(preds_df['NHITS'], preds_df['y'])],
    'N-BEATS': [mae(preds_df['NBEATS'], preds_df['y']), mse(preds_df['NBEATS'], preds_df['y'])],
    'PatchTST': [mae(preds_df['PatchTST'], preds_df['y']), mse(preds_df['PatchTST'], preds_df['y'])],
    'TimeGPT': [mae(preds_df['TimeGPT'], preds_df['y']), mse(preds_df['TimeGPT'], preds_df['y'])]
}
metrics_df = pd.DataFrame(data=data)
metrics_df.index = ['mae', 'mse']
metrics_df.style.highlight_min(color='lightgreen', axis=1)

Output:

          N-HiTS     N-BEATS    PatchTST     TimeGPT
mae     300.125000  267.815476  269.113095  295.547619
mse  219075.267857 183030.005952 185470.077381 231426.928571

Load Forecasting

Case Study 1

import warnings
from nixtlats import TimeGPT
import pandas as pd
warnings.filterwarnings('ignore')
timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')

df = pd.read_csv('/content/drive/MyDrive/datasets/test2.csv')
df = df.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
df['ds'] = pd.to_datetime(df['date'] + ' ' + df['DATA_TIME'])
cols = ['unique_id', 'ds', 'y', 'DAY']
df = df[cols]
print(df.isnull().sum())
print(df.shape)
df.tail()

Output:

unique_id    0
ds           0
y            0
DAY          0
dtype: int64
(33408, 4)
       unique_id                  ds       y  DAY
33403 |210000005481 2023-12-29 22:45:00   964.2  363
33404 |210000005481 2023-12-29 23:00:00  1036.8  363
33405 |210000005481 2023-12-29 23:15:00  1030.8  363
33406 |210000005481 2023-12-29 23:30:00  1049.4  363
33407 |210000005481 2023-12-29 23:45:00  1054.8  363
future_ex_vars_df_origin = pd.read_csv('/content/drive/MyDrive/datasets/210000005481.csv')
future_ex_vars_df = future_ex_vars_df_origin.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
future_ex_vars_df['ds'] = pd.to_datetime(future_ex_vars_df['date'] + ' ' + future_ex_vars_df['DATA_TIME'])
future_ex_vars_df = future_ex_vars_df[future_ex_vars_df['DAY'] == 364]
cols = ['unique_id', 'ds', 'DAY']
future_ex_vars_df = future_ex_vars_df[cols]
print(future_ex_vars_df.shape)
print(future_ex_vars_df.isnull().sum())
future_ex_vars_df.head()

Output:

(96, 3)
unique_id    0
ds           0
DAY          0
dtype: int64
       unique_id                  ds  DAY
33408 |210000005481 2023-12-30 00:00:00  364
33409 |210000005481 2023-12-30 00:15:00  364
33410 |210000005481 2023-12-30 00:30:00  364
33411 |210000005481 2023-12-30 00:45:00  364
33412 |210000005481 2023-12-30 01:00:00  364
timegpt_fcst_ex_vars_df = timegpt.forecast(
    df=df,
    X_df=future_ex_vars_df,
    finetune_steps=100,
    h=96,
    time_col='ds',
    target_col='y',
    model='timegpt-1-long-horizon',
    level=[80, 90]
)
timegpt_fcst_ex_vars_df.head()

Output:

       unique_id                  ds     TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0 |210000005481 2023-12-30 00:00:00  1022.289181     741.859170     837.336947    1207.241414    1302.719192
1 |210000005481 2023-12-30 00:15:00   963.901790     446.810781     603.597085    1324.206496    1480.992800
2 |210000005481 2023-12-30 00:30:00   933.234249     400.233163     554.460030    1312.008468    1466.235335
3 |210000005481 2023-12-30 00:45:00   972.456844     521.245904     613.786275    1331.127413    1423.667785
4 |210000005481 2023-12-30 01:00:00   962.090755     505.215810     593.242837    1330.938674    1418.965700
timegpt.plot(
    df[['unique_id', 'ds', 'y']],
    timegpt_fcst_ex_vars_df,
    max_insample_length=365,
    level=[80, 90]
)
timegpt_fcst_ex_vars_df.shape
timegpt_fcst_ex_vars_df['ds'] = pd.to_datetime(timegpt_fcst_ex_vars_df['ds'])
real_data = future_ex_vars_df_origin[future_ex_vars_df_origin['DAY'] == 364]
real_data['ds'] = pd.to_datetime(real_data['date'] + ' ' + real_data['DATA_TIME'])
real_data = real_data.rename(columns={'CONS_ID': 'unique_id'})
cols = ['unique_id', 'ds', 'LOAD']
real_data = real_data[cols]
print(real_data.shape)
real_data.head()
merge_data = pd.merge(real_data, timegpt_fcst_ex_vars_df, how='left', on=['unique_id', 'ds'])
merge_data.head()

Output:

(96, 3)
       unique_id                  ds    LOAD     TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0 |210000005481 2023-12-30 00:00:00  1279.8  1022.289181     741.859170     837.336947    1207.241414    1302.719192
1 |210000005481 2023-12-30 00:15:00  1211.4   963.901790     446.810781     603.597085    1324.206496    1480.992800
2 |210000005481 2023-12-30 00:30:00  1412.4   933.234249     400.233163     554.460030    1312.008468    1466.235335
3 |210000005481 2023-12-30 00:45:00  1349.4   972.456844     521.245904     613.786275    1331.127413    1423.667785
4 |210000005481 2023-12-30 01:00:00   964.2   962.090755     505.215810     593.242837    1330.938674    1418.965700
import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(merge_data['ds'], merge_data['LOAD'], label='real_data', color='red', marker='o', linestyle='-')
ax.plot(merge_data['ds'], merge_data['TimeGPT'], label='predict_data', color='blue', marker='s', linestyle='--')
plt.show()

Case Study 2

import warnings
from nixtlats import TimeGPT
import pandas as pd
warnings.filterwarnings('ignore')
timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')

df = pd.read_csv('/content/drive/MyDrive/datasets/test.csv')
df = df.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
df['ds'] = pd.to_datetime(df['date'] + ' ' + df['DATA_TIME'])
cols = ['unique_id', 'ds', 'y', 'DAY']
df = df[cols]
print(df.isnull().sum())
df.head()

Output:

unique_id    0
ds           0
y            0
DAY          0
dtype: int64
       unique_id                  ds      y  DAY
0 |210000003901 2023-01-16 00:00:00  68.19   16
1 |210000003901 2023-01-16 00:15:00  51.39   16
2 |210000003901 2023-01-16 00:30:00  47.24   16
3 |210000003901 2023-01-16 00:45:00  47.17   16
4 |210000003901 2023-01-16 01:00:00  49.31   16
future_ex_vars_df_origin = pd.read_csv('/content/drive/MyDrive/datasets/210000003901.csv')
future_ex_vars_df = future_ex_vars_df_origin.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
future_ex_vars_df['ds'] = pd.to_datetime(future_ex_vars_df['date'] + ' ' + future_ex_vars_df['DATA_TIME'])
future_ex_vars_df = future_ex_vars_df[future_ex_vars_df['DAY'] == 364]
cols = ['unique_id', 'ds', 'DAY']
future_ex_vars_df = future_ex_vars_df[cols]
print(future_ex_vars_df.shape)
print(future_ex_vars_df.isnull().sum())
future_ex_vars_df.head()

Output:

(96, 3)
unique_id    0
ds           0
DAY          0
dtype: int64
       unique_id                  ds  DAY
33408 |210000003901 2023-12-30 00:00:00  364
33409 |210000003901 2023-12-30 00:15:00  364
33410 |210000003901 2023-12-30 00:30:00  364
33411 |210000003901 2023-12-30 00:45:00  364
33412 |210000003901 2023-12-30 01:00:00  364
timegpt_fcst_ex_vars_df = timegpt.forecast(
    df=df,
    X_df=future_ex_vars_df,
    finetune_steps=100,
    h=96,
    time_col='ds',
    target_col='y',
    model='timegpt-1-long-horizon',
    level=[80, 90]
)
timegpt_fcst_ex_vars_df.head()

Output:

       unique_id                  ds   TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0 |210000003901 2023-12-30 00:00:00  75.555785      59.431775      66.383720      84.727850      91.679795
1 |210000003901 2023-12-30 00:15:00  66.986273      44.116121      52.079720      81.892827      89.856426
2 |210000003901 2023-12-30 00:30:00  61.534644      40.231499      46.371637      76.697651      82.837788
3 |210000003901 2023-12-30 00:45:00  55.279120      34.012068      40.644038      69.914202      76.546172
4 |210000003901 2023-12-30 01:00:00  52.130294      28.080968      37.955519      66.305068      76.179619
timegpt.plot(
    df[['unique_id', 'ds', 'y']],
    timegpt_fcst_ex_vars_df,
    max_insample_length=365,
    level=[80, 90]
)
timegpt_fcst_ex_vars_df.shape
timegpt_fcst_ex_vars_df['ds'] = pd.to_datetime(timegpt_fcst_ex_vars_df['ds'])
real_data = future_ex_vars_df_origin[future_ex_vars_df_origin['DAY'] == 364]
real_data['ds'] = pd.to_datetime(real_data['date'] + ' ' + real_data['DATA_TIME'])
real_data = real_data.rename(columns={'CONS_ID': 'unique_id'})
cols = ['unique_id', 'ds', 'LOAD']
real_data = real_data[cols]
print(real_data.shape)
real_data.head()
merge_data = pd.merge(real_data, timegpt_fcst_ex_vars_df, how='left', on=['unique_id', 'ds'])
print(merge_data.isnull().sum())
merge_data.head()

Output:

(96, 3)
unique_id        0
ds               0
LOAD             0
TimeGPT          0
TimeGPT-lo-90    0
TimeGPT-lo-80    0
TimeGPT-hi-80    0
TimeGPT-hi-90    0
dtype: int64
       unique_id                  ds   LOAD   TimeGPT  TimeGPT-lo-90  TimeGPT-lo-80  TimeGPT-hi-80  TimeGPT-hi-90
0 |210000003901 2023-12-30 00:00:00  87.44  75.555785      59.431775      66.383720      84.727850      91.679795
1 |210000003901 2023-12-30 00:15:00  53.65  66.986273      44.116121      52.079720      81.892827      89.856426
2 |210000003901 2023-12-30 00:30:00  52.12  61.534644      40.231499      46.371637      76.697651      82.837788
3 |210000003901 2023-12-30 00:45:00  52.81  55.279120      34.012068      40.644038      69.914202      76.546172
4 |210000003901 2023-12-30 01:00:00  52.56  52.130294      28.080968      37.955519      66.305068      76.179619
import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(merge_data['ds'], merge_data['LOAD'], label='real_data', color='red', marker='o', linestyle='-')
ax.plot(merge_data['ds'], merge_data['TimeGPT'], label='predict_data', color='blue', marker='s', linestyle='--')
plt.show()

References

* https://nixtla.dev/docs/intro

* https://blog.csdn.net/qq_53123875/article/details/136620743

* https://zhuanlan.zhihu.com/p/674814155

* https://github.com/marcopeix/time-series-analysis/blob/master/TimeGPT.ipynb

TIMEGPT时序大模型介绍

TimeGPT: A Comprehensive Guide to Time Series Forecasting

Introduction to TimeGPT

Principle Introduction

TimeGPT Features

Preparation

Install Required Packages

Obtain API Token

Usage Tutorial

Univariate Forecasting

Load and Initialize

Load Dataset

Visualize Data

Forecast with Confidence Intervals

Forecast with Different Parameters

Plot Forecasts

Air Passengers Dataset

Forecast Air Passengers

Long-Term Forecasting

Short-Term Forecasting

Setting Frequency

Validate Token

Anomaly Detection

Adjusting Anomaly Detection Threshold

Including Date Features

Forecasting with Exogenous Variables

Example: Electricity Price Forecasting

Feature Importance

Adding Country Holidays

Multivariate Forecasting

Visualize Predictions

Comparison with Other Models

Visualize Model Comparison

Evaluation

Load Forecasting

Case Study 1

Case Study 2

References

网站公告

今日签到

热门文章

最新发布