TimeGPT: A Comprehensive Guide to Time Series Forecasting
Introduction to TimeGPT
TimeGPT, developed by Nixtla, is a generative pre-trained Transformer model specialized for forecasting tasks. It was trained on the largest dataset in history—over 100 billion rows of financial, weather, energy, and web data—democratizing time series analysis capabilities. This tool can identify patterns and predict future data points in seconds.
Principle Introduction
TimeGPT is the first time series foundation model capable of zero-shot inference. The general idea is to train the model on vast datasets from various domains (pre-trained model) and then generate zero-shot inferences on unseen data.
This approach relies on transfer learning, where the model leverages knowledge acquired during training to solve new tasks. This method is effective only when the model is sufficiently large and trained on extensive data.
To this end, the authors trained TimeGPT on over 100 billion data points from open-source time series data. The dataset spans diverse domains, including finance, economics, weather, web traffic, energy, and sales. The authors did not disclose the specific sources of the public data used to manage these 100 billion data points.
This diversity is crucial for the success of the foundation model, as it enables learning different temporal patterns, thus improving generalization. For example, weather data may exhibit daily seasonality (hotter during the day than at night) and yearly seasonality, while traffic data may show daily seasonality (more cars on the road during the day than at night) and weekly seasonality (more cars on weekdays than weekends).
To ensure model robustness and generalization, preprocessing was kept minimal. Only missing values were imputed, with the rest retained in their original form. Although the authors did not specify the data input method, interpolation techniques such as linear, spline, or moving average are likely used.
The model was trained over multiple days, during which hyperparameters and learning rates were optimized. While the exact duration and GPU resources were not disclosed, we know the model was implemented in PyTorch, using the Adam optimizer and a learning rate decay strategy.
TimeGPT utilizes a Transformer architecture, specifically a full encoder-decoder structure. Inputs can include historical data windows and exogenous data windows, such as on-time events or other series. The encoder’s attention mechanism learns different attributes from the input, which are then fed into the decoder to generate predictions until the user-specified forecast horizon is reached.
Notably, the authors implemented conformal prediction in TimeGPT, allowing the model to estimate prediction intervals based on historical errors.
TimeGPT Features
- Pre-trained Model: Generates predictions without specific training, though fine-tuning is possible.
- Exogenous Variables: Supports multivariate forecasting tasks by incorporating external variables.
- Conformal Prediction: Estimates prediction intervals, enabling anomaly detection (e.g., flagging data points outside a 99% confidence interval as anomalies).
All these tasks can be achieved through zero-shot inference or minimal fine-tuning, marking a paradigm shift in time series forecasting.
Currently, TimeGPT is accessible only via API in closed beta. As mentioned, the model was trained on 100 billion data points from publicly available data. Since the authors did not specify the datasets used, testing on known benchmark datasets (e.g., ETT or weather) is unreasonable, as the model may have seen these during training.
Preparation
Install Required Packages
Install the nixtla
package (note: nixtlats
is deprecated):
!pip install nixtla
Output:
Collecting nixtlats
Downloading nixtlats-0.5.2-py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: httpx in /usr/local/lib/python3.11/dist-packages (from nixtlats) (0.28.1)
Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (from nixtlats) (2.2.2)
Requirement already satisfied: pydantic in /usr/local/lib/python3.11/dist-packages (from nixtlats) (2.11.4)
Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from nixtlats) (2.32.3)
Requirement already satisfied: tenacity in /usr/local/lib/python3.11/dist-packages (from nixtlats) (9.1.2)
Collecting utilsforecast>=0.1.7 (from nixtlats)
Downloading utilsforecast-0.2.12-py3-none-any.whl.metadata (7.6 kB)
...
Installing collected packages: utilsforecast, nixtlats
Successfully installed nixtlats-0.5.2 utilsforecast-0.2.12
Obtain API Token
Register at Nixtla’s dashboard using an institutional email to obtain an API token. The free tier allows 1,000 API calls per month.
Usage Tutorial
Univariate Forecasting
Load and Initialize
from nixtlats import TimeGPT
import pandas as pd
# Create TimeGPT object with token
timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')
Note: The nixtlats
package and TimeGPT
class are deprecated; use nixtla
and NixtlaClient
instead.
Load Dataset
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')
df = df.sort_values(['unique_id', 'ds'])
df.head()
Output:
unique_id ds y
0 BE 2016-10-22 00:00:00 70.00
1 BE 2016-10-22 01:00:00 37.10
2 BE 2016-10-22 02:00:00 37.10
3 BE 2016-10-22 03:00:00 44.75
4 BE 2016-10-22 04:00:00 37.10
Visualize Data
timegpt.plot(df, time_col='ds', target_col='y')
This generates a plot of the time series data.
Forecast with Confidence Intervals
fcst_df = timegpt.forecast(df, h=24, level=[80, 90])
fcst_df.head()
Output:
unique_id ds TimeGPT TimeGPT-lo-90 TimeGPT-lo-80 TimeGPT-hi-80 TimeGPT-hi-90
0 BE 2016-12-31 00:00:00 45.190582 33.011285 35.508618 54.872547 57.369880
1 BE 2016-12-31 01:00:00 43.244987 30.388532 35.376340 51.113635 56.101443
2 BE 2016-12-31 02:00:00 41.958897 29.285654 35.340688 48.577106 54.632139
3 BE 2016-12-31 03:00:00 39.796680 29.909487 32.327371 47.265990 49.683874
4 BE 2016-12-31 04:00:00 39.204865 30.731904 30.998638 47.411091 47.677825
Forecast with Different Parameters
timegpt_fcst_pred_int_df = timegpt.forecast(
df=df, h=12, level=[80, 90, 99.7],
time_col='timestamp', target_col='value',
)
print(timegpt_fcst_pred_int_df.shape)
timegpt_fcst_pred_int_df.head()
Output:
(48, 9)
unique_id timestamp TimeGPT TimeGPT-lo-99.7 TimeGPT-lo-90 TimeGPT-lo-80 TimeGPT-hi-80 TimeGPT-hi-90 TimeGPT-hi-99.7
0 BE 2016-12-31 00:00:00 45.190453 28.008072 33.011395 35.508424 54.872481 57.369510 62.372833
1 BE 2016-12-31 01:00:00 43.244446 27.750938 30.387266 35.374624 51.114267 56.101625 58.737954
2 BE 2016-12-31 02:00:00 41.958389 25.092357 29.283794 35.340795 48.575984 54.632985 58.824421
3 BE 2016-12-31 03:00:00 39.796486 26.072040 29.910928 32.326250 47.266722 49.682044 53.520932
4 BE 2016-12-31 04:00:00 39.204536 18.367774 30.731239 30.998955 47.410118 47.677833 60.041299
Plot Forecasts
timegpt.plot(fcst_df, time_col='ds', target_col='TimeGPT')
timegpt.plot(df, fcst_df, level=[80, 90], max_insample_length=24 * 5)
Air Passengers Dataset
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.head()
Output:
timestamp value
0 1949-01-01 112
1 1949-02-01 118
2 1949-03-01 132
3 1949-04-01 129
4 1949-05-01 121
timegpt.plot(df, time_col='timestamp', target_col='value')
Forecast Air Passengers
timegpt_fcst_df = timegpt.forecast(df=df, h=12, freq='MS', time_col='timestamp', target_col='value')
timegpt_fcst_df.head()
Output:
timestamp TimeGPT
0 1961-01-01 437.837921
1 1961-02-01 426.062744
2 1961-03-01 463.116547
3 1961-04-01 478.244507
4 1961-05-01 505.646484
timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
Long-Term Forecasting
timegpt_fcst_df = timegpt.forecast(df=df, h=36, time_col='timestamp', target_col='value', freq='MS')
timegpt_fcst_df.head()
Output:
WARNING:nixtlats.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
timestamp TimeGPT
0 1961-01-01 437.837921
1 1961-02-01 426.062744
2 1961-03-01 463.116547
3 1961-04-01 478.244507
4 1961-05-01 505.646484
timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
Short-Term Forecasting
timegpt_fcst_df = timegpt.forecast(df=df, h=6, time_col='timestamp', target_col='value', freq='MS')
timegpt.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
Setting Frequency
The freq
parameter is critical, indicating the time unit between consecutive data points. Ensure the DataFrame has a DateTime index with the appropriate frequency:
df_time_index = df.set_index('timestamp')
df_time_index.index = pd.DatetimeIndex(df_time_index.index, freq='MS')
timegpt.forecast(df=df, h=36, time_col='timestamp', target_col='value').head()
Output:
WARNING:nixtlats.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon.
timestamp TimeGPT
0 1961-01-01 437.837921
1 1961-02-01 426.062744
2 1961-03-01 463.116547
3 1961-04-01 478.244507
4 1961-05-01 505.646484
Validate Token
timegpt.validate_token()
Output:
True
Anomaly Detection
Anomaly detection in time series data is critical in fields like finance, healthcare, security, and infrastructure. TimeGPT’s detect_anomalies
method automatically identifies anomalies by evaluating each observation’s context within the time series, using a 99% prediction interval by default. Observations outside this interval are flagged as anomalies (labeled as 1 in the anomaly
column).
import pandas as pd
from nixtlats import TimeGPT
pm_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/peyton_manning.csv')
timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D')
timegpt_anomalies_df.head()
Output:
timestamp anomaly TimeGPT-lo-99 TimeGPT TimeGPT-hi-99
0 2008-01-10 0 6.936009 8.224194 9.512378
1 2008-01-11 0 6.863336 8.151521 9.439705
2 2008-01-12 0 6.839064 8.127249 9.415433
3 2008-01-13 0 7.629072 8.917256 10.205441
4 2008-01-14 0 7.714111 9.002295 10.290480
timegpt.plot(pm_df, timegpt_anomalies_df, time_col='timestamp', target_col='value')
Adjusting Anomaly Detection Threshold
Adjust the level
parameter to modify the prediction interval:
timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D', level=90)
timegpt.plot(pm_df, timegpt_anomalies_df, time_col='timestamp', target_col='value')
A higher level
(e.g., 99.99) widens the interval, detecting fewer anomalies:
timegpt_anomalies_df = timegpt.detect_anomalies(pm_df, time_col='timestamp', target_col='value', freq='D', level=99.99)
timegpt.plot(pm_df, timegpt_anomalies_df, time_col='timestamp', target_col='value')
Including Date Features
Incorporate date features for better anomaly detection:
from nixtlats.date_features import CountryHolidays
timegpt_anomalies_df_x = timegpt.detect_anomalies(
pm_df, time_col='timestamp', target_col='value', freq='D', date_features=True, level=99.99,
)
timegpt.plot(pm_df, timegpt_anomalies_df_x, time_col='timestamp', target_col='value')
Forecasting with Exogenous Variables
Exogenous variables provide additional context that can improve predictions. For example, temperature data can enhance ice cream sales forecasts. Add exogenous variables as columns after the target column.
Example: Electricity Price Forecasting
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df.head()
Output:
unique_id ds y Exogenous1 Exogenous2 day_0 day_1 day_2 day_3 day_4 day_5 day_6
0 BE 2016-10-22 00:00:00 70.00 57253.0 49593.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
1 BE 2016-10-22 01:00:00 37.10 51887.0 46073.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
2 BE 2016-10-22 02:00:00 37.10 51896.0 44927.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
3 BE 2016-10-22 03:00:00 44.75 48428.0 44483.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
4 BE 2016-10-22 04:00:00 37.10 46721.0 44338.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
Load future exogenous variables:
future_ex_vars_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv')
print(future_ex_vars_df.shape)
future_ex_vars_df.head()
Output:
(96, 11)
unique_id ds Exogenous1 Exogenous2 day_0 day_1 day_2 day_3 day_4 day_5 day_6
0 BE 2016-12-31 00:00:00 70318.0 64108.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
1 BE 2016-12-31 01:00:00 67898.0 62492.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
2 BE 2016-12-31 02:00:00 68379.0 61571.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
3 BE 2016-12-31 03:00:00 64972.0 60381.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
4 BE 2016-12-31 04:00:00 62900.0 60298.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
Forecast with exogenous variables:
timegpt_fcst_ex_vars_df = timegpt.forecast(df=df, X_df=future_ex_vars_df, h=24, level=[80, 90])
timegpt_fcst_ex_vars_df.head()
Output:
unique_id ds TimeGPT TimeGPT-lo-90 TimeGPT-lo-80 TimeGPT-hi-80 TimeGPT-hi-90
0 BE 2016-12-31 00:00:00 51.633533 37.170360 41.667443 61.599622 66.096706
1 BE 2016-12-31 01:00:00 45.751707 31.324216 36.895095 54.608318 60.179197
2 BE 2016-12-31 02:00:00 39.651087 26.457148 33.045684 46.256490 52.845026
3 BE 2016-12-31 03:00:00 34.000518 20.566910 23.985966 44.015071 47.434127
4 BE 2016-12-31 04:00:00 33.785968 18.989039 24.427422 43.144514 48.582897
timegpt.plot(
df[['unique_id', 'ds', 'y']],
timegpt_fcst_ex_vars_df,
max_insample_length=365,
level=[80, 90],
)
Feature Importance
timegpt.weights_x.plot.barh(x='features', y='weights')
Adding Country Holidays
from nixtlats.date_features import CountryHolidays
timegpt_fcst_ex_vars_df = timegpt.forecast(
df=df, X_df=future_ex_vars_df, h=24, level=[80, 90],
date_features=[CountryHolidays(['US'])]
)
timegpt.weights_x.plot.barh(x='features', y='weights')
Multivariate Forecasting
Install neuralforecast
for comparison:
!pip install neuralforecast
Perform rolling forecasts with TimeGPT:
timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')
timegpt_preds = []
for i in range(0, 162, 7):
timegpt_preds_df = timegpt.forecast(
df=df.iloc[:1213+i],
X_df=future_exog[i:i+7],
h=7,
finetune_steps=50,
id_col='unique_id',
time_col='ds',
target_col='y'
)
preds = timegpt_preds_df['TimeGPT']
timegpt_preds.extend(preds)
len(timegpt_preds)
Output:
168
test['TimeGPT'] = timegpt_preds
test.tail(100)
Output:
unique_id ds y published is_holiday TimeGPT
1281 0 2023-07-05 1864 0.0 0 2479.581543
1282 0 2023-07-06 1706 0.0 0 2305.177979
1283 0 2023-07-07 1468 0.0 0 2074.042725
1284 0 2023-07-08 977 0.0 0 969.086182
1285 0 2023-07-09 1063 0.0 0 1131.523193
... ... ... ... ... ... ...
1376 0 2023-10-08 737 0.0 0 692.883728
1377 0 2023-10-09 1237 0.0 1 1092.072266
1378 0 2023-10-10 1755 1.0 0 1192.525146
1379 0 2023-10-11 3241 0.0 0 926.260010
1380 0 2023-10-12 2262 0.0 0 922.098145
test.to_csv('medium_views_test.csv', header=True, index=False)
Visualize Predictions
import matplotlib.pyplot as plt
published_dates = test[test['published'] == 1]
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(test['ds'], test['y'])
ax.plot(test['ds'], test['TimeGPT'], label='TimeGPT')
ax.scatter(published_dates['ds'], published_dates['y'], marker='o', color='red', label='New article')
ax.set_xlabel('Day')
ax.set_ylabel('Total views')
ax.legend(loc='best')
fig.autofmt_xdate()
plt.tight_layout()
Comparison with Other Models
Compare TimeGPT with N-BEATS, N-HiTS, and PatchTST:
from neuralforecast.models import NHITS, NBEATS, PatchTST
from neuralforecast import NeuralForecast
horizon = 7
models = [
NHITS(h=horizon, input_size=5*horizon, max_steps=50),
NBEATS(h=horizon, input_size=5*horizon, max_steps=50),
PatchTST(h=horizon, input_size=5*horizon, max_steps=50)
]
nf = NeuralForecast(models=models, freq='D')
future_exog = test[['unique_id', 'published', 'is_holiday']]
preds_df = nf.cross_validation(df=df, static_df=future_exog, step_size=7, n_windows=24)
preds_df.head()
Output:
unique_id ds cutoff NHITS NBEATS PatchTST y
0 0 2023-04-28 2023-04-27 1571.078491 1479.276855 1484.261597 1470
1 0 2023-04-29 2023-04-27 1208.617920 1015.555298 1099.190552 1004
2 0 2023-04-30 2023-04-27 1308.625122 1204.944092 1200.549072 1051
3 0 2023-05-01 2023-04-27 1811.447754 1830.838379 1797.462524 1333
4 0 2023-05-02 2023-04-27 1952.458862 1857.008911 1853.445679 1778
preds_df['TimeGPT'] = test['TimeGPT']
preds_df.head()
Output:
unique_id ds cutoff NHITS NBEATS PatchTST y TimeGPT
0 0 2023-04-28 2023-04-27 1571.078491 1479.276855 1484.261597 1470 1351.176636
1 0 2023-04-29 2023-04-27 1208.617920 1015.555298 1099.190552 1004 962.566650
2 0 2023-04-30 2023-04-27 1308.625122 1204.944092 1200.549072 1051 1148.006470
3 0 2023-05-01 2023-04-27 1811.447754 1830.838379 1797.462524 1333 1856.734009
4 0 2023-05-02 2023-04-27 1952.458862 1857.008911 1853.445679 1778 1914.413452
Visualize Model Comparison
published_dates = test[test['published'] == 1]
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(preds_df['ds'], preds_df['y'], label='actual')
ax.plot(preds_df['ds'], preds_df['TimeGPT'], ls='--', label='TimeGPT')
ax.plot(preds_df['ds'], preds_df['NHITS'], ls=':', label='NHiTS')
ax.plot(preds_df['ds'], preds_df['NBEATS'], ls='-.', label='NBEATS')
ax.plot(preds_df['ds'], preds_df['PatchTST'], ls='-', label='PatchTST')
ax.scatter(published_dates['ds'], published_dates['y'], marker='o', color='red', label='New article')
ax.set_xlabel('Day')
ax.set_ylabel('Total views')
ax.legend(loc='best')
fig.autofmt_xdate()
plt.tight_layout()
Observation: N-HiTS predicts peaks not observed in reality, PatchTST often under-predicts, while TimeGPT closely aligns with actual data.
Evaluation
Evaluate model performance using Mean Absolute Error (MAE) and Mean Squared Error (MSE). Round predictions to integers for daily view counts:
from neuralforecast.losses.numpy import mae, mse
preds_df = preds_df.round({
'NHITS': 0, 'NBEATS': 0, 'PatchTST': 0, 'TimeGPT': 0
})
preds_df.head()
Output:
unique_id ds cutoff NHITS NBEATS PatchTST y TimeGPT
0 0 2023-04-28 2023-04-27 1571 1479 1484 1470 1351
1 0 2023-04-29 2023-04-27 1209 1016 1099 1004 963
2 0 2023-04-30 2023-04-27 1309 1205 1201 1051 1148
3 0 2023-05-01 2023-04-27 1811 1831 1797 1333 1857
4 0 2023-05-02 2023-04-27 1952 1857 1853 1778 1914
data = {
'N-HiTS': [mae(preds_df['NHITS'], preds_df['y']), mse(preds_df['NHITS'], preds_df['y'])],
'N-BEATS': [mae(preds_df['NBEATS'], preds_df['y']), mse(preds_df['NBEATS'], preds_df['y'])],
'PatchTST': [mae(preds_df['PatchTST'], preds_df['y']), mse(preds_df['PatchTST'], preds_df['y'])],
'TimeGPT': [mae(preds_df['TimeGPT'], preds_df['y']), mse(preds_df['TimeGPT'], preds_df['y'])]
}
metrics_df = pd.DataFrame(data=data)
metrics_df.index = ['mae', 'mse']
metrics_df.style.highlight_min(color='lightgreen', axis=1)
Output:
N-HiTS N-BEATS PatchTST TimeGPT
mae 300.125000 267.815476 269.113095 295.547619
mse 219075.267857 183030.005952 185470.077381 231426.928571
Load Forecasting
Case Study 1
import warnings
from nixtlats import TimeGPT
import pandas as pd
warnings.filterwarnings('ignore')
timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')
df = pd.read_csv('/content/drive/MyDrive/datasets/test2.csv')
df = df.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
df['ds'] = pd.to_datetime(df['date'] + ' ' + df['DATA_TIME'])
cols = ['unique_id', 'ds', 'y', 'DAY']
df = df[cols]
print(df.isnull().sum())
print(df.shape)
df.tail()
Output:
unique_id 0
ds 0
y 0
DAY 0
dtype: int64
(33408, 4)
unique_id ds y DAY
33403 |210000005481 2023-12-29 22:45:00 964.2 363
33404 |210000005481 2023-12-29 23:00:00 1036.8 363
33405 |210000005481 2023-12-29 23:15:00 1030.8 363
33406 |210000005481 2023-12-29 23:30:00 1049.4 363
33407 |210000005481 2023-12-29 23:45:00 1054.8 363
future_ex_vars_df_origin = pd.read_csv('/content/drive/MyDrive/datasets/210000005481.csv')
future_ex_vars_df = future_ex_vars_df_origin.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
future_ex_vars_df['ds'] = pd.to_datetime(future_ex_vars_df['date'] + ' ' + future_ex_vars_df['DATA_TIME'])
future_ex_vars_df = future_ex_vars_df[future_ex_vars_df['DAY'] == 364]
cols = ['unique_id', 'ds', 'DAY']
future_ex_vars_df = future_ex_vars_df[cols]
print(future_ex_vars_df.shape)
print(future_ex_vars_df.isnull().sum())
future_ex_vars_df.head()
Output:
(96, 3)
unique_id 0
ds 0
DAY 0
dtype: int64
unique_id ds DAY
33408 |210000005481 2023-12-30 00:00:00 364
33409 |210000005481 2023-12-30 00:15:00 364
33410 |210000005481 2023-12-30 00:30:00 364
33411 |210000005481 2023-12-30 00:45:00 364
33412 |210000005481 2023-12-30 01:00:00 364
timegpt_fcst_ex_vars_df = timegpt.forecast(
df=df,
X_df=future_ex_vars_df,
finetune_steps=100,
h=96,
time_col='ds',
target_col='y',
model='timegpt-1-long-horizon',
level=[80, 90]
)
timegpt_fcst_ex_vars_df.head()
Output:
unique_id ds TimeGPT TimeGPT-lo-90 TimeGPT-lo-80 TimeGPT-hi-80 TimeGPT-hi-90
0 |210000005481 2023-12-30 00:00:00 1022.289181 741.859170 837.336947 1207.241414 1302.719192
1 |210000005481 2023-12-30 00:15:00 963.901790 446.810781 603.597085 1324.206496 1480.992800
2 |210000005481 2023-12-30 00:30:00 933.234249 400.233163 554.460030 1312.008468 1466.235335
3 |210000005481 2023-12-30 00:45:00 972.456844 521.245904 613.786275 1331.127413 1423.667785
4 |210000005481 2023-12-30 01:00:00 962.090755 505.215810 593.242837 1330.938674 1418.965700
timegpt.plot(
df[['unique_id', 'ds', 'y']],
timegpt_fcst_ex_vars_df,
max_insample_length=365,
level=[80, 90]
)
timegpt_fcst_ex_vars_df.shape
timegpt_fcst_ex_vars_df['ds'] = pd.to_datetime(timegpt_fcst_ex_vars_df['ds'])
real_data = future_ex_vars_df_origin[future_ex_vars_df_origin['DAY'] == 364]
real_data['ds'] = pd.to_datetime(real_data['date'] + ' ' + real_data['DATA_TIME'])
real_data = real_data.rename(columns={'CONS_ID': 'unique_id'})
cols = ['unique_id', 'ds', 'LOAD']
real_data = real_data[cols]
print(real_data.shape)
real_data.head()
merge_data = pd.merge(real_data, timegpt_fcst_ex_vars_df, how='left', on=['unique_id', 'ds'])
merge_data.head()
Output:
(96, 3)
unique_id ds LOAD TimeGPT TimeGPT-lo-90 TimeGPT-lo-80 TimeGPT-hi-80 TimeGPT-hi-90
0 |210000005481 2023-12-30 00:00:00 1279.8 1022.289181 741.859170 837.336947 1207.241414 1302.719192
1 |210000005481 2023-12-30 00:15:00 1211.4 963.901790 446.810781 603.597085 1324.206496 1480.992800
2 |210000005481 2023-12-30 00:30:00 1412.4 933.234249 400.233163 554.460030 1312.008468 1466.235335
3 |210000005481 2023-12-30 00:45:00 1349.4 972.456844 521.245904 613.786275 1331.127413 1423.667785
4 |210000005481 2023-12-30 01:00:00 964.2 962.090755 505.215810 593.242837 1330.938674 1418.965700
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(merge_data['ds'], merge_data['LOAD'], label='real_data', color='red', marker='o', linestyle='-')
ax.plot(merge_data['ds'], merge_data['TimeGPT'], label='predict_data', color='blue', marker='s', linestyle='--')
plt.show()
Case Study 2
import warnings
from nixtlats import TimeGPT
import pandas as pd
warnings.filterwarnings('ignore')
timegpt = TimeGPT(token='nixak-iiMuoOYjG8IU0QcVfoNByyGIbkmuht1w8aoUzjJEqy7hQunyqSQz0Zp24Wu6OaD3PKaVF3OfuFA45hw4')
df = pd.read_csv('/content/drive/MyDrive/datasets/test.csv')
df = df.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
df['ds'] = pd.to_datetime(df['date'] + ' ' + df['DATA_TIME'])
cols = ['unique_id', 'ds', 'y', 'DAY']
df = df[cols]
print(df.isnull().sum())
df.head()
Output:
unique_id 0
ds 0
y 0
DAY 0
dtype: int64
unique_id ds y DAY
0 |210000003901 2023-01-16 00:00:00 68.19 16
1 |210000003901 2023-01-16 00:15:00 51.39 16
2 |210000003901 2023-01-16 00:30:00 47.24 16
3 |210000003901 2023-01-16 00:45:00 47.17 16
4 |210000003901 2023-01-16 01:00:00 49.31 16
future_ex_vars_df_origin = pd.read_csv('/content/drive/MyDrive/datasets/210000003901.csv')
future_ex_vars_df = future_ex_vars_df_origin.rename(columns={'CONS_ID': 'unique_id', 'LOAD': 'y'})
future_ex_vars_df['ds'] = pd.to_datetime(future_ex_vars_df['date'] + ' ' + future_ex_vars_df['DATA_TIME'])
future_ex_vars_df = future_ex_vars_df[future_ex_vars_df['DAY'] == 364]
cols = ['unique_id', 'ds', 'DAY']
future_ex_vars_df = future_ex_vars_df[cols]
print(future_ex_vars_df.shape)
print(future_ex_vars_df.isnull().sum())
future_ex_vars_df.head()
Output:
(96, 3)
unique_id 0
ds 0
DAY 0
dtype: int64
unique_id ds DAY
33408 |210000003901 2023-12-30 00:00:00 364
33409 |210000003901 2023-12-30 00:15:00 364
33410 |210000003901 2023-12-30 00:30:00 364
33411 |210000003901 2023-12-30 00:45:00 364
33412 |210000003901 2023-12-30 01:00:00 364
timegpt_fcst_ex_vars_df = timegpt.forecast(
df=df,
X_df=future_ex_vars_df,
finetune_steps=100,
h=96,
time_col='ds',
target_col='y',
model='timegpt-1-long-horizon',
level=[80, 90]
)
timegpt_fcst_ex_vars_df.head()
Output:
unique_id ds TimeGPT TimeGPT-lo-90 TimeGPT-lo-80 TimeGPT-hi-80 TimeGPT-hi-90
0 |210000003901 2023-12-30 00:00:00 75.555785 59.431775 66.383720 84.727850 91.679795
1 |210000003901 2023-12-30 00:15:00 66.986273 44.116121 52.079720 81.892827 89.856426
2 |210000003901 2023-12-30 00:30:00 61.534644 40.231499 46.371637 76.697651 82.837788
3 |210000003901 2023-12-30 00:45:00 55.279120 34.012068 40.644038 69.914202 76.546172
4 |210000003901 2023-12-30 01:00:00 52.130294 28.080968 37.955519 66.305068 76.179619
timegpt.plot(
df[['unique_id', 'ds', 'y']],
timegpt_fcst_ex_vars_df,
max_insample_length=365,
level=[80, 90]
)
timegpt_fcst_ex_vars_df.shape
timegpt_fcst_ex_vars_df['ds'] = pd.to_datetime(timegpt_fcst_ex_vars_df['ds'])
real_data = future_ex_vars_df_origin[future_ex_vars_df_origin['DAY'] == 364]
real_data['ds'] = pd.to_datetime(real_data['date'] + ' ' + real_data['DATA_TIME'])
real_data = real_data.rename(columns={'CONS_ID': 'unique_id'})
cols = ['unique_id', 'ds', 'LOAD']
real_data = real_data[cols]
print(real_data.shape)
real_data.head()
merge_data = pd.merge(real_data, timegpt_fcst_ex_vars_df, how='left', on=['unique_id', 'ds'])
print(merge_data.isnull().sum())
merge_data.head()
Output:
(96, 3)
unique_id 0
ds 0
LOAD 0
TimeGPT 0
TimeGPT-lo-90 0
TimeGPT-lo-80 0
TimeGPT-hi-80 0
TimeGPT-hi-90 0
dtype: int64
unique_id ds LOAD TimeGPT TimeGPT-lo-90 TimeGPT-lo-80 TimeGPT-hi-80 TimeGPT-hi-90
0 |210000003901 2023-12-30 00:00:00 87.44 75.555785 59.431775 66.383720 84.727850 91.679795
1 |210000003901 2023-12-30 00:15:00 53.65 66.986273 44.116121 52.079720 81.892827 89.856426
2 |210000003901 2023-12-30 00:30:00 52.12 61.534644 40.231499 46.371637 76.697651 82.837788
3 |210000003901 2023-12-30 00:45:00 52.81 55.279120 34.012068 40.644038 69.914202 76.546172
4 |210000003901 2023-12-30 01:00:00 52.56 52.130294 28.080968 37.955519 66.305068 76.179619
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(merge_data['ds'], merge_data['LOAD'], label='real_data', color='red', marker='o', linestyle='-')
ax.plot(merge_data['ds'], merge_data['TimeGPT'], label='predict_data', color='blue', marker='s', linestyle='--')
plt.show()
References
* https://nixtla.dev/docs/intro
* https://blog.csdn.net/qq_53123875/article/details/136620743
* https://zhuanlan.zhihu.com/p/674814155
* https://github.com/marcopeix/time-series-analysis/blob/master/TimeGPT.ipynb