%load_ext autoreload
%autoreload 2
转换外生特征
对您的外生特征进行转换以用于MLForecast
MLForecast类允许您对目标进行滞后变换,但有时您也希望对动态外生特征进行变换。本指南将向您展示如何实现这一点。
数据设置
from mlforecast.utils import generate_series, generate_prices_for_series
= generate_series(10, equal_ends=True)
series = generate_prices_for_series(series)
prices 2) prices.head(
ds | unique_id | price | |
---|---|---|---|
0 | 2000-10-05 | 0 | 0.548814 |
1 | 2000-10-06 | 0 | 0.715189 |
假设您有一些系列以及每个ID和日期的价格,并且您想要计算接下来7天的预测。由于价格是一个动态特征,您必须通过 MLForecast.predict
提供未来的值,使用 X_df
。
如果您不仅想使用价格,还想使用价格的 lag7 以及 lag1 的扩展均值,例如,您可以在训练之前计算它们,将它们与您的系列合并,然后通过 X_df
提供未来值。考虑以下示例。
计算变换
from mlforecast.lag_transforms import ExpandingMean
from mlforecast.feature_engineering import transform_exog
= transform_exog(prices, lags=[7], lag_transforms={1: [ExpandingMean()]})
transformed_prices 10) transformed_prices.head(
ds | unique_id | price | price_lag7 | price_expanding_mean_lag1 | |
---|---|---|---|---|---|
0 | 2000-10-05 | 0 | 0.548814 | NaN | NaN |
1 | 2000-10-06 | 0 | 0.715189 | NaN | 0.548814 |
2 | 2000-10-07 | 0 | 0.602763 | NaN | 0.632001 |
3 | 2000-10-08 | 0 | 0.544883 | NaN | 0.622255 |
4 | 2000-10-09 | 0 | 0.423655 | NaN | 0.602912 |
5 | 2000-10-10 | 0 | 0.645894 | NaN | 0.567061 |
6 | 2000-10-11 | 0 | 0.437587 | NaN | 0.580200 |
7 | 2000-10-12 | 0 | 0.891773 | 0.548814 | 0.559827 |
8 | 2000-10-13 | 0 | 0.963663 | 0.715189 | 0.601320 |
9 | 2000-10-14 | 0 | 0.383442 | 0.602763 | 0.641580 |
您现在可以将其与原始系列合并。
= series.merge(transformed_prices, on=['unique_id', 'ds'])
series_with_prices 10) series_with_prices.head(
unique_id | ds | y | price | price_lag7 | price_expanding_mean_lag1 | |
---|---|---|---|---|---|---|
0 | 0 | 2000-10-05 | 0.322947 | 0.548814 | NaN | NaN |
1 | 0 | 2000-10-06 | 1.218794 | 0.715189 | NaN | 0.548814 |
2 | 0 | 2000-10-07 | 2.445887 | 0.602763 | NaN | 0.632001 |
3 | 0 | 2000-10-08 | 3.481831 | 0.544883 | NaN | 0.622255 |
4 | 0 | 2000-10-09 | 4.191721 | 0.423655 | NaN | 0.602912 |
5 | 0 | 2000-10-10 | 5.395863 | 0.645894 | NaN | 0.567061 |
6 | 0 | 2000-10-11 | 6.264447 | 0.437587 | NaN | 0.580200 |
7 | 0 | 2000-10-12 | 0.284022 | 0.891773 | 0.548814 | 0.559827 |
8 | 0 | 2000-10-13 | 1.462798 | 0.963663 | 0.715189 | 0.601320 |
9 | 0 | 2000-10-14 | 2.035518 | 0.383442 | 0.602763 | 0.641580 |
然后您可以定义您的预测对象。请注意,您仍然可以根据目标正常计算滞后特征。
from sklearn.linear_model import LinearRegression
from mlforecast import MLForecast
= MLForecast(
fcst =[LinearRegression()],
models='D',
freq=[1],
lags=['dayofweek'],
date_features
)=[], dropna=True).head() fcst.preprocess(series_with_prices, static_features
unique_id | ds | y | price | price_lag7 | price_expanding_mean_lag1 | lag1 | dayofweek | |
---|---|---|---|---|---|---|---|---|
1 | 0 | 2000-10-06 | 1.218794 | 0.715189 | NaN | 0.548814 | 0.322947 | 4 |
2 | 0 | 2000-10-07 | 2.445887 | 0.602763 | NaN | 0.632001 | 1.218794 | 5 |
3 | 0 | 2000-10-08 | 3.481831 | 0.544883 | NaN | 0.622255 | 2.445887 | 6 |
4 | 0 | 2000-10-09 | 4.191721 | 0.423655 | NaN | 0.602912 | 3.481831 | 0 |
5 | 0 | 2000-10-10 | 5.395863 | 0.645894 | NaN | 0.567061 | 4.191721 | 1 |
请注意,dropna
参数仅考虑由基于目标的滞后特征生成的空值。如果您想删除包含空值的所有行,则必须在原始序列中进行操作。
= series_with_prices.dropna()
series_with_prices2 =True, static_features=[]).head() fcst.preprocess(series_with_prices2, dropna
unique_id | ds | y | price | price_lag7 | price_expanding_mean_lag1 | lag1 | dayofweek | |
---|---|---|---|---|---|---|---|---|
8 | 0 | 2000-10-13 | 1.462798 | 0.963663 | 0.715189 | 0.601320 | 0.284022 | 4 |
9 | 0 | 2000-10-14 | 2.035518 | 0.383442 | 0.602763 | 0.641580 | 1.462798 | 5 |
10 | 0 | 2000-10-15 | 3.043565 | 0.791725 | 0.544883 | 0.615766 | 2.035518 | 6 |
11 | 0 | 2000-10-16 | 4.010109 | 0.528895 | 0.423655 | 0.631763 | 3.043565 | 0 |
12 | 0 | 2000-10-17 | 5.416310 | 0.568045 | 0.645894 | 0.623190 | 4.010109 | 1 |
您现在可以训练模型。
=[]) fcst.fit(series_with_prices2, static_features
MLForecast(models=[LinearRegression], freq=D, lag_features=['lag1'], date_features=['dayofweek'], num_threads=1)
并使用价格进行预测。请注意,您可以提供包含完整历史数据的数据框,mlforecast将筛选出预测范围所需的日期。
1, X_df=transformed_prices).head() fcst.predict(
unique_id | ds | LinearRegression | |
---|---|---|---|
0 | 0 | 2001-05-15 | 3.803967 |
1 | 1 | 2001-05-15 | 3.512489 |
2 | 2 | 2001-05-15 | 3.170019 |
3 | 3 | 2001-05-15 | 4.307121 |
4 | 4 | 2001-05-15 | 3.018758 |
在这个示例中,我们有接下来7天的价格,如果您尝试预测更长的时间范围,将会出现错误。
from fastcore.test import test_fail
lambda: fcst.predict(8, X_df=transformed_prices), contains='Found missing inputs in X_df') test_fail(
Give us a ⭐ on Github