转换外生特征

%load_ext autoreload
%autoreload 2

对您的外生特征进行转换以用于MLForecast

MLForecast类允许您对目标进行滞后变换,但有时您也希望对动态外生特征进行变换。本指南将向您展示如何实现这一点。

数据设置

from mlforecast.utils import generate_series, generate_prices_for_series
series = generate_series(10, equal_ends=True)
prices = generate_prices_for_series(series)
prices.head(2)
ds unique_id price
0 2000-10-05 0 0.548814
1 2000-10-06 0 0.715189

假设您有一些系列以及每个ID和日期的价格,并且您想要计算接下来7天的预测。由于价格是一个动态特征,您必须通过 MLForecast.predict 提供未来的值,使用 X_df

如果您不仅想使用价格,还想使用价格的 lag7 以及 lag1 的扩展均值,例如,您可以在训练之前计算它们,将它们与您的系列合并,然后通过 X_df 提供未来值。考虑以下示例。

计算变换

from mlforecast.lag_transforms import ExpandingMean

from mlforecast.feature_engineering import transform_exog
transformed_prices = transform_exog(prices, lags=[7], lag_transforms={1: [ExpandingMean()]})
transformed_prices.head(10)
ds unique_id price price_lag7 price_expanding_mean_lag1
0 2000-10-05 0 0.548814 NaN NaN
1 2000-10-06 0 0.715189 NaN 0.548814
2 2000-10-07 0 0.602763 NaN 0.632001
3 2000-10-08 0 0.544883 NaN 0.622255
4 2000-10-09 0 0.423655 NaN 0.602912
5 2000-10-10 0 0.645894 NaN 0.567061
6 2000-10-11 0 0.437587 NaN 0.580200
7 2000-10-12 0 0.891773 0.548814 0.559827
8 2000-10-13 0 0.963663 0.715189 0.601320
9 2000-10-14 0 0.383442 0.602763 0.641580

您现在可以将其与原始系列合并。

series_with_prices = series.merge(transformed_prices, on=['unique_id', 'ds'])
series_with_prices.head(10)
unique_id ds y price price_lag7 price_expanding_mean_lag1
0 0 2000-10-05 0.322947 0.548814 NaN NaN
1 0 2000-10-06 1.218794 0.715189 NaN 0.548814
2 0 2000-10-07 2.445887 0.602763 NaN 0.632001
3 0 2000-10-08 3.481831 0.544883 NaN 0.622255
4 0 2000-10-09 4.191721 0.423655 NaN 0.602912
5 0 2000-10-10 5.395863 0.645894 NaN 0.567061
6 0 2000-10-11 6.264447 0.437587 NaN 0.580200
7 0 2000-10-12 0.284022 0.891773 0.548814 0.559827
8 0 2000-10-13 1.462798 0.963663 0.715189 0.601320
9 0 2000-10-14 2.035518 0.383442 0.602763 0.641580

然后您可以定义您的预测对象。请注意,您仍然可以根据目标正常计算滞后特征。

from sklearn.linear_model import LinearRegression

from mlforecast import MLForecast
fcst = MLForecast(
    models=[LinearRegression()],
    freq='D',
    lags=[1],
    date_features=['dayofweek'],
)
fcst.preprocess(series_with_prices, static_features=[], dropna=True).head()
unique_id ds y price price_lag7 price_expanding_mean_lag1 lag1 dayofweek
1 0 2000-10-06 1.218794 0.715189 NaN 0.548814 0.322947 4
2 0 2000-10-07 2.445887 0.602763 NaN 0.632001 1.218794 5
3 0 2000-10-08 3.481831 0.544883 NaN 0.622255 2.445887 6
4 0 2000-10-09 4.191721 0.423655 NaN 0.602912 3.481831 0
5 0 2000-10-10 5.395863 0.645894 NaN 0.567061 4.191721 1

请注意,dropna 参数仅考虑由基于目标的滞后特征生成的空值。如果您想删除包含空值的所有行,则必须在原始序列中进行操作。

series_with_prices2 = series_with_prices.dropna()
fcst.preprocess(series_with_prices2, dropna=True, static_features=[]).head()
unique_id ds y price price_lag7 price_expanding_mean_lag1 lag1 dayofweek
8 0 2000-10-13 1.462798 0.963663 0.715189 0.601320 0.284022 4
9 0 2000-10-14 2.035518 0.383442 0.602763 0.641580 1.462798 5
10 0 2000-10-15 3.043565 0.791725 0.544883 0.615766 2.035518 6
11 0 2000-10-16 4.010109 0.528895 0.423655 0.631763 3.043565 0
12 0 2000-10-17 5.416310 0.568045 0.645894 0.623190 4.010109 1

您现在可以训练模型。

fcst.fit(series_with_prices2, static_features=[])
MLForecast(models=[LinearRegression], freq=D, lag_features=['lag1'], date_features=['dayofweek'], num_threads=1)

并使用价格进行预测。请注意,您可以提供包含完整历史数据的数据框,mlforecast将筛选出预测范围所需的日期。

fcst.predict(1, X_df=transformed_prices).head()
unique_id ds LinearRegression
0 0 2001-05-15 3.803967
1 1 2001-05-15 3.512489
2 2 2001-05-15 3.170019
3 3 2001-05-15 4.307121
4 4 2001-05-15 3.018758

在这个示例中,我们有接下来7天的价格,如果您尝试预测更长的时间范围,将会出现错误。

from fastcore.test import test_fail
test_fail(lambda: fcst.predict(8, X_df=transformed_prices), contains='Found missing inputs in X_df')

Give us a ⭐ on Github