转换外生特征

%load_ext autoreload
%autoreload 2

对您的外生特征进行转换以用于MLForecast

MLForecast类允许您对目标进行滞后变换，但有时您也希望对动态外生特征进行变换。本指南将向您展示如何实现这一点。

数据设置

from mlforecast.utils import generate_series, generate_prices_for_series

series = generate_series(10, equal_ends=True)
prices = generate_prices_for_series(series)
prices.head(2)

	ds	unique_id	price
0	2000-10-05	0	0.548814
1	2000-10-06	0	0.715189

假设您有一些系列以及每个ID和日期的价格，并且您想要计算接下来7天的预测。由于价格是一个动态特征，您必须通过 MLForecast.predict 提供未来的值，使用 X_df。

如果您不仅想使用价格，还想使用价格的 lag7 以及 lag1 的扩展均值，例如，您可以在训练之前计算它们，将它们与您的系列合并，然后通过 X_df 提供未来值。考虑以下示例。

计算变换

from mlforecast.lag_transforms import ExpandingMean

from mlforecast.feature_engineering import transform_exog

transformed_prices = transform_exog(prices, lags=[7], lag_transforms={1: [ExpandingMean()]})
transformed_prices.head(10)

	ds	price	price_lag7	price_expanding_mean_lag1
0	2000-10-05	0.548814	NaN	NaN
1	2000-10-06	0.715189	NaN	0.548814
2	2000-10-07	0.602763	NaN	0.632001
3	2000-10-08	0.544883	NaN	0.622255
4	2000-10-09	0.423655	NaN	0.602912
5	2000-10-10	0.645894	NaN	0.567061
6	2000-10-11	0.437587	NaN	0.580200
7	2000-10-12	0.891773	0.548814	0.559827
8	2000-10-13	0.963663	0.715189	0.601320
9	2000-10-14	0.383442	0.602763	0.641580

您现在可以将其与原始系列合并。

series_with_prices = series.merge(transformed_prices, on=['unique_id', 'ds'])
series_with_prices.head(10)

	ds	y	price	price_lag7	price_expanding_mean_lag1
0	2000-10-05	0.322947	0.548814	NaN	NaN
1	2000-10-06	1.218794	0.715189	NaN	0.548814
2	2000-10-07	2.445887	0.602763	NaN	0.632001
3	2000-10-08	3.481831	0.544883	NaN	0.622255
4	2000-10-09	4.191721	0.423655	NaN	0.602912
5	2000-10-10	5.395863	0.645894	NaN	0.567061
6	2000-10-11	6.264447	0.437587	NaN	0.580200
7	2000-10-12	0.284022	0.891773	0.548814	0.559827
8	2000-10-13	1.462798	0.963663	0.715189	0.601320
9	2000-10-14	2.035518	0.383442	0.602763	0.641580

然后您可以定义您的预测对象。请注意，您仍然可以根据目标正常计算滞后特征。

from sklearn.linear_model import LinearRegression

from mlforecast import MLForecast

fcst = MLForecast(
    models=[LinearRegression()],
    freq='D',
    lags=[1],
    date_features=['dayofweek'],
)
fcst.preprocess(series_with_prices, static_features=[], dropna=True).head()

	ds	y	price	price_lag7	price_expanding_mean_lag1	lag1	dayofweek
1	2000-10-06	1.218794	0.715189	NaN	0.548814	0.322947	4
2	2000-10-07	2.445887	0.602763	NaN	0.632001	1.218794	5
3	2000-10-08	3.481831	0.544883	NaN	0.622255	2.445887	6
4	2000-10-09	4.191721	0.423655	NaN	0.602912	3.481831	0
5	2000-10-10	5.395863	0.645894	NaN	0.567061	4.191721	1

请注意，dropna 参数仅考虑由基于目标的滞后特征生成的空值。如果您想删除包含空值的所有行，则必须在原始序列中进行操作。

series_with_prices2 = series_with_prices.dropna()
fcst.preprocess(series_with_prices2, dropna=True, static_features=[]).head()

	ds	y	price	price_lag7	price_expanding_mean_lag1	lag1	dayofweek
8	2000-10-13	1.462798	0.963663	0.715189	0.601320	0.284022	4
9	2000-10-14	2.035518	0.383442	0.602763	0.641580	1.462798	5
10	2000-10-15	3.043565	0.791725	0.544883	0.615766	2.035518	6
11	2000-10-16	4.010109	0.528895	0.423655	0.631763	3.043565	0
12	2000-10-17	5.416310	0.568045	0.645894	0.623190	4.010109	1

您现在可以训练模型。

fcst.fit(series_with_prices2, static_features=[])

MLForecast(models=[LinearRegression], freq=D, lag_features=['lag1'], date_features=['dayofweek'], num_threads=1)

并使用价格进行预测。请注意，您可以提供包含完整历史数据的数据框，mlforecast将筛选出预测范围所需的日期。

fcst.predict(1, X_df=transformed_prices).head()

	unique_id	ds	LinearRegression
0	0	2001-05-15	3.803967
1	1	2001-05-15	3.512489
2	2	2001-05-15	3.170019
3	3	2001-05-15	4.307121
4	4	2001-05-15	3.018758

在这个示例中，我们有接下来7天的价格，如果您尝试预测更长的时间范围，将会出现错误。

from fastcore.test import test_fail

test_fail(lambda: fcst.predict(8, X_df=transformed_prices), contains='Found missing inputs in X_df')

Give us a ⭐ on Github