%load_ext autoreload
%autoreload 2
延迟变换
基于滞后计算特征
mlforecast允许您定义对滞后项进行转换以用作特征。这些通过lag_transforms
参数提供,该参数是一个字典,其中键是滞后项,值是要应用于该滞后项的转换列表。
数据设置
import numpy as np
from mlforecast import MLForecast
from mlforecast.utils import generate_daily_series
= generate_daily_series(10) data
内置转换
内置的滞后变换位于 mlforecast.lag_transforms
模块中。
from mlforecast.lag_transforms import RollingMean, ExpandingStd
= MLForecast(
fcst =[],
models='D',
freq={
lag_transforms1: [ExpandingStd()],
7: [RollingMean(window_size=7, min_samples=1), RollingMean(window_size=14)]
}, )
一旦定义了你的变换,你可以使用 MLForecast.preprocess
来查看它们的样子。
2) fcst.preprocess(data).head(
unique_id | ds | y | expanding_std_lag1 | rolling_mean_lag7_window_size7_min_samples1 | rolling_mean_lag7_window_size14 | |
---|---|---|---|---|---|---|
20 | id_0 | 2000-01-21 | 6.319961 | 1.956363 | 3.234486 | 3.283064 |
21 | id_0 | 2000-01-22 | 0.071677 | 2.028545 | 3.256055 | 3.291068 |
扩展内置转换
您可以使用Combine
类组合内置变换,该类接受两个变换和一个操作符。
import operator
from mlforecast.lag_transforms import Combine
= MLForecast(
fcst =[],
models='D',
freq={
lag_transforms1: [
=7),
RollingMean(window_size=14),
RollingMean(window_size
Combine(=7),
RollingMean(window_size=14),
RollingMean(window_size
operator.truediv,
)
],
},
)= fcst.preprocess(data)
prep 2) prep.head(
unique_id | ds | y | rolling_mean_lag1_window_size7 | rolling_mean_lag1_window_size14 | rolling_mean_lag1_window_size7_truediv_rolling_mean_lag1_window_size14 | |
---|---|---|---|---|---|---|
14 | id_0 | 2000-01-15 | 0.435006 | 3.234486 | 3.283064 | 0.985204 |
15 | id_0 | 2000-01-16 | 1.489309 | 3.256055 | 3.291068 | 0.989361 |
np.testing.assert_allclose('rolling_mean_lag1_window_size7'] / prep['rolling_mean_lag1_window_size14'],
prep['rolling_mean_lag1_window_size7_truediv_rolling_mean_lag1_window_size14']
prep[ )
如果你希望在Combine
中的某个变换应用于不同的滞后,可以使用Offset
类,它会先应用偏移,然后再进行变换。
from mlforecast.lag_transforms import Offset
= MLForecast(
fcst =[],
models='D',
freq={
lag_transforms1: [
=7),
RollingMean(window_size
Combine(=7),
RollingMean(window_size=7), n=1),
Offset(RollingMean(window_size
operator.truediv,
)
],2: [RollingMean(window_size=7)]
},
)= fcst.preprocess(data)
prep 2) prep.head(
unique_id | ds | y | rolling_mean_lag1_window_size7 | rolling_mean_lag1_window_size7_truediv_rolling_mean_lag2_window_size7 | rolling_mean_lag2_window_size7 | |
---|---|---|---|---|---|---|
8 | id_0 | 2000-01-09 | 1.462798 | 3.326081 | 0.998331 | 3.331641 |
9 | id_0 | 2000-01-10 | 2.035518 | 3.360938 | 1.010480 | 3.326081 |
np.testing.assert_allclose('rolling_mean_lag1_window_size7'] / prep['rolling_mean_lag2_window_size7'],
prep['rolling_mean_lag1_window_size7_truediv_rolling_mean_lag2_window_size7']
prep[ )
from sklearn.linear_model import LinearRegression
= MLForecast(
fcst =[LinearRegression()],
models='D',
freq={
lag_transforms1: [
=7),
RollingMean(window_size=14),
RollingMean(window_size
Combine(=7),
RollingMean(window_size=14),
RollingMean(window_size
operator.truediv,
)
],
},
)
fcst.fit(data)2); fcst.predict(
基于numba的变换
window-ops包 提供了作为 numba JIT 编译 函数定义的转换。我们使用 numba,因为它使得这些转换速度非常快,并且可以绕过 python 的 GIL,这允许我们在多线程环境中并发运行它们。
使用这些转换的主要好处是它们非常易于实现。然而,当我们需要在预测步骤中更新它们的值时,它们可能会非常慢,因为我们必须在完整历史上再次调用函数并仅保留最后一个值。因此,如果性能是一个关注点,您应该尝试使用内置的转换,或在 MLForecast.preprocess
或 MLForecast.fit
中将 keep_last_n
设置为您的转换所需的最小样本数。
from numba import njit
from window_ops.expanding import expanding_mean
from window_ops.shift import shift_array
@njit
def ratio_over_previous(x, offset=1):
"""计算当前值与其`偏移`滞后值之间的比率"""
return x / shift_array(x, offset=offset)
@njit
def diff_over_previous(x, offset=1):
"""计算当前值与其`offset`滞后值之间的差异"""
return x - shift_array(x, offset=offset)
如果您的函数接受的参数比输入数组更多,您可以提供一个元组,例如:(func, arg1, arg2, ...)
= MLForecast(
fcst =[],
models='D',
freq=[1, 2, 3],
lags={
lag_transforms1: [expanding_mean, ratio_over_previous, (ratio_over_previous, 2)], # 第二个比率设定偏移量为2
2: [diff_over_previous],
},
)= fcst.preprocess(data)
prep 2) prep.head(
unique_id | ds | y | lag1 | lag2 | lag3 | expanding_mean_lag1 | ratio_over_previous_lag1 | ratio_over_previous_lag1_offset2 | diff_over_previous_lag2 | |
---|---|---|---|---|---|---|---|---|---|---|
3 | id_0 | 2000-01-04 | 3.481831 | 2.445887 | 1.218794 | 0.322947 | 1.329209 | 2.006809 | 7.573645 | 0.895847 |
4 | id_0 | 2000-01-05 | 4.191721 | 3.481831 | 2.445887 | 1.218794 | 1.867365 | 1.423546 | 2.856785 | 1.227093 |
正如您所看到的,函数的名称与转换名称结合使用,并加上 _lag
后缀。如果函数有其他参数并且它们没有设置为默认值,那么这些参数也会被包含在内,就像这里的 offset=2
一样。
'lag1'] / prep['lag2'], prep['ratio_over_previous_lag1'])
np.testing.assert_allclose(prep['lag1'] / prep['lag3'], prep['ratio_over_previous_lag1_offset2'])
np.testing.assert_allclose(prep['lag2'] - prep['lag3'], prep['diff_over_previous_lag2']) np.testing.assert_allclose(prep[
Give us a ⭐ on Github