使用 `sktime` 进行层次、面板和全局预测#

本笔记本概述#

介绍 - 分层时间序列
sktime 中的层次和面板数据格式
预测器和变换器的自动化矢量化
sktime 中的层次对账
层次预测/使用总结/简化模型进行全局预测，“M5 获胜者”
扩展指南
介绍 - 全球预测
无外生数据的全球预测
带有外生数据的全球预测
贡献者荣誉

[1]:

%%capture
import warnings

warnings.filterwarnings("ignore")

层次时间序列#

时间序列通常以“层次”形式存在，例如：

箭头头

示例包括：

许多层次时间序列数据集可以在这里找到：https://forecastingdata.org/ （访问加载器正在开发中）

关于文献，另请参见：https://otexts.com/fpp2/hierarchical.html

时间序列也可以展示多个独立的层次结构：

箭头头

层次结构和面板数据类型的表示#

sktime 区分了时间序列的抽象科学数据类型（=”scitypes”）：

Series 类型 = 一个时间序列（单变量或多变量）
Panel 类型 = 多个时间序列的集合，扁平层次结构
Hierarchical 类型 = 时间序列的分层集合，如上所述

每个科学类型都有多种mtype表示形式。

为了简单起见，我们下面只关注基于 pandas.DataFrame 的表示。

[2]:

# import to retrieve examples

from sktime.datatypes import get_examples

在 "pd.DataFrame" mtype 中，时间序列由内存中的容器 obj: pandas.DataFrame 表示，如下所示。

结构约定：obj.index 必须是单调的，并且是 Int64Index、RangeIndex、DatetimeIndex、PeriodIndex 之一。
变量：obj 的列对应于不同的变量
变量名：列名 obj.columns
时间点：obj 的行对应于不同、明确的时间点
时间索引: obj.index 被解释为时间索引。
功能：可以表示多元序列；可以表示不等间距的序列

"pd.DataFrame" 表示中的单变量序列示例。

这个系列有两个变量，分别命名为 "a" 和 "b"。两者都在相同的时间点 0、1、2、3 被观察。

[3]:

get_examples(mtype="pd.DataFrame")[0]

[3]:

	a
0	1.0
1	4.0
2	0.5
3	-3.0

"pd.DataFrame" 表示中的双变量序列示例。

这个系列有两个变量，分别命名为 "a" 和 "b"。两者都在相同的时间点 0、1、2、3 被观察。

[4]:

get_examples(mtype="pd.DataFrame")[1]

[4]:

	a	b
0	1.0	3.000000
1	4.0	7.000000
2	0.5	2.000000
3	-3.0	-0.428571

在 "pd-multiindex" mtype 中，时间序列面板由内存中的容器 obj: pandas.DataFrame 表示，如下所示。

结构约定：obj.index 必须是一个类型为 (RangeIndex, t) 的成对多索引，其中 t 是 Int64Index、RangeIndex、DatetimeIndex、PeriodIndex 之一，并且是单调的。
实例：具有相同 "instances" 索引的行对应于同一个实例；具有不同 "instances" 索引的行对应于不同的实例。
实例索引：obj.index 中的成对的第一个元素被解释为实例索引。
变量：obj 的列对应于不同的变量
变量名：列名 obj.columns
时间点：具有相同 "timepoints" 索引的 obj 行对应于同一时间点；具有不同 "timepoints" 索引的 obj 行对应于不同的时间点。
时间索引：obj.index 中的成对的第二个元素被解释为时间索引。
功能：可以表示多变量序列的面板；可以表示不等间隔的序列；可以表示不等支持的序列面板；不能表示具有不同变量集的序列面板。

"pd-multiindex" mtype 表示法中多元时间序列面板的示例。该面板包含三个多元时间序列，实例索引分别为 0、1、2。所有序列都有两个变量，名称为 "var_0"、"var_1"。所有序列都在三个时间点 0、1、2 进行了观测。

[5]:

get_examples(mtype="pd-multiindex")[0]

[5]:

		var_0	var_1
instances	timepoints
0	0	1	4
	1	2	5
	2	3	6
1	0	1	4
	1	2	55
	2	3	6
2	0	1	42
	1	2	5
	2	3	6

结构约定：obj.index 必须是一个类型为 (RangeIndex, ..., RangeIndex, t) 的3级或更多级的多重索引，其中 t 是 Int64Index、RangeIndex、DatetimeIndex、PeriodIndex 之一且是单调的。我们称最后一个索引为“类时间”索引
层次级别：具有相同非时间索引的行对应于相同的层次单元；具有不同非时间索引组合的行对应于不同的层次单元。
层次结构：在 obj.index 中，非时间类索引被解释为标识层次结构的索引。
变量：obj 的列对应于不同的变量
变量名：列名 obj.columns
时间点：具有相同 "timepoints" 索引的 obj 行对应于同一时间点；具有不同 "timepoints" 索引的 obj 行对应于不同的时间点。
时间索引：obj.index 中元组的最后一个元素被解释为时间索引。
功能：可以表示分层序列；可以表示不等间距序列；可以表示不等支持的分层序列；不能表示具有不同变量集的分层序列。

[6]:

X = get_examples(mtype="pd_multiindex_hier")[0]
X

[6]:

			var_0	var_1
foo	bar	timepoints
a	0	0	1	4
		1	2	5
		2	3	6
	1	0	1	4
		1	2	55
		2	3	6
	2	0	1	42
		1	2	5
		2	3	6
b	0	0	1	4
		1	2	5
		2	3	6
	1	0	1	4
		1	2	55
		2	3	6
	2	0	1	42
		1	2	5
		2	3	6

[7]:

X.index.get_level_values(level=-1)

Int64Index([0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2], dtype='int64', name='timepoints')

自动向量化（转换）预测器和转换器#

许多为单一系列实现的变压器和预测器

sktime 会自动将它们向量化/“向上转换”为分层数据

在层次结构中按系列/面板单独应用

构建层次化时间序列：

[8]:

from sktime.utils._testing.hierarchical import _make_hierarchical

y = _make_hierarchical()
y

[8]:

			c0
h0	h1	time
h0_0	h1_0	2000-01-01	1.954666
		2000-01-02	4.749032
		2000-01-03	3.116761
		2000-01-04	3.951981
		2000-01-05	3.789698
...	...	...	...
h0_1	h1_3	2000-01-08	2.044946
		2000-01-09	2.609192
		2000-01-10	4.107474
		2000-01-11	3.194098
		2000-01-12	3.712494

96 rows × 1 columns

所有预测器和转换器都适用于此！

如果预测器仅实现了单一系列的逻辑，则“应用于所有层次单位”

[9]:

from sktime.forecasting.arima import ARIMA

forecaster = ARIMA()

y_pred = forecaster.fit(y, fh=[1, 2]).predict()
y_pred

[9]:

			c0
h0	h1
h0_0	h1_0	2000-01-13	2.972898
	h1_0	2000-01-14	2.990252
	h1_1	2000-01-13	3.108714
	h1_1	2000-01-14	3.641990
	h1_2	2000-01-13	3.263830
	h1_2	2000-01-14	3.040534
	h1_3	2000-01-13	3.186436
	h1_3	2000-01-14	3.031391
h0_1	h1_0	2000-01-13	2.483968
	h1_0	2000-01-14	2.568648
	h1_1	2000-01-13	2.583254
	h1_1	2000-01-14	2.591526
	h1_2	2000-01-13	3.114990
	h1_2	2000-01-14	3.078702
	h1_3	2000-01-13	2.979505
	h1_3	2000-01-14	3.029302

每个层次结构级别的单独预测器存储在 forecasters_ 属性中
这包含一个 pandas.DataFrame，其格式与层次级别相同，并且包含拟合的预测器。
（对于转换器，该属性称为 transformers_）

[10]:

forecaster.forecasters_

[10]:

		forecasters
h0	h1
h0_0	h1_0	ARIMA()
	h1_1	ARIMA()
	h1_2	ARIMA()
	h1_3	ARIMA()
h0_1	h1_0	ARIMA()
	h1_1	ARIMA()
	h1_2	ARIMA()
	h1_3	ARIMA()

要访问或检查单个预测器，请访问 forecasters_ 数据框的元素：

[11]:

forecaster.forecasters_.iloc[0, 0].summary()

[11]:

SARIMAX Results
Dep. Variable:	y	No. Observations:	12
Model:	SARIMAX(1, 0, 0)	Log Likelihood	-15.852
Date:	Mon, 11 Apr 2022	AIC	37.704
Time:	06:55:46	BIC	39.159
Sample:	0	HQIC	37.165
	- 12
Covariance Type:	opg

	coef	std err	z	P>\|z\|	[0.025	0.975]
intercept	2.9242	1.285	2.275	0.023	0.405	5.443
ar.L1	0.0222	0.435	0.051	0.959	-0.831	0.876
sigma2	0.8221	0.676	1.217	0.224	-0.502	2.146

Ljung-Box (L1) (Q):	0.00	Jarque-Bera (JB):	1.13
Prob(Q):	0.99	Prob(JB):	0.57
Heteroskedasticity (H):	0.52	Skew:	0.55
Prob(H) (two-sided):	0.55	Kurtosis:	1.98

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

自动向量化也应用于概率预测（参见笔记本2）：

[12]:

forecaster.predict_interval()

[12]:

			Coverage
			0.9
			lower	upper
h0	h1
h0_0	h1_0	2000-01-13	1.481552	4.464243
	h1_0	2000-01-14	1.498538	4.481966
	h1_1	2000-01-13	1.807990	4.409437
	h1_1	2000-01-14	2.247328	5.036651
	h1_2	2000-01-13	1.630340	4.897320
	h1_2	2000-01-14	1.378545	4.702523
	h1_3	2000-01-13	2.001023	4.371850
	h1_3	2000-01-14	1.785088	4.277694
h0_1	h1_0	2000-01-13	1.057036	3.910900
	h1_0	2000-01-14	1.121161	4.016136
	h1_1	2000-01-13	1.825167	3.341341
	h1_1	2000-01-14	1.818342	3.364710
	h1_2	2000-01-13	1.521553	4.708427
	h1_2	2000-01-14	1.484741	4.672664
	h1_3	2000-01-13	1.826515	4.132494
	h1_3	2000-01-14	1.873655	4.184949

预测器/本机实现哪个级别以及何时发生矢量化是由“内部mtype”属性决定的。

对于预报员，检查 y_inner_mtype，这是一个内部支持的机器类型列表：

[13]:

forecaster.get_tag("y_inner_mtype")

[13]:

'pd.Series'

ARIMA 仅支持一种 Series 级别的 mtype。对于所有 mtype 代码及其对应的 scitype（序列、面板或层次结构）的注册，

参见 sktime.datatypes.MTYPE_REGISTER （转换为 pandas.DataFrame 以进行美观打印）

[14]:

import pandas as pd

from sktime.datatypes import MTYPE_REGISTER

pd.DataFrame(MTYPE_REGISTER)

[14]:

	0	1	2
0	pd.Series	Series	pd.Series representation of a univariate series
1	pd.DataFrame	Series	pd.DataFrame representation of a uni- or multi...
2	np.ndarray	Series	2D numpy.ndarray with rows=samples, cols=varia...
3	nested_univ	Panel	pd.DataFrame with one column per variable, pd....
4	numpy3D	Panel	3D np.array of format (n_instances, n_columns,...
5	numpyflat	Panel	WARNING: only for internal use, not a fully su...
6	pd-multiindex	Panel	pd.DataFrame with multi-index (instances, time...
7	pd-wide	Panel	pd.DataFrame in wide format, cols = (instance*...
8	pd-long	Panel	pd.DataFrame in long format, cols = (index, ti...
9	df-list	Panel	list of pd.DataFrame
10	pd_multiindex_hier	Hierarchical	pd.DataFrame with MultiIndex
11	alignment	Alignment	pd.DataFrame in alignment format, values are i...
12	alignment_loc	Alignment	pd.DataFrame in alignment format, values are l...
13	pd_DataFrame_Table	Table	pd.DataFrame representation of a data table
14	numpy1D	Table	1D np.narray representation of a univariate table
15	numpy2D	Table	2D np.narray representation of a univariate table
16	pd_Series_Table	Table	pd.Series representation of a data table
17	list_of_dict	Table	list of dictionaries with primitive entries
18	pred_interval	Proba	predictive intervals
19	pred_quantiles	Proba	quantile predictions
20	pred_var	Proba	variance predictions

路线图项目：

与本地分层支持接口和实现常见方法
- 使用层次因子、GEE、混合效应的ARIMA
用于将“使用层次级别”作为协变量或外生特征的包装器
对变量进行完全矢量化，以使单变量预测器变为多变量

欢迎贡献！

层次化协调#

预测协调 = 确保满足线性层次依赖关系，
例如，“柏林各店铺销售额之和必须等于柏林总销售额”
需要层次结构（或面板）数据，通常涉及总计

自底向上对账 通过仅在最底层生成预测，然后向上累加到所有更高层的总计来工作。

* Arguably the most simple of all algorithms to reconcile across hierarchies.
* Advantages: easy to implement
* Disadvantages: lower levels of hierarchy are prone to excess volatility. This excess volatility is aggregated up, often producing much less accurate top level forecasts.

自上而下的协调 通过生成顶层预测并将它们基于例如较低级别的相对比例分解到最低级别来工作。

* Advantages: still easy to implement, top level is stable
* Disadvantages: peculiarities of lower levels of hierarchy ignored

最优预测协调 通过在层次结构的所有层级上生成预测，并以最小化预测误差的方式调整所有预测来工作。

* Advantages: often found to be most accurate implementation
* Disadvantages: more difficult to implement

sktime 提供了协调功能：

节点聚合的数据容器约定
计算节点级聚合的功能 - Aggregator
- 可用于自底向上的对账
实现协调逻辑的转换器 - Reconciler
- 实现自顶向下的协调
- 实现类似变压器的最优预测协调

sktime 使用 pd_multiindex_hier 格式的一个特殊情况来存储节点级别的聚合：

在一个实例（非类时）级别中的 __total 索引元素表示对该级别以下所有实例的求和。
__total 索引元素是保留的，不能用于其他任何用途
在 __total 索引元素下的条目是同一级别中所有其他实例的条目总和，其中找到 __total 元素。

通过应用 Aggregator 转换器，可以获得节点级聚合格式。

在非聚合输入的管道中，这允许通过总计进行预测。

[15]:

from sktime.datatypes import get_examples

y_hier = get_examples("pd_multiindex_hier")[1]
y_hier

[15]:

			var_0
foo	bar	timepoints
a	0	0	1
		1	2
		2	3
	1	0	1
		1	2
		2	3
	2	0	1
		1	2
		2	3
b	0	0	1
		1	2
		2	3
	1	0	1
		1	2
		2	3
	2	0	1
		1	2
		2	3

[16]:

from sktime.transformations.hierarchical.aggregate import Aggregator

Aggregator().fit_transform(y_hier)

[16]:

			var_0
foo	bar	timepoints
__total	__total	0	6
		1	12
		2	18
a	0	0	1
		1	2
		2	3
	1	0	1
		1	2
		2	3
	2	0	1
		1	2
		2	3
	__total	0	3
		1	6
		2	9
b	0	0	1
		1	2
		2	3
	1	0	1
		1	2
		2	3
	2	0	1
		1	2
		2	3
	__total	0	3
		1	6
		2	9

如果在管道开始时使用，预测不仅会针对节点 __total 进行，还会针对各个实例进行。

注意：通常，这不会产生一个协调的预测，即，预测总数将不会相加。

[17]:

from sktime.forecasting.naive import NaiveForecaster

pipeline_to_forecast_totals = Aggregator() * NaiveForecaster()

pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()

[17]:

			0
foo	bar
__total	__total	3	18
__total	__total	4	18
a	0	3	3
	0	4	3
	1	3	3
	1	4	3
	2	3	3
	2	4	3
	__total	3	9
	__total	4	9
b	0	3	3
	0	4	3
	1	3	3
	1	4	3
	2	3	3
	2	4	3
	__total	3	9
	__total	4	9

如果在管道末端使用，预测将自下而上地协调。

这将产生一个协调的预测，尽管自下而上可能不是首选的方法。

[18]:

from sktime.forecasting.naive import NaiveForecaster

pipeline_to_forecast_totals = NaiveForecaster() * Aggregator()

pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()

[18]:

			0
foo	bar
__total	__total	3	18
__total	__total	4	18
a	0	3	3
	0	4	3
	1	3	3
	1	4	3
	2	3	3
	2	4	3
	__total	3	9
	__total	4	9
b	0	3	3
	0	4	3
	1	3	3
	1	4	3
	2	3	3
	2	4	3
	__total	3	9
	__total	4	9

对于类似变压器的协调，使用 Reconciler 。它支持诸如 OLS 和 WLS 等高级技术：

[19]:

from sktime.transformations.hierarchical.reconcile import Reconciler

pipeline_with_reconciliation = (
    Aggregator() * NaiveForecaster() * Reconciler(method="ols")
)

[20]:

pipeline_to_forecast_totals.fit(y_hier, fh=[1, 2])
pipeline_to_forecast_totals.predict()

[20]:

			0
foo	bar
__total	__total	3	18
__total	__total	4	18
a	0	3	3
	0	4	3
	1	3	3
	1	4	3
	2	3	3
	2	4	3
	__total	3	9
	__total	4	9
b	0	3	3
	0	4	3
	1	3	3
	1	4	3
	2	3	3
	2	4	3
	__total	3	9
	__total	4	9

路线图项目：

包装类型的和解
对账 & 全球预测
概率协调

`sktime` 中的全局预测#

全局预测/面板预测 - 简介#

全局预测 = 在 时间序列集合 上进行训练，即在时间序列面板上进行训练。

通常比“为每个时间序列实例拟合一个预测器”更好。

也称为：

“面板预测”，如果预测所有时间序列
“跨学习”，如果训练时间序列与部署集不相交

为什么全球预测很重要？

在实践中，我们经常有时间序列的有限范围
估计是困难的，我们无法模拟复杂的依赖关系
全局预测的假设：我们可以多次观察到相同的数据生成过程（DGP）
非相同的DGP也可以很好，只要异质性的程度能被外生信息所捕捉。
现在我们有了更多的信息，可以更可靠地估计更复杂的模型（注意：除非复杂性纯粹由时间动态驱动）

由于这些优势，全球预测模型在比赛中取得了巨大成功，例如：* Rossmann 商店销售 * 沃尔玛在恶劣天气下的销售 * M5 比赛

实践中许多商业问题本质上都是全局预测问题——通常也反映出层次信息（见上文）

多元预测与全局预测的区别 * 多元预测侧重于建模时间序列之间的相互依赖关系 * 全局预测可以建模相互依赖关系，但重点在于增强观测空间

sktime 中的实现

所有模型都支持面板预测——这些预测可以是独立的或相关的
- 某些模型的面板预测使用所有时间序列，例如，减少预测者
能够在不相关样本上训练的模型（“跨学习”）具有 capability:global_forecasting 标签

使用简化预测器的面板预测#

在下面的示例中，我们将使用 Panel 科学类型的 pd-multiindex 表示（见上文）

行多重索引的第0级是各个时间序列的唯一标识符，第1级包含时间索引。

[21]:

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline

from sktime.forecasting.compose import make_reduction
from sktime.split import temporal_train_test_split
from sktime.transformations.series.date import DateTimeFeatures

pd.options.mode.chained_assignment = None
pd.set_option("display.max_columns", None)

# %%
# Load M5 Data and prepare
y = pd.read_pickle("global_fc/y.pkl")
X = pd.read_pickle("global_fc/X.pkl")

y/X 基于 M5 竞赛。数据特征包括不同商店、不同州和不同产品类别的销售情况。

有关比赛的详细分析，请参阅论文《M5 精度竞赛：结果、发现和结论》。

https://doi.org/10.1016/j.ijforecast.2021.11.013

你可以在这里看到数据的概览：

[22]:

print(y.head())
print(X.head())

                            y
instances timepoints
1         2016-03-15   756.67
          2016-03-16   679.13
          2016-03-17   633.40
          2016-03-18  1158.04
          2016-03-19   914.24
                      dept_id  cat_id  store_id  state_id  event_name_1  \
instances timepoints
1         2016-03-15        1       1        10         3             1
          2016-03-16        1       1        10         3             1
          2016-03-17        1       1        10         3             7
          2016-03-18        1       1        10         3             1
          2016-03-19        1       1        10         3             1

                      event_type_1  event_name_2  event_type_2  snap  \
instances timepoints
1         2016-03-15             1             1             1     3
          2016-03-16             1             1             1     0
          2016-03-17             3             1             1     0
          2016-03-18             1             1             1     0
          2016-03-19             1             1             1     0

                      no_stock_days
instances timepoints
1         2016-03-15              0
          2016-03-16              0
          2016-03-17              0
          2016-03-18              0
          2016-03-19              0

通过第一列中的实例参数分组的时间序列 = 面板
专注于建模单个产品
层次信息作为外部信息提供。

在M5比赛中，获胜的解决方案使用了关于层次结构的外生特征，如 "dept_id", "store_id" 等，以捕捉产品的相似性和差异性。其他特征包括假日事件和SNAP日（美国社会保障特定援助计划在某些天支付）。

现在使用 temporal_train_test_split 将数据集分为测试集和训练集。

我们可以单独切割时间序列的每个实例：

[23]:

y_train, y_test, X_train, X_test = temporal_train_test_split(y, X)
print(y_train.head(5))
print(y_test.head(5))

		y
instances	timepoints
1	2016-03-15	756.67
	2016-03-16	679.13
	2016-03-17	633.40
	2016-03-18	1158.04
	2016-03-19	914.24

		y
instances	timepoints
1	2016-04-14	874.57
	2016-04-15	895.29
	2016-04-16	1112.63
	2016-04-17	1014.86
	2016-04-18	691.91

y 和 X 以相同的方式分割，并且层次结构得以保留。

树集合利用时间序列和协变量之间复杂的非线性关系/依赖关系。

在**单变量时间序列预测**中，由于数据缺乏，基于树的模型通常表现不佳。

由于 全球预测 中的有效样本量较大，树集成可以成为一个不错的选择（例如，M5竞赛中有42,840个时间序列）。

sktime 可以通过简化接口任何与 sklearn 兼容的模型，例如 RandomForestRegressor。

[24]:

regressor = make_pipeline(
    RandomForestRegressor(random_state=1),
)

注意：reduction 将一个监督回归器应用于时间序列，即应用于它们原本未设计用于的任务。

这个问题可以通过生成捕捉时间序列动态的特征来解决。
"WindowSummarizer" 可以用于生成对时间序列预测有用的特征，
基于提供的函数字典、窗口偏移和窗口长度：

[25]:

import pandas as pd

from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import ForecastingPipeline
from sktime.transformations.series.summarize import WindowSummarizer

kwargs = {
    "lag_feature": {
        "lag": [1],
        "mean": [[1, 3], [3, 6]],
        "std": [[1, 4]],
    }
}

transformer = WindowSummarizer(**kwargs)
y_transformed = transformer.fit_transform(y_train)

y_transformed.head(10)

		y_lag_1	y_mean_1_3	y_mean_3_6	y_std_1_4
instances	timepoints
1	2016-03-15	NaN	NaN	NaN	NaN
	2016-03-16	756.67	NaN	NaN	NaN
	2016-03-17	679.13	NaN	NaN	NaN
	2016-03-18	633.40	689.733333	NaN	NaN
	2016-03-19	1158.04	823.523333	NaN	239.617572
	2016-03-20	914.24	901.893333	NaN	241.571143
	2016-03-21	965.27	1012.516667	NaN	216.690775
	2016-03-22	630.77	836.760000	NaN	217.842052
	2016-03-23	702.79	766.276667	851.125000	161.669232
	2016-03-24	728.15	687.236667	830.141667	145.007117

符号 "mean": [[1, 3]]``（捕获在列 ``"y_mean_1_3" 中）的读法是：

摘要功能 "mean": [[1, 3]] 应用于：

长度为 3 的窗口
start (包含) 滞后一个周期

可视化：

对于 z = 目标时间戳：

window = [1, 3]，表示 lag 为 1 和 window_length 为 3，选择最近的三天（不包括 z）。

摘要是在这样的窗口中完成的：

x x x x x x x x * * * z x

默认情况下，"WindowSummarizer" 使用 pandas 的滚动窗口函数来实现特征的快速生成。* “求和”, * “均值”, * “中位数”, * “标准差”, * “方差”, * “峰度”, * “最小值”, * “最大值”, * “相关系数”, * “协方差”, * “偏度”, * “标准误差”

通常非常快，因为针对滚动、分组操作进行了优化

在M5竞赛中，可以说最相关的特征是：

均值计算以捕捉水平变化，例如上周销售额、上个月前一周的销售额等。
标准差 用于捕捉销售波动性的增加/减少，以及它如何影响未来的销售
滚动偏度 / 峰度计算，以捕捉商店销售趋势的变化。
各种不同的计算方法来捕捉零销售期（例如，缺货场景）

前三个可以通过原生的 pandas 函数获得。

WindowSummarizer 也可以提供任意的总结函数。

示例：函数 count_gt130 用于计算在长度为3的窗口内，滞后2个周期的情况下，有多少观测值高于130的阈值。

[26]:

import numpy as np


def count_gt130(x):
    """Count how many observations lie above threshold 130."""
    return np.sum((x > 700)[::-1])


kwargs = {
    "lag_feature": {
        "lag": [1],
        count_gt130: [[2, 3]],
        "std": [[1, 4]],
    }
}

transformer = WindowSummarizer(**kwargs)
y_transformed = transformer.fit_transform(y_train)

y_transformed.head(10)

		y_lag_1	y_count_gt130_2_3	y_std_1_4
instances	timepoints
1	2016-03-15	NaN	NaN	NaN
	2016-03-16	756.67	NaN	NaN
	2016-03-17	679.13	NaN	NaN
	2016-03-18	633.40	NaN	NaN
	2016-03-19	1158.04	1.0	239.617572
	2016-03-20	914.24	1.0	241.571143
	2016-03-21	965.27	2.0	216.690775
	2016-03-22	630.77	3.0	217.842052
	2016-03-23	702.79	2.0	161.669232
	2016-03-24	728.15	2.0	145.007117

上述代码将 "WindowSummarizer" 应用于预测目标 y。

要将 "WindowSummarizer" 应用于 X 中的列，请在 "ForecastingPipeline" 中使用 "WindowSummarizer" 并指定 "target_cols"。

在M5竞赛中，外生特征的滞后性对于节假日虚拟变量（通常在主要节假日前后几天销售会受到影响）以及商品价格变化（包括折扣和持续的价格变化）特别有用。

[27]:

from sktime.datasets import load_longley
from sktime.forecasting.naive import NaiveForecaster

y_ll, X_ll = load_longley()
y_train_ll, y_test_ll, X_train_ll, X_test_ll = temporal_train_test_split(y_ll, X_ll)
fh = ForecastingHorizon(X_test_ll.index, is_relative=False)
# Example transforming only X
pipe = ForecastingPipeline(
    steps=[
        ("a", WindowSummarizer(n_jobs=1, target_cols=["POP", "GNPDEFL"])),
        ("b", WindowSummarizer(n_jobs=1, target_cols=["GNP"], **kwargs)),
        ("forecaster", NaiveForecaster(strategy="drift")),
    ]
)
pipe_return = pipe.fit(y_train_ll, X_train_ll)
y_pred1 = pipe_return.predict(fh=fh, X=X_test_ll)

y_pred1

1959    67075.727273
1960    67638.454545
1961    68201.181818
1962    68763.909091
Freq: A-DEC, dtype: float64

为了提高效率，相关功能以并行方式计算。

为了最大化并行化，将 WindowSummarizer 作为 make_reduction 中的单个转换器传递。

在这种情况下，window_length 是从 WindowSummarizer 推断出来的，不需要传递给 make_reduction。

[28]:

forecaster = make_reduction(
    regressor,
    transformers=[WindowSummarizer(**kwargs, n_jobs=1)],
    window_length=None,
    strategy="recursive",
    pooling="global",
)

与日历季节性相关的概念需要通过特征工程提供给大多数模型。例如：

历史上观察到的股票价格的星期效应（周五的价格通常与周一的价格不同）。
二手车价格在春季比夏季更高
月初支出与月末支出不同，这是由于工资效应造成的。

日历的季节性可以通过虚拟变量或傅里叶项来建模。根据经验法则，对于不连续效应使用虚拟变量，而当你认为季节性存在一定程度的平滑性时，使用傅里叶/周期/季节性项。

sktime 通过 DateTimeFeatures 转换器支持日历虚拟特征。可以手动指定所需的季节性，或提供时间序列的基本频率（每日、每周等）和所需的复杂性（少量特征与多特征），DateTimeFeatures 可以推断出合理的季节性。

[29]:

transformer = DateTimeFeatures(ts_freq="D")
X_hat = transformer.fit_transform(X_train)

new_cols = [i for i in X_hat if i not in X_train.columns]
X_hat[new_cols]

[29]:

		year	month	weekday
instances	timepoints
1	2016-03-15	2016	3	1
	2016-03-16	2016	3	2
	2016-03-17	2016	3	3
	2016-03-18	2016	3	4
	2016-03-19	2016	3	5
	2016-03-20	2016	3	6
	2016-03-21	2016	3	0
	2016-03-22	2016	3	1
	2016-03-23	2016	3	2
	2016-03-24	2016	3	3
	2016-03-25	2016	3	4
	2016-03-26	2016	3	5
	2016-03-27	2016	3	6
	2016-03-28	2016	3	0
	2016-03-29	2016	3	1
	2016-03-30	2016	3	2
	2016-03-31	2016	3	3
	2016-04-01	2016	4	4
	2016-04-02	2016	4	5
	2016-04-03	2016	4	6
	2016-04-04	2016	4	0
	2016-04-05	2016	4	1
	2016-04-06	2016	4	2
	2016-04-07	2016	4	3
	2016-04-08	2016	4	4
	2016-04-09	2016	4	5
	2016-04-10	2016	4	6
	2016-04-11	2016	4	0
	2016-04-12	2016	4	1
	2016-04-13	2016	4	2
2	2016-03-15	2016	3	1
	2016-03-16	2016	3	2
	2016-03-17	2016	3	3
	2016-03-18	2016	3	4
	2016-03-19	2016	3	5
	2016-03-20	2016	3	6
	2016-03-21	2016	3	0
	2016-03-22	2016	3	1
	2016-03-23	2016	3	2
	2016-03-24	2016	3	3
	2016-03-25	2016	3	4
	2016-03-26	2016	3	5
	2016-03-27	2016	3	6
	2016-03-28	2016	3	0
	2016-03-29	2016	3	1
	2016-03-30	2016	3	2
	2016-03-31	2016	3	3
	2016-04-01	2016	4	4
	2016-04-02	2016	4	5
	2016-04-03	2016	4	6
	2016-04-04	2016	4	0
	2016-04-05	2016	4	1
	2016-04-06	2016	4	2
	2016-04-07	2016	4	3
	2016-04-08	2016	4	4
	2016-04-09	2016	4	5
	2016-04-10	2016	4	6
	2016-04-11	2016	4	0
	2016-04-12	2016	4	1
	2016-04-13	2016	4	2

DateTimeFeatures 支持以下频率：

你可以通过例如“day_of_month”、“day_of_week”、“week_of_quarter”等符号来指定手动生成虚拟特征。

[30]:

transformer = DateTimeFeatures(manual_selection=["week_of_month", "day_of_quarter"])
X_hat = transformer.fit_transform(X_train)

new_cols = [i for i in X_hat if i not in X_train.columns]
X_hat[new_cols]

		day_of_quarter	week_of_month
instances	timepoints
1	2016-03-15	75	3
	2016-03-16	76	3
	2016-03-17	77	3
	2016-03-18	78	3
	2016-03-19	79	3
	2016-03-20	80	3
	2016-03-21	81	3
	2016-03-22	82	4
	2016-03-23	83	4
	2016-03-24	84	4
	2016-03-25	85	4
	2016-03-26	86	4
	2016-03-27	87	4
	2016-03-28	88	4
	2016-03-29	89	5
	2016-03-30	90	5
	2016-03-31	91	5
	2016-04-01	1	1
	2016-04-02	2	1
	2016-04-03	3	1
	2016-04-04	4	1
	2016-04-05	5	1
	2016-04-06	6	1
	2016-04-07	7	1
	2016-04-08	8	2
	2016-04-09	9	2
	2016-04-10	10	2
	2016-04-11	11	2
	2016-04-12	12	2
	2016-04-13	13	2
2	2016-03-15	75	3
	2016-03-16	76	3
	2016-03-17	77	3
	2016-03-18	78	3
	2016-03-19	79	3
	2016-03-20	80	3
	2016-03-21	81	3
	2016-03-22	82	4
	2016-03-23	83	4
	2016-03-24	84	4
	2016-03-25	85	4
	2016-03-26	86	4
	2016-03-27	87	4
	2016-03-28	88	4
	2016-03-29	89	5
	2016-03-30	90	5
	2016-03-31	91	5
	2016-04-01	1	1
	2016-04-02	2	1
	2016-04-03	3	1
	2016-04-04	4	1
	2016-04-05	5	1
	2016-04-06	6	1
	2016-04-07	7	1
	2016-04-08	8	2
	2016-04-09	9	2
	2016-04-10	10	2
	2016-04-11	11	2
	2016-04-12	12	2
	2016-04-13	13	2

使用 "WindowSummarizer"、"DateTimeFeatures" 和 "make_reduction" 函数，我们现在可以基于 M5 竞赛数据的一个样本设置一个端到端全球预测管道的示例：

[31]:

pipe = ForecastingPipeline(
    steps=[
        (
            "event_dynamics",
            WindowSummarizer(
                n_jobs=-1, **kwargs, target_cols=["event_type_1", "event_type_2"]
            ),
        ),
        ("snap_dynamics", WindowSummarizer(n_jobs=-1, target_cols=["snap"])),
        ("daily_season", DateTimeFeatures(ts_freq="D")),
        ("forecaster", forecaster),
    ]
)

pipe_return = pipe.fit(y_train, X_train)
y_pred1 = pipe_return.predict(fh=1, X=X_test)

y_pred1

		y
instances	timepoints
1	2016-03-15	756.67
2	2016-03-15	1901.15

构建你自己的层次预测器#

入门：

遵循 “实现估计器” 开发者指南
使用高级预测扩展模板

扩展模板 = 带有待办块的 Python “填空” 模板，允许您实现自己的、与 sktime 兼容的预测算法。

使用 check_estimator 检查估计器

对于分层预测：

确保为面板和层次数据选择支持的m类型
推荐：y_inner_mtype = ["pd.DataFrame", "pd-multiindex", "pd_multiindex_hier"]，X_inner_mtype 同理。
- 这确保了在 _fit 和 _predict 中看到的输入 y 和 X 是 pd.DataFrame ，具有1、2、3或更多行级别。
如果存在高效的实现，您可以对行进行向量化处理。
- 但：自动矢量化已经循环遍历行索引集，如果“分层”是指这个，就不要实现它。
- 为了确保自动矢量化，请勿在 y_inner_mtype 和 X_inner_mtype 中包含分层或面板的 mtype。
仔细思考你的估计器是一个预测器，还是可以分解为一个转换器
- “执行X，然后应用sktime中已有的预测器”是一个强烈的提示，表明你实际上想要实现一个转换器

使用深度学习模型和变换器的全球预测#

全球预测是关于什么的？

许多深度学习模型将会
- 在包含来自多个实例的许多时间序列的大型数据集上进行训练
- 预测训练数据之外的其他实例的时间序列。
在多个序列上进行训练并在训练数据之外进行预测的能力被称为全局预测。
全球预报是如何制定的？
- 在包含 \(n\) 个时间序列 \(\{x_{i_1}, x_{i_2}, x_{i_3}, ..., x_{i_n}\}\) 的数据集上进行训练
- 在另一组时间序列上进行预测 \(\{x_{j_1}, x_{j_2}, x_{j_3}, ..., x_{j_m}\}\)
- \(m\) 不需要等于 \(n\)
- 集合 \(\{i_1,.., i_n\}\) 和 \(\{j_1, ... j_m\}\) 之间可能存在也可能不存在交集。

要检查 sktime 中某个预测器的全局预测能力，可以查看 capability:global_forecasting 标签。

[32]:

from sktime.forecasting.pytorchforecasting import PytorchForecastingDeepAR

PytorchForecastingDeepAR().get_tag(
    "capability:global_forecasting"
)  # should return True

[32]:

True

你可以传递 \(n\) 系列的 y 给 fit，然后在训练数据之外的 \(m\) 系列上进行预测。

首先我们生成90个序列用于训练，10个序列用于预测。

[33]:

from sklearn.model_selection import train_test_split

from sktime.utils._testing.hierarchical import _make_hierarchical

data = _make_hierarchical(
    hierarchy_levels=(100, 1), max_timepoints=10, min_timepoints=10, n_columns=1
)
data = data.droplevel(1)
y_train, y_test = train_test_split(data, test_size=0.1, train_size=0.9, shuffle=False)

y_train 是我们想要拟合的序列，其中包含时间序列实例的前90%。

[34]:

y_train

[34]:

		c0
h0	time
h0_0	2000-01-01	2.800129
	2000-01-02	5.385355
	2000-01-03	3.938048
	2000-01-04	4.838710
	2000-01-05	3.744334
...	...	...
h0_89	2000-01-06	5.662998
	2000-01-07	3.283564
	2000-01-08	3.652488
	2000-01-09	6.173749
	2000-01-10	5.208331

900 rows × 1 columns

y_test 是我们想要预测的序列，其中包含时间序列实例的最后10%。

[35]:

y_test

[35]:

		c0
h0	time
h0_90	2000-01-01	3.761360
	2000-01-02	3.931633
	2000-01-03	4.486252
	2000-01-04	3.225777
	2000-01-05	4.567175
...	...	...
h0_99	2000-01-06	3.954103
	2000-01-07	3.168133
	2000-01-08	4.330946
	2000-01-09	4.586232
	2000-01-10	4.600275

100 rows × 1 columns

你可以看到 y_train 和 y_test 在层次级别 h0 上是不同的，这表明它们来自不同的实例。

我们可以初始化一个全局预测器来拟合 y_train 并在 y_test 上进行预测。

[36]:

from sktime.forecasting.pytorchforecasting import PytorchForecastingNBeats

model = PytorchForecastingNBeats(
    trainer_params={
        "max_epochs": 5,  # for quick test
        "limit_train_batches": 50,  # for quick test
    },
    dataset_params={
        "max_encoder_length": 3,
    },
)

[37]:

from sktime.forecasting.base import ForecastingHorizon

fh = ForecastingHorizon([1, 2, 3], is_relative=True)
model.fit(y=y_train, fh=fh)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
You are using a CUDA device ('NVIDIA GeForce RTX 3060') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
2024-08-01 16:35:52.693686: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-08-01 16:35:52.719912: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-01 16:35:53.197118: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name            | Type       | Params
-----------------------------------------------
0 | loss            | MASE       | 0
1 | logging_metrics | ModuleList | 0
2 | net_blocks      | ModuleList | 1.6 M
-----------------------------------------------
1.6 M     Trainable params
0         Non-trainable params
1.6 M     Total params
6.375     Total estimated model params size (MB)

`Trainer.fit` stopped: `max_epochs=5` reached.

[37]:

PytorchForecastingNBeats(dataset_params={'max_encoder_length': 3},
                         trainer_params={'limit_train_batches': 50,
                                         'max_epochs': 5})

Please rerun this cell to show the HTML repr or trust the notebook.

然后我们可以在 y_test 上进行预测，这个 y_test 不包含在 y_train 中。

[38]:

y_pred = model.predict(y=y_test, fh=fh)
y_pred

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

[38]:

		c0
h0	time
h0_90	2000-01-11	3.747893
	2000-01-12	3.716210
	2000-01-13	3.680586
h0_91	2000-01-11	4.273325
	2000-01-12	4.335873
	2000-01-13	4.329363
h0_92	2000-01-11	3.879993
	2000-01-12	3.900126
	2000-01-13	3.889120
h0_93	2000-01-11	4.310719
	2000-01-12	4.048112
	2000-01-13	3.862309
h0_94	2000-01-11	3.713766
	2000-01-12	3.528116
	2000-01-13	3.399841
h0_95	2000-01-11	4.167168
	2000-01-12	4.198434
	2000-01-13	4.184916
h0_96	2000-01-11	3.765751
	2000-01-12	3.565860
	2000-01-13	3.430732
h0_97	2000-01-11	3.898326
	2000-01-12	3.843762
	2000-01-13	3.793458
h0_98	2000-01-11	3.939527
	2000-01-12	3.951651
	2000-01-13	3.939065
h0_99	2000-01-11	3.916081
	2000-01-12	3.825488
	2000-01-13	3.755118

y_pred 具有与 y_test 相同的实例索引，即从 h0_90 到 h0_99。

我们可以绘制一个序列来查看结果。由于我们使用的是随机数据，并且只训练模型几个周期，因此我们不能期望太多。

[39]:

from sktime.utils.plotting import plot_series

plot_series(
    y_test.loc[("h0_99")],
    y_pred.loc[("h0_99")],
    labels=["y", "y_pred"],
)

[39]:

(<Figure size 1600x400 with 1 Axes>, <Axes: ylabel='c0'>)

../_images/examples_01c_forecasting_hierarchical_global_97_1.png

当我们使用外生数据进行预测时，我们需要将 X 和 y 传递给 predict。

X 必须包含所有历史值和要预测的时间点，而 y 应仅包含历史值，但不包含要预测的时间点。

[40]:

from sklearn.model_selection import train_test_split

from sktime.utils._testing.hierarchical import _make_hierarchical

data = _make_hierarchical(
    hierarchy_levels=(100, 1), max_timepoints=10, min_timepoints=10, n_columns=2
)
data = data.droplevel(1)
x = data["c0"].to_frame()
y = data["c1"].to_frame()
X_train, X_test, y_train, y_test = train_test_split(
    x, y, test_size=0.1, train_size=0.9, shuffle=False
)
y_test = y_test.groupby(level=0).apply(lambda x: x.droplevel(0).iloc[:-3])

X_train 和 y_train 具有相同的时间索引，从2000-01-01到2000-01-10。

然而 y_test 比 X_test 更短。

X_test 的时间索引从2000-01-01到2000-01-10，但 y_test 的时间索引仅从2000-01-01到2000-01-07。

这是因为我们不知道 y_test 在 2000-01-08 到 2000-01-10 之间的值，这些值将被预测。

[41]:

y_test

[41]:

		c1
h0	time
h0_90	2000-01-01	4.697726
	2000-01-02	4.376547
	2000-01-03	4.419372
	2000-01-04	6.045622
	2000-01-05	4.986501
...	...	...
h0_99	2000-01-03	3.633788
	2000-01-04	5.149747
	2000-01-05	5.850804
	2000-01-06	3.724629
	2000-01-07	4.151905

70 rows × 1 columns

y_test 比 X_test 短，因为 y_test 只包含历史值，而不包含要预测的时间点。

让我们初始化一个可以处理外生数据的全局预测器。

[42]:

from sktime.forecasting.pytorchforecasting import PytorchForecastingTFT

[43]:

model = PytorchForecastingTFT(
    trainer_params={
        "max_epochs": 5,  # for quick test
        "limit_train_batches": 10,  # for quick test
    },
    dataset_params={
        "max_encoder_length": 3,
    },
)

[44]:

model.fit(y=y_train, X=X_train, fh=fh)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

   | Name                               | Type                            | Params
----------------------------------------------------------------------------------------
0  | loss                               | QuantileLoss                    | 0
1  | logging_metrics                    | ModuleList                      | 0
2  | input_embeddings                   | MultiEmbedding                  | 0
3  | prescalers                         | ModuleDict                      | 32
4  | static_variable_selection          | VariableSelectionNetwork        | 0
5  | encoder_variable_selection         | VariableSelectionNetwork        | 1.2 K
6  | decoder_variable_selection         | VariableSelectionNetwork        | 528
7  | static_context_variable_selection  | GatedResidualNetwork            | 1.1 K
8  | static_context_initial_hidden_lstm | GatedResidualNetwork            | 1.1 K
9  | static_context_initial_cell_lstm   | GatedResidualNetwork            | 1.1 K
10 | static_context_enrichment          | GatedResidualNetwork            | 1.1 K
11 | lstm_encoder                       | LSTM                            | 2.2 K
12 | lstm_decoder                       | LSTM                            | 2.2 K
13 | post_lstm_gate_encoder             | GatedLinearUnit                 | 544
14 | post_lstm_add_norm_encoder         | AddNorm                         | 32
15 | static_enrichment                  | GatedResidualNetwork            | 1.4 K
16 | multihead_attn                     | InterpretableMultiHeadAttention | 676
17 | post_attn_gate_norm                | GateAddNorm                     | 576
18 | pos_wise_ff                        | GatedResidualNetwork            | 1.1 K
19 | pre_output_gate_norm               | GateAddNorm                     | 576
20 | output_layer                       | Linear                          | 119
----------------------------------------------------------------------------------------
15.5 K    Trainable params
0         Non-trainable params
15.5 K    Total params
0.062     Total estimated model params size (MB)

`Trainer.fit` stopped: `max_epochs=5` reached.

[44]:

PytorchForecastingTFT(dataset_params={'max_encoder_length': 3},
                      trainer_params={'limit_train_batches': 10,
                                      'max_epochs': 5})