StackingRegressor#

class sklearn.ensemble.StackingRegressor(estimators, final_estimator=None, *, cv=None, n_jobs=None, passthrough=False, verbose=0)#

堆叠估计器与最终回归器。

堆叠泛化包括将单个估计器的输出堆叠起来，并使用回归器计算最终预测。堆叠允许通过将每个估计器的输出作为最终估计器的输入来利用每个估计器的优势。

请注意， estimators_ 是在完整的 X 上拟合的，而 final_estimator_ 是使用交叉验证预测的基估计器进行训练的，使用 cross_val_predict 。

更多信息请参阅用户指南。

Added in version 0.22.

Parameters:

estimatorslist of (str, estimator)

将堆叠在一起的基估计器。列表中的每个元素定义为一个字符串（即名称）和估计器实例的元组。可以使用 set_params 将估计器设置为 ‘drop’。

final_estimatorestimator, default=None

将用于组合基估计器的回归器。默认回归器是 RidgeCV 。

cvint, cross-validation generator, iterable, or “prefit”, default=None

确定在 cross_val_predict 中使用的交叉验证分割策略，以训练 final_estimator 。cv 的可能输入包括：

None，使用默认的 5 折交叉验证，
整数，指定 (Stratified) KFold 中的折数，
用作交叉验证生成器的对象，
产生训练、测试分割的可迭代对象。
“prefit” 表示假设 estimators 已经预先拟合，并跳过交叉验证

对于整数/None 输入，如果估计器是分类器且 y 是二分类或多分类，则使用 StratifiedKFold 。在所有其他情况下，使用 KFold 。这些分割器使用 shuffle=False 实例化，因此分割在多次调用中将保持一致。

请参阅用户指南以了解可以在此处使用的各种交叉验证策略。

如果传递 “prefit”，则假设所有 estimators 已经拟合。 final_estimator_ 在完整训练集上的 estimators 预测上进行训练，并且不进行交叉验证预测。请注意，如果模型已经在训练堆叠模型的相同数据上进行了训练，则存在非常高的过拟合风险。

Added in version 1.1: ‘prefit’ 选项在 1.1 中添加

Note

如果训练样本的数量足够大，增加分割的数量将不会带来好处。实际上，训练时间会增加。 cv 不用于模型评估，而是用于预测。

n_jobsint, default=None

所有 estimators 的 fit 过程中并行运行的作业数。 None 表示 1，除非在 joblib.parallel_backend 上下文中。-1 表示使用所有处理器。详见术语表。

passthroughbool, default=False

当 False 时，仅使用估计器的预测作为 final_estimator 的训练数据。当 True 时， final_estimator 在预测以及原始训练数据上进行训练。

verboseint, default=0

详细级别。

Attributes:

estimators_list of estimator: estimators 参数的元素，已经在训练数据上拟合。如果估计器被设置为 'drop' ，它将不会出现在 estimators_ 中。当 cv="prefit" 时， estimators_ 设置为 estimators 并且不再拟合。
named_estimators_Bunch: 属性，用于按名称访问任何拟合的子估计器。
n_features_in_int: 特征数量在:term:fit 期间被看到。
feature_names_in_ndarray of shape ( n_features_in_ ,): 在 fit 期间看到的特征名称。仅当底层估计器在拟合时暴露此类属性时才定义。

Added in version 1.0.
final_estimator_estimator: 拟合的基估计器堆叠的回归器。
stack_method_list of str: 每个基估计器使用的方法。

See also

StackingClassifier: 堆叠估计器与最终分类器。

References

[1]

Wolpert, David H. “Stacked generalization.” Neural networks 5.2 (1992): 241-259.

Examples

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.linear_model import RidgeCV
>>> from sklearn.svm import LinearSVR
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.ensemble import StackingRegressor
>>> X, y = load_diabetes(return_X_y=True)
>>> estimators = [
...     ('lr', RidgeCV()),
...     ('svr', LinearSVR(random_state=42))
... ]
>>> reg = StackingRegressor(
...     estimators=estimators,
...     final_estimator=RandomForestRegressor(n_estimators=10,
...                                           random_state=42)
... )
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=42
... )
>>> reg.fit(X_train, y_train).score(X_test, y_test)
0.3...

fit(X, y, *, sample_weight=None, **fit_params)#

拟合估计器。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 训练向量，其中 n_samples 是样本数量， n_features 是特征数量。
yarray-like，形状为 (n_samples,): 目标值。
sample_weightarray-like，形状为 (n_samples,)，默认=None: 样本权重。如果为 None，则样本权重相等。请注意，这仅在所有底层估计器都支持样本权重时才受支持。
**fit_paramsdict: 传递给底层估计器的参数。

Added in version 1.6: 仅在 enable_metadata_routing=True 时可用，可以通过使用 sklearn.set_config(enable_metadata_routing=True) 设置。有关更多详细信息，请参阅 Metadata Routing 用户指南。

Returns:

selfobject: 返回一个已拟合的实例。

fit_transform(X, y, *, sample_weight=None, **fit_params)#

拟合估计器并返回每个估计器对X的预测。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 训练向量，其中 n_samples 是样本数量， n_features 是特征数量。
y形状为 (n_samples,) 的 array-like: 目标值。
sample_weight形状为 (n_samples,) 的 array-like，默认=None: 样本权重。如果为 None，则样本等权重。请注意，这仅在所有底层估计器都支持样本权重时才受支持。
**fit_paramsdict: 传递给底层估计器的参数。

Added in version 1.6: 仅在 enable_metadata_routing=True 时可用，可以通过使用 sklearn.set_config(enable_metadata_routing=True) 设置。有关更多详细信息，请参阅 Metadata Routing User Guide 。

Returns:

y_preds形状为 (n_samples, n_estimators) 的 ndarray: 每个估计器的预测输出。

get_feature_names_out(input_features=None)#

获取变换后的输出特征名称。

Parameters:

input_features字符串数组或None，默认=None

输入特征。只有在 passthrough 为 True 时，才使用输入特征名称。

如果 input_features 为 None ，则使用 feature_names_in_ 作为输入特征名称。如果 feature_names_in_ 未定义，则生成名称： [x0, x1, ..., x(n_features_in_ - 1)] 。
如果 input_features 是数组类型，则 input_features 必须与 feature_names_in_ 匹配（如果 feature_names_in_ 已定义）。

如果 passthrough 为 False ，则仅使用 estimators 的名称来生成输出特征名称。

Returns:

feature_names_out字符串对象的ndarray: 变换后的特征名称。

get_metadata_routing()#

获取此对象的元数据路由。

请查看用户指南以了解路由机制的工作原理。

Added in version 1.6.

Returns:

routingMetadataRouter: MetadataRouter 封装的路由信息。

get_params(deep=True)#

获取集成估计器的参数。

返回在构造函数中给定的参数以及 estimators 参数中包含的估计器。

Parameters:

deepbool, default=True: 设置为True时，获取各种估计器及其参数。

Returns:

paramsdict: 参数和估计器名称映射到它们的值，或参数名称映射到它们的值。

property n_features_in_#: 特征数量在:term:fit 期间被看到。

property named_estimators#

字典，用于按名称访问任何拟合的子估计器。

Returns:

Bunch

predict(X, **predict_params)#

预测X的目标。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features)

训练向量，其中 n_samples 是样本数量， n_features 是特征数量。

**predict_paramsdict of str -> obj

传递给 final_estimator 的 predict 方法的参数。注意，这可能用于从某些估计器返回不确定性，使用 return_std 或 return_cov 。请注意，它只会考虑最终估计器的不确定性。

如果 enable_metadata_routing=False （默认）：参数直接传递给 final_estimator 的 predict 方法。
如果 enable_metadata_routing=True ：参数安全地路由到 final_estimator 的 predict 方法。有关更多详细信息，请参阅 Metadata Routing User Guide 。

Changed in version 1.6: **predict_params 可以通过元数据路由 API 进行路由。

Returns:

y_predndarray，形状为 (n_samples,) 或 (n_samples, n_output): 预测的目标。

score(X, y, sample_weight=None)#

返回预测的决定系数。

决定系数 $R^2$ 定义为 $(1 - rac{u}{v})$ ，其中 $u$ 是残差平方和 ((y_true - y_pred)** 2).sum() ，而 $v$ 是总平方和 ((y_true - y_true.mean()) ** 2).sum() 。最好的可能得分是 1.0，它可能是负的（因为模型可能任意地差）。一个总是预测 y 的期望值的常数模型，忽略输入特征，将得到 $R^2$ 得分为 0.0。

Parameters:

Xarray-like of shape (n_samples, n_features): 测试样本。对于某些估计器，这可能是一个预计算的核矩阵或一个形状为 (n_samples, n_samples_fitted) 的通用对象列表，其中 n_samples_fitted 是估计器拟合中使用的样本数量。
yarray-like of shape (n_samples,) or (n_samples, n_outputs): X 的真实值。
sample_weightarray-like of shape (n_samples,), default=None: 样本权重。

Returns:

scorefloat: $R^2$ 相对于 y 的 self.predict(X) 。

Notes

在调用回归器的 score 时使用的 $R^2$ 得分从 0.23 版本开始使用 multioutput='uniform_average' 以保持与 r2_score 默认值一致。这影响了所有多输出回归器的 score 方法（除了 MultiOutputRegressor ）。

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → StackingRegressor#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to fit .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit .

Returns:

selfobject: The updated object.

set_output(*, transform=None)#

设置输出容器。

请参阅介绍 set_output API 以了解如何使用API的示例。

Parameters:

transform{“default”, “pandas”, “polars”}, 默认=None

配置 transform 和 fit_transform 的输出。

"default" : 转换器的默认输出格式
"pandas" : DataFrame 输出
"polars" : Polars 输出
None : 转换配置不变

Added in version 1.4: "polars" 选项已添加。

Returns:

self估计器实例: 估计器实例。

set_params(**params)#

设置集成估计器的参数。

有效的参数键可以通过 get_params() 列出。请注意，您可以直接设置 estimators 中包含的估计器的参数。

Parameters:

**params关键字参数: 使用例如 set_params(parameter_name=new_value) 设置特定参数。此外，除了设置估计器的参数外，还可以设置或通过将它们设置为 ‘drop’ 来移除估计器中的单个估计器。

Returns:

self对象: 估计器实例。

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → StackingRegressor#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to score .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score .

Returns:

selfobject: The updated object.

transform(X)#

返回每个估计器对X的预测结果。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 训练向量，其中 n_samples 是样本数量， n_features 是特征数量。

Returns:

y_preds形状为 (n_samples, n_estimators) 的 ndarray: 每个估计器的预测输出。

Gallery examples#

使用堆叠方法结合预测器