ClassifierChain#

class sklearn.multioutput.ClassifierChain(base_estimator, *, order=None, cv=None, chain_method='predict', random_state=None, verbose=False)#

一个将二元分类器排列成链的多标签模型。

每个模型按照链中指定的顺序进行预测，使用提供给模型的所有可用特征以及链中较早模型的预测结果。

有关如何使用 ClassifierChain 及其集成优势的示例，请参见 ClassifierChain 在酵母数据集上的示例。

更多信息请参见用户指南。

Added in version 0.19.

Parameters:

base_estimatorestimator

构建分类器链的基础估计器。

orderarray-like of shape (n_outputs,) or ‘random’, default=None

如果为 None ，顺序将由标签矩阵 Y 中的列顺序决定。:

order = [0, 1, 2, ..., Y.shape[1] - 1]

可以通过提供整数列表显式设置链的顺序。例如，对于长度为 5 的链。:

order = [1, 3, 2, 4, 0]

意味着链中的第一个模型将对 Y 矩阵中的第 1 列进行预测，第二个模型将对第 3 列进行预测，等等。

如果 order 为 random ，将使用随机顺序。

cvint, cross-validation generator or an iterable, default=None

确定是否使用交叉验证预测或真实标签作为链中前一个估计器的结果。 cv 的可能输入包括：

None，使用真实标签进行拟合，
整数，指定 (Stratified)KFold 中的折数，
CV splitter ，
一个迭代器，产生 (train, test) 分割作为索引数组。

chain_method{‘predict’, ‘predict_proba’, ‘predict_log_proba’, ‘decision_function’} or list of such str’s, default=’predict’

链中估计器用于前一个估计器预测特征的预测方法。

如果为 str ，方法名称；
如果为 str 列表，按优先顺序提供方法名称。使用的方法对应于 base_estimator 实现的第一种方法。

Added in version 1.5.

random_stateint, RandomState instance or None, optional (default=None)

如果 order='random' ，确定链顺序的随机数生成。此外，它控制每个 base_estimator 在每次链迭代时给出的随机种子。因此，仅当 base_estimator 暴露 random_state 时使用。传递一个整数以在多次函数调用中重现输出。请参见 Glossary 。

verbosebool, default=False

如果为 True，在每个模型完成时输出链进度。

Added in version 1.2.

Attributes:

classes_list: 包含链中每个估计器类标签的列表，长度为 len(estimators_) 。
estimators_list: base_estimator 的克隆列表。
order_list: 分类器链中的标签顺序。
chain_method_str: 链中估计器用于预测特征的预测方法。
n_features_in_int: 在 fit 期间看到的特征数量。仅当基础 base_estimator 在拟合时暴露此类属性时定义。

Added in version 0.24.
feature_names_in_ndarray of shape ( n_features_in_ ,): 在 fit 期间看到的特征名称。仅当 X 中的特征名称均为字符串时定义。

Added in version 1.0.

See also

RegressorChain: 回归等效项。
MultiOutputClassifier: 独立分类每个输出，而不是链式处理。

References

Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank, “Classifier Chains for Multi-label Classification”, 2009.

Examples

>>> from sklearn.datasets import make_multilabel_classification
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.multioutput import ClassifierChain
>>> X, Y = make_multilabel_classification(
...    n_samples=12, n_classes=3, random_state=0
... )
>>> X_train, X_test, Y_train, Y_test = train_test_split(
...    X, Y, random_state=0
... )
>>> base_lr = LogisticRegression(solver='lbfgs', random_state=0)
>>> chain = ClassifierChain(base_lr, order='random', random_state=0)
>>> chain.fit(X_train, Y_train).predict(X_test)
array([[1., 1., 0.],
       [1., 0., 0.],
       [0., 1., 0.]])
>>> chain.predict_proba(X_test)
array([[0.8387..., 0.9431..., 0.4576...],
       [0.8878..., 0.3684..., 0.2640...],
       [0.0321..., 0.9935..., 0.0626...]])

decision_function(X)#

评估链中模型的decision_function。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 输入数据。

Returns:

Y_decision形状为 (n_samples, n_classes) 的类数组: 返回链中每个模型对样本的决策函数。

fit(X, Y, **fit_params)#

拟合模型到数据矩阵X和目标Y。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features)

输入数据。

Yarray-like，形状为 (n_samples, n_classes)

目标值。

**fit_paramsdict of string -> object

传递给每一步 fit 方法的参数。

仅在 enable_metadata_routing=True 时可用。请参阅用户指南。

Added in version 1.3.

Returns:

selfobject: 类实例。

get_metadata_routing()#

获取此对象的元数据路由。

请查看用户指南以了解路由机制的工作原理。

Added in version 1.3.

Returns:

routingMetadataRouter: MetadataRouter 封装的路由信息。

get_params(deep=True)#

获取此估计器的参数。

Parameters:

deepbool, 默认=True: 如果为True，将返回此估计器和包含的子对象（也是估计器）的参数。

Returns:

paramsdict: 参数名称映射到它们的值。

predict(X)#

使用ClassifierChain模型对数据矩阵X进行预测。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 输入数据。

Returns:

Y_predarray-like，形状为 (n_samples, n_classes): 预测值。

predict_log_proba(X)#

预测概率估计的对数。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 输入数据。

Returns:

Y_log_probarray-like，形状为 (n_samples, n_classes): 预测的概率的对数。

predict_proba(X)#

预测概率估计。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 输入数据。

Returns:

Y_probarray-like，形状为 (n_samples, n_classes): 预测的概率。

score(X, y, sample_weight=None)#

返回给定测试数据和标签的平均准确率。

在多标签分类中，这是子集准确率，这是一个严格的指标，因为你要求每个样本的每个标签集都被正确预测。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 测试样本。
y形状为 (n_samples,) 或 (n_samples, n_outputs) 的类数组: ` X`的真实标签。
sample_weight形状为 (n_samples,) 的类数组，默认=None: 样本权重。

Returns:

scorefloat: self.predict(X) 相对于 y 的平均准确率。

set_params(**params)#

设置此估计器的参数。

该方法适用于简单估计器以及嵌套对象（例如 Pipeline ）。后者具有形式为 <component>__<parameter> 的参数，以便可以更新嵌套对象的每个组件。

Parameters:

**paramsdict: 估计器参数。

Returns:

selfestimator instance: 估计器实例。

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → ClassifierChain#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to score .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score .

Returns:

selfobject: The updated object.

Gallery examples#

使用分类器链进行多标签分类