ComplementNB#

class sklearn.naive_bayes.ComplementNB(*, alpha=1.0, force_alpha=True, fit_prior=True, class_prior=None, norm=False)#

The Complement Naive Bayes classifier described in Rennie et al. (2003).

The Complement Naive Bayes classifier was designed to correct the “severe assumptions” made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets.

See also

BernoulliNB: 多变量伯努利模型的朴素贝叶斯分类器。
CategoricalNB: 分类特征的朴素贝叶斯分类器。
GaussianNB: 高斯朴素贝叶斯。
MultinomialNB: 多项式模型的朴素贝叶斯分类器。

References

Rennie, J. D., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of naive bayes text classifiers. In ICML (Vol. 3, pp. 616-623). https://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

Examples

>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import ComplementNB
>>> clf = ComplementNB()
>>> clf.fit(X, y)
ComplementNB()
>>> print(clf.predict(X[2:3]))
[3]

fit(X, y, sample_weight=None)#

拟合朴素贝叶斯分类器根据 X, y。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 训练向量，其中 n_samples 是样本的数量， n_features 是特征的数量。
yarray-like，形状为 (n_samples,): 目标值。
sample_weightarray-like，形状为 (n_samples,)，默认=None: 应用于单个样本的权重（1. 表示未加权）。

Returns:

selfobject: 返回实例本身。

get_metadata_routing()#

获取此对象的元数据路由。

请查看用户指南以了解路由机制的工作原理。

Returns:

routingMetadataRequest: MetadataRequest 封装的路由信息。

get_params(deep=True)#

获取此估计器的参数。

Parameters:

deepbool, 默认=True: 如果为True，将返回此估计器和包含的子对象（也是估计器）的参数。

Returns:

paramsdict: 参数名称映射到它们的值。

partial_fit(X, y, classes=None, sample_weight=None)#

增量拟合一批样本。

该方法预计会连续调用多次，对数据集的不同块进行处理，以实现核外或在线学习。

当整个数据集太大而无法一次性装入内存时，这特别有用。

该方法有一些性能开销，因此最好在尽可能大的数据块上调用partial_fit（只要在内存预算内）以隐藏开销。

Parameters:

X{array-like, sparse matrix}，形状为(n_samples, n_features)

训练向量，其中 n_samples 是样本数量， n_features 是特征数量。

y形状为(n_samples,)的array-like

目标值。

classes形状为(n_classes,)的array-like，默认=None

可能出现在y向量中的所有类的列表。

必须在第一次调用partial_fit时提供，后续调用中可以省略。

sample_weight形状为(n_samples,)的array-like，默认=None

应用于单个样本的权重（未加权为1.）。

Returns:

selfobject: 返回实例本身。

predict(X)#

执行对测试向量数组X的分类。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 输入样本。

Returns:

C形状为 (n_samples,) 的 ndarray: X的预测目标值。

predict_joint_log_proba(X)#

返回测试向量X的联合对数概率估计。

对于X的每一行x和类别y，联合对数概率由以下公式给出: log P(x, y) = log P(y) + log P(x|y),

其中 log P(y) 是类别先验概率， log P(x|y) 是类别条件概率。

Parameters:

X形状为(n_samples, n_features)的类数组: 输入样本。

Returns:

C形状为(n_samples, n_classes)的ndarray: 返回模型中每个类别的样本的联合对数概率。列对应于按属性:term:classes_ 中出现的顺序排序的类别。

predict_log_proba(X)#

返回测试向量 X 的对数概率估计。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 输入样本。

Returns:

C形状为 (n_samples, n_classes) 的类数组: 返回模型中每个类别的样本的对数概率。列对应于按排序顺序出现的类别，如属性 classes_ 中所示。

predict_proba(X)#

返回测试向量X的概率估计。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 输入样本。

Returns:

C形状为 (n_samples, n_classes) 的类数组: 返回模型中每个类别的样本概率。列对应于按排序顺序出现的类别，如属性 classes_ 中所示。

score(X, y, sample_weight=None)#

返回给定测试数据和标签的平均准确率。

在多标签分类中，这是子集准确率，这是一个严格的指标，因为你要求每个样本的每个标签集都被正确预测。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 测试样本。
y形状为 (n_samples,) 或 (n_samples, n_outputs) 的类数组: ` X`的真实标签。
sample_weight形状为 (n_samples,) 的类数组，默认=None: 样本权重。

Returns:

scorefloat: self.predict(X) 相对于 y 的平均准确率。

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → ComplementNB#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to fit .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit .

Returns:

selfobject: The updated object.

set_params(**params)#

设置此估计器的参数。

该方法适用于简单估计器以及嵌套对象（例如 Pipeline ）。后者具有形式为 <component>__<parameter> 的参数，以便可以更新嵌套对象的每个组件。

Parameters:

**paramsdict: 估计器参数。

Returns:

selfestimator instance: 估计器实例。

set_partial_fit_request(*, classes: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') → ComplementNB#

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to partial_fit .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

classesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for classes parameter in partial_fit .
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in partial_fit .

Returns:

selfobject: The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → ComplementNB#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to score .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score .

Returns:

selfobject: The updated object.

Gallery examples#

文本特征提取和评估的示例管道

使用稀疏特征对文本文档进行分类