MultinomialNB#

class sklearn.naive_bayes.MultinomialNB(*, alpha=1.0, force_alpha=True, fit_prior=True, class_prior=None)#

朴素贝叶斯分类器用于多项式模型。

多项式朴素贝叶斯分类器适用于具有离散特征的分类（例如，用于文本分类的词频）。多项式分布通常需要整数特征计数。然而，在实践中，分数计数如tf-idf也可能有效。

更多信息请参阅用户指南。

Parameters:

alphafloat 或 array-like of shape (n_features,), 默认=1.0: 加性（拉普拉斯/利德斯通）平滑参数（设置 alpha=0 并 force_alpha=True，以禁用平滑）。
force_alphabool, 默认=True: 如果为 False 且 alpha 小于 1e-10，则会将 alpha 设置为 1e-10。如果为 True，alpha 将保持不变。这可能会导致数值错误，如果 alpha 太接近 0。

Added in version 1.2.

Changed in version 1.4: force_alpha 的默认值更改为 True 。
fit_priorbool, 默认=True: 是否学习类先验概率。如果为 false，将使用均匀先验。
class_priorarray-like of shape (n_classes,), 默认=None: 类的先验概率。如果指定，先验将不会根据数据进行调整。

Attributes:

class_count_ndarray of shape (n_classes,): 在拟合过程中遇到的每个类的样本数量。此值在提供样本权重时会按权重计算。
class_log_prior_ndarray of shape (n_classes,): 每个类的平滑经验对数概率。
classes_ndarray of shape (n_classes,): 分类器已知的类标签
feature_count_ndarray of shape (n_classes, n_features): 在拟合过程中遇到的每个（类，特征）的样本数量。此值在提供样本权重时会按权重计算。
feature_log_prob_ndarray of shape (n_classes, n_features): 给定类的特征的经验对数概率， P(x_i|y) 。
n_features_in_int: 在 fit 过程中看到的特征数量。

Added in version 0.24.
feature_names_in_ndarray of shape ( n_features_in_ ,): 在 fit 过程中看到的特征名称。仅当 X 的特征名称均为字符串时定义。

Added in version 1.0.

See also

BernoulliNB: 用于多元伯努利模型的朴素贝叶斯分类器。
CategoricalNB: 用于分类特征的朴素贝叶斯分类器。
ComplementNB: 补码朴素贝叶斯分类器。
GaussianNB: 高斯朴素贝叶斯。

References

C.D. Manning, P. Raghavan 和 H. Schuetze (2008)。信息检索导论。剑桥大学出版社，第234-265页。 https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

Examples

>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB()
>>> clf.fit(X, y)
MultinomialNB()
>>> print(clf.predict(X[2:3]))
[3]

fit(X, y, sample_weight=None)#

拟合朴素贝叶斯分类器根据 X, y。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 训练向量，其中 n_samples 是样本的数量， n_features 是特征的数量。
yarray-like，形状为 (n_samples,): 目标值。
sample_weightarray-like，形状为 (n_samples,)，默认=None: 应用于单个样本的权重（1. 表示未加权）。

Returns:

selfobject: 返回实例本身。

get_metadata_routing()#

获取此对象的元数据路由。

请查看用户指南以了解路由机制的工作原理。

Returns:

routingMetadataRequest: MetadataRequest 封装的路由信息。

get_params(deep=True)#

获取此估计器的参数。

Parameters:

deepbool, 默认=True: 如果为True，将返回此估计器和包含的子对象（也是估计器）的参数。

Returns:

paramsdict: 参数名称映射到它们的值。

partial_fit(X, y, classes=None, sample_weight=None)#

增量拟合一批样本。

该方法预计会连续调用多次，对数据集的不同块进行处理，以实现核外或在线学习。

当整个数据集太大而无法一次性装入内存时，这特别有用。

该方法有一些性能开销，因此最好在尽可能大的数据块上调用partial_fit（只要在内存预算内）以隐藏开销。

Parameters:

X{array-like, sparse matrix}，形状为(n_samples, n_features)

训练向量，其中 n_samples 是样本数量， n_features 是特征数量。

y形状为(n_samples,)的array-like

目标值。

classes形状为(n_classes,)的array-like，默认=None

可能出现在y向量中的所有类的列表。

必须在第一次调用partial_fit时提供，后续调用中可以省略。

sample_weight形状为(n_samples,)的array-like，默认=None

应用于单个样本的权重（未加权为1.）。

Returns:

selfobject: 返回实例本身。

predict(X)#

执行对测试向量数组X的分类。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 输入样本。

Returns:

C形状为 (n_samples,) 的 ndarray: X的预测目标值。

predict_joint_log_proba(X)#

返回测试向量X的联合对数概率估计。

对于X的每一行x和类别y，联合对数概率由以下公式给出: log P(x, y) = log P(y) + log P(x|y),

其中 log P(y) 是类别先验概率， log P(x|y) 是类别条件概率。

Parameters:

X形状为(n_samples, n_features)的类数组: 输入样本。

Returns:

C形状为(n_samples, n_classes)的ndarray: 返回模型中每个类别的样本的联合对数概率。列对应于按属性:term:classes_ 中出现的顺序排序的类别。

predict_log_proba(X)#

返回测试向量 X 的对数概率估计。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 输入样本。

Returns:

C形状为 (n_samples, n_classes) 的类数组: 返回模型中每个类别的样本的对数概率。列对应于按排序顺序出现的类别，如属性 classes_ 中所示。

predict_proba(X)#

返回测试向量X的概率估计。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 输入样本。

Returns:

C形状为 (n_samples, n_classes) 的类数组: 返回模型中每个类别的样本概率。列对应于按排序顺序出现的类别，如属性 classes_ 中所示。

score(X, y, sample_weight=None)#

返回给定测试数据和标签的平均准确率。

在多标签分类中，这是子集准确率，这是一个严格的指标，因为你要求每个样本的每个标签集都被正确预测。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 测试样本。
y形状为 (n_samples,) 或 (n_samples, n_outputs) 的类数组: ` X`的真实标签。
sample_weight形状为 (n_samples,) 的类数组，默认=None: 样本权重。

Returns:

scorefloat: self.predict(X) 相对于 y 的平均准确率。

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MultinomialNB#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to fit .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit .

Returns:

selfobject: The updated object.

set_params(**params)#

设置此估计器的参数。

该方法适用于简单估计器以及嵌套对象（例如 Pipeline ）。后者具有形式为 <component>__<parameter> 的参数，以便可以更新嵌套对象的每个组件。

Parameters:

**paramsdict: 估计器参数。

Returns:

selfestimator instance: 估计器实例。

set_partial_fit_request(*, classes: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') → MultinomialNB#

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to partial_fit .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

classesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for classes parameter in partial_fit .
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in partial_fit .

Returns:

selfobject: The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MultinomialNB#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to score .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score .

Returns:

selfobject: The updated object.

Gallery examples#

文本文档的外存分类