compute_sample_weight#

sklearn.utils.class_weight.compute_sample_weight(class_weight, y, *, indices=None)#

估计不平衡数据集的样本权重。

Parameters:

class_weightdict, list of dicts, “balanced”, 或 None

与类相关的权重，格式为 {类标签: 权重} 。如果未提供，则所有类别的权重均为一。对于多输出问题，可以按 y 列的顺序提供字典列表。

请注意，对于多输出（包括多标签），权重应为每个列的每个类定义在其自己的字典中。例如，对于四类多标签分类，权重应为 [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] 而不是 [{1:1}, {2:5}, {3:1}, {4:1}] 。

"balanced" 模式使用 y 的值自动调整权重，与输入数据中的类别频率成反比： n_samples / (n_classes * np.bincount(y)) 。

对于多输出，每个 y 列的权重将被乘以。

y{array-like, sparse matrix} of shape (n_samples,) 或 (n_samples, n_outputs)

每个样本的原始类别标签数组。

indicesarray-like of shape (n_subsample,), 默认=None

用于子样本的索引数组。可以是长度小于 n_samples 的子样本，或等于 n_samples 的带有重复索引的自举子样本。如果为 None ，样本权重将在整个样本上计算。如果提供了此参数，则仅支持 class_weight 的 "balanced" 。

Returns:

sample_weight_vectndarray of shape (n_samples,): 应用于原始 y 的样本权重数组。

Examples

>>> from sklearn.utils.class_weight import compute_sample_weight
>>> y = [1, 1, 1, 1, 0, 0]
>>> compute_sample_weight(class_weight="balanced", y=y)
array([0.75, 0.75, 0.75, 0.75, 1.5 , 1.5 ])