OneClassSVM#

class sklearn.svm.OneClassSVM(*, kernel='rbf', degree=3, gamma='scale', coef0=0.0, tol=0.001, nu=0.5, shrinking=True, cache_size=200, verbose=False, max_iter=-1)#

无监督异常检测。

估计高维分布的支持度。

该实现基于libsvm。

更多信息请参阅用户指南。

Parameters:

kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} or callable, default=’rbf’

指定在算法中使用的核类型。如果没有给出，将使用’rbf’。如果给出一个可调用对象，它将用于预计算核矩阵。

degreeint, default=3

多项式核函数的次数(‘poly’)。必须是非负的。被所有其他核忽略。

gamma{‘scale’, ‘auto’} or float, default=’scale’

‘rbf’、’poly’和’sigmoid’的核系数。

如果传递 gamma='scale' （默认），则使用 1 / (n_features * X.var()) 作为 gamma 的值，
如果 ‘auto’，使用 1 / n_features
如果是浮点数，必须是非负的。

Changed in version 0.22: gamma 的默认值从 ‘auto’ 改为 ‘scale’。

coef0float, default=0.0

核函数中的独立项。仅在’poly’和’sigmoid’中显著。

tolfloat, default=1e-3

停止准则的容差。

nufloat, default=0.5

训练误差的上界和支持向量分数的下界。应在区间 (0, 1] 内。默认取 0.5。

shrinkingbool, default=True

是否使用收缩启发式。请参阅用户指南。

cache_sizefloat, default=200

指定核缓存的大小（以MB为单位）。

verbosebool, default=False

启用详细输出。请注意，此设置利用了libsvm中的每个进程运行时设置，如果在启用状态下，可能无法在多线程上下文中正常工作。

max_iterint, default=-1

在求解器中迭代的硬限制，或-1表示无限制。

Attributes:

coef_ndarray of shape (1, n_features): 权重分配给特征当 kernel="linear" 时。
dual_coef_ndarray of shape (1, n_SV): 决策函数中支持向量的系数。
fit_status_int: 如果正确拟合，则为0，否则为1（将引发警告）。
intercept_ndarray of shape (1,): 决策函数中的常数。
n_features_in_int: 在 fit 期间看到的特征数量。

Added in version 0.24.
feature_names_in_ndarray of shape ( n_features_in_ ,): 在 fit 期间看到的特征名称。仅当 X 的特征名称均为字符串时定义。

Added in version 1.0.
n_iter_int: 优化例程运行以拟合模型的迭代次数。

Added in version 1.1.
n_support_ndarray of shape (n_classes,), dtype=int32: 每个类别的支持向量数量。
offset_float: 用于定义从原始分数到决策函数的偏移量。我们有关系：decision_function = score_samples - offset_ 。偏移量是 intercept_ 的相反数，并提供与其他异常检测算法的兼容性。

Added in version 0.20.
shape_fit_tuple of int of shape (n_dimensions_of_X,): 训练向量 X 的数组维度。
support_ndarray of shape (n_SV,): 支持向量的索引。
support_vectors_ndarray of shape (n_SV, n_features): 支持向量。

See also

sklearn.linear_model.SGDOneClassSVM: 使用随机梯度下降求解线性单类SVM。
sklearn.neighbors.LocalOutlierFactor: 使用局部异常因子（LOF）进行无监督异常检测。
sklearn.ensemble.IsolationForest: 隔离森林算法。

Examples

>>> from sklearn.svm import OneClassSVM
>>> X = [[0], [0.44], [0.45], [0.46], [1]]
>>> clf = OneClassSVM(gamma='auto').fit(X)
>>> clf.predict(X)
array([-1,  1,  1,  1, -1])
>>> clf.score_samples(X)
array([1.7798..., 2.0547..., 2.0556..., 2.0561..., 1.7332...])

property coef_#

权重分配给特征当 kernel="linear" 时。

Returns:

形状为 (n_features, n_classes) 的 ndarray

decision_function(X)#

Signed distance to the separating hyperplane.

Signed distance is positive for an inlier and negative for an outlier.

Parameters:

Xarray-like of shape (n_samples, n_features): 数据矩阵。

Returns:

decndarray of shape (n_samples,): 返回样本的决策函数。

fit(X, y=None, sample_weight=None)#

检测样本集X的软边界。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 样本集，其中 n_samples 是样本数量， n_features 是特征数量。
y忽略: 未使用，为了API一致性而存在。
sample_weightarray-like，形状为 (n_samples,)，默认=None: 每个样本的权重。按样本重新调整C。更高的权重会迫使分类器更加重视这些点。

Returns:

selfobject: 拟合的估计器。

Notes

如果X不是C-ordered的连续数组，则会被复制。

fit_predict(X, y=None, **kwargs)#

对X进行拟合并返回X的标签。

对于异常值返回-1，对于正常值返回1。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features): 输入样本。
y忽略: 未使用，为了API一致性而存在。
**kwargsdict: 传递给 fit 的参数。

Added in version 1.4.

Returns:

yndarray，形状为 (n_samples,): 正常值为1，异常值为-1。

get_metadata_routing()#

获取此对象的元数据路由。

请查看用户指南以了解路由机制的工作原理。

Returns:

routingMetadataRequest: MetadataRequest 封装的路由信息。

get_params(deep=True)#

获取此估计器的参数。

Parameters:

deepbool, 默认=True: 如果为True，将返回此估计器和包含的子对象（也是估计器）的参数。

Returns:

paramsdict: 参数名称映射到它们的值。

property n_support_#: 每个类别的支持向量数量。

predict(X)#

执行对X中样本的分类。

对于单类模型，返回+1或-1。

Parameters:

X{array-like, sparse matrix}，形状为 (n_samples, n_features) 或 (n_samples_test, n_samples_train): 对于 kernel=”precomputed”，X的预期形状为 (n_samples_test, n_samples_train)。

Returns:

y_predndarray，形状为 (n_samples,): 样本在X中的类别标签。

score_samples(X)#

原始样本的评分函数。

Parameters:

X形状为 (n_samples, n_features) 的类数组: 数据矩阵。

Returns:

score_samples形状为 (n_samples,) 的 ndarray: 返回样本的（未偏移的）评分函数。

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → OneClassSVM#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config ). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True : metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False : metadata is not requested and the meta-estimator will not pass it to fit .
None : metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str : metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default ( sklearn.utils.metadata_routing.UNCHANGED ) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline . Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit .

Returns:

selfobject: The updated object.

set_params(**params)#

设置此估计器的参数。

该方法适用于简单估计器以及嵌套对象（例如 Pipeline ）。后者具有形式为 <component>__<parameter> 的参数，以便可以更新嵌套对象的每个组件。

Parameters:

**paramsdict: 估计器参数。

Returns:

selfestimator instance: 估计器实例。

Gallery examples#

物种分布建模

真实数据集上的异常值检测

单类支持向量机与使用随机梯度下降的单类支持向量机

比较用于异常检测的算法在玩具数据集上的表现

使用非线性核（RBF）的单类SVM