.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/neighbors/plot_lof_outlier_detection.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_neighbors_plot_lof_outlier_detection.py: ================================================= 使用局部离群因子(LOF)进行离群点检测 ================================================= 局部离群因子(LOF)算法是一种无监督的异常检测方法,它通过计算给定数据点相对于其邻居的局部密度偏差来检测离群点。它将那些密度显著低于其邻居的样本视为离群点。此示例展示了如何使用LOF进行离群点检测,这是scikit-learn中此估计器的默认用例。请注意,当LOF用于离群点检测时,它没有 `predict` 、 `decision_function` 和 `score_samples` 方法。有关离群点检测和新颖性检测之间的区别以及如何使用LOF进行新颖性检测的详细信息,请参见:ref:`用户指南 ` 。 考虑的邻居数量(参数 `n_neighbors` )通常设置为1)大于一个簇必须包含的最小样本数,以便其他样本可以相对于该簇成为局部离群点,以及2)小于可能成为局部离群点的最大邻近样本数。在实践中,通常无法获得此类信息,并且选择 `n_neighbors=20` 通常效果良好。 .. GENERATED FROM PYTHON SOURCE LINES 13-15 生成包含异常值的数据 ----------------------- .. GENERATED FROM PYTHON SOURCE LINES 18-31 .. code-block:: Python import numpy as np np.random.seed(42) X_inliers = 0.3 * np.random.randn(100, 2) X_inliers = np.r_[X_inliers + 2, X_inliers - 2] X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2)) X = np.r_[X_inliers, X_outliers] n_outliers = len(X_outliers) ground_truth = np.ones(len(X), dtype=int) ground_truth[-n_outliers:] = -1 .. GENERATED FROM PYTHON SOURCE LINES 32-35 拟合用于异常检测的模型(默认) 使用 `fit_predict` 来计算训练样本的预测标签(当 LOF 用于异常检测时,估计器没有 `predict` 、 `decision_function` 和 `score_samples` 方法)。 .. GENERATED FROM PYTHON SOURCE LINES 35-43 .. code-block:: Python from sklearn.neighbors import LocalOutlierFactor clf = LocalOutlierFactor(n_neighbors=20, contamination=0.1) y_pred = clf.fit_predict(X) n_errors = (y_pred != ground_truth).sum() X_scores = clf.negative_outlier_factor_ .. GENERATED FROM PYTHON SOURCE LINES 44-46 绘制结果 ------------ .. GENERATED FROM PYTHON SOURCE LINES 48-78 .. code-block:: Python import matplotlib.pyplot as plt from matplotlib.legend_handler import HandlerPathCollection def update_legend_marker_size(handle, orig): "自定义图例标记的大小" handle.update_from(orig) handle.set_sizes([20]) plt.scatter(X[:, 0], X[:, 1], color="k", s=3.0, label="Data points") # 绘制半径与异常值分数成比例的圆圈 radius = (X_scores.max() - X_scores) / (X_scores.max() - X_scores.min()) scatter = plt.scatter( X[:, 0], X[:, 1], s=1000 * radius, edgecolors="r", facecolors="none", label="Outlier scores", ) plt.axis("tight") plt.xlim((-5, 5)) plt.ylim((-5, 5)) plt.xlabel("prediction errors: %d" % (n_errors)) plt.legend( handler_map={scatter: HandlerPathCollection(update_func=update_legend_marker_size)} ) plt.title("Local Outlier Factor (LOF)") plt.show() .. image-sg:: /auto_examples/neighbors/images/sphx_glr_plot_lof_outlier_detection_001.png :alt: Local Outlier Factor (LOF) :srcset: /auto_examples/neighbors/images/sphx_glr_plot_lof_outlier_detection_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.036 seconds) .. _sphx_glr_download_auto_examples_neighbors_plot_lof_outlier_detection.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/neighbors/plot_lof_outlier_detection.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_lof_outlier_detection.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_lof_outlier_detection.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_lof_outlier_detection.zip ` .. include:: plot_lof_outlier_detection.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_