.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/ensemble/plot_isolation_forest.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_ensemble_plot_isolation_forest.py: ======================= IsolationForest 示例 ======================= 使用 :class:`~sklearn.ensemble.IsolationForest` 进行异常检测的示例。 :ref:`isolation_forest` 是由“隔离树”组成的集成方法,通过递归随机划分来“隔离”观测值,这可以用树结构表示。隔离一个样本所需的分割次数对于异常值较少,而对于正常值较多。 在本示例中,我们演示了两种可视化在玩具数据集上训练的 Isolation Forest 决策边界的方法。 .. GENERATED FROM PYTHON SOURCE LINES 15-21 数据生成 --------------- 我们通过随机采样标准正态分布(由 :func:`numpy.random.randn` 返回)生成两个簇(每个簇包含 `n_samples` )。其中一个是球形的,另一个是稍微变形的。 为了与 :class:`~sklearn.ensemble.IsolationForest` 的标注保持一致,内点(即高斯簇)被赋予真实标签 `1` ,而离群点(使用 :func:`numpy.random.uniform` 创建)被赋予标签 `-1` 。 .. GENERATED FROM PYTHON SOURCE LINES 21-40 .. code-block:: Python import numpy as np from sklearn.model_selection import train_test_split n_samples, n_outliers = 120, 40 rng = np.random.RandomState(0) covariance = np.array([[0.5, -0.1], [0.7, 0.4]]) cluster_1 = 0.4 * rng.randn(n_samples, 2) @ covariance + np.array([2, 2]) # general cluster_2 = 0.3 * rng.randn(n_samples, 2) + np.array([-2, -2]) # spherical outliers = rng.uniform(low=-4, high=4, size=(n_outliers, 2)) X = np.concatenate([cluster_1, cluster_2, outliers]) y = np.concatenate( [np.ones((2 * n_samples), dtype=int), -np.ones((n_outliers), dtype=int)] ) X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42) .. GENERATED FROM PYTHON SOURCE LINES 41-42 我们可以将结果聚类可视化: .. GENERATED FROM PYTHON SOURCE LINES 42-53 .. code-block:: Python import matplotlib.pyplot as plt scatter = plt.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor="k") handles, labels = scatter.legend_elements() plt.axis("square") plt.legend(handles=handles, labels=["outliers", "inliers"], title="true class") plt.title("Gaussian inliers with \nuniformly distributed outliers") plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_001.png :alt: Gaussian inliers with uniformly distributed outliers :srcset: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 54-56 模型训练 -------- .. GENERATED FROM PYTHON SOURCE LINES 56-63 .. code-block:: Python from sklearn.ensemble import IsolationForest clf = IsolationForest(max_samples=100, random_state=0) clf.fit(X_train) .. raw:: html
IsolationForest(max_samples=100, random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 64-68 绘制离散决策边界 ------------------- 我们使用类 :class:`~sklearn.inspection.DecisionBoundaryDisplay` 来可视化离散决策边界。背景颜色表示该区域内的样本是否被预测为异常点。散点图显示真实标签。 .. GENERATED FROM PYTHON SOURCE LINES 68-85 .. code-block:: Python import matplotlib.pyplot as plt from sklearn.inspection import DecisionBoundaryDisplay disp = DecisionBoundaryDisplay.from_estimator( clf, X, response_method="predict", alpha=0.5, ) disp.ax_.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor="k") disp.ax_.set_title("Binary decision boundary \nof IsolationForest") plt.axis("square") plt.legend(handles=handles, labels=["outliers", "inliers"], title="true class") plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_002.png :alt: Binary decision boundary of IsolationForest :srcset: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 86-92 绘制路径长度决策边界 ------------------------ 通过设置 `response_method="decision_function"` ,:class:`~sklearn.inspection.DecisionBoundaryDisplay` 的背景代表了观测值的正常性度量。该分数由在一片随机森林中平均的路径长度给出,而路径长度本身由隔离给定样本所需的叶子深度(或等效的分裂次数)决定。 当一片随机树组成的森林为某些特定样本生成较短的路径长度时,它们很可能是异常值,且正常性的度量接近于 `0` 。类似地,较长的路径对应于接近于 `1` 的值,并且更有可能是内点。 .. GENERATED FROM PYTHON SOURCE LINES 92-105 .. code-block:: Python disp = DecisionBoundaryDisplay.from_estimator( clf, X, response_method="decision_function", alpha=0.5, ) disp.ax_.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor="k") disp.ax_.set_title("Path length decision boundary \nof IsolationForest") plt.axis("square") plt.legend(handles=handles, labels=["outliers", "inliers"], title="true class") plt.colorbar(disp.ax_.collections[1]) plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_003.png :alt: Path length decision boundary of IsolationForest :srcset: /auto_examples/ensemble/images/sphx_glr_plot_isolation_forest_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.208 seconds) .. _sphx_glr_download_auto_examples_ensemble_plot_isolation_forest.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/ensemble/plot_isolation_forest.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_isolation_forest.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_isolation_forest.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_isolation_forest.zip ` .. include:: plot_isolation_forest.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_