.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/ensemble/plot_forest_importances_faces.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_ensemble_plot_forest_importances_faces.py: ================================================= 使用并行树森林评估像素重要性 ================================================= 本示例展示了如何使用树森林来评估图像分类任务中面部数据集的像素基于不纯度的重要性。像素越热,重要性越高。 下面的代码还说明了如何在多个作业中并行化构建和计算预测。 .. GENERATED FROM PYTHON SOURCE LINES 13-16 加载数据和模型拟合 ------------------ 首先,我们加载 olivetti faces 数据集,并将数据集限制为仅包含前五个类别。然后,我们在数据集上训练一个随机森林,并评估基于不纯度的特征重要性。该方法的一个缺点是无法在单独的测试集上进行评估。在这个例子中,我们感兴趣的是表示从完整数据集中学到的信息。此外,我们将设置用于任务的核心数量。 .. GENERATED FROM PYTHON SOURCE LINES 16-19 .. code-block:: Python from sklearn.datasets import fetch_olivetti_faces .. GENERATED FROM PYTHON SOURCE LINES 20-21 我们选择用于并行拟合森林模型的核心数量。 `-1` 表示使用所有可用核心。 .. GENERATED FROM PYTHON SOURCE LINES 21-24 .. code-block:: Python n_jobs = -1 .. GENERATED FROM PYTHON SOURCE LINES 25-26 加载人脸数据集 .. GENERATED FROM PYTHON SOURCE LINES 26-30 .. code-block:: Python data = fetch_olivetti_faces() X, y = data.data, data.target .. GENERATED FROM PYTHON SOURCE LINES 31-32 将数据集限制为5个类别。 .. GENERATED FROM PYTHON SOURCE LINES 32-37 .. code-block:: Python mask = y < 5 X = X[mask] y = y[mask] .. GENERATED FROM PYTHON SOURCE LINES 38-39 将拟合一个随机森林分类器来计算特征重要性。 .. GENERATED FROM PYTHON SOURCE LINES 39-45 .. code-block:: Python from sklearn.ensemble import RandomForestClassifier forest = RandomForestClassifier(n_estimators=750, n_jobs=n_jobs, random_state=42) forest.fit(X, y) .. raw:: html
RandomForestClassifier(n_estimators=750, n_jobs=-1, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 46-53 基于平均杂质减少(MDI)的特征重要性 --------------------------------------- 特征重要性由拟合属性 `feature_importances_` 提供,它们是通过计算每棵树内杂质减少的累积的均值和标准差得出的。 .. warning:: 基于杂质的特征重要性对于 **高基数** 特征(许多唯一值)可能会产生误导。请参阅 :ref:`permutation_importance` 作为替代方案。 .. GENERATED FROM PYTHON SOURCE LINES 53-69 .. code-block:: Python import time import matplotlib.pyplot as plt start_time = time.time() img_shape = data.images[0].shape importances = forest.feature_importances_ elapsed_time = time.time() - start_time print(f"Elapsed time to compute the importances: {elapsed_time:.3f} seconds") imp_reshaped = importances.reshape(img_shape) plt.matshow(imp_reshaped, cmap=plt.cm.hot) plt.title("Pixel importances using impurity values") plt.colorbar() plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_forest_importances_faces_001.png :alt: Pixel importances using impurity values :srcset: /auto_examples/ensemble/images/sphx_glr_plot_forest_importances_faces_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Elapsed time to compute the importances: 0.133 seconds .. GENERATED FROM PYTHON SOURCE LINES 70-71 你还能认出一张脸吗? .. GENERATED FROM PYTHON SOURCE LINES 74-80 MDI 的局限性对于此数据集来说不是问题,因为: 1. 所有特征都是(有序的)数值型,因此不会受到基数偏差的影响。 2. 我们只对表示在训练集上获得的森林知识感兴趣。 如果这两个条件不满足,建议改用 :func:`~sklearn.inspection.permutation_importance` 。 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.819 seconds) .. _sphx_glr_download_auto_examples_ensemble_plot_forest_importances_faces.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/ensemble/plot_forest_importances_faces.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_forest_importances_faces.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_forest_importances_faces.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_forest_importances_faces.zip ` .. include:: plot_forest_importances_faces.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_