.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/feature_selection/plot_rfe_with_cross_validation.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_feature_selection_plot_rfe_with_cross_validation.py>`
        to download the full example code. or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_feature_selection_plot_rfe_with_cross_validation.py:


===================================================
带交叉验证的递归特征消除
===================================================

一个递归特征消除(RFE)的示例,通过交叉验证自动调整所选特征的数量。

.. GENERATED FROM PYTHON SOURCE LINES 11-15

数据生成
---------------

我们使用3个信息特征构建一个分类任务。引入2个额外的冗余(即相关)特征会导致所选特征因交叉验证折叠而异。其余特征是随机抽取的,因此是无信息的。

.. GENERATED FROM PYTHON SOURCE LINES 15-30

.. code-block:: Python


    from sklearn.datasets import make_classification

    X, y = make_classification(
        n_samples=500,
        n_features=15,
        n_informative=3,
        n_redundant=2,
        n_repeated=0,
        n_classes=8,
        n_clusters_per_class=1,
        class_sep=0.8,
        random_state=0,
    )








.. GENERATED FROM PYTHON SOURCE LINES 31-35

模型训练与选择
----------------------------

我们创建了RFE对象并计算了交叉验证得分。评分策略“准确性”优化了正确分类样本的比例。

.. GENERATED FROM PYTHON SOURCE LINES 35-56

.. code-block:: Python


    from sklearn.feature_selection import RFECV
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import StratifiedKFold

    min_features_to_select = 1  # Minimum number of features to consider
    clf = LogisticRegression()
    cv = StratifiedKFold(5)

    rfecv = RFECV(
        estimator=clf,
        step=1,
        cv=cv,
        scoring="accuracy",
        min_features_to_select=min_features_to_select,
        n_jobs=2,
    )
    rfecv.fit(X, y)

    print(f"Optimal number of features: {rfecv.n_features_}")





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Optimal number of features: 3




.. GENERATED FROM PYTHON SOURCE LINES 57-60

在当前情况下,具有3个特征的模型(对应于真实生成模型)被认为是最优的。

绘制特征数量与交叉验证得分的关系图

.. GENERATED FROM PYTHON SOURCE LINES 60-76

.. code-block:: Python


    import matplotlib.pyplot as plt
    import pandas as pd

    cv_results = pd.DataFrame(rfecv.cv_results_)
    plt.figure()
    plt.xlabel("Number of features selected")
    plt.ylabel("Mean test accuracy")
    plt.errorbar(
        x=cv_results["n_features"],
        y=cv_results["mean_test_score"],
        yerr=cv_results["std_test_score"],
    )
    plt.title("Recursive Feature Elimination \nwith correlated features")
    plt.show()




.. image-sg:: /auto_examples/feature_selection/images/sphx_glr_plot_rfe_with_cross_validation_001.png
   :alt: Recursive Feature Elimination  with correlated features
   :srcset: /auto_examples/feature_selection/images/sphx_glr_plot_rfe_with_cross_validation_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 77-78

从上图可以进一步注意到,对于选择的3到5个特征,得分出现了一个平台期(平均值相似且误差条重叠)。这是引入相关特征的结果。实际上,由RFE选择的最优模型可能位于这个范围内,具体取决于交叉验证技术。选择超过5个特征时,测试准确率下降,也就是说,保留无信息特征会导致过拟合,因此对模型的统计性能有害。


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.329 seconds)


.. _sphx_glr_download_auto_examples_feature_selection_plot_rfe_with_cross_validation.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/feature_selection/plot_rfe_with_cross_validation.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_rfe_with_cross_validation.ipynb <plot_rfe_with_cross_validation.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_rfe_with_cross_validation.py <plot_rfe_with_cross_validation.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_rfe_with_cross_validation.zip <plot_rfe_with_cross_validation.zip>`


.. include:: plot_rfe_with_cross_validation.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_