.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/model_selection/plot_permutation_tests_for_classification.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_model_selection_plot_permutation_tests_for_classification.py>`
        to download the full example code. or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_model_selection_plot_permutation_tests_for_classification.py:


=================================================================
通过排列检验分类评分的显著性
=================================================================

本示例演示了如何使用 :func:`~sklearn.model_selection.permutation_test_score` 通过排列来评估交叉验证评分的显著性。

.. GENERATED FROM PYTHON SOURCE LINES 9-13

.. code-block:: Python


    # 作者:scikit-learn 开发者
    # SPDX 许可证标识符:BSD-3-Clause








.. GENERATED FROM PYTHON SOURCE LINES 14-18

Dataset
-------

我们将使用 :ref:`iris_dataset` ,该数据集包含从3种鸢尾花中获取的测量值。

.. GENERATED FROM PYTHON SOURCE LINES 18-25

.. code-block:: Python


    from sklearn.datasets import load_iris

    iris = load_iris()
    X = iris.data
    y = iris.target








.. GENERATED FROM PYTHON SOURCE LINES 26-27

我们还将生成一些随机特征数据(即20个特征),这些特征与鸢尾花数据集中的类别标签不相关。

.. GENERATED FROM PYTHON SOURCE LINES 27-36

.. code-block:: Python



    import numpy as np

    n_uncorrelated_features = 20
    rng = np.random.RandomState(seed=0)
    # 使用与鸢尾花数据集相同数量的样本和20个特征
    X_rand = rng.normal(size=(X.shape[0], n_uncorrelated_features))








.. GENERATED FROM PYTHON SOURCE LINES 37-43

置换检验得分
----------------------

接下来,我们使用原始的鸢尾花数据集计算 :func:`~sklearn.model_selection.permutation_test_score` ,该数据集可以很好地预测标签;同时使用随机生成的特征和鸢尾花标签进行计算,这些特征和标签之间不应有任何依赖关系。我们使用 :class:`~sklearn.svm.SVC` 分类器和 :ref:`accuracy_score` 在每一轮中评估模型。

:func:`~sklearn.model_selection.permutation_test_score` 通过计算分类器在数据集的1000种不同排列上的准确率来生成一个零分布,其中特征保持不变但标签进行不同的排列。这是零假设的分布,零假设认为特征和标签之间没有依赖关系。然后计算一个经验p值,即在多少百分比的排列中获得的分数大于使用原始数据获得的分数。

.. GENERATED FROM PYTHON SOURCE LINES 43-58

.. code-block:: Python


    from sklearn.model_selection import StratifiedKFold, permutation_test_score
    from sklearn.svm import SVC

    clf = SVC(kernel="linear", random_state=7)
    cv = StratifiedKFold(2, shuffle=True, random_state=0)

    score_iris, perm_scores_iris, pvalue_iris = permutation_test_score(
        clf, X, y, scoring="accuracy", cv=cv, n_permutations=1000
    )

    score_rand, perm_scores_rand, pvalue_rand = permutation_test_score(
        clf, X_rand, y, scoring="accuracy", cv=cv, n_permutations=1000
    )








.. GENERATED FROM PYTHON SOURCE LINES 59-63

原始数据
^^^^^^^^^

下面我们绘制了置换得分(原假设分布)的直方图。红线表示分类器在原始数据上获得的得分。该得分远高于使用置换数据获得的得分,因此 p 值非常低。这表明仅凭偶然性获得如此高得分的可能性很低。这证明了鸢尾花数据集在特征和标签之间存在真实的依赖关系,并且分类器能够利用这一点来获得良好的结果。

.. GENERATED FROM PYTHON SOURCE LINES 63-75

.. code-block:: Python


    import matplotlib.pyplot as plt

    fig, ax = plt.subplots()

    ax.hist(perm_scores_iris, bins=20, density=True)
    ax.axvline(score_iris, ls="--", color="r")
    score_label = f"Score on original\ndata: {score_iris:.2f}\n(p-value: {pvalue_iris:.3f})"
    ax.text(0.7, 10, score_label, fontsize=12)
    ax.set_xlabel("Accuracy score")
    _ = ax.set_ylabel("Probability density")




.. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_permutation_tests_for_classification_001.png
   :alt: plot permutation tests for classification
   :srcset: /auto_examples/model_selection/images/sphx_glr_plot_permutation_tests_for_classification_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 76-80

Random data
^^^^^^^^^^^

下面我们绘制了随机数据的零分布。置换得分与使用原始鸢尾花数据集获得的得分相似,因为置换总是破坏任何存在的特征标签依赖性。然而,在这种情况下,原始随机数据上获得的得分非常差。这导致了一个很大的p值,证实了原始数据中没有特征标签依赖性。

.. GENERATED FROM PYTHON SOURCE LINES 80-92

.. code-block:: Python


    fig, ax = plt.subplots()

    ax.hist(perm_scores_rand, bins=20, density=True)
    ax.set_xlim(0.13)
    ax.axvline(score_rand, ls="--", color="r")
    score_label = f"Score on original\ndata: {score_rand:.2f}\n(p-value: {pvalue_rand:.3f})"
    ax.text(0.14, 7.5, score_label, fontsize=12)
    ax.set_xlabel("Accuracy score")
    ax.set_ylabel("Probability density")
    plt.show()




.. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_permutation_tests_for_classification_002.png
   :alt: plot permutation tests for classification
   :srcset: /auto_examples/model_selection/images/sphx_glr_plot_permutation_tests_for_classification_002.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 93-103

另一种可能导致高p值的原因是分类器无法利用数据中的结构。在这种情况下,只有能够利用现有依赖关系的分类器才会有低p值。在我们上面的例子中,由于数据是随机的,所有分类器的p值都会很高,因为数据中不存在结构。

最后请注意,即使数据中只有微弱的结构,该测试也已被证明会产生较低的p值[1]_。

.. rubric:: References

.. [1] Ojala 和 Garriga. `用于研究分类器性能的置换检验
<http://www.jmlr.org/papers/volume11/ojala10a/ojala10a.pdf>`_  .
《机器学习研究杂志》(2010) 第11卷



.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 6.144 seconds)


.. _sphx_glr_download_auto_examples_model_selection_plot_permutation_tests_for_classification.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/model_selection/plot_permutation_tests_for_classification.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_permutation_tests_for_classification.ipynb <plot_permutation_tests_for_classification.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_permutation_tests_for_classification.py <plot_permutation_tests_for_classification.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_permutation_tests_for_classification.zip <plot_permutation_tests_for_classification.zip>`


.. include:: plot_permutation_tests_for_classification.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_