.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/model_selection/plot_grid_search_refit_callable.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_model_selection_plot_grid_search_refit_callable.py>`
        to download the full example code. or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_model_selection_plot_grid_search_refit_callable.py:


==================================================
平衡模型复杂性和交叉验证得分
==================================================

此示例通过在最佳准确性得分的1个标准差内找到一个不错的准确性，同时最小化PCA组件的数量来平衡模型复杂性和交叉验证得分[1]。

图中显示了交叉验证得分和PCA组件数量之间的权衡。平衡的情况是当n_components=10且accuracy=0.88时，这落在最佳准确性得分的1个标准差范围内。

[1] Hastie, T., Tibshirani, R., Friedman, J. (2001). 模型评估与选择. 统计学习的要素 (第219-260页). 纽约, 美国: 纽约施普林格公司.

.. GENERATED FROM PYTHON SOURCE LINES 13-114


.. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_grid_search_refit_callable_001.png
   :alt: Balance model complexity and cross-validated score
   :srcset: /auto_examples/model_selection/images/sphx_glr_plot_grid_search_refit_callable_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The best_index_ is 2
    The n_components selected is 10
    The corresponding accuracy score is 0.88


|

.. code-block:: Python


    # Author: Wenhao Zhang <wenhaoz@ucla.edu>

    import matplotlib.pyplot as plt
    import numpy as np

    from sklearn.datasets import load_digits
    from sklearn.decomposition import PCA
    from sklearn.model_selection import GridSearchCV
    from sklearn.pipeline import Pipeline
    from sklearn.svm import LinearSVC


    def lower_bound(cv_results):
        """计算最佳 `mean_test_scores` 在1个标准差内的下界。

    Parameters
    ----------
    cv_results : numpy(masked) ndarrays 的字典
        参见 `GridSearchCV` 的属性 cv_results_

    返回
    -------
    float
        最佳 `mean_test_score` 在1个标准差内的下界。
    """
        best_score_idx = np.argmax(cv_results["mean_test_score"])

        return (
            cv_results["mean_test_score"][best_score_idx]
            - cv_results["std_test_score"][best_score_idx]
        )


    def best_low_complexity(cv_results):
        """平衡模型复杂性与交叉验证得分。

    Parameters
    ----------
    cv_results : dict of numpy(masked) ndarrays
        请参阅 `GridSearchCV` 的属性 cv_results_。

    返回
    ------
    int
        返回一个模型的索引，该模型在测试得分在最佳 `mean_test_score` 的1个标准差内的同时，具有最少的PCA组件。
    """
        threshold = lower_bound(cv_results)
        candidate_idx = np.flatnonzero(cv_results["mean_test_score"] >= threshold)
        best_idx = candidate_idx[
            cv_results["param_reduce_dim__n_components"][candidate_idx].argmin()
        ]
        return best_idx


    pipe = Pipeline(
        [
            ("reduce_dim", PCA(random_state=42)),
            ("classify", LinearSVC(random_state=42, C=0.01)),
        ]
    )

    param_grid = {"reduce_dim__n_components": [6, 8, 10, 12, 14]}

    grid = GridSearchCV(
        pipe,
        cv=10,
        n_jobs=1,
        param_grid=param_grid,
        scoring="accuracy",
        refit=best_low_complexity,
    )
    X, y = load_digits(return_X_y=True)
    grid.fit(X, y)

    n_components = grid.cv_results_["param_reduce_dim__n_components"]
    test_scores = grid.cv_results_["mean_test_score"]

    plt.figure()
    plt.bar(n_components, test_scores, width=1.3, color="b")

    lower = lower_bound(grid.cv_results_)
    plt.axhline(np.max(test_scores), linestyle="--", color="y", label="Best score")
    plt.axhline(lower, linestyle="--", color=".5", label="Best score - 1 std")

    plt.title("Balance model complexity and cross-validated score")
    plt.xlabel("Number of PCA components used")
    plt.ylabel("Digit classification accuracy")
    plt.xticks(n_components.tolist())
    plt.ylim((0, 1.0))
    plt.legend(loc="upper left")

    best_index_ = grid.best_index_

    print("The best_index_ is %d" % best_index_)
    print("The n_components selected is %d" % n_components[best_index_])
    print(
        "The corresponding accuracy score is %.2f"
        % grid.cv_results_["mean_test_score"][best_index_]
    )
    plt.show()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.724 seconds)


.. _sphx_glr_download_auto_examples_model_selection_plot_grid_search_refit_callable.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/model_selection/plot_grid_search_refit_callable.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_grid_search_refit_callable.ipynb <plot_grid_search_refit_callable.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_grid_search_refit_callable.py <plot_grid_search_refit_callable.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_grid_search_refit_callable.zip <plot_grid_search_refit_callable.zip>`


.. include:: plot_grid_search_refit_callable.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_