.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/model_selection/plot_successive_halving_iterations.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_model_selection_plot_successive_halving_iterations.py: 连续减半迭代 ============================= 本示例展示了连续减半搜索(:class:`~sklearn.model_selection.HalvingGridSearchCV` 和 :class:`~sklearn.model_selection.HalvingRandomSearchCV` )如何通过迭代从多个候选中选择最佳参数组合。 .. GENERATED FROM PYTHON SOURCE LINES 8-19 .. code-block:: Python import matplotlib.pyplot as plt import numpy as np import pandas as pd from scipy.stats import randint from sklearn import datasets from sklearn.ensemble import RandomForestClassifier from sklearn.experimental import enable_halving_search_cv # noqa from sklearn.model_selection import HalvingRandomSearchCV .. GENERATED FROM PYTHON SOURCE LINES 20-21 我们首先定义参数空间并训练一个:class:`~sklearn.model_selection.HalvingRandomSearchCV` 实例。 .. GENERATED FROM PYTHON SOURCE LINES 21-42 .. code-block:: Python rng = np.random.RandomState(0) X, y = datasets.make_classification(n_samples=400, n_features=12, random_state=rng) clf = RandomForestClassifier(n_estimators=20, random_state=rng) param_dist = { "max_depth": [3, None], "max_features": randint(1, 6), "min_samples_split": randint(2, 11), "bootstrap": [True, False], "criterion": ["gini", "entropy"], } rsh = HalvingRandomSearchCV( estimator=clf, param_distributions=param_dist, factor=2, random_state=rng ) rsh.fit(X, y) .. raw:: html
HalvingRandomSearchCV(estimator=RandomForestClassifier(n_estimators=20,
                                                           random_state=RandomState(MT19937) at 0xFFFFA3943E40),
                          factor=2,
                          param_distributions={'bootstrap': [True, False],
                                               'criterion': ['gini', 'entropy'],
                                               'max_depth': [3, None],
                                               'max_features': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0xffff4c3fa710>,
                                               'min_samples_split': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0xffff400ceb10>},
                          random_state=RandomState(MT19937) at 0xFFFFA3943E40)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 43-44 我们现在可以使用搜索估计器的 `cv_results_` 属性来检查和绘制搜索的演变过程。 .. GENERATED FROM PYTHON SOURCE LINES 44-67 .. code-block:: Python results = pd.DataFrame(rsh.cv_results_) results["params_str"] = results.params.apply(str) results.drop_duplicates(subset=("params_str", "iter"), inplace=True) mean_scores = results.pivot( index="iter", columns="params_str", values="mean_test_score" ) ax = mean_scores.plot(legend=False, alpha=0.6) labels = [ f"iter={i}\nn_samples={rsh.n_resources_[i]}\nn_candidates={rsh.n_candidates_[i]}" for i in range(rsh.n_iterations_) ] ax.set_xticks(range(rsh.n_iterations_)) ax.set_xticklabels(labels, rotation=45, multialignment="left") ax.set_title("Scores of candidates over iterations") ax.set_ylabel("mean test score", fontsize=15) ax.set_xlabel("iterations", fontsize=15) plt.tight_layout() plt.show() .. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_successive_halving_iterations_001.png :alt: Scores of candidates over iterations :srcset: /auto_examples/model_selection/images/sphx_glr_plot_successive_halving_iterations_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 68-76 候选人数和每次迭代的资源量 ------------------------------------------------------------- 在第一次迭代时,使用少量资源。这里的资源是指训练估计器所用的样本数量。所有候选者都会被评估。 在第二次迭代中,只评估表现最好的候选者的一半。分配的资源数量加倍:候选者在两倍数量的样本上进行评估。 这个过程会重复进行,直到最后一轮迭代,此时只剩下2个候选者。最佳候选者是在最后一轮迭代中得分最高的候选者。 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 2.333 seconds) .. _sphx_glr_download_auto_examples_model_selection_plot_successive_halving_iterations.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/model_selection/plot_successive_halving_iterations.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_successive_halving_iterations.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_successive_halving_iterations.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_successive_halving_iterations.zip ` .. include:: plot_successive_halving_iterations.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_