.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/ensemble/plot_gradient_boosting_regression.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_ensemble_plot_gradient_boosting_regression.py: ============================ 梯度提升回归 ============================ 本示例演示了如何使用梯度提升从一组弱预测模型中生成一个预测模型。梯度提升可以用于回归和分类问题。在这里,我们将训练一个模型来解决糖尿病回归任务。我们将使用最小二乘损失和500棵深度为4的回归树,从:class:`~sklearn.ensemble.GradientBoostingRegressor` 中获得结果。 注意:对于较大的数据集(n_samples >= 10000),请参考:class:`~sklearn.ensemble.HistGradientBoostingRegressor` 。有关展示:class:`~ensemble.HistGradientBoostingRegressor` 其他优势的示例,请参见:ref:`sphx_glr_auto_examples_ensemble_plot_hgbt_regression.py` 。 .. GENERATED FROM PYTHON SOURCE LINES 11-23 .. code-block:: Python # 作者:scikit-learn 开发者 # SPDX-License-Identifier: BSD-3-Clause import matplotlib.pyplot as plt import numpy as np from sklearn import datasets, ensemble from sklearn.inspection import permutation_importance from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split .. GENERATED FROM PYTHON SOURCE LINES 24-28 加载数据 ------------------------------------- 首先我们需要加载数据。 .. GENERATED FROM PYTHON SOURCE LINES 28-32 .. code-block:: Python diabetes = datasets.load_diabetes() X, y = diabetes.data, diabetes.target .. GENERATED FROM PYTHON SOURCE LINES 33-47 数据预处理 ------------------------------------- 接下来,我们将拆分数据集,使用90%进行训练,剩余部分用于测试。我们还将设置回归模型参数。你可以调整这些参数,观察结果的变化。 `n_estimators` :将执行的提升阶段的数量。稍后,我们将绘制偏差与提升迭代次数的关系图。 `max_depth` :限制树中的节点数量。最佳值取决于输入变量的相互作用。 `min_samples_split` :拆分内部节点所需的最小样本数量。 `learning_rate` : 每棵树的贡献将缩减多少。 `loss` :要优化的损失函数。在这种情况下使用的是最小二乘函数,不过还有许多其他选项(参见::class:`~sklearn.ensemble.GradientBoostingRegressor` )。 .. GENERATED FROM PYTHON SOURCE LINES 47-60 .. code-block:: Python X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.1, random_state=13 ) params = { "n_estimators": 500, "max_depth": 4, "min_samples_split": 5, "learning_rate": 0.01, "loss": "squared_error", } .. GENERATED FROM PYTHON SOURCE LINES 61-65 拟合回归模型 -------------------- 现在我们将初始化梯度提升回归器并用我们的训练数据进行拟合。我们还将查看测试数据的均方误差。 .. GENERATED FROM PYTHON SOURCE LINES 65-72 .. code-block:: Python reg = ensemble.GradientBoostingRegressor(**params) reg.fit(X_train, y_train) mse = mean_squared_error(y_test, reg.predict(X_test)) print("The mean squared error (MSE) on test set: {:.4f}".format(mse)) .. rst-class:: sphx-glr-script-out .. code-block:: none The mean squared error (MSE) on test set: 3027.4544 .. GENERATED FROM PYTHON SOURCE LINES 73-77 绘制训练偏差图 ----------------- 最后,我们将对结果进行可视化。为此,我们将首先计算测试集偏差,然后将其与提升迭代次数进行对比绘图。 .. GENERATED FROM PYTHON SOURCE LINES 77-100 .. code-block:: Python test_score = np.zeros((params["n_estimators"],), dtype=np.float64) for i, y_pred in enumerate(reg.staged_predict(X_test)): test_score[i] = mean_squared_error(y_test, y_pred) fig = plt.figure(figsize=(6, 6)) plt.subplot(1, 1, 1) plt.title("Deviance") plt.plot( np.arange(params["n_estimators"]) + 1, reg.train_score_, "b-", label="Training Set Deviance", ) plt.plot( np.arange(params["n_estimators"]) + 1, test_score, "r-", label="Test Set Deviance" ) plt.legend(loc="upper right") plt.xlabel("Boosting Iterations") plt.ylabel("Deviance") fig.tight_layout() plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_gradient_boosting_regression_001.png :alt: Deviance :srcset: /auto_examples/ensemble/images/sphx_glr_plot_gradient_boosting_regression_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 101-108 绘制特征重要性 ----------------------- .. warning:: 注意,对于 **高基数** 特征(具有许多唯一值),基于杂质的特征重要性可能会产生误导。作为替代,可以在保留的测试集上计算 ``reg`` 的置换重要性。更多详情请参见 :ref:`permutation_importance` 。 对于这个例子,基于不纯度的方法和置换方法识别出了相同的两个强预测特征,但顺序不同。第三个最具预测性的特征 "bp" 对于这两种方法也是相同的。其余特征的预测性较低,并且置换图的误差条显示它们与0重叠。 .. GENERATED FROM PYTHON SOURCE LINES 108-131 .. code-block:: Python feature_importance = reg.feature_importances_ sorted_idx = np.argsort(feature_importance) pos = np.arange(sorted_idx.shape[0]) + 0.5 fig = plt.figure(figsize=(12, 6)) plt.subplot(1, 2, 1) plt.barh(pos, feature_importance[sorted_idx], align="center") plt.yticks(pos, np.array(diabetes.feature_names)[sorted_idx]) plt.title("Feature Importance (MDI)") result = permutation_importance( reg, X_test, y_test, n_repeats=10, random_state=42, n_jobs=2 ) sorted_idx = result.importances_mean.argsort() plt.subplot(1, 2, 2) plt.boxplot( result.importances[sorted_idx].T, vert=False, labels=np.array(diabetes.feature_names)[sorted_idx], ) plt.title("Permutation Importance (test set)") fig.tight_layout() plt.show() .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_gradient_boosting_regression_002.png :alt: Feature Importance (MDI), Permutation Importance (test set) :srcset: /auto_examples/ensemble/images/sphx_glr_plot_gradient_boosting_regression_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none /app/scikit-learn-main-origin/examples/ensemble/plot_gradient_boosting_regression.py:123: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.509 seconds) .. _sphx_glr_download_auto_examples_ensemble_plot_gradient_boosting_regression.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/ensemble/plot_gradient_boosting_regression.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_gradient_boosting_regression.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_gradient_boosting_regression.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_gradient_boosting_regression.zip ` .. include:: plot_gradient_boosting_regression.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_