.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/preprocessing/plot_discretization.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_preprocessing_plot_discretization.py: ================================================================ 使用KBinsDiscretizer离散连续特征 ================================================================ 该示例比较了线性回归(线性模型)和决策树(基于树的模型)在对实值特征进行离散化前后的预测结果。 如离散化前的结果所示,线性模型构建速度快且相对易于解释,但只能建模线性关系,而决策树可以构建数据的更复杂模型。使线性模型在连续数据上更强大的一种方法是使用离散化(也称为分箱)。在示例中,我们对特征进行离散化并对转换后的数据进行独热编码。请注意,如果分箱不够宽,则可能会显著增加过拟合的风险,因此通常应在交叉验证下调整离散化参数。 离散化后,线性回归和决策树做出完全相同的预测。由于每个分箱内的特征是恒定的,任何模型都必须对分箱内的所有点预测相同的值。与离散化前的结果相比,线性模型变得更加灵活,而决策树变得不那么灵活。请注意,分箱特征对基于树的模型通常没有有益效果,因为这些模型可以在数据的任何地方进行分割。 .. GENERATED FROM PYTHON SOURCE LINES 13-75 .. image-sg:: /auto_examples/preprocessing/images/sphx_glr_plot_discretization_001.png :alt: Result before discretization, Result after discretization :srcset: /auto_examples/preprocessing/images/sphx_glr_plot_discretization_001.png :class: sphx-glr-single-img .. code-block:: Python # 作者:scikit-learn 开发者 # SPDX-License-Identifier: BSD-3-Clause import matplotlib.pyplot as plt import numpy as np from sklearn.linear_model import LinearRegression from sklearn.preprocessing import KBinsDiscretizer from sklearn.tree import DecisionTreeRegressor # 构建数据集 rnd = np.random.RandomState(42) X = rnd.uniform(-3, 3, size=100) y = np.sin(X) + rnd.normal(size=len(X)) / 3 X = X.reshape(-1, 1) # 使用KBinsDiscretizer对数据集进行转换 enc = KBinsDiscretizer(n_bins=10, encode="onehot") X_binned = enc.fit_transform(X) # 使用原始数据集进行预测 fig, (ax1, ax2) = plt.subplots(ncols=2, sharey=True, figsize=(10, 4)) line = np.linspace(-3, 3, 1000, endpoint=False).reshape(-1, 1) reg = LinearRegression().fit(X, y) ax1.plot(line, reg.predict(line), linewidth=2, color="green", label="linear regression") reg = DecisionTreeRegressor(min_samples_split=3, random_state=0).fit(X, y) ax1.plot(line, reg.predict(line), linewidth=2, color="red", label="decision tree") ax1.plot(X[:, 0], y, "o", c="k") ax1.legend(loc="best") ax1.set_ylabel("Regression output") ax1.set_xlabel("Input feature") ax1.set_title("Result before discretization") # 使用转换后的数据集进行预测 line_binned = enc.transform(line) reg = LinearRegression().fit(X_binned, y) ax2.plot( line, reg.predict(line_binned), linewidth=2, color="green", linestyle="-", label="linear regression", ) reg = DecisionTreeRegressor(min_samples_split=3, random_state=0).fit(X_binned, y) ax2.plot( line, reg.predict(line_binned), linewidth=2, color="red", linestyle=":", label="decision tree", ) ax2.plot(X[:, 0], y, "o", c="k") ax2.vlines(enc.bin_edges_[0], *plt.gca().get_ylim(), linewidth=1, alpha=0.2) ax2.legend(loc="best") ax2.set_xlabel("Input feature") ax2.set_title("Result after discretization") plt.tight_layout() plt.show() .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.102 seconds) .. _sphx_glr_download_auto_examples_preprocessing_plot_discretization.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/preprocessing/plot_discretization.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_discretization.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_discretization.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_discretization.zip ` .. include:: plot_discretization.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_