.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/model_selection/plot_underfitting_overfitting.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_model_selection_plot_underfitting_overfitting.py: ============================ 欠拟合与过拟合 ============================ 本示例演示了欠拟合和过拟合的问题,以及如何使用带有多项式特征的线性回归来逼近非线性函数。图中显示了我们想要逼近的函数,它是余弦函数的一部分。此外,还显示了真实函数的样本和不同模型的逼近结果。这些模型具有不同次数的多项式特征。我们可以看到,线性函数(次数为1的多项式)不足以拟合训练样本。这被称为 **欠拟合** 。次数为4的多项式几乎完美地逼近了真实函数。然而,对于更高次数的多项式,模型会对训练数据 **过拟合** ,即它学习了训练数据的噪声。 我们通过使用交叉验证定量评估 **过拟合** / **欠拟合** 。我们计算验证集上的均方误差(MSE),误差越高,模型从训练数据中正确泛化的可能性就越小。 .. GENERATED FROM PYTHON SOURCE LINES 10-67 .. image-sg:: /auto_examples/model_selection/images/sphx_glr_plot_underfitting_overfitting_001.png :alt: Degree 1 MSE = 4.08e-01(+/- 4.25e-01), Degree 4 MSE = 4.32e-02(+/- 7.08e-02), Degree 15 MSE = 1.83e+08(+/- 5.48e+08) :srcset: /auto_examples/model_selection/images/sphx_glr_plot_underfitting_overfitting_001.png :class: sphx-glr-single-img .. code-block:: Python import matplotlib.pyplot as plt import numpy as np from sklearn.linear_model import LinearRegression from sklearn.model_selection import cross_val_score from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures def true_fun(X): return np.cos(1.5 * np.pi * X) np.random.seed(0) n_samples = 30 degrees = [1, 4, 15] X = np.sort(np.random.rand(n_samples)) y = true_fun(X) + np.random.randn(n_samples) * 0.1 plt.figure(figsize=(14, 5)) for i in range(len(degrees)): ax = plt.subplot(1, len(degrees), i + 1) plt.setp(ax, xticks=(), yticks=()) polynomial_features = PolynomialFeatures(degree=degrees[i], include_bias=False) linear_regression = LinearRegression() pipeline = Pipeline( [ ("polynomial_features", polynomial_features), ("linear_regression", linear_regression), ] ) pipeline.fit(X[:, np.newaxis], y) # 使用交叉验证评估模型 scores = cross_val_score( pipeline, X[:, np.newaxis], y, scoring="neg_mean_squared_error", cv=10 ) X_test = np.linspace(0, 1, 100) plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model") plt.plot(X_test, true_fun(X_test), label="True function") plt.scatter(X, y, edgecolor="b", s=20, label="Samples") plt.xlabel("x") plt.ylabel("y") plt.xlim((0, 1)) plt.ylim((-2, 2)) plt.legend(loc="best") plt.title( "Degree {}\nMSE = {:.2e}(+/- {:.2e})".format( degrees[i], -scores.mean(), scores.std() ) ) plt.show() .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.102 seconds) .. _sphx_glr_download_auto_examples_model_selection_plot_underfitting_overfitting.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/model_selection/plot_underfitting_overfitting.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_underfitting_overfitting.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_underfitting_overfitting.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_underfitting_overfitting.zip ` .. include:: plot_underfitting_overfitting.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_