.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/gaussian_process/plot_gpr_noisy.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_gaussian_process_plot_gpr_noisy.py: ========================================================================= 高斯过程回归 (GPR) 估计数据噪声水平的能力 ========================================================================= 本示例展示了 :class:`~sklearn.gaussian_process.kernels.WhiteKernel` 估计数据噪声水平的能力。此外,我们还展示了核函数超参数初始化的重要性。 .. GENERATED FROM PYTHON SOURCE LINES 8-12 .. code-block:: Python # 作者:scikit-learn 开发者 # SPDX-License-Identifier: BSD-3-Clause .. GENERATED FROM PYTHON SOURCE LINES 13-17 数据生成 --------------- 我们将在一个包含单个特征的设置中工作。我们创建一个函数来生成要预测的目标。我们将添加一个选项,以便向生成的目标添加一些噪声。 .. GENERATED FROM PYTHON SOURCE LINES 17-28 .. code-block:: Python import numpy as np def target_generator(X, add_noise=False): target = 0.5 + np.sin(3 * X) if add_noise: rng = np.random.RandomState(1) target += rng.normal(0, 0.3, size=target.shape) return target.squeeze() .. GENERATED FROM PYTHON SOURCE LINES 29-30 让我们看看目标生成器,在这里我们不会添加任何噪音,以观察我们想要预测的信号。 .. GENERATED FROM PYTHON SOURCE LINES 30-34 .. code-block:: Python X = np.linspace(0, 5, num=30).reshape(-1, 1) y = target_generator(X, add_noise=False) .. GENERATED FROM PYTHON SOURCE LINES 35-42 .. code-block:: Python import matplotlib.pyplot as plt plt.plot(X, y, label="Expected signal") plt.legend() plt.xlabel("X") _ = plt.ylabel("y") .. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_001.png :alt: plot gpr noisy :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 43-44 目标是使用正弦函数变换输入 `X` 。现在,我们将生成一些带噪声的训练样本。为了说明噪声水平,我们将绘制真实信号和带噪声的训练样本。 .. GENERATED FROM PYTHON SOURCE LINES 44-49 .. code-block:: Python rng = np.random.RandomState(0) X_train = rng.uniform(0, 5, size=20).reshape(-1, 1) y_train = target_generator(X_train, add_noise=True) .. GENERATED FROM PYTHON SOURCE LINES 50-62 .. code-block:: Python plt.plot(X, y, label="Expected signal") plt.scatter( x=X_train[:, 0], y=y_train, color="black", alpha=0.4, label="Observations", ) plt.legend() plt.xlabel("X") _ = plt.ylabel("y") .. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_002.png :alt: plot gpr noisy :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 63-71 内核超参数在高斯过程回归中的优化 --------------------------------------------- 现在,我们将创建一个使用加性核的 :class:`~sklearn.gaussian_process.GaussianProcessRegressor` ,该加性核由 :class:`~sklearn.gaussian_process.kernels.RBF` 和 :class:`~sklearn.gaussian_process.kernels.WhiteKernel` 组成。:class:`~sklearn.gaussian_process.kernels.WhiteKernel` 是一种能够估计数据中噪声量的核,而 :class:`~sklearn.gaussian_process.kernels.RBF` 则用于拟合数据和目标之间的非线性关系。 然而,我们将展示超参数空间包含多个局部最小值。这将突显初始超参数值的重要性。 我们将使用具有高噪声水平和大长度尺度的核来创建模型,这将通过噪声解释数据中的所有变化。 .. GENERATED FROM PYTHON SOURCE LINES 71-81 .. code-block:: Python from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import RBF, WhiteKernel kernel = 1.0 * RBF(length_scale=1e1, length_scale_bounds=(1e-2, 1e3)) + WhiteKernel( noise_level=1, noise_level_bounds=(1e-5, 1e1) ) gpr = GaussianProcessRegressor(kernel=kernel, alpha=0.0) gpr.fit(X_train, y_train) y_mean, y_std = gpr.predict(X, return_std=True) .. rst-class:: sphx-glr-script-out .. code-block:: none /app/scikit-learn-main-origin/sklearn/gaussian_process/kernels.py:431: ConvergenceWarning: The optimal value found for dimension 0 of parameter k1__k2__length_scale is close to the specified upper bound 1000.0. Increasing the bound and calling fit again may find a better value. .. GENERATED FROM PYTHON SOURCE LINES 82-95 .. code-block:: Python plt.plot(X, y, label="Expected signal") plt.scatter(x=X_train[:, 0], y=y_train, color="black", alpha=0.4, label="Observations") plt.errorbar(X, y_mean, y_std) plt.legend() plt.xlabel("X") plt.ylabel("y") _ = plt.title( ( f"Initial: {kernel}\nOptimum: {gpr.kernel_}\nLog-Marginal-Likelihood: " f"{gpr.log_marginal_likelihood(gpr.kernel_.theta)}" ), fontsize=8, ) .. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_003.png :alt: Initial: 1**2 * RBF(length_scale=10) + WhiteKernel(noise_level=1) Optimum: 0.763**2 * RBF(length_scale=1e+03) + WhiteKernel(noise_level=0.525) Log-Marginal-Likelihood: -23.499266455424184 :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 96-99 我们看到找到的最优核仍然具有较高的噪声水平和更大的长度尺度。此外,我们观察到模型并未提供可信的预测。 现在,我们将使用更大的 `length_scale` 初始化 :class:`~sklearn.gaussian_process.kernels.RBF` ,并使用更小的噪声水平下限初始化 :class:`~sklearn.gaussian_process.kernels.WhiteKernel` 。 .. GENERATED FROM PYTHON SOURCE LINES 99-106 .. code-block:: Python kernel = 1.0 * RBF(length_scale=1e-1, length_scale_bounds=(1e-2, 1e3)) + WhiteKernel( noise_level=1e-2, noise_level_bounds=(1e-10, 1e1) ) gpr = GaussianProcessRegressor(kernel=kernel, alpha=0.0) gpr.fit(X_train, y_train) y_mean, y_std = gpr.predict(X, return_std=True) .. GENERATED FROM PYTHON SOURCE LINES 107-121 .. code-block:: Python plt.plot(X, y, label="Expected signal") plt.scatter(x=X_train[:, 0], y=y_train, color="black", alpha=0.4, label="Observations") plt.errorbar(X, y_mean, y_std) plt.legend() plt.xlabel("X") plt.ylabel("y") _ = plt.title( ( f"Initial: {kernel}\nOptimum: {gpr.kernel_}\nLog-Marginal-Likelihood: " f"{gpr.log_marginal_likelihood(gpr.kernel_.theta)}" ), fontsize=8, ) .. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_004.png :alt: Initial: 1**2 * RBF(length_scale=0.1) + WhiteKernel(noise_level=0.01) Optimum: 1.05**2 * RBF(length_scale=0.569) + WhiteKernel(noise_level=0.134) Log-Marginal-Likelihood: -18.429732528984054 :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 122-127 首先,我们看到该模型的预测比之前的模型更精确:这个新模型能够估计无噪声的函数关系。 通过查看核函数的超参数,我们发现找到的最佳组合具有比第一个模型更小的噪声水平和更短的长度尺度。 我们可以检查不同超参数下 :class:`~sklearn.gaussian_process.GaussianProcessRegressor` 的对数边际似然 (LML),以了解局部最小值。 .. GENERATED FROM PYTHON SOURCE LINES 127-141 .. code-block:: Python from matplotlib.colors import LogNorm length_scale = np.logspace(-2, 4, num=50) noise_level = np.logspace(-2, 1, num=50) length_scale_grid, noise_level_grid = np.meshgrid(length_scale, noise_level) log_marginal_likelihood = [ gpr.log_marginal_likelihood(theta=np.log([0.36, scale, noise])) for scale, noise in zip(length_scale_grid.ravel(), noise_level_grid.ravel()) ] log_marginal_likelihood = np.reshape( log_marginal_likelihood, newshape=noise_level_grid.shape ) .. GENERATED FROM PYTHON SOURCE LINES 142-159 .. code-block:: Python vmin, vmax = (-log_marginal_likelihood).min(), 50 level = np.around(np.logspace(np.log10(vmin), np.log10(vmax), num=50), decimals=1) plt.contour( length_scale_grid, noise_level_grid, -log_marginal_likelihood, levels=level, norm=LogNorm(vmin=vmin, vmax=vmax), ) plt.colorbar() plt.xscale("log") plt.yscale("log") plt.xlabel("Length-scale") plt.ylabel("Noise-level") plt.title("Log-marginal-likelihood") plt.show() .. image-sg:: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_005.png :alt: Log-marginal-likelihood :srcset: /auto_examples/gaussian_process/images/sphx_glr_plot_gpr_noisy_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 160-161 我们看到有两个局部最小值对应于先前找到的超参数组合。根据超参数的初始值,基于梯度的优化可能会收敛到最佳模型,也可能不会。因此,重要的是对不同的初始化重复进行多次优化。 .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.993 seconds) .. _sphx_glr_download_auto_examples_gaussian_process_plot_gpr_noisy.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/gaussian_process/plot_gpr_noisy.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_gpr_noisy.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_gpr_noisy.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_gpr_noisy.zip ` .. include:: plot_gpr_noisy.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_