.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/linear_model/plot_ard.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_linear_model_plot_ard.py>`
        to download the full example code. or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_linear_model_plot_ard.py:


====================================
比较线性贝叶斯回归器
====================================

本示例比较了两种不同的贝叶斯回归器：

 - :ref:`自动相关性确定` 
 - :ref:`贝叶斯岭回归` 

在第一部分中，我们使用 :ref:`普通最小二乘法` (OLS) 模型作为基准，比较模型系数与真实系数。随后，我们展示了通过迭代最大化观测值的边际对数似然来估计这些模型。

在最后一部分中，我们使用多项式特征扩展来拟合 `X` 和 `y` 之间的非线性关系，并绘制了 ARD 和贝叶斯岭回归的预测和不确定性。

.. GENERATED FROM PYTHON SOURCE LINES 16-19

.. code-block:: Python


    # Author: Arturo Amor <david-arturo.amor-quiroz@inria.fr>


.. GENERATED FROM PYTHON SOURCE LINES 20-27

模型的鲁棒性以恢复真实权重
=============================

生成合成数据集
--------------------------

我们生成了一个数据集，其中 `X` 和 `y` 具有线性关系： `X` 的 10 个特征将用于生成 `y` 。其他特征对预测 `y` 没有用。此外，我们生成了一个 `n_samples == n_features` 的数据集。对于普通最小二乘法（OLS）模型来说，这样的设置是具有挑战性的，并且可能导致权重任意增大。对权重施加先验和惩罚可以缓解这个问题。最后，添加了高斯噪声。

.. GENERATED FROM PYTHON SOURCE LINES 27-39

.. code-block:: Python


    from sklearn.datasets import make_regression

    X, y, true_weights = make_regression(
        n_samples=100,
        n_features=100,
        n_informative=10,
        noise=8,
        coef=True,
        random_state=42,
    )


.. GENERATED FROM PYTHON SOURCE LINES 40-44

拟合回归模型
------------------

我们现在拟合贝叶斯模型和普通最小二乘法（OLS），以便稍后比较模型的系数。

.. GENERATED FROM PYTHON SOURCE LINES 44-61

.. code-block:: Python


    import pandas as pd

    from sklearn.linear_model import ARDRegression, BayesianRidge, LinearRegression

    olr = LinearRegression().fit(X, y)
    brr = BayesianRidge(compute_score=True, max_iter=30).fit(X, y)
    ard = ARDRegression(compute_score=True, max_iter=30).fit(X, y)
    df = pd.DataFrame(
        {
            "Weights of true generative process": true_weights,
            "ARDRegression": ard.coef_,
            "BayesianRidge": brr.coef_,
            "LinearRegression": olr.coef_,
        }
    )


.. GENERATED FROM PYTHON SOURCE LINES 62-66

绘制真实和估计的系数
----------------------------------------

现在我们将每个模型的系数与真实生成模型的权重进行比较。

.. GENERATED FROM PYTHON SOURCE LINES 66-82

.. code-block:: Python

    import matplotlib.pyplot as plt
    import seaborn as sns
    from matplotlib.colors import SymLogNorm

    plt.figure(figsize=(10, 6))
    ax = sns.heatmap(
        df.T,
        norm=SymLogNorm(linthresh=10e-4, vmin=-80, vmax=80),
        cbar_kws={"label": "coefficients' values"},
        cmap="seismic_r",
    )
    plt.ylabel("linear model")
    plt.xlabel("coefficients")
    plt.tight_layout(rect=(0, 0, 1, 0.95))
    _ = plt.title("Models' coefficients")


.. image-sg:: /auto_examples/linear_model/images/sphx_glr_plot_ard_001.png
   :alt: Models' coefficients
   :srcset: /auto_examples/linear_model/images/sphx_glr_plot_ard_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 83-84

由于添加了噪声，所有模型都无法恢复真实权重。实际上，所有模型总是有超过10个非零系数。与OLS估计相比，使用贝叶斯岭回归的系数略微向零偏移，这使它们更加稳定。ARD回归提供了一个更稀疏的解决方案：一些无信息系数被精确地设为零，而其他系数则更接近于零。一些无信息系数仍然存在并保持较大的值。

.. GENERATED FROM PYTHON SOURCE LINES 87-89

绘制边际对数似然函数
--------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 89-101

.. code-block:: Python

    import numpy as np

    ard_scores = -np.array(ard.scores_)
    brr_scores = -np.array(brr.scores_)
    plt.plot(ard_scores, color="navy", label="ARD")
    plt.plot(brr_scores, color="red", label="BayesianRidge")
    plt.ylabel("Log-likelihood")
    plt.xlabel("Iterations")
    plt.xlim(1, 30)
    plt.legend()
    _ = plt.title("Models log-likelihood")


.. image-sg:: /auto_examples/linear_model/images/sphx_glr_plot_ard_002.png
   :alt: Models log-likelihood
   :srcset: /auto_examples/linear_model/images/sphx_glr_plot_ard_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 102-110

确实，这两种模型都将对数似然函数最小化到由 `max_iter` 参数定义的任意截止点。

Bayesian regressions with polynomial feature expansion
======================================================
生成合成数据集
--------------------------
We create a target that is a non-linear function of the input feature.
Noise following a standard uniform distribution is added.

.. GENERATED FROM PYTHON SOURCE LINES 110-132

.. code-block:: Python


    from sklearn.pipeline import make_pipeline
    from sklearn.preprocessing import PolynomialFeatures, StandardScaler

    rng = np.random.RandomState(0)
    n_samples = 110

    # 对数据进行排序，以便后续绘图更容易进行
    X = np.sort(-10 * rng.rand(n_samples) + 10)
    noise = rng.normal(0, 1, n_samples) * 1.35
    y = np.sqrt(X) * np.sin(X) + noise
    full_data = pd.DataFrame({"input_feature": X, "target": y})
    X = X.reshape((-1, 1))

    # 外推
    # 
    # 
    X_plot = np.linspace(10, 10.4, 10)
    y_plot = np.sqrt(X_plot) * np.sin(X_plot)
    X_plot = np.concatenate((X, X_plot.reshape((-1, 1))))
    y_plot = np.concatenate((y - noise, y_plot))


.. GENERATED FROM PYTHON SOURCE LINES 133-137

拟合回归模型
------------------

在这里，我们尝试使用10次多项式来潜在地过拟合，尽管贝叶斯线性模型会对多项式系数的大小进行正则化。由于对于:class:`~sklearn.linear_model.ARDRegression` 和:class:`~sklearn.linear_model.BayesianRidge` ， `fit_intercept=True` 是默认设置，因此:class:`~sklearn.preprocessing.PolynomialFeatures` 不应引入额外的偏差特征。通过设置 `return_std=True` ，贝叶斯回归器会返回模型参数后验分布的标准差。

.. GENERATED FROM PYTHON SOURCE LINES 137-152

.. code-block:: Python


    ard_poly = make_pipeline(
        PolynomialFeatures(degree=10, include_bias=False),
        StandardScaler(),
        ARDRegression(),
    ).fit(X, y)
    brr_poly = make_pipeline(
        PolynomialFeatures(degree=10, include_bias=False),
        StandardScaler(),
        BayesianRidge(),
    ).fit(X, y)

    y_ard, y_ard_std = ard_poly.predict(X_plot, return_std=True)
    y_brr, y_brr_std = brr_poly.predict(X_plot, return_std=True)


.. GENERATED FROM PYTHON SOURCE LINES 153-155

绘制带有分数标准误差的多项式回归图
-------------------------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 155-180

.. code-block:: Python


    ax = sns.scatterplot(
        data=full_data, x="input_feature", y="target", color="black", alpha=0.75
    )
    ax.plot(X_plot, y_plot, color="black", label="Ground Truth")
    ax.plot(X_plot, y_brr, color="red", label="BayesianRidge with polynomial features")
    ax.plot(X_plot, y_ard, color="navy", label="ARD with polynomial features")
    ax.fill_between(
        X_plot.ravel(),
        y_ard - y_ard_std,
        y_ard + y_ard_std,
        color="navy",
        alpha=0.3,
    )
    ax.fill_between(
        X_plot.ravel(),
        y_brr - y_brr_std,
        y_brr + y_brr_std,
        color="red",
        alpha=0.3,
    )
    ax.legend()
    _ = ax.set_title("Polynomial fit of a non-linear feature")


.. image-sg:: /auto_examples/linear_model/images/sphx_glr_plot_ard_003.png
   :alt: Polynomial fit of a non-linear feature
   :srcset: /auto_examples/linear_model/images/sphx_glr_plot_ard_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 181-184

误差条表示查询点的预测高斯分布的一个标准差。请注意，当在两个模型中使用默认参数时，ARD回归最好地捕捉到了真实情况，但进一步减少贝叶斯岭回归的 `lambda_init` 超参数可以减少其偏差（参见示例
:ref:`sphx_glr_auto_examples_linear_model_plot_bayesian_ridge_curvefit.py` ）。
最后，由于多项式回归的内在局限性，两个模型在外推时都失败了。


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.747 seconds)


.. _sphx_glr_download_auto_examples_linear_model_plot_ard.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/linear_model/plot_ard.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_ard.ipynb <plot_ard.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_ard.py <plot_ard.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_ard.zip <plot_ard.zip>`


.. include:: plot_ard.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_