.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/linear_model/plot_lasso_lars_ic.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_linear_model_plot_lasso_lars_ic.py: ============================================== 通过信息准则进行Lasso模型选择 ============================================== 本示例重现了[ZHT2007]_中图2的示例。一个 :class:`~sklearn.linear_model.LassoLarsIC` 估计器在糖尿病数据集上进行拟合,并使用AIC和BIC准则选择最佳模型。 .. NOTE:: 需要注意的是,使用:class:`~sklearn.linear_model.LassoLarsIC` 找到 `alpha` 的优化依赖于在样本内计算的AIC或BIC准则,因此直接在训练集上进行。这种方法不同于交叉验证程序。关于这两种方法的比较,可以参考以下示例: :ref:`sphx_glr_auto_examples_linear_model_plot_lasso_model_selection.py` . .. rubric:: References .. [ZHT2007] :arxiv:`Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the degrees of freedom of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192. <0712.0881>` .. GENERATED FROM PYTHON SOURCE LINES 20-24 .. code-block:: Python # 作者:scikit-learn 开发者 # SPDX 许可证标识符:BSD-3-Clause .. GENERATED FROM PYTHON SOURCE LINES 25-26 我们将使用糖尿病数据集。 .. GENERATED FROM PYTHON SOURCE LINES 26-33 .. code-block:: Python from sklearn.datasets import load_diabetes X, y = load_diabetes(return_X_y=True, as_frame=True) n_samples = X.shape[0] X.head() .. raw:: html
age sex bmi bp s1 s2 s3 s4 s5 s6
0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401 -0.002592 0.019907 -0.017646
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412 -0.039493 -0.068332 -0.092204
2 0.085299 0.050680 0.044451 -0.005670 -0.045599 -0.034194 -0.032356 -0.002592 0.002861 -0.025930
3 -0.089063 -0.044642 -0.011595 -0.036656 0.012191 0.024991 -0.036038 0.034309 0.022688 -0.009362
4 0.005383 -0.044642 -0.036385 0.021872 0.003935 0.015596 0.008142 -0.002592 -0.031988 -0.046641


.. GENERATED FROM PYTHON SOURCE LINES 34-37 Scikit-learn 提供了一个名为 :class:`~sklearn.linear_model.LassoLarsIC` 的估计器,该估计器使用赤池信息准则 (AIC) 或贝叶斯信息准则 (BIC) 来选择最佳模型。在拟合此模型之前,我们将对数据集进行缩放。 接下来,我们将拟合两个模型,以比较AIC和BIC报告的值。 .. GENERATED FROM PYTHON SOURCE LINES 37-44 .. code-block:: Python from sklearn.linear_model import LassoLarsIC from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler lasso_lars_ic = make_pipeline(StandardScaler(), LassoLarsIC(criterion="aic")).fit(X, y) .. GENERATED FROM PYTHON SOURCE LINES 45-46 为了与 [ZHT2007]_ 中的定义保持一致,我们需要重新调整 AIC 和 BIC 的尺度。实际上,Zou 等人忽略了一些常数项,这些常数项与从线性模型的最大对数似然推导出的 AIC 原始定义相比。您可以参考 :ref:`用户指南的数学细节部分 ` 。 .. GENERATED FROM PYTHON SOURCE LINES 46-52 .. code-block:: Python def zou_et_al_criterion_rescaling(criterion, n_samples, noise_variance): """将信息准则重新调整为符合Zou等人的定义。""" return criterion - n_samples * np.log(2 * np.pi * noise_variance) - n_samples .. GENERATED FROM PYTHON SOURCE LINES 53-65 .. code-block:: Python import numpy as np aic_criterion = zou_et_al_criterion_rescaling( lasso_lars_ic[-1].criterion_, n_samples, lasso_lars_ic[-1].noise_variance_, ) index_alpha_path_aic = np.flatnonzero( lasso_lars_ic[-1].alphas_ == lasso_lars_ic[-1].alpha_ )[0] .. GENERATED FROM PYTHON SOURCE LINES 66-78 .. code-block:: Python lasso_lars_ic.set_params(lassolarsic__criterion="bic").fit(X, y) bic_criterion = zou_et_al_criterion_rescaling( lasso_lars_ic[-1].criterion_, n_samples, lasso_lars_ic[-1].noise_variance_, ) index_alpha_path_bic = np.flatnonzero( lasso_lars_ic[-1].alphas_ == lasso_lars_ic[-1].alpha_ )[0] .. GENERATED FROM PYTHON SOURCE LINES 79-80 现在我们已经收集了AIC和BIC,我们可以检查两个准则的最小值是否发生在相同的alpha值处。然后,我们可以简化下面的图表。 .. GENERATED FROM PYTHON SOURCE LINES 80-83 .. code-block:: Python index_alpha_path_aic == index_alpha_path_bic .. rst-class:: sphx-glr-script-out .. code-block:: none np.True_ .. GENERATED FROM PYTHON SOURCE LINES 84-85 最终,我们可以绘制AIC和BIC准则以及随后的选择正则化参数。 .. GENERATED FROM PYTHON SOURCE LINES 85-102 .. code-block:: Python import matplotlib.pyplot as plt plt.plot(aic_criterion, color="tab:blue", marker="o", label="AIC criterion") plt.plot(bic_criterion, color="tab:orange", marker="o", label="BIC criterion") plt.vlines( index_alpha_path_bic, aic_criterion.min(), aic_criterion.max(), color="black", linestyle="--", label="Selected alpha", ) plt.legend() plt.ylabel("Information criterion") plt.xlabel("Lasso model sequence") _ = plt.title("Lasso model selection via AIC and BIC") .. image-sg:: /auto_examples/linear_model/images/sphx_glr_plot_lasso_lars_ic_001.png :alt: Lasso model selection via AIC and BIC :srcset: /auto_examples/linear_model/images/sphx_glr_plot_lasso_lars_ic_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.045 seconds) .. _sphx_glr_download_auto_examples_linear_model_plot_lasso_lars_ic.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/linear_model/plot_lasso_lars_ic.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_lasso_lars_ic.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_lasso_lars_ic.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_lasso_lars_ic.zip ` .. include:: plot_lasso_lars_ic.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_