.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/linear_model/plot_lasso_and_elasticnet.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end <sphx_glr_download_auto_examples_linear_model_plot_lasso_and_elasticnet.py>` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_linear_model_plot_lasso_and_elasticnet.py: ================================== 基于L1的稀ç–ä¿¡å·æ¨¡åž‹ ================================== 本示例比较了三ç§åŸºäºŽL1的回归模型在一个由稀ç–且相关特å¾ç”Ÿæˆçš„åˆæˆä¿¡å·ä¸Šçš„表现,这些信å·è¿›ä¸€æ¥è¢«åŠ æ€§é«˜æ–¯å™ªå£°æ‰€ç ´å: - :ref:`lasso` ; - :ref:`自动相关性确定` ; - :ref:`弹性网络` . 已知当数æ®ç»´åº¦å¢žåŠ æ—¶ï¼ŒLassoä¼°è®¡å€¼è¶‹è¿‘äºŽæ¨¡åž‹é€‰æ‹©ä¼°è®¡å€¼ï¼Œå‰ææ˜¯æ— å…³å˜é‡ä¸Žç›¸å…³å˜é‡çš„相关性ä¸å¤ªé«˜ã€‚在å˜åœ¨ç›¸å…³ç‰¹å¾çš„æƒ…况下,Lassoæœ¬èº«æ— æ³•é€‰æ‹©æ£ç¡®çš„ç¨€ç–æ¨¡å¼ [1]_。 åœ¨è¿™é‡Œï¼Œæˆ‘ä»¬æ¯”è¾ƒäº†è¿™ä¸‰ç§æ¨¡åž‹åœ¨ä»¥ä¸‹æ–¹é¢çš„表现::math:`R^2` å¾—åˆ†ã€æ‹Ÿåˆæ—¶é—´ä»¥åŠä¼°è®¡ç³»æ•°çš„ç¨€ç–æ€§ï¼Œå¹¶ä¸ŽçœŸå®žå€¼è¿›è¡Œå¯¹æ¯”。 .. GENERATED FROM PYTHON SOURCE LINES 16-19 .. code-block:: Python # Author: Arturo Amor <david-arturo.amor-quiroz@inria.fr> .. GENERATED FROM PYTHON SOURCE LINES 20-26 生æˆåˆæˆæ•°æ®é›† ----------------- 我们生æˆä¸€ä¸ªæ ·æœ¬æ•°é‡å°‘äºŽç‰¹å¾æ€»æ•°çš„æ•°æ®é›†ã€‚è¿™ä¼šå¯¼è‡´ä¸€ä¸ªæ¬ å®šç³»ç»Ÿï¼Œå³è§£ä¸æ˜¯å”¯ä¸€çš„ï¼Œå› æ¤æˆ‘们ä¸èƒ½å•独应用普通最å°äºŒä¹˜æ³•。æ£åˆ™åŒ–åœ¨ç›®æ ‡å‡½æ•°ä¸å¼•å…¥ä¸€ä¸ªæƒ©ç½šé¡¹ï¼Œè¿™ä¼šä¿®æ”¹ä¼˜åŒ–é—®é¢˜ï¼Œå¹¶æœ‰åŠ©äºŽç¼“è§£ç³»ç»Ÿçš„æ¬ å®šæ€§è´¨ã€‚ ç›®æ ‡ `y` 是æ£å¼¦ä¿¡å·çš„交替符å·çº¿æ€§ç»„åˆã€‚ `X` ä¸çš„ 100 个频率ä¸åªæœ‰æœ€ä½Žçš„ 10 个被用æ¥ç”Ÿæˆ `y` ï¼Œå…¶ä½™çš„ç‰¹å¾æ²¡æœ‰ä¿¡æ¯é‡ã€‚这导致了一个高维稀ç–特å¾ç©ºé—´ï¼Œå…¶ä¸éœ€è¦ä¸€å®šç¨‹åº¦çš„ l1 惩罚。 .. GENERATED FROM PYTHON SOURCE LINES 26-43 .. code-block:: Python import numpy as np rng = np.random.RandomState(0) n_samples, n_features, n_informative = 50, 100, 10 time_step = np.linspace(-2, 2, n_samples) freqs = 2 * np.pi * np.sort(rng.rand(n_features)) / 0.01 X = np.zeros((n_samples, n_features)) for i in range(n_features): X[:, i] = np.sin(freqs[i] * time_step) idx = np.arange(n_features) true_coef = (-1) ** idx * np.exp(-idx / 10) true_coef[n_informative:] = 0 # sparsify coef y = np.dot(X, true_coef) .. GENERATED FROM PYTHON SOURCE LINES 44-45 一些信æ¯ç‰¹å¾å…·æœ‰æŽ¥è¿‘的频率,从而引å‘(å)相关性。 .. GENERATED FROM PYTHON SOURCE LINES 45-49 .. code-block:: Python freqs[:n_informative] .. rst-class:: sphx-glr-script-out .. code-block:: none array([ 2.9502547 , 11.8059798 , 12.63394388, 12.70359377, 24.62241605, 37.84077985, 40.30506066, 44.63327171, 54.74495357, 59.02456369]) .. GENERATED FROM PYTHON SOURCE LINES 50-51 使用 :func:`numpy.random.random_sample` å¼•å…¥éšæœºç›¸ä½ï¼Œå¹¶å‘特å¾å’Œç›®æ ‡æ·»åŠ ä¸€äº›é«˜æ–¯å™ªå£°ï¼ˆç”± :func:`numpy.random.normal` 实现)。 .. GENERATED FROM PYTHON SOURCE LINES 51-58 .. code-block:: Python for i in range(n_features): X[:, i] = np.sin(freqs[i] * time_step + 2 * (rng.random_sample() - 0.5)) X[:, i] += 0.2 * rng.normal(0, 1, n_samples) y += 0.2 * rng.normal(0, 1, n_samples) .. GENERATED FROM PYTHON SOURCE LINES 59-61 例如,从监测æŸäº›çŽ¯å¢ƒå˜é‡çš„ä¼ æ„Ÿå™¨èŠ‚ç‚¹å¯ä»¥èŽ·å¾—è¿™ç§ç¨€ç–ã€å™ªå£°å’Œç›¸å…³çš„特å¾ï¼Œå› ä¸ºå®ƒä»¬é€šå¸¸æ ¹æ®å…¶ä½ç½®ï¼ˆç©ºé—´ç›¸å…³æ€§ï¼‰è®°å½•相似的值。 我们å¯ä»¥å¯è§†åŒ–ç›®æ ‡ã€‚ .. GENERATED FROM PYTHON SOURCE LINES 61-69 .. code-block:: Python import matplotlib.pyplot as plt plt.plot(time_step, y) plt.ylabel("target signal") plt.xlabel("time") _ = plt.title("Superposition of sinusoidal signals") .. image-sg:: /auto_examples/linear_model/images/sphx_glr_plot_lasso_and_elasticnet_001.png :alt: Superposition of sinusoidal signals :srcset: /auto_examples/linear_model/images/sphx_glr_plot_lasso_and_elasticnet_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 70-71 我们将数æ®åˆ†ä¸ºè®ç»ƒé›†å’Œæµ‹è¯•集以简化æ“作。实际上,应该使用 :class:`~sklearn.model_selection.TimeSeriesSplit` 交å‰éªŒè¯æ¥ä¼°è®¡æµ‹è¯•分数的方差。这里我们设置 `shuffle="False"` ï¼Œå› ä¸ºåœ¨å¤„ç†å…·æœ‰æ—¶é—´å…³ç³»çš„æ•°æ®æ—¶ï¼Œä¸èƒ½ä½¿ç”¨åœ¨æµ‹è¯•æ•°æ®ä¹‹åŽçš„è®ç»ƒæ•°æ®ã€‚ .. GENERATED FROM PYTHON SOURCE LINES 71-77 .. code-block:: Python from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, shuffle=False) .. GENERATED FROM PYTHON SOURCE LINES 78-84 在下文ä¸ï¼Œæˆ‘们计算了三个基于 l1 模型的拟åˆä¼˜åº¦ :math:`R^2` å¾—åˆ†å’Œæ‹Ÿåˆæ—¶é—´ã€‚ç„¶åŽæˆ‘ä»¬ç»˜åˆ¶äº†ä¸€ä¸ªå›¾è¡¨ï¼Œå°†ä¼°è®¡ç³»æ•°çš„ç¨€ç–æ€§ä¸ŽçœŸå®žç³»æ•°è¿›è¡Œæ¯”è¾ƒï¼Œæœ€åŽæˆ‘们分æžäº†ä¹‹å‰çš„结果。 Lasso ----- 在这个示例ä¸ï¼Œæˆ‘们演示了一个具有固定æ£åˆ™åŒ–傿•° `alpha` çš„ :class:`~sklearn.linear_model.Lasso` 。在实际应用ä¸ï¼Œåº”该通过将 :class:`~sklearn.model_selection.TimeSeriesSplit` 交å‰éªŒè¯ç–ç•¥ä¼ é€’ç»™ :class:`~sklearn.linear_model.LassoCV` æ¥é€‰æ‹©æœ€ä¼˜å‚æ•° `alpha` 。为了使示例简å•且执行速度快,我们在这里直接设置了 alpha 的最优值。 .. GENERATED FROM PYTHON SOURCE LINES 84-97 .. code-block:: Python from time import time from sklearn.linear_model import Lasso from sklearn.metrics import r2_score t0 = time() lasso = Lasso(alpha=0.14).fit(X_train, y_train) print(f"Lasso fit done in {(time() - t0):.3f}s") y_pred_lasso = lasso.predict(X_test) r2_score_lasso = r2_score(y_test, y_pred_lasso) print(f"Lasso r^2 on test data : {r2_score_lasso:.3f}") .. rst-class:: sphx-glr-script-out .. code-block:: none Lasso fit done in 0.001s Lasso r^2 on test data : 0.480 .. GENERATED FROM PYTHON SOURCE LINES 98-102 自动相关性确定 (ARD) ----------------------- ARD回归是Lassoçš„è´å¶æ–¯ç‰ˆæœ¬ã€‚如果需è¦ï¼Œå®ƒå¯ä»¥ä¸ºæ‰€æœ‰å‚数(包括误差方差)生æˆåŒºé—´ä¼°è®¡ã€‚当信å·å…·æœ‰é«˜æ–¯å™ªå£°æ—¶ï¼Œå®ƒæ˜¯ä¸€ä¸ªåˆé€‚的选择。请å‚阅示例 :ref:`sphx_glr_auto_examples_linear_model_plot_ard.py` ,以比较 :class:`~sklearn.linear_model.ARDRegression` å’Œ :class:`~sklearn.linear_model.BayesianRidge` 回归器。 .. GENERATED FROM PYTHON SOURCE LINES 102-113 .. code-block:: Python from sklearn.linear_model import ARDRegression t0 = time() ard = ARDRegression().fit(X_train, y_train) print(f"ARD fit done in {(time() - t0):.3f}s") y_pred_ard = ard.predict(X_test) r2_score_ard = r2_score(y_test, y_pred_ard) print(f"ARD r^2 on test data : {r2_score_ard:.3f}") .. rst-class:: sphx-glr-script-out .. code-block:: none ARD fit done in 0.018s ARD r^2 on test data : 0.543 .. GENERATED FROM PYTHON SOURCE LINES 114-120 ElasticNet ---------- :class:`~sklearn.linear_model.ElasticNet` 是 :class:`~sklearn.linear_model.Lasso` å’Œ :class:`~sklearn.linear_model.Ridge` 之间的ä¸é—´åœ°å¸¦ï¼Œå› 为它结åˆäº† L1 å’Œ L2 惩罚。æ£åˆ™åŒ–çš„ç¨‹åº¦ç”±ä¸¤ä¸ªè¶…å‚æ•° `l1_ratio` å’Œ `alpha` 控制。当 `l1_ratio = 0` 时,惩罚是纯 L2,模型ç‰åŒäºŽ :class:`~sklearn.linear_model.Ridge` 。类似地, `l1_ratio = 1` 是纯 L1 惩罚,模型ç‰åŒäºŽ :class:`~sklearn.linear_model.Lasso` 。对于 `0 < l1_ratio < 1` ,惩罚是 L1 å’Œ L2 的组åˆã€‚ æ£å¦‚之剿‰€åšçš„,我们使用固定的 `alpha` å’Œ `l1_ratio` 值æ¥è®ç»ƒæ¨¡åž‹ã€‚为了选择它们的最优值,我们使用了 :class:`~sklearn.linear_model.ElasticNetCV` ï¼Œè¿™é‡Œæ²¡æœ‰å±•ç¤ºä»¥ä¿æŒç¤ºä¾‹çš„简æ´ã€‚ .. GENERATED FROM PYTHON SOURCE LINES 120-131 .. code-block:: Python from sklearn.linear_model import ElasticNet t0 = time() enet = ElasticNet(alpha=0.08, l1_ratio=0.5).fit(X_train, y_train) print(f"ElasticNet fit done in {(time() - t0):.3f}s") y_pred_enet = enet.predict(X_test) r2_score_enet = r2_score(y_test, y_pred_enet) print(f"ElasticNet r^2 on test data : {r2_score_enet:.3f}") .. rst-class:: sphx-glr-script-out .. code-block:: none ElasticNet fit done in 0.001s ElasticNet r^2 on test data : 0.636 .. GENERATED FROM PYTHON SOURCE LINES 132-136 ç»“æžœçš„ç»˜å›¾å’Œåˆ†æž -------------------------------- 在本节ä¸ï¼Œæˆ‘们使用çƒå›¾æ¥å¯è§†åŒ–å„çº¿æ€§æ¨¡åž‹çš„çœŸå®žå’Œä¼°è®¡ç³»æ•°çš„ç¨€ç–æ€§ã€‚ .. GENERATED FROM PYTHON SOURCE LINES 136-167 .. code-block:: Python import matplotlib.pyplot as plt import pandas as pd import seaborn as sns from matplotlib.colors import SymLogNorm df = pd.DataFrame( { "True coefficients": true_coef, "Lasso": lasso.coef_, "ARDRegression": ard.coef_, "ElasticNet": enet.coef_, } ) plt.figure(figsize=(10, 6)) ax = sns.heatmap( df.T, norm=SymLogNorm(linthresh=10e-4, vmin=-1, vmax=1), cbar_kws={"label": "coefficients' values"}, cmap="seismic_r", ) plt.ylabel("linear model") plt.xlabel("coefficients") plt.title( f"Models' coefficients\nLasso $R^2$: {r2_score_lasso:.3f}, " f"ARD $R^2$: {r2_score_ard:.3f}, " f"ElasticNet $R^2$: {r2_score_enet:.3f}" ) plt.tight_layout() .. image-sg:: /auto_examples/linear_model/images/sphx_glr_plot_lasso_and_elasticnet_002.png :alt: Models' coefficients Lasso $R^2$: 0.480, ARD $R^2$: 0.543, ElasticNet $R^2$: 0.636 :srcset: /auto_examples/linear_model/images/sphx_glr_plot_lasso_and_elasticnet_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 168-183 在当å‰ç¤ºä¾‹ä¸ï¼Œ:class:`~sklearn.linear_model.ElasticNet` å¾—åˆ°äº†æœ€ä½³åˆ†æ•°å¹¶æ•æ‰åˆ°äº†å¤§éƒ¨åˆ†é¢„测特å¾ï¼Œä½†ä»æœªèƒ½æ‰¾åˆ°æ‰€æœ‰çš„真实æˆåˆ†ã€‚请注æ„,:class:`~sklearn.linear_model.ElasticNet` å’Œ :class:`~sklearn.linear_model.ARDRegression` 生æˆçš„æ¨¡åž‹æ¯” :class:`~sklearn.linear_model.Lasso` æ›´ä¸ç¨€ç–。 结论 ----------- :class:`~sklearn.linear_model.Lasso` 被认为能够有效地æ¢å¤ç¨€ç–æ•°æ®ï¼Œä½†åœ¨å¤„ç†é«˜åº¦ç›¸å…³çš„ç‰¹å¾æ—¶è¡¨çްä¸ä½³ã€‚实际上,如果多个相关特å¾å¯¹ç›®æ ‡æœ‰è´¡çŒ®ï¼Œ:class:`~sklearn.linear_model.Lasso` 最终åªä¼šé€‰æ‹©å…¶ä¸ä¸€ä¸ªã€‚在稀ç–但ä¸ç›¸å…³çš„ç‰¹å¾æƒ…况下,:class:`~sklearn.linear_model.Lasso` 模型会更åˆé€‚。 :class:`~sklearn.linear_model.ElasticNet` åœ¨ç³»æ•°ä¸Šå¼•å…¥äº†ä¸€äº›ç¨€ç–æ€§ï¼Œå¹¶å°†å®ƒä»¬çš„值缩å°åˆ°é›¶ã€‚å› æ¤ï¼Œåœ¨å˜åœ¨å¯¹ç›®æ ‡æœ‰è´¡çŒ®çš„相关特å¾çš„æƒ…况下,模型ä»ç„¶èƒ½å¤Ÿå‡å°‘它们的æƒé‡ï¼Œè€Œä¸å°†å®ƒä»¬å®Œå…¨è®¾ä¸ºé›¶ã€‚这导致模型比纯粹的 :class:`~sklearn.linear_model.Lasso` æ›´ä¸ç¨€ç–,并且å¯èƒ½æ•æ‰åˆ°éžé¢„测性特å¾ã€‚ :class:`~sklearn.linear_model.ARDRegression` 在处ç†é«˜æ–¯å™ªå£°æ—¶è¡¨çŽ°æ›´å¥½ï¼Œä½†ä»ç„¶æ— 法处ç†ç›¸å…³ç‰¹å¾ï¼Œå¹¶ä¸”ç”±äºŽéœ€è¦æ‹Ÿåˆå…ˆéªŒåˆ†å¸ƒï¼Œè€—时较长。 References ---------- .. [1] :doi:`"高维数æ®ç¨€ç–表示的Lassoåž‹æ¢å¤" N. Meinshausen, B. Yu - 统计年鉴 2009, 第37å·, 第1期, 246-270 <10.1214/07-AOS582>` .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.230 seconds) .. _sphx_glr_download_auto_examples_linear_model_plot_lasso_and_elasticnet.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/linear_model/plot_lasso_and_elasticnet.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_lasso_and_elasticnet.ipynb <plot_lasso_and_elasticnet.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_lasso_and_elasticnet.py <plot_lasso_and_elasticnet.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_lasso_and_elasticnet.zip <plot_lasso_and_elasticnet.zip>` .. include:: plot_lasso_and_elasticnet.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_