.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/ensemble/plot_bias_variance.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end <sphx_glr_download_auto_examples_ensemble_plot_bias_variance.py>` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_ensemble_plot_bias_variance.py: ============================================================ å•一估计器与袋装法:åå·®-方差分解 ============================================================ 这个例å说明并比较了å•一估计器与袋装集æˆçš„æœŸæœ›å‡æ–¹è¯¯å·®çš„åå·®-方差分解。 在回归ä¸ï¼Œä¼°è®¡å™¨çš„æœŸæœ›å‡æ–¹è¯¯å·®å¯ä»¥åˆ†è§£ä¸ºåå·®ã€æ–¹å·®å’Œå™ªå£°ã€‚在回归问题的数æ®é›†ä¸Šå¹³å‡ï¼Œå差项衡é‡ä¼°è®¡å™¨çš„预测与问题的最佳å¯èƒ½ä¼°è®¡å™¨ï¼ˆå³è´å¶æ–¯æ¨¡åž‹ï¼‰çš„预测之间的平å‡å·®å¼‚。方差项衡é‡ä¼°è®¡å™¨åœ¨ä¸åŒéšæœºå®žä¾‹ä¸Šçš„é¢„æµ‹çš„å¯å˜æ€§ã€‚æ¯ä¸ªé—®é¢˜å®žä¾‹åœ¨ä¸‹æ–‡ä¸è®°ä¸ºâ€œLSâ€ï¼Œå³â€œå¦ä¹ æ ·æœ¬â€ã€‚最åŽï¼Œå™ªå£°è¡¡é‡çš„æ˜¯ç”±äºŽæ•°æ®çš„å¯å˜æ€§è€Œå¯¼è‡´çš„误差ä¸ä¸å¯å‡å°‘的部分。 左上图展示了在一个玩具1då›žå½’é—®é¢˜çš„éšæœºæ•°æ®é›†LS(è“点)上è®ç»ƒçš„å•ä¸€å†³ç–æ ‘的预测(深红色)。它还展示了在问题的其他(且ä¸åŒçš„ï¼‰éšæœºæŠ½å–实例LS上è®ç»ƒçš„å…¶ä»–å•ä¸€å†³ç–æ ‘的预测(浅红色)。直观上,这里的方差项对应于å•个估计器的预测æŸï¼ˆæµ…红色)的宽度。方差越大,预测对è®ç»ƒé›†çš„å°å˜åŒ–è¶Šæ•æ„Ÿã€‚å差项对应于估计器的平å‡é¢„测(é’色)与最佳å¯èƒ½æ¨¡åž‹ï¼ˆæ·±è“色)之间的差异。在这个问题上,我们å¯ä»¥è§‚察到å差相当低(é’色和è“è‰²æ›²çº¿å½¼æ¤æŽ¥è¿‘ï¼‰ï¼Œè€Œæ–¹å·®å¾ˆå¤§ï¼ˆçº¢è‰²æŸç›¸å½“宽)。 左下图绘制了å•ä¸€å†³ç–æ ‘çš„æœŸæœ›å‡æ–¹è¯¯å·®çš„é€ç‚¹åˆ†è§£ã€‚它确认了å差项(è“色)很低,而方差很大(绿色)。它还展示了误差的噪声部分,æ£å¦‚é¢„æœŸçš„é‚£æ ·ï¼Œå™ªå£°ä¼¼ä¹Žæ˜¯æ’定的,大约为 `0.01` 。 å³å›¾å¯¹åº”于相åŒçš„å›¾ï¼Œä½†ä½¿ç”¨çš„æ˜¯å†³ç–æ ‘的袋装集æˆã€‚在这两幅图ä¸ï¼Œæˆ‘们å¯ä»¥è§‚察到å差项比å‰ä¸€ç§æƒ…况更大。在å³ä¸Šå›¾ä¸ï¼Œå¹³å‡é¢„测(é’色)与最佳å¯èƒ½æ¨¡åž‹ä¹‹é—´çš„å·®å¼‚æ›´å¤§ï¼ˆä¾‹å¦‚ï¼Œæ³¨æ„ `x=2` 附近的å移)。在å³ä¸‹å›¾ä¸ï¼Œå差曲线也比左下图ç¨é«˜ã€‚然而,在方差方é¢ï¼Œé¢„æµ‹æŸæ›´çª„,这表明方差较低。确实,如å³ä¸‹å›¾æ‰€ç¡®è®¤çš„,方差项(绿色)比å•ä¸€å†³ç–æ ‘低。总体而言,åå·®-方差分解ä¸å†ç›¸åŒã€‚袋装法的æƒè¡¡æ›´å¥½ï¼šå¯¹æ•°æ®é›†çš„自助法副本拟åˆçš„å¤šä¸ªå†³ç–æ ‘的平å‡ç•¥å¾®å¢žåŠ äº†å差项,但å…许大幅å‡å°‘æ–¹å·®ï¼Œä»Žè€Œå¯¼è‡´è¾ƒä½Žçš„æ€»ä½“å‡æ–¹è¯¯å·®ï¼ˆæ¯”较下图ä¸çš„红色曲线)。脚本输出也è¯å®žäº†è¿™ä¸€ç›´è§‰ã€‚袋装集æˆçš„æ€»è¯¯å·®ä½ŽäºŽå•ä¸€å†³ç–æ ‘的总误差,这ç§å·®å¼‚ç¡®å®žä¸»è¦æ¥è‡ªäºŽæ–¹å·®çš„å‡å°‘。 有关åå·®-方差分解的更多详细信æ¯ï¼Œè¯·å‚è§[1]_的第7.3节。 å‚考文献 ---------- .. [1] T. Hastie, R. Tibshirani å’Œ J. Friedman, "统计å¦ä¹ è¦ç´ ", Springer, 2009. .. GENERATED FROM PYTHON SOURCE LINES 25-149 .. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_bias_variance_001.png :alt: Tree, Bagging(Tree) :srcset: /auto_examples/ensemble/images/sphx_glr_plot_bias_variance_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Tree: 0.0255 (error) = 0.0003 (bias^2) + 0.0152 (var) + 0.0098 (noise) Bagging(Tree): 0.0196 (error) = 0.0004 (bias^2) + 0.0092 (var) + 0.0098 (noise) | .. code-block:: Python # 作者:scikit-learn å¼€å‘者 # SPDX-License-Identifier:BSD-3-Clause import matplotlib.pyplot as plt import numpy as np from sklearn.ensemble import BaggingRegressor from sklearn.tree import DecisionTreeRegressor # Settings n_repeat = 50 # Number of iterations for computing expectations n_train = 50 # Size of the training set n_test = 1000 # Size of the test set noise = 0.1 # Standard deviation of the noise np.random.seed(0) # 更改æ¤é¡¹ä»¥æŽ¢ç´¢å…¶ä»–估计器的åå·®-æ–¹å·®åˆ†è§£ã€‚å¯¹äºŽé«˜æ–¹å·®çš„ä¼°è®¡å™¨ï¼ˆä¾‹å¦‚ï¼Œå†³ç–æ ‘或KNN),这应该效果很好,但对于低方差的估计器(例如,线性模型),效果较差。 estimators = [ ("Tree", DecisionTreeRegressor()), ("Bagging(Tree)", BaggingRegressor(DecisionTreeRegressor())), ] n_estimators = len(estimators) # ç”Ÿæˆæ•°æ® def f(x): x = x.ravel() return np.exp(-(x**2)) + 1.5 * np.exp(-((x - 2) ** 2)) def generate(n_samples, noise, n_repeat=1): X = np.random.rand(n_samples) * 10 - 5 X = np.sort(X) if n_repeat == 1: y = f(X) + np.random.normal(0.0, noise, n_samples) else: y = np.zeros((n_samples, n_repeat)) for i in range(n_repeat): y[:, i] = f(X) + np.random.normal(0.0, noise, n_samples) X = X.reshape((n_samples, 1)) return X, y X_train = [] y_train = [] for i in range(n_repeat): X, y = generate(n_samples=n_train, noise=noise) X_train.append(X) y_train.append(y) X_test, y_test = generate(n_samples=n_test, noise=noise, n_repeat=n_repeat) plt.figure(figsize=(10, 8)) # 循环é历估计器以进行比较 for n, (name, estimator) in enumerate(estimators): # Compute predictions y_predict = np.zeros((n_test, n_repeat)) for i in range(n_repeat): estimator.fit(X_train[i], y_train[i]) y_predict[:, i] = estimator.predict(X_test) # 凿–¹è¯¯å·®çš„å差平方 + 方差 + 噪声分解 y_error = np.zeros(n_test) for i in range(n_repeat): for j in range(n_repeat): y_error += (y_test[:, j] - y_predict[:, i]) ** 2 y_error /= n_repeat * n_repeat y_noise = np.var(y_test, axis=1) y_bias = (f(X_test) - np.mean(y_predict, axis=1)) ** 2 y_var = np.var(y_predict, axis=1) print( "{0}: {1:.4f} (error) = {2:.4f} (bias^2) " " + {3:.4f} (var) + {4:.4f} (noise)".format( name, np.mean(y_error), np.mean(y_bias), np.mean(y_var), np.mean(y_noise) ) ) # Plot figures plt.subplot(2, n_estimators, n + 1) plt.plot(X_test, f(X_test), "b", label="$f(x)$") plt.plot(X_train[0], y_train[0], ".b", label="LS ~ $y = f(x)+noise$") for i in range(n_repeat): if i == 0: plt.plot(X_test, y_predict[:, i], "r", label=r"$\^y(x)$") else: plt.plot(X_test, y_predict[:, i], "r", alpha=0.05) plt.plot(X_test, np.mean(y_predict, axis=1), "c", label=r"$\mathbb{E}_{LS} \^y(x)$") plt.xlim([-5, 5]) plt.title(name) if n == n_estimators - 1: plt.legend(loc=(1.1, 0.5)) plt.subplot(2, n_estimators, n_estimators + n + 1) plt.plot(X_test, y_error, "r", label="$error(x)$") plt.plot(X_test, y_bias, "b", label="$bias^2(x)$"), plt.plot(X_test, y_var, "g", label="$variance(x)$"), plt.plot(X_test, y_noise, "c", label="$noise(x)$") plt.xlim([-5, 5]) plt.ylim([0, 0.1]) if n == n_estimators - 1: plt.legend(loc=(1.1, 0.5)) plt.subplots_adjust(right=0.75) plt.show() .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.558 seconds) .. _sphx_glr_download_auto_examples_ensemble_plot_bias_variance.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/ensemble/plot_bias_variance.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_bias_variance.ipynb <plot_bias_variance.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_bias_variance.py <plot_bias_variance.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_bias_variance.zip <plot_bias_variance.zip>` .. include:: plot_bias_variance.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_