.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/covariance/plot_sparse_cov.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_covariance_plot_sparse_cov.py: ====================================== 稀疏逆协方差估计 ====================================== 使用GraphicalLasso估计器从少量样本中学习协方差和稀疏精度矩阵。 为了估计一个概率模型(例如高斯模型),估计精度矩阵,即逆协方差矩阵,与估计协方差矩阵同样重要。实际上,高斯模型是由精度矩阵参数化的。 为了处于有利的恢复条件下,我们从一个具有稀疏逆协方差矩阵的模型中采样数据。此外,我们确保数据不过于相关(限制精度矩阵的最大系数),并且精度矩阵中没有无法恢复的小系数。此外,使用少量观测值时,恢复相关矩阵比恢复协方差矩阵更容易,因此我们对时间序列进行缩放。 在这里,样本数量略大于维度数量,因此经验协方差仍然是可逆的。然而,由于观测值高度相关,经验协方差矩阵条件较差,因此其逆矩阵——经验精度矩阵——与真实值相差甚远。 如果我们使用l2收缩,如Ledoit-Wolf估计器,由于样本数量少,我们需要大量收缩。结果是,Ledoit-Wolf精度与真实精度相当接近,几乎是对角的,但失去了非对角结构。 l1惩罚估计器可以恢复部分非对角结构。它学习一个稀疏精度矩阵。它无法恢复精确的稀疏模式:它检测到太多的非零系数。然而,l1估计的最高非零系数对应于真实值中的非零系数。最后,l1精度估计的系数偏向零:由于惩罚,它们都小于相应的真实值,如图所示。 请注意,为了提高图形的可读性,精度矩阵的颜色范围进行了调整。未显示经验精度的全范围值。 GraphicalLasso的alpha参数通过GraphicalLassoCV中的内部交叉验证设置模型的稀疏性。如图2所示,计算交叉验证分数的网格在最大值附近迭代细化。 .. GENERATED FROM PYTHON SOURCE LINES 23-27 .. code-block:: Python # 作者:scikit-learn 开发者 # SPDX-License-Identifier:BSD-3-Clause .. GENERATED FROM PYTHON SOURCE LINES 28-30 生成数据 --------- .. GENERATED FROM PYTHON SOURCE LINES 30-53 .. code-block:: Python import numpy as np from scipy import linalg from sklearn.datasets import make_sparse_spd_matrix n_samples = 60 n_features = 20 prng = np.random.RandomState(1) prec = make_sparse_spd_matrix( n_features, alpha=0.98, smallest_coef=0.4, largest_coef=0.7, random_state=prng ) cov = linalg.inv(prec) d = np.sqrt(np.diag(cov)) cov /= d cov /= d[:, np.newaxis] prec *= d prec *= d[:, np.newaxis] X = prng.multivariate_normal(np.zeros(n_features), cov, size=n_samples) X -= X.mean(axis=0) X /= X.std(axis=0) .. GENERATED FROM PYTHON SOURCE LINES 54-56 估计协方差 ----------------------- .. GENERATED FROM PYTHON SOURCE LINES 56-69 .. code-block:: Python from sklearn.covariance import GraphicalLassoCV, ledoit_wolf emp_cov = np.dot(X.T, X) / n_samples model = GraphicalLassoCV() model.fit(X) cov_ = model.covariance_ prec_ = model.precision_ lw_cov_, _ = ledoit_wolf(X) lw_prec_ = linalg.inv(lw_cov_) .. GENERATED FROM PYTHON SOURCE LINES 70-72 绘制结果 ---------------- .. GENERATED FROM PYTHON SOURCE LINES 72-120 .. code-block:: Python import matplotlib.pyplot as plt plt.figure(figsize=(10, 6)) plt.subplots_adjust(left=0.02, right=0.98) # 绘制协方差 covs = [ ("Empirical", emp_cov), ("Ledoit-Wolf", lw_cov_), ("GraphicalLassoCV", cov_), ("True", cov), ] vmax = cov_.max() for i, (name, this_cov) in enumerate(covs): plt.subplot(2, 4, i + 1) plt.imshow( this_cov, interpolation="nearest", vmin=-vmax, vmax=vmax, cmap=plt.cm.RdBu_r ) plt.xticks(()) plt.yticks(()) plt.title("%s covariance" % name) # plot the precisions precs = [ ("Empirical", linalg.inv(emp_cov)), ("Ledoit-Wolf", lw_prec_), ("GraphicalLasso", prec_), ("True", prec), ] vmax = 0.9 * prec_.max() for i, (name, this_prec) in enumerate(precs): ax = plt.subplot(2, 4, i + 5) plt.imshow( np.ma.masked_equal(this_prec, 0), interpolation="nearest", vmin=-vmax, vmax=vmax, cmap=plt.cm.RdBu_r, ) plt.xticks(()) plt.yticks(()) plt.title("%s precision" % name) if hasattr(ax, "set_facecolor"): ax.set_facecolor(".7") else: ax.set_axis_bgcolor(".7") .. image-sg:: /auto_examples/covariance/images/sphx_glr_plot_sparse_cov_001.png :alt: Empirical covariance, Ledoit-Wolf covariance, GraphicalLassoCV covariance, True covariance, Empirical precision, Ledoit-Wolf precision, GraphicalLasso precision, True precision :srcset: /auto_examples/covariance/images/sphx_glr_plot_sparse_cov_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 121-132 .. code-block:: Python # plot the model selection metric plt.figure(figsize=(4, 3)) plt.axes([0.2, 0.15, 0.75, 0.7]) plt.plot(model.cv_results_["alphas"], model.cv_results_["mean_test_score"], "o-") plt.axvline(model.alpha_, color=".5") plt.title("Model selection") plt.ylabel("Cross-validation score") plt.xlabel("alpha") plt.show() .. image-sg:: /auto_examples/covariance/images/sphx_glr_plot_sparse_cov_002.png :alt: Model selection :srcset: /auto_examples/covariance/images/sphx_glr_plot_sparse_cov_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.242 seconds) .. _sphx_glr_download_auto_examples_covariance_plot_sparse_cov.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/covariance/plot_sparse_cov.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_sparse_cov.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_sparse_cov.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_sparse_cov.zip ` .. include:: plot_sparse_cov.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_