.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/decomposition/plot_pca_vs_fa_model_selection.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_decomposition_plot_pca_vs_fa_model_selection.py>`
        to download the full example code. or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_decomposition_plot_pca_vs_fa_model_selection.py:


===============================================================
使用概率PCA和因子分析（FA）进行模型选择
===============================================================

概率PCA和因子分析是概率模型。
其结果是可以使用新数据的似然性进行模型选择和协方差估计。
在这里，我们通过交叉验证比较了在低秩数据上添加同方差噪声（噪声方差
对于每个特征都是相同的）或异方差噪声（噪声方差对于每个特征是不同的）时的PCA和FA。
在第二步中，我们将模型的似然性与从收缩协方差估计器获得的似然性进行比较。

可以观察到，在同方差噪声下，FA和PCA都成功地恢复了低秩子空间的大小。
在这种情况下，PCA的似然性高于FA。然而，当存在异方差噪声时，PCA失败并高估了秩。
在适当的情况下（选择组件的数量），低秩模型的保留数据比收缩模型更有可能。

还比较了Thomas P. Minka在《PCA维度的自动选择》中提出的自动估计方法。
NIPS 2000: 598-604

.. GENERATED FROM PYTHON SOURCE LINES 20-24

.. code-block:: Python


    # 作者：scikit-learn 开发者
    # SPDX-License-Identifier: BSD-3-Clause


.. GENERATED FROM PYTHON SOURCE LINES 25-27

创建数据
-------

.. GENERATED FROM PYTHON SOURCE LINES 27-45

.. code-block:: Python


    import numpy as np
    from scipy import linalg

    n_samples, n_features, rank = 500, 25, 5
    sigma = 1.0
    rng = np.random.RandomState(42)
    U, _, _ = linalg.svd(rng.randn(n_features, n_features))
    X = np.dot(rng.randn(n_samples, rank), U[:, :rank].T)

    # 添加同方差噪声
    X_homo = X + sigma * rng.randn(n_samples, n_features)

    # 添加异方差噪声
    sigmas = sigma * rng.rand(n_features) + sigma / 2.0
    X_hetero = X + rng.randn(n_samples, n_features) * sigmas


.. GENERATED FROM PYTHON SOURCE LINES 46-48

拟合模型
--------

.. GENERATED FROM PYTHON SOURCE LINES 48-139

.. code-block:: Python


    import matplotlib.pyplot as plt

    from sklearn.covariance import LedoitWolf, ShrunkCovariance
    from sklearn.decomposition import PCA, FactorAnalysis
    from sklearn.model_selection import GridSearchCV, cross_val_score

    n_components = np.arange(0, n_features, 5)  # options for n_components


    def compute_scores(X):
        pca = PCA(svd_solver="full")
        fa = FactorAnalysis()

        pca_scores, fa_scores = [], []
        for n in n_components:
            pca.n_components = n
            fa.n_components = n
            pca_scores.append(np.mean(cross_val_score(pca, X)))
            fa_scores.append(np.mean(cross_val_score(fa, X)))

        return pca_scores, fa_scores


    def shrunk_cov_score(X):
        shrinkages = np.logspace(-2, 0, 30)
        cv = GridSearchCV(ShrunkCovariance(), {"shrinkage": shrinkages})
        return np.mean(cross_val_score(cv.fit(X).best_estimator_, X))


    def lw_score(X):
        return np.mean(cross_val_score(LedoitWolf(), X))


    for X, title in [(X_homo, "Homoscedastic Noise"), (X_hetero, "Heteroscedastic Noise")]:
        pca_scores, fa_scores = compute_scores(X)
        n_components_pca = n_components[np.argmax(pca_scores)]
        n_components_fa = n_components[np.argmax(fa_scores)]

        pca = PCA(svd_solver="full", n_components="mle")
        pca.fit(X)
        n_components_pca_mle = pca.n_components_

        print("best n_components by PCA CV = %d" % n_components_pca)
        print("best n_components by FactorAnalysis CV = %d" % n_components_fa)
        print("best n_components by PCA MLE = %d" % n_components_pca_mle)

        plt.figure()
        plt.plot(n_components, pca_scores, "b", label="PCA scores")
        plt.plot(n_components, fa_scores, "r", label="FA scores")
        plt.axvline(rank, color="g", label="TRUTH: %d" % rank, linestyle="-")
        plt.axvline(
            n_components_pca,
            color="b",
            label="PCA CV: %d" % n_components_pca,
            linestyle="--",
        )
        plt.axvline(
            n_components_fa,
            color="r",
            label="FactorAnalysis CV: %d" % n_components_fa,
            linestyle="--",
        )
        plt.axvline(
            n_components_pca_mle,
            color="k",
            label="PCA MLE: %d" % n_components_pca_mle,
            linestyle="--",
        )

        # 与其他协方差估计器进行比较
        plt.axhline(
            shrunk_cov_score(X),
            color="violet",
            label="Shrunk Covariance MLE",
            linestyle="-.",
        )
        plt.axhline(
            lw_score(X),
            color="orange",
            label="LedoitWolf MLE" % n_components_pca_mle,
            linestyle="-.",
        )

        plt.xlabel("nb of components")
        plt.ylabel("CV scores")
        plt.legend(loc="lower right")
        plt.title(title)

    plt.show()


.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /auto_examples/decomposition/images/sphx_glr_plot_pca_vs_fa_model_selection_001.png
         :alt: Homoscedastic Noise
         :srcset: /auto_examples/decomposition/images/sphx_glr_plot_pca_vs_fa_model_selection_001.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /auto_examples/decomposition/images/sphx_glr_plot_pca_vs_fa_model_selection_002.png
         :alt: Heteroscedastic Noise
         :srcset: /auto_examples/decomposition/images/sphx_glr_plot_pca_vs_fa_model_selection_002.png
         :class: sphx-glr-multi-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    best n_components by PCA CV = 5
    best n_components by FactorAnalysis CV = 5
    best n_components by PCA MLE = 5
    best n_components by PCA CV = 20
    best n_components by FactorAnalysis CV = 5
    best n_components by PCA MLE = 18


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.812 seconds)


.. _sphx_glr_download_auto_examples_decomposition_plot_pca_vs_fa_model_selection.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: binder-badge

      .. image:: images/binder_badge_logo.svg
        :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/decomposition/plot_pca_vs_fa_model_selection.ipynb
        :alt: Launch binder
        :width: 150 px

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_pca_vs_fa_model_selection.ipynb <plot_pca_vs_fa_model_selection.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_pca_vs_fa_model_selection.py <plot_pca_vs_fa_model_selection.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_pca_vs_fa_model_selection.zip <plot_pca_vs_fa_model_selection.zip>`


.. include:: plot_pca_vs_fa_model_selection.recommendations


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_