Note

Go to the end to download the full example code. or to run this example in your browser via Binder

使用核PCA进行图像去噪#

本示例展示了如何使用 KernelPCA 来对图像进行去噪。简而言之，我们利用在 fit 过程中学习到的近似函数来重建原始图像。

我们将把结果与使用 PCA 进行的精确重建进行比较。

我们将使用USPS数字数据集来重现[1]_第4节中展示的内容。

参考文献

# 作者：Guillaume Lemaitre <guillaume.lemaitre@inria.fr>
# 许可证：BSD 3条款

通过 OpenML 加载数据集#

USPS数字数据集在OpenML上可用。我们使用:func:~sklearn.datasets.fetch_openml 来获取这个数据集。此外，我们将数据集标准化，使所有像素值都在(0, 1)范围内。

import numpy as np

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

X, y = fetch_openml(data_id=41082, as_frame=False, return_X_y=True)
X = MinMaxScaler().fit_transform(X)

这个想法是学习一个PCA基（有核和无核）在噪声图像上，然后使用这些模型来重建和去噪这些图像。

因此，我们将数据集分为训练集和测试集，其中训练集包含1000个样本，测试集包含100个样本。这些图像是无噪声的，我们将用它们来评估去噪方法的效率。此外，我们创建了原始数据集的副本并添加了高斯噪声。

这个应用程序的目的是通过在一些未受损的图像上学习PCA基来展示我们可以对受损图像进行去噪。我们将使用PCA和基于核的PCA来解决这个问题。

X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, random_state=0, train_size=1_000, test_size=100
)

rng = np.random.RandomState(0)
noise = rng.normal(scale=0.25, size=X_test.shape)
X_test_noisy = X_test + noise

noise = rng.normal(scale=0.25, size=X_train.shape)
X_train_noisy = X_train + noise

此外，我们将创建一个辅助函数，通过绘制测试图像来定性评估图像重建。

import matplotlib.pyplot as plt


def plot_digits(X, title):
    """小助手函数用于绘制100个数字。"""
    fig, axs = plt.subplots(nrows=10, ncols=10, figsize=(8, 8))
    for img, ax in zip(X, axs.ravel()):
        ax.imshow(img.reshape((16, 16)), cmap="Greys")
        ax.axis("off")
    fig.suptitle(title, fontsize=24)

此外，我们将使用均方误差（MSE）来定量评估图像重建。

首先让我们看看无噪声和有噪声图像之间的区别。我们将检查测试集来了解这一点。

plot_digits(X_test, "Uncorrupted test images")
plot_digits(
    X_test_noisy, f"Noisy test images\nMSE: {np.mean((X_test - X_test_noisy) ** 2):.2f}"
)

学习 `PCA` 基础#

我们现在可以使用线性PCA和使用径向基函数（RBF）核的核PCA来学习我们的PCA基。

from sklearn.decomposition import PCA, KernelPCA

pca = PCA(n_components=32, random_state=42)
kernel_pca = KernelPCA(
    n_components=400,
    kernel="rbf",
    gamma=1e-3,
    fit_inverse_transform=True,
    alpha=5e-3,
    random_state=42,
)

pca.fit(X_train_noisy)
_ = kernel_pca.fit(X_train_noisy)

重建和去噪测试图像#

现在，我们可以对噪声测试集进行变换和重构。由于我们使用的成分比原始特征的数量少，因此我们将得到原始集合的近似值。实际上，通过丢弃在PCA中解释方差最少的成分，我们希望去除噪声。在核PCA中也有类似的思路；然而，我们期望得到更好的重构效果，因为我们使用非线性核来学习PCA基，并使用核岭回归来学习映射函数。

X_reconstructed_kernel_pca = kernel_pca.inverse_transform(
    kernel_pca.transform(X_test_noisy)
)
X_reconstructed_pca = pca.inverse_transform(pca.transform(X_test_noisy))

plot_digits(X_test, "Uncorrupted test images")
plot_digits(
    X_reconstructed_pca,
    f"PCA reconstruction\nMSE: {np.mean((X_test - X_reconstructed_pca) ** 2):.2f}",
)
plot_digits(
    X_reconstructed_kernel_pca,
    (
        "Kernel PCA reconstruction\n"
        f"MSE: {np.mean((X_test - X_reconstructed_kernel_pca) ** 2):.2f}"
    ),
)

PCA的均方误差（MSE）比核PCA低。然而，定性分析可能不会偏向PCA而不是核PCA。我们观察到，核PCA能够去除背景噪声并提供更平滑的图像。

然而，需要注意的是，使用核PCA进行去噪的结果将取决于参数 n_components 、 gamma 和 alpha 。

Total running time of the script: (0 minutes 4.743 seconds)

Related examples

核主成分分析

增量PCA

使用概率PCA和因子分析（FA）进行模型选择

特征缩放的重要性

Gallery generated by Sphinx-Gallery

使用核PCA进行图像去噪#

通过 OpenML 加载数据集#

学习 PCA 基础#

重建和去噪测试图像#

学习 `PCA` 基础#