Note

Go to the end to download the full example code. or to run this example in your browser via Binder

使用K均值的颜色量化#

对颐和园（中国）图像执行逐像素的矢量量化（VQ），将显示图像所需的颜色数量从96,615种独特颜色减少到64种，同时保留整体外观质量。

在此示例中，像素在3D空间中表示，并使用K均值找到64个颜色簇。在图像处理文献中，从K均值获得的代码簿（簇中心）称为调色板。使用单字节最多可以表示256种颜色，而RGB编码每个像素需要3个字节。例如，GIF文件格式使用这种调色板。

为了比较，还显示了使用随机代码簿（随机选择颜色）的量化图像。

Fitting model on a small sub-sample of the data
done in 0.010s.
Predicting color indices on the full image (k-means)
done in 0.031s.
Predicting color indices on the full image (random)
done in 0.012s.

# 作者：scikit-learn 开发者
# SPDX 许可证标识符：BSD-3-Clause

from time import time

import matplotlib.pyplot as plt
import numpy as np

from sklearn.cluster import KMeans
from sklearn.datasets import load_sample_image
from sklearn.metrics import pairwise_distances_argmin
from sklearn.utils import shuffle

n_colors = 64

# 加载颐和园照片
#
#
china = load_sample_image("china.jpg")

# 转换为浮点数，而不是默认的8位整数编码。除以255很重要，这样plt.imshow在处理浮点数据时才能正常工作（需要在[0-1]范围内）。
china = np.array(china, dtype=np.float64) / 255

# 加载图像并转换为二维 numpy 数组。
w, h, d = original_shape = tuple(china.shape)
assert d == 3
image_array = np.reshape(china, (w * h, d))

print("Fitting model on a small sub-sample of the data")
t0 = time()
image_array_sample = shuffle(image_array, random_state=0, n_samples=1_000)
kmeans = KMeans(n_clusters=n_colors, random_state=0).fit(image_array_sample)
print(f"done in {time() - t0:0.3f}s.")

# 获取所有点的标签
print("Predicting color indices on the full image (k-means)")
t0 = time()
labels = kmeans.predict(image_array)
print(f"done in {time() - t0:0.3f}s.")


codebook_random = shuffle(image_array, random_state=0, n_samples=n_colors)
print("Predicting color indices on the full image (random)")
t0 = time()
labels_random = pairwise_distances_argmin(codebook_random, image_array, axis=0)
print(f"done in {time() - t0:0.3f}s.")


def recreate_image(codebook, labels, w, h):
    """从代码簿和标签中重建（压缩的）图像"""
    return codebook[labels].reshape(w, h, -1)


# 显示所有结果，并列出原始图像
plt.figure(1)
plt.clf()
plt.axis("off")
plt.title("Original image (96,615 colors)")
plt.imshow(china)

plt.figure(2)
plt.clf()
plt.axis("off")
plt.title(f"Quantized image ({n_colors} colors, K-Means)")
plt.imshow(recreate_image(kmeans.cluster_centers_, labels, w, h))

plt.figure(3)
plt.clf()
plt.axis("off")
plt.title(f"Quantized image ({n_colors} colors, Random)")
plt.imshow(recreate_image(codebook_random, labels_random, w, h))
plt.show()