.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py:

====================================================================
K-Means 和 MiniBatchKMeans 聚类算法的比较
====================================================================

我们想要比较 MiniBatchKMeans 和 KMeans 的性能:
MiniBatchKMeans 更快,但会产生略有不同的结果(参见
:ref:`mini_batch_kmeans` )。

我们将首先使用 KMeans 对一组数据进行聚类,然后使用 MiniBatchKMeans 进行聚类,并绘制结果。
我们还将绘制在两个算法之间标签不同的点。

.. GENERATED FROM PYTHON SOURCE LINES 15-19

生成数据
---------

我们首先生成要进行聚类的数据斑点。

.. GENERATED FROM PYTHON SOURCE LINES 19-31 .. code-block:: Python import numpy as np from sklearn.datasets import make_blobs np.random.seed(0) batch_size = 45 centers = [[1, 1], [-1, -1], [1, -1]] n_clusters = len(centers) X, labels_true = make_blobs(n_samples=3000, centers=centers, cluster_std=0.7) .. GENERATED FROM PYTHON SOURCE LINES 32-34 计算KMeans聚类 ----------------- .. GENERATED FROM PYTHON SOURCE LINES 34-45 .. code-block:: Python import time from sklearn.cluster import KMeans k_means = KMeans(init="k-means++", n_clusters=3, n_init=10) t0 = time.time() k_means.fit(X) t_batch = time.time() - t0 .. GENERATED FROM PYTHON SOURCE LINES 46-48 计算使用 MiniBatchKMeans 的聚类 --------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 48-64 .. code-block:: Python from sklearn.cluster import MiniBatchKMeans mbk = MiniBatchKMeans( init="k-means++", n_clusters=3, batch_size=batch_size, n_init=10, max_no_improvement=10, verbose=0, ) t0 = time.time() mbk.fit(X) t_mini_batch = time.time() - t0 .. GENERATED FROM PYTHON SOURCE LINES 65-69 在集群之间建立一致性 ------------------------------------ 我们希望 MiniBatchKMeans 和 KMeans 算法中的同一簇具有相同的颜色。让我们将每个簇中心与最近的一个配对。 .. GENERATED FROM PYTHON SOURCE LINES 69-79 .. code-block:: Python from sklearn.metrics.pairwise import pairwise_distances_argmin k_means_cluster_centers = k_means.cluster_centers_ order = pairwise_distances_argmin(k_means.cluster_centers_, mbk.cluster_centers_) mbk_means_cluster_centers = mbk.cluster_centers_[order] k_means_labels = pairwise_distances_argmin(X, k_means_cluster_centers) mbk_means_labels = pairwise_distances_argmin(X, mbk_means_cluster_centers) .. GENERATED FROM PYTHON SOURCE LINES 80-82 绘制结果 -------------------- .. GENERATED FROM PYTHON SOURCE LINES 82-142

.. code-block:: Python


    import matplotlib.pyplot as plt

    fig = plt.figure(figsize=(8, 3))
    fig.subplots_adjust(left=0.02, right=0.98, bottom=0.05, top=0.9)
    colors = ["#4EACC5", "#FF9C34", "#4E9A06"]

    # KMeans
    ax = fig.add_subplot(1, 3, 1)
    for k, col in zip(range(n_clusters), colors):
        my_members = k_means_labels == k
        cluster_center = k_means_cluster_centers[k]
        ax.plot(X[my_members, 0], X[my_members, 1], "w", markerfacecolor=col, marker=".")
        ax.plot(
            cluster_center[0],
            cluster_center[1],
            "o",
            markerfacecolor=col,
            markeredgecolor="k",
            markersize=6,
        )
    ax.set_title("KMeans")
    ax.set_xticks(())
    ax.set_yticks(())
    plt.text(-3.5, 1.8, "train time: %.2fs\ninertia: %f" % (t_batch, k_means.inertia_))

    # MiniBatchKMeans
    ax = fig.add_subplot(1, 3, 2)
    for k, col in zip(range(n_clusters), colors):
        my_members = mbk_means_labels == k
        cluster_center = mbk_means_cluster_centers[k]
        ax.plot(X[my_members, 0], X[my_members, 1], "w", markerfacecolor=col, marker=".")
        ax.plot(
            cluster_center[0],
            cluster_center[1],
            "o",
            markerfacecolor=col,
            markeredgecolor="k",
            markersize=6,
        )
    ax.set_title("MiniBatchKMeans")
    ax.set_xticks(())
    ax.set_yticks(())
    plt.text(-3.5, 1.8, "train time: %.2fs\ninertia: %f" % (t_mini_batch, mbk.inertia_))

    # 将不同的数组初始化为全False
    different = mbk_means_labels == 4
    ax = fig.add_subplot(1, 3, 3)

    for k in range(n_clusters):
        different += (k_means_labels == k) != (mbk_means_labels == k)

    identical = np.logical_not(different)
    ax.plot(X[identical, 0], X[identical, 1], "w", markerfacecolor="#bbbbbb", marker=".")
    ax.plot(X[different, 0], X[different, 1], "w", markerfacecolor="m", marker=".")
    ax.set_title("Difference")
    ax.set_xticks(())
    ax.set_yticks(())

    plt.show()




.. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_mini_batch_kmeans_001.png
   :alt: KMeans, MiniBatchKMeans, Difference
   :srcset: /auto_examples/cluster/images/sphx_glr_plot_mini_batch_kmeans_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.120 seconds)