.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/cluster/plot_mini_batch_kmeans.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py: ==================================================================== K-Means 和 MiniBatchKMeans 聚类算法的比较 ==================================================================== 我们想要比较 MiniBatchKMeans 和 KMeans 的性能: MiniBatchKMeans 更快,但会产生略有不同的结果(参见 :ref:`mini_batch_kmeans` )。 我们将首先使用 KMeans 对一组数据进行聚类,然后使用 MiniBatchKMeans 进行聚类,并绘制结果。 我们还将绘制在两个算法之间标签不同的点。 .. GENERATED FROM PYTHON SOURCE LINES 15-19 生成数据 --------- 我们首先生成要进行聚类的数据斑点。 .. GENERATED FROM PYTHON SOURCE LINES 19-31 .. code-block:: Python import numpy as np from sklearn.datasets import make_blobs np.random.seed(0) batch_size = 45 centers = [[1, 1], [-1, -1], [1, -1]] n_clusters = len(centers) X, labels_true = make_blobs(n_samples=3000, centers=centers, cluster_std=0.7) .. GENERATED FROM PYTHON SOURCE LINES 32-34 计算KMeans聚类 ----------------- .. GENERATED FROM PYTHON SOURCE LINES 34-45 .. code-block:: Python import time from sklearn.cluster import KMeans k_means = KMeans(init="k-means++", n_clusters=3, n_init=10) t0 = time.time() k_means.fit(X) t_batch = time.time() - t0 .. GENERATED FROM PYTHON SOURCE LINES 46-48 计算使用 MiniBatchKMeans 的聚类 --------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 48-64 .. code-block:: Python from sklearn.cluster import MiniBatchKMeans mbk = MiniBatchKMeans( init="k-means++", n_clusters=3, batch_size=batch_size, n_init=10, max_no_improvement=10, verbose=0, ) t0 = time.time() mbk.fit(X) t_mini_batch = time.time() - t0 .. GENERATED FROM PYTHON SOURCE LINES 65-69 在集群之间建立一致性 ------------------------------------ 我们希望 MiniBatchKMeans 和 KMeans 算法中的同一簇具有相同的颜色。让我们将每个簇中心与最近的一个配对。 .. GENERATED FROM PYTHON SOURCE LINES 69-79 .. code-block:: Python from sklearn.metrics.pairwise import pairwise_distances_argmin k_means_cluster_centers = k_means.cluster_centers_ order = pairwise_distances_argmin(k_means.cluster_centers_, mbk.cluster_centers_) mbk_means_cluster_centers = mbk.cluster_centers_[order] k_means_labels = pairwise_distances_argmin(X, k_means_cluster_centers) mbk_means_labels = pairwise_distances_argmin(X, mbk_means_cluster_centers) .. GENERATED FROM PYTHON SOURCE LINES 80-82 绘制结果 -------------------- .. GENERATED FROM PYTHON SOURCE LINES 82-142 .. code-block:: Python import matplotlib.pyplot as plt fig = plt.figure(figsize=(8, 3)) fig.subplots_adjust(left=0.02, right=0.98, bottom=0.05, top=0.9) colors = ["#4EACC5", "#FF9C34", "#4E9A06"] # KMeans ax = fig.add_subplot(1, 3, 1) for k, col in zip(range(n_clusters), colors): my_members = k_means_labels == k cluster_center = k_means_cluster_centers[k] ax.plot(X[my_members, 0], X[my_members, 1], "w", markerfacecolor=col, marker=".") ax.plot( cluster_center[0], cluster_center[1], "o", markerfacecolor=col, markeredgecolor="k", markersize=6, ) ax.set_title("KMeans") ax.set_xticks(()) ax.set_yticks(()) plt.text(-3.5, 1.8, "train time: %.2fs\ninertia: %f" % (t_batch, k_means.inertia_)) # MiniBatchKMeans ax = fig.add_subplot(1, 3, 2) for k, col in zip(range(n_clusters), colors): my_members = mbk_means_labels == k cluster_center = mbk_means_cluster_centers[k] ax.plot(X[my_members, 0], X[my_members, 1], "w", markerfacecolor=col, marker=".") ax.plot( cluster_center[0], cluster_center[1], "o", markerfacecolor=col, markeredgecolor="k", markersize=6, ) ax.set_title("MiniBatchKMeans") ax.set_xticks(()) ax.set_yticks(()) plt.text(-3.5, 1.8, "train time: %.2fs\ninertia: %f" % (t_mini_batch, mbk.inertia_)) # 将不同的数组初始化为全False different = mbk_means_labels == 4 ax = fig.add_subplot(1, 3, 3) for k in range(n_clusters): different += (k_means_labels == k) != (mbk_means_labels == k) identical = np.logical_not(different) ax.plot(X[identical, 0], X[identical, 1], "w", markerfacecolor="#bbbbbb", marker=".") ax.plot(X[different, 0], X[different, 1], "w", markerfacecolor="m", marker=".") ax.set_title("Difference") ax.set_xticks(()) ax.set_yticks(()) plt.show() .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_mini_batch_kmeans_001.png :alt: KMeans, MiniBatchKMeans, Difference :srcset: /auto_examples/cluster/images/sphx_glr_plot_mini_batch_kmeans_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.120 seconds) .. _sphx_glr_download_auto_examples_cluster_plot_mini_batch_kmeans.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/cluster/plot_mini_batch_kmeans.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_mini_batch_kmeans.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_mini_batch_kmeans.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_mini_batch_kmeans.zip ` .. include:: plot_mini_batch_kmeans.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_