.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/cluster/plot_agglomerative_clustering_metrics.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_cluster_plot_agglomerative_clustering_metrics.py: 使用不同度量的凝聚聚类 =============================================== 演示不同度量对层次聚类的影响。 该示例旨在展示选择不同度量的效果。它应用于波形,可以看作是高维向量。实际上,度量之间的差异在高维中通常更为明显(特别是对于欧几里得和城市街区距离)。 我们从三组波形中生成数据。两个波形(波形1和波形2)是成比例的。余弦距离对数据的缩放是不变的,因此它无法区分这两个波形。因此,即使没有噪声,使用这种距离进行聚类也不会将波形1和2分开。 我们向这些波形添加观测噪声。我们生成非常稀疏的噪声:只有6%的时间点包含噪声。因此,这种噪声的l1范数(即“城市街区”距离)比其l2范数(“欧几里得”距离)小得多。这可以在类间距离矩阵中看到:对角线上的值,表征类的扩展,对于欧几里得距离来说比城市街区距离大得多。 当我们将聚类应用于数据时,我们发现聚类反映了距离矩阵中的情况。实际上,对于欧几里得距离,由于噪声,类分离不好,因此聚类不会将波形分开。对于城市街区距离,分离效果很好,波形类得以恢复。最后,余弦距离根本无法区分波形1和2,因此聚类将它们放在同一个簇中。 .. GENERATED FROM PYTHON SOURCE LINES 16-127 .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_001.png :alt: Ground truth :srcset: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_001.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_002.png :alt: Interclass cosine distances :srcset: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_002.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_003.png :alt: Interclass euclidean distances :srcset: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_003.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_004.png :alt: Interclass cityblock distances :srcset: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_004.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_005.png :alt: AgglomerativeClustering(metric=cosine) :srcset: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_005.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_006.png :alt: AgglomerativeClustering(metric=euclidean) :srcset: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_006.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_007.png :alt: AgglomerativeClustering(metric=cityblock) :srcset: /auto_examples/cluster/images/sphx_glr_plot_agglomerative_clustering_metrics_007.png :class: sphx-glr-multi-img .. code-block:: Python # 作者:scikit-learn 开发者 # SPDX-License-Identifier: BSD-3-Clause import matplotlib.patheffects as PathEffects import matplotlib.pyplot as plt import numpy as np from sklearn.cluster import AgglomerativeClustering from sklearn.metrics import pairwise_distances np.random.seed(0) # 生成波形数据 n_features = 2000 t = np.pi * np.linspace(0, 1, n_features) def sqr(x): return np.sign(np.cos(x)) X = list() y = list() for i, (phi, a) in enumerate([(0.5, 0.15), (0.5, 0.6), (0.3, 0.2)]): for _ in range(30): phase_noise = 0.01 * np.random.normal() amplitude_noise = 0.04 * np.random.normal() additional_noise = 1 - 2 * np.random.rand(n_features) # 使噪声稀疏 additional_noise[np.abs(additional_noise) < 0.997] = 0 X.append( 12 * ( (a + amplitude_noise) * (sqr(6 * (t + phi + phase_noise))) + additional_noise ) ) y.append(i) X = np.array(X) y = np.array(y) n_clusters = 3 labels = ("Waveform 1", "Waveform 2", "Waveform 3") colors = ["#f7bd01", "#377eb8", "#f781bf"] # Plot the ground-truth labelling plt.figure() plt.axes([0, 0, 1, 1]) for l, color, n in zip(range(n_clusters), colors, labels): lines = plt.plot(X[y == l].T, c=color, alpha=0.5) lines[0].set_label(n) plt.legend(loc="best") plt.axis("tight") plt.axis("off") plt.suptitle("Ground truth", size=20, y=1) # 绘制距离图 for index, metric in enumerate(["cosine", "euclidean", "cityblock"]): avg_dist = np.zeros((n_clusters, n_clusters)) plt.figure(figsize=(5, 4.5)) for i in range(n_clusters): for j in range(n_clusters): avg_dist[i, j] = pairwise_distances( X[y == i], X[y == j], metric=metric ).mean() avg_dist /= avg_dist.max() for i in range(n_clusters): for j in range(n_clusters): t = plt.text( i, j, "%5.3f" % avg_dist[i, j], verticalalignment="center", horizontalalignment="center", ) t.set_path_effects( [PathEffects.withStroke(linewidth=5, foreground="w", alpha=0.5)] ) plt.imshow(avg_dist, interpolation="nearest", cmap="cividis", vmin=0) plt.xticks(range(n_clusters), labels, rotation=45) plt.yticks(range(n_clusters), labels) plt.colorbar() plt.suptitle("Interclass %s distances" % metric, size=18, y=1) plt.tight_layout() # 绘制聚类结果 for index, metric in enumerate(["cosine", "euclidean", "cityblock"]): model = AgglomerativeClustering( n_clusters=n_clusters, linkage="average", metric=metric ) model.fit(X) plt.figure() plt.axes([0, 0, 1, 1]) for l, color in zip(np.arange(model.n_clusters), colors): plt.plot(X[model.labels_ == l].T, c=color, alpha=0.5) plt.axis("tight") plt.axis("off") plt.suptitle("AgglomerativeClustering(metric=%s)" % metric, size=20, y=1) plt.show() .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.528 seconds) .. _sphx_glr_download_auto_examples_cluster_plot_agglomerative_clustering_metrics.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/cluster/plot_agglomerative_clustering_metrics.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_agglomerative_clustering_metrics.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_agglomerative_clustering_metrics.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_agglomerative_clustering_metrics.zip ` .. include:: plot_agglomerative_clustering_metrics.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_