.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/cluster/plot_ward_structured_vs_unstructured.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_cluster_plot_ward_structured_vs_unstructured.py: =========================================================== 层次聚类:结构化 vs 非结构化 Ward =========================================================== 本示例构建了一个瑞士卷数据集,并对其位置进行层次聚类。 更多信息,请参见 :ref:`hierarchical_clustering` 。 在第一步中,层次聚类在没有结构连通性约束的情况下进行,仅基于距离;而在第二步中,聚类被限制在k-近邻图上:这是一个具有结构先验的层次聚类。 在没有连通性约束的情况下学习到的一些聚类不遵循瑞士卷的结构,并且跨越了流形的不同折叠。相反,当施加连通性约束时,聚类形成了瑞士卷的良好分区。 .. GENERATED FROM PYTHON SOURCE LINES 15-26 .. code-block:: Python # 作者:scikit-learn 开发者 # SPDX-License-Identifier: BSD-3-Clause import time as time # 以下导入是必需的 # 以便在 matplotlib < 3.2 中实现 3D 投影 import mpl_toolkits.mplot3d # noqa: F401 import numpy as np .. GENERATED FROM PYTHON SOURCE LINES 27-31 生成数据 ------------- 我们首先生成瑞士卷数据集。 .. GENERATED FROM PYTHON SOURCE LINES 31-41 .. code-block:: Python from sklearn.datasets import make_swiss_roll n_samples = 1500 noise = 0.05 X, _ = make_swiss_roll(n_samples, noise=noise) # 把它弄薄一点 # # X[:, 1] *= 0.5 .. GENERATED FROM PYTHON SOURCE LINES 42-46 计算聚类 ------------------ 我们执行层次聚类中的凝聚聚类,不受任何连通性约束。 .. GENERATED FROM PYTHON SOURCE LINES 46-57 .. code-block:: Python from sklearn.cluster import AgglomerativeClustering print("Compute unstructured hierarchical clustering...") st = time.time() ward = AgglomerativeClustering(n_clusters=6, linkage="ward").fit(X) elapsed_time = time.time() - st label = ward.labels_ print(f"Elapsed time: {elapsed_time:.2f}s") print(f"Number of points: {label.size}") .. rst-class:: sphx-glr-script-out .. code-block:: none Compute unstructured hierarchical clustering... Elapsed time: 0.02s Number of points: 1500 .. GENERATED FROM PYTHON SOURCE LINES 58-61 绘制结果 ----------- 绘制非结构化的层次聚类。 .. GENERATED FROM PYTHON SOURCE LINES 61-78 .. code-block:: Python import matplotlib.pyplot as plt fig1 = plt.figure() ax1 = fig1.add_subplot(111, projection="3d", elev=7, azim=-80) ax1.set_position([0, 0, 0.95, 1]) for l in np.unique(label): ax1.scatter( X[label == l, 0], X[label == l, 1], X[label == l, 2], color=plt.cm.jet(float(l) / np.max(label + 1)), s=20, edgecolor="k", ) _ = fig1.suptitle(f"Without connectivity constraints (time {elapsed_time:.2f}s)") .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_ward_structured_vs_unstructured_001.png :alt: Without connectivity constraints (time 0.02s) :srcset: /auto_examples/cluster/images/sphx_glr_plot_ward_structured_vs_unstructured_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 79-81 我们正在定义具有10个邻居的k-近邻算法 ----------------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 81-87 .. code-block:: Python from sklearn.neighbors import kneighbors_graph connectivity = kneighbors_graph(X, n_neighbors=10, include_self=False) .. GENERATED FROM PYTHON SOURCE LINES 88-92 计算聚类 ------------------ 我们再次在连通性约束下执行凝聚聚类。 .. GENERATED FROM PYTHON SOURCE LINES 92-103 .. code-block:: Python print("Compute structured hierarchical clustering...") st = time.time() ward = AgglomerativeClustering( n_clusters=6, connectivity=connectivity, linkage="ward" ).fit(X) elapsed_time = time.time() - st label = ward.labels_ print(f"Elapsed time: {elapsed_time:.2f}s") print(f"Number of points: {label.size}") .. rst-class:: sphx-glr-script-out .. code-block:: none Compute structured hierarchical clustering... Elapsed time: 0.04s Number of points: 1500 .. GENERATED FROM PYTHON SOURCE LINES 104-108 Plot result ----------- 绘制结构化的层次聚类。 .. GENERATED FROM PYTHON SOURCE LINES 108-124 .. code-block:: Python fig2 = plt.figure() ax2 = fig2.add_subplot(121, projection="3d", elev=7, azim=-80) ax2.set_position([0, 0, 0.95, 1]) for l in np.unique(label): ax2.scatter( X[label == l, 0], X[label == l, 1], X[label == l, 2], color=plt.cm.jet(float(l) / np.max(label + 1)), s=20, edgecolor="k", ) fig2.suptitle(f"With connectivity constraints (time {elapsed_time:.2f}s)") plt.show() .. image-sg:: /auto_examples/cluster/images/sphx_glr_plot_ward_structured_vs_unstructured_002.png :alt: With connectivity constraints (time 0.04s) :srcset: /auto_examples/cluster/images/sphx_glr_plot_ward_structured_vs_unstructured_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.196 seconds) .. _sphx_glr_download_auto_examples_cluster_plot_ward_structured_vs_unstructured.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/scikit-learn/scikit-learn/main?urlpath=lab/tree/notebooks/auto_examples/cluster/plot_ward_structured_vs_unstructured.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_ward_structured_vs_unstructured.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_ward_structured_vs_unstructured.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_ward_structured_vs_unstructured.zip ` .. include:: plot_ward_structured_vs_unstructured.recommendations .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_