fowlkes_mallows_score#

sklearn.metrics.fowlkes_mallows_score(labels_true, labels_pred, *, sparse=False)#

测量一组点的两个聚类之间的相似性。

Added in version 0.18.

Fowlkes-Mallows指数（FMI）定义为精确度和召回率的几何平均值:

FMI = TP / sqrt((TP + FP) * (TP + FN))

其中 TP 是 真阳性 的数量（即在 labels_true 和 labels_pred 中都属于同一簇的点对的数量）， FP 是 假阳性 的数量（即在 labels_true 中属于同一簇但在 labels_pred 中不属于同一簇的点对的数量）， FN 是 假阴性 的数量（即在 labels_pred 中属于同一簇但在 labels_True 中不属于同一簇的点对的数量）。

分数范围从0到1。高分表示两个聚类之间的相似性很好。

更多信息请参阅用户指南。

Parameters:

labels_truearray-like of shape (n_samples,), dtype=int: 数据的聚类，分为不相交的子集。
labels_predarray-like of shape (n_samples,), dtype=int: 数据的聚类，分为不相交的子集。
sparsebool, default=False: 使用稀疏矩阵内部计算列联矩阵。

Returns:

scorefloat: 生成的Fowlkes-Mallows分数。

References

[1]

E. B. Fowkles和C. L. Mallows, 1983. “一种比较两个层次聚类的方法”。美国统计协会杂志

[2]

Wikipedia关于Fowlkes-Mallows指数的条目

Examples

完美的标签既是同质的又是完整的，因此得分为1.0:

>>> from sklearn.metrics.cluster import fowlkes_mallows_score
>>> fowlkes_mallows_score([0, 0, 1, 1], [0, 0, 1, 1])
1.0
>>> fowlkes_mallows_score([0, 0, 1, 1], [1, 1, 0, 0])
1.0

如果类成员完全分散在不同的簇中，分配是完全随机的，因此FMI为零:

>>> fowlkes_mallows_score([0, 0, 0, 0], [0, 1, 2, 3])
0.0