trustworthiness#
- sklearn.manifold.trustworthiness(X, X_embedded, *, n_neighbors=5, metric='euclidean')#
指示局部结构保留到何种程度。
可信度在[0, 1]之间。其定义为
\[T(k) = 1 - \frac{2}{nk (2n - 3k - 1)} \sum^n_{i=1} \sum_{j \in \mathcal{N}_{i}^{k}} \max(0, (r(i, j) - k))\]其中对于每个样本i,\(\mathcal{N}_{i}^{k}\) 是其在输出空间中的k个最近邻,而每个样本j是其在输入空间中的:math:
r(i, j)
-th最近邻。换句话说,输出空间中任何意外的最近邻会根据其在输入空间中的排名受到惩罚。- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features) or
(n_samples, n_samples) 如果度量为’precomputed’,X必须是一个平方距离矩阵。否则,每一行包含一个样本。
- X_embedded{array-like, sparse matrix} of shape (n_samples, n_components)
低维空间中的训练数据嵌入。
- n_neighborsint, default=5
将考虑的邻居数量。应少于
n_samples / 2
,以确保可信度在[0, 1]之间,如[1]中所述。否则将引发错误。- metricstr or callable, default=’euclidean’
用于计算原始输入空间中样本之间成对距离的度量。如果度量为’precomputed’,X必须是一个成对距离或平方距离矩阵。否则,查看
sklearn.pairwise.pairwise_distances
中的参数度量文档和sklearn.metrics.pairwise.PAIRWISE_DISTANCE_FUNCTIONS
中列出的度量。请注意,”cosine”度量使用:func:~sklearn.metrics.pairwise.cosine_distances
。Added in version 0.20.
- Returns:
- trustworthinessfloat
低维嵌入的可信度。
References
[1]Jarkko Venna and Samuel Kaski. 2001. Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study. In Proceedings of the International Conference on Artificial Neural Networks (ICANN ‘01). Springer-Verlag, Berlin, Heidelberg, 485-491.
[2]Laurens van der Maaten. Learning a Parametric Embedding by Preserving Local Structure. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5:384-391, 2009.
Examples
>>> from sklearn.datasets import make_blobs >>> from sklearn.decomposition import PCA >>> from sklearn.manifold import trustworthiness >>> X, _ = make_blobs(n_samples=100, n_features=10, centers=3, random_state=42) >>> X_embedded = PCA(n_components=2).fit_transform(X) >>> print(f"{trustworthiness(X, X_embedded, n_neighbors=5):.2f}") 0.92