ndcg_score#

sklearn.metrics.ndcg_score(y_true, y_score, *, k=None, sample_weight=None, ignore_ties=False)#

计算归一化折损累积增益。

按照预测分数诱导的顺序对真实分数进行求和，在应用对数折扣后，然后除以最佳可能分数（理想DCG，为完美排序所得）以获得一个介于 0和1之间的分数。

该排序指标在 y_score 将真实标签排高时返回高值。

Parameters:

y_true形如 (n_samples, n_labels) 的类数组: 多标签分类的真实目标，或待排序实体的真实分数。 y_true 中的负值可能导致输出不在0和1之间。
y_score形如 (n_samples, n_labels) 的类数组: 目标分数，可以是概率估计、置信值，或非阈值决策的度量（如某些分类器返回的“decision_function”）。
kint, 默认=None: 仅考虑排序中最高的k个分数。如果为 None ，使用所有输出。
sample_weight形如 (n_samples,) 的类数组, 默认=None: 样本权重。如果为 None ，所有样本赋予相同权重。
ignore_tiesbool, 默认=False: 假设y_score中没有平局（如果y_score是连续的，这可能是情况）以提高效率。

Returns:

normalized_discounted_cumulative_gain[0., 1.] 之间的浮点数: 所有样本的平均NDCG分数。

See also

dcg_score: 折损累积增益（未归一化）。

<https://en.wikipedia.org/wiki/Discounted_cumulative_gain>`_

Jarvelin, K., & Kekalainen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-446.

Wang, Y., Wang, L., Li, Y., He, D., Chen, W., & Liu, T. Y. (2013, May). A theoretical analysis of NDCG ranking measures. In Proceedings of the 26th Annual Conference on Learning Theory (COLT 2013)

McSherry, F., & Najork, M. (2008, March). Computing information retrieval performance measures efficiently in the presence of tied scores. In European conference on information retrieval (pp. 414-421). Springer, Berlin, Heidelberg.

Examples

>>> import numpy as np
>>> from sklearn.metrics import ndcg_score
>>> # 我们有一些查询的某些答案的真实相关性：
>>> true_relevance = np.asarray([[10, 0, 0, 1, 5]])
>>> # 我们预测了一些分数（相关性）用于答案
>>> scores = np.asarray([[.1, .2, .3, 4, 70]])
>>> ndcg_score(true_relevance, scores)
0.69...
>>> scores = np.asarray([[.05, 1.1, 1., .5, .0]])
>>> ndcg_score(true_relevance, scores)
0.49...
>>> # 我们可以设置k来截断求和；只有前k个答案贡献。
>>> ndcg_score(true_relevance, scores, k=4)
0.35...
>>> # 归一化考虑了k，所以一个完美的答案
>>> # 仍然会得到1.0
>>> ndcg_score(true_relevance, true_relevance, k=4)
1.0...
>>> # 现在我们在预测中有一些平局
>>> scores = np.asarray([[1, 0, 0, 0, 1]])
>>> # 默认情况下平局被平均，所以这里我们得到平均（归一化）
>>> # 我们最高预测的真实相关性：(10 / 10 + 5 / 10) / 2 = .75
>>> ndcg_score(true_relevance, scores, k=1)
0.75...
>>> # 我们可以选择忽略平局以获得更快的结果，但仅
>>> # 如果我们知道分数中没有平局，否则我们会得到
>>> # 错误的结果：
>>> ndcg_score(true_relevance,
...           scores, k=1, ignore_ties=True)
0.5...