评分：计算各种性能指标

一个用于计算各种不同性能指标的函数。

> 从 mlxtend.evaluate 导入评分

概述

混淆矩阵

混淆矩阵（或错误矩阵）是总结分类器在二分类任务中表现的一种方式。这个方形矩阵由列和行组成，列出实例的数量作为绝对或相对的“实际类别”与“预测类别”比例。

设 $P$ 为类别 1 的标签，$N$ 为第二类的标签或在多类情况下所有不是类别 1 的类别的标签。

错误与准确性

预测错误（ERR）和准确性（ACC）都提供有关被错误分类的样本数量的一般信息。错误可以理解为所有错误预测的总和除以总预测数量，准确性则是将正确预测的总和除以总预测数量。

$$ERR = \frac{FP + FN}{FP + FN + TP + TN} = 1 - ACC$$

$$ACC = \frac{TP + TN}{FP + FN + TP + TN} = 1 - ERR$$

真阳性和假阳性率

真正例率 (TPR) 和 假正例率 (FPR) 是特别适用于类别不平衡问题的性能指标。在 垃圾邮件分类 中，例如，我们当然主要关心的是检测和过滤 垃圾邮件。然而，减少被错误分类为 垃圾邮件 的信息数量 (假正例) 也很重要：一个人错过了一条重要信息的情况被认为比一个人邮箱中出现几条 垃圾邮件 的情况“更糟”。与 假正例率 (FPR) 相对，真正例率 提供了有关正确识别的正例（或相关）样本在总正例池中的比例的有用信息。

$$FPR = \frac{FP}{N} = \frac{FP}{FP + TN}$$

$$TPR = \frac{TP}{P} = \frac{TP}{FN + TP}$$

精确率、召回率和F1-分数

精确度 (PRE) 和 召回率 (REC) 是在 信息技术 中更常用的指标，与 假阳性 和 真阳性率 相关。实际上，召回率 与 真阳性率 同义，有时也称为 敏感性。F$_1$-分数可以理解为 精确度 和 召回率 的结合。

$$PRE = \frac{真正例}{真正例 + 假正例}$$

$$REC = TPR = \frac{真正例}{正类} = \frac{真正例}{假负例 + 真正例}$$

$$F_1 = 2 \cdot \frac{PRE \cdot REC}{PRE + REC}$$

敏感性和特异性

灵敏度（SEN）与召回率和真阳性率同义，而特异度（SPC）与真阴性率同义——灵敏度衡量阳性的恢复率，而特异度相应地衡量阴性的恢复率。

$$SEN = TPR = REC = \frac{TP}{P} = \frac{TP}{FN + TP}$$

$$SPC = TNR =\frac{TN}{N} = \frac{TN}{FP + TN}$$

马修斯相关系数

Matthews correlation coefficient (MCC) was first formulated by Brian W. Matthews [3] in 1975 to assess the performance of protein secondary structure predictions. The MCC can be understood as a specific case of a linear correlation coefficient (Pearson's R) for a binary classification setting and is considered as especially useful in unbalanced class settings. The previous metrics take values in the range between 0 (worst) and 1 (best), whereas the MCC is bounded between the range 1 (perfect correlation between ground truth and predicted outcome) and -1 (inverse or negative correlation) -- a value of 0 denotes a random prediction.

$$MCC = \frac{ TP \times TN - FP \times FN } {\sqrt{ (TP + FP) ( TP + FN ) ( TN + FP ) ( TN + FN ) } }$$

每类平均准确率

“总体”准确率定义为所有样本中正确预测的数量（真正 TP 和真负 TN）：

$$ACC = \frac{TP + TN}{n}$$

在二分类设置中：

在多分类的情况下，我们可以将准确率的计算概括为所有真实预测（对角线）与所有样本 n 的比例。

$$ACC = \frac{T}{n}$$

考虑一个包含3个类别（C0, C1, C2）的多分类问题

假设我们的模型做出了以下预测：

我们计算准确率为：

$$ACC = \frac{3 + 50 + 18}{90} \approx 0.79$$

现在，为了计算每类的平均准确率，我们分别计算每个类别标签的二元准确率；即如果类别 1 是正类，则类别 0 和 2 都被视为负类。

$$APC\;ACC = \frac{83/90 + 71/90 + 78/90}{3} \approx 0.86$$

参考文献

[1] S. Raschka. 二元分类系统的一般性能指标概述. 计算研究报告 (CoRR), abs/1410.5330, 2014.
[2] Cyril Goutte 和 Eric Gaussier. 精确度、召回率和 F 值的概率解释及其对评估的影响. 在《信息检索进展》中，页码 345–359. Springer, 2005.
[3] Brian W Matthews. T4 噬菌体溶菌酶的预测与观察的次级结构比较. 生物化学与生物物理学杂志 (BBA)- 蛋白质结构, 405(2):442–451, 1975.

示例 1 - 分类错误

from mlxtend.evaluate import scoring

y_targ = [1, 1, 1, 0, 0, 2, 0, 3]
y_pred = [1, 0, 1, 0, 0, 2, 1, 3]
res = scoring(y_target=y_targ, y_predicted=y_pred, metric='error')

print('Error: %s%%' % (res * 100))

Error: 25.0%

API

scoring(y_target, y_predicted, metric='error', positive_label=1, unique_labels='auto')

Compute a scoring metric for supervised learning.

Parameters

y_target : array-like, shape=[n_values]

True class labels or target values.
y_predicted : array-like, shape=[n_values]

Predicted class labels or target values.
metric : str (default: 'error')

Performance metric: 'accuracy': (TP + TN)/(FP + FN + TP + TN) = 1-ERR

'average per-class accuracy': Average per-class accuracy

'average per-class error': Average per-class error

'error': (TP + TN)/(FP+ FN + TP + TN) = 1-ACC

'false_positive_rate': FP/N = FP/(FP + TN)

'true_positive_rate': TP/P = TP/(FN + TP)

'true_negative_rate': TN/N = TN/(FP + TN)

'precision': TP/(TP + FP)

'recall': equal to 'true_positive_rate'

'sensitivity': equal to 'true_positive_rate' or 'recall'

'specificity': equal to 'true_negative_rate'

'f1': 2 * (PRE * REC)/(PRE + REC)

'matthews_corr_coef': (TPTN - FPFN) / (sqrt{(TP + FP)( TP + FN )( TN + FP )( TN + FN )})

Where: [TP: True positives, TN = True negatives,

TN: True negatives, FN = False negatives]
positive_label : int (default: 1)

Label of the positive class for binary classification metrics.
unique_labels : str or array-like (default: 'auto')

If 'auto', deduces the unique class labels from y_target

Returns

score : float

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/scoring/