准确率得分:计算标准、平衡和每类的准确率
一个函数用于计算基础分类准确率、每类准确率和每类平均准确率。
> 从 mlxtend.evaluate 导入准确率评分
示例 1 -- 标准准确度
“整体”准确率定义为正确预测(真正 TP 和 真负 TN)在所有样本 n 中的比例:
$$ACC = \frac{TP + TN}{n}$$
import numpy as np
from mlxtend.evaluate import accuracy_score
y_targ = [0, 0, 0, 1, 1, 1, 2, 2, 2]
y_pred = [1, 0, 0, 0, 1, 2, 0, 2, 2]
accuracy_score(y_targ, y_pred)
0.5555555555555556
示例 2 -- 按类别准确率
每个类别的准确率是一个类别(定义为 pos_label
)与数据集中所有剩余数据点之间的准确率。
import numpy as np
from mlxtend.evaluate import accuracy_score
y_targ = [0, 0, 0, 1, 1, 1, 2, 2, 2]
y_pred = [1, 0, 0, 0, 1, 2, 0, 2, 2]
std_acc = accuracy_score(y_targ, y_pred)
bin_acc = accuracy_score(y_targ, y_pred, method='binary', pos_label=1)
print(f'Standard accuracy: {std_acc*100:.2f}%')
print(f'Class 1 accuracy: {bin_acc*100:.2f}%')
Standard accuracy: 55.56%
Class 1 accuracy: 66.67%
示例 3 -- 每类平均准确率
概述
“整体”准确性定义为所有样本中正确预测的数量(真正 TP 和 真负 TN)与总样本数 n 的比值:
$$ACC = \frac{TP + TN}{n}$$
在二分类设置中:
在多类别设置中,我们可以将准确率的计算推广为所有真实预测(对角线)的比例与所有样本 n 之比。
$$ACC = \frac{T}{n}$$
考虑一个有 3 个类别 (C0, C1, C2) 的多类别问题
假设我们的模型做出了以下预测:
我们计算准确率为:
$$ACC = \frac{3 + 50 + 18}{90} \approx 0.79$$
现在,为了计算每类的平均准确率,我们分别计算每个类别标签的二元准确率;即,如果类别1是积极类别,则类别0和2都被视为消极类别。
$$APC\;ACC = \frac{83/90 + 71/90 + 78/90}{3} \approx 0.86$$
import numpy as np
from mlxtend.evaluate import accuracy_score
y_targ = [0, 0, 0, 1, 1, 1, 2, 0, 0]
y_pred = [1, 0, 0, 0, 1, 2, 0, 2, 1]
std_acc = accuracy_score(y_targ, y_pred)
bin_acc = accuracy_score(y_targ, y_pred, method='binary', pos_label=1)
avg_acc = accuracy_score(y_targ, y_pred, method='average')
print(f'Standard accuracy: {std_acc*100:.2f}%')
print(f'Class 1 accuracy: {bin_acc*100:.2f}%')
print(f'Average per-class accuracy: {avg_acc*100:.2f}%')
Standard accuracy: 33.33%
Class 1 accuracy: 55.56%
Average per-class accuracy: 55.56%
参考文献
- [1] S. Raschka. 二元分类系统的一般性能指标概述. 计算研究存储库 (CoRR), abs/1410.5330, 2014.
- [2] Cyril Goutte 和 Eric Gaussier. 精确度、召回率和F值的概率解释及其对评估的影响. 在《信息检索的进展》中,第345–359页. Springer, 2005.
- [3] Brian W Matthews. T4噬菌体溶菌酶的预测和观察的二级结构比较. 生物化学和生物物理学学报 (BBA)- 蛋白质结构, 405(2):442–451, 1975.
API
accuracy_score(y_target, y_predicted, method='standard', pos_label=1, normalize=True)
General accuracy function for supervised learning. Parameters
-
y_target
: array-like, shape=[n_values]True class labels or target values.
-
y_predicted
: array-like, shape=[n_values]Predicted class labels or target values.
-
method
: str, 'standard' by default.The chosen method for accuracy computation. If set to 'standard', computes overall accuracy. If set to 'binary', computes accuracy for class pos_label. If set to 'average', computes average per-class (balanced) accuracy. If set to 'balanced', computes the scikit-learn-style balanced accuracy.
-
pos_label
: str or int, 1 by default.The class whose accuracy score is to be reported. Used only when
method
is set to 'binary' -
normalize
: bool, True by default.If True, returns fraction of correctly classified samples. If False, returns number of correctly classified samples.
Returns
score: float
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/accuracy_score/