mcnemar_tables: McNemar检验和Cochran's Q检验的列联表
计算McNemar检验和Cochran Q检验的2x2列联表的函数
> `from mlxtend.evaluate import mcnemar_tables`
概述
应急表
一个 2x2 列联表在 McNemar 检验中被使用(mlxtend.evaluate.mcnemar
),它是比较两个不同模型的有用工具。与典型的混淆矩阵不同,这个表比较两个模型之间的表现,而不是显示单个模型预测的假阳性、真阳性、假阴性和真阴性的数量:
例如,考虑到两个模型的准确率分别为99.7%和99.6%,一个2x2列联表可以为模型选择提供进一步的见解。
在子图A和B中,两个模型的预测准确率如下:
- 模型1的准确率:9,960 / 10,000 = 99.6%
- 模型2的准确率:9,970 / 10,000 = 99.7%
现在,在子图A中,我们可以看到模型2有11个预测正确,而模型1预测错误。反之,模型2有1个预测正确,而模型1预测错误。因此,基于这个11:1的比例,我们可以得出结论,模型2的表现显著优于模型1。然而,在子图B中,比例是25:15,这对选择哪个模型更好不那么明确。
参考文献
- McNemar, Quinn, 1947. "关于相关比例或百分比差异的抽样误差的说明". Psychometrika. 12 (2): 153–157.
- Edwards AL: 关于检验相关比例之间差异显著性的“连续性修正”的说明. Psychometrika. 1948, 13 (3): 185-187. 10.1007/BF02289261.
- https://zh.wikipedia.org/wiki/%E7%89%B9%E6%AE%8A%E4%B8%8E%E6%9C%AA%E7%BB%9F%E4%B9%89%E5%86%85%E5%BF%83%E6%A0%BC%E8%AF%81%E6%98%8E
示例 1 - 单个 2x2 列联表
import numpy as np
from mlxtend.evaluate import mcnemar_tables
y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0])
y_mod1 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0])
tb = mcnemar_tables(y_true,
y_mod0,
y_mod1)
tb
{'model_0 vs model_1': array([[ 4., 1.],
[ 2., 3.]])}
为了通过matplotlib可视化(并更好地解释)列联表,我们可以使用checkerboard_plot
函数:
from mlxtend.plotting import checkerboard_plot
import matplotlib.pyplot as plt
brd = checkerboard_plot(tb['model_0 vs model_1'],
figsize=(3, 3),
fmt='%d',
col_labels=['model 2 wrong', 'model 2 right'],
row_labels=['model 1 wrong', 'model 1 right'])
plt.show()
示例 2 - 多个 2x2 列联表
如果向 mcnemar_tables
函数提供了超过两个模型,则将为每对模型创建一个 2x2 交叉表:
import numpy as np
from mlxtend.evaluate import mcnemar_tables
y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0])
y_mod1 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0])
y_mod2 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 1, 0])
tb = mcnemar_tables(y_true,
y_mod0,
y_mod1,
y_mod2)
for key, value in tb.items():
print(key, '\n', value, '\n')
model_0 vs model_1
[[ 4. 1.]
[ 2. 3.]]
model_0 vs model_2
[[ 4. 2.]
[ 2. 2.]]
model_1 vs model_2
[[ 5. 1.]
[ 0. 4.]]
API
mcnemar_tables(y_target, y_model_predictions)*
Compute multiple 2x2 contigency tables for McNemar's test or Cochran's Q test.
Parameters
-
y_target
: array-like, shape=[n_samples]True class labels as 1D NumPy array.
-
y_model_predictions
: array-like, shape=[n_samples]Predicted class labels for a model.
Returns
-
tables
: dictDictionary of NumPy arrays with shape=[2, 2]. Each dictionary key names the two models to be compared based on the order the models were passed as
*y_model_predictions
. The number of dictionary entries is equal to the number of pairwise combinations between the m models, i.e., "m choose 2."For example the following target array (containing the true labels) and 3 models
- y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
- y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0])
- y_mod1 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0])
- y_mod2 = np.array([0, 1, 1, 1, 0, 1, 0, 0, 0, 0])
would result in the following dictionary:
{'model_0 vs model_1': array([[ 4., 1.], [ 2., 3.]]), 'model_0 vs model_2': array([[ 3., 0.], [ 3., 4.]]), 'model_1 vs model_2': array([[ 3., 0.], [ 2., 5.]])}
Each array is structured in the following way:
- tb[0, 0]: # of samples that both models predicted correctly
- tb[0, 1]: # of samples that model a got right and model b got wrong
- tb[1, 0]: # of samples that model b got right and model a got wrong
- tb[1, 1]: # of samples that both models predicted incorrectly
Examples
For usage examples, please see
https://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/
ython