Note

Go to the end to download the full example code. or to run this example in your browser via Binder

检测错误权衡（DET）曲线#

在这个示例中，我们比较了两种二元分类多阈值指标：接收者操作特性（ROC）和检测错误权衡（DET）。为此，我们评估了两个不同的分类器在相同分类任务中的表现。

ROC 曲线在 Y 轴上显示真正例率（TPR），在 X 轴上显示假正例率（FPR）。这意味着图的左上角是“理想”点——FPR 为零，TPR 为一。

DET 曲线是 ROC 曲线的一种变体，其中在 y 轴上绘制假负例率（FNR）而不是 TPR。在这种情况下，原点（左下角）是“理想”点。

Note

有关 ROC 曲线的更多信息，请参见 sklearn.metrics.roc_curve 。
有关 DET 曲线的更多信息，请参见 sklearn.metrics.det_curve 。
这个示例大致基于分类器比较示例。
有关估计 ROC 曲线和 ROC-AUC 方差的示例，请参见接收者操作特性（ROC）与交叉验证。

生成合成数据#

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X, y = make_classification(
    n_samples=1_000,
    n_features=2,
    n_redundant=0,
    n_informative=2,
    random_state=1,
    n_clusters_per_class=1,
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)

定义分类器#

我们在这里定义了两个不同的分类器。目标是使用ROC和DET曲线在不同阈值下直观地比较它们的统计性能。选择这些分类器并没有特别的原因，scikit-learn中还有其他可用的分类器。

from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sklearn.svm import LinearSVC

classifiers = {
    "Linear SVM": make_pipeline(StandardScaler(), LinearSVC(C=0.025)),
    "Random Forest": RandomForestClassifier(
        max_depth=5, n_estimators=10, max_features=1
    ),
}

绘制ROC和DET曲线#

DET 曲线通常在正态偏差尺度上绘制。为了实现这一点，DET 显示将使用 scipy.stats.norm 转换由 det_curve 返回的错误率和轴刻度。

import matplotlib.pyplot as plt

from sklearn.metrics import DetCurveDisplay, RocCurveDisplay

fig, [ax_roc, ax_det] = plt.subplots(1, 2, figsize=(11, 5))

for name, clf in classifiers.items():
    clf.fit(X_train, y_train)

    RocCurveDisplay.from_estimator(clf, X_test, y_test, ax=ax_roc, name=name)
    DetCurveDisplay.from_estimator(clf, X_test, y_test, ax=ax_det, name=name)

ax_roc.set_title("Receiver Operating Characteristic (ROC) curves")
ax_det.set_title("Detection Error Tradeoff (DET) curves")

ax_roc.grid(linestyle="--")
ax_det.grid(linestyle="--")

plt.legend()
plt.show()

Receiver Operating Characteristic (ROC) curves, Detection Error Tradeoff (DET) curves

请注意，使用DET曲线比使用ROC曲线更容易直观地评估不同分类算法的整体性能。由于ROC曲线是在线性尺度上绘制的，不同的分类器通常在图的很大一部分上看起来相似，而在图的左上角差异最大。另一方面，由于DET曲线在正态偏差尺度上表示为直线，它们往往整体上更容易区分，并且感兴趣的区域覆盖了图的很大一部分。

DET 曲线直接反馈检测错误权衡，以帮助进行操作点分析。然后用户可以决定他们愿意接受的 FNR 以换取 FPR（反之亦然）。

Total running time of the script: (0 minutes 0.085 seconds)

Related examples

接收者操作特性（ROC）与交叉验证

多分类接收者操作特性（ROC）

ROC 曲线与可视化 API

scikit-learn 0.22 版本发布亮点

Gallery generated by Sphinx-Gallery