lift_score: 分类和关联规则挖掘的提升得分

评分函数用于计算LIFT指标,即在测试数据集中,正确预测的正例与实际正例之间的比率。

> `from mlxtend.evaluate import lift_score` 

概览

在分类的背景下,提升 [1] 比较模型预测与随机生成的预测。提升通常与增益与提升图表结合使用,作为一种视觉辅助工具 [2]。例如,假设客户响应的基准为10%,则当使用预测模型时,提升值为3将对应于30%的客户响应。请注意,提升 的范围为 $\lbrack 0, \infty \rbrack$。

计算提升有多种策略,下面我们将通过经典的混淆矩阵演示提升分数的计算。例如,假设以下预测和目标标签,其中“1”是正类:

然后,我们的混淆矩阵将如下所示:

Based on the confusion matrix above, with "1" as positive label, we compute lift as follows:

$$ \text{lift} = \frac{(TP/(TP+FP)}{(TP+FN)/(TP+TN+FP+FN)} $$

Plugging in the actual values from the example above, we arrive at the following lift value:

$$ \frac{2/(2+1)}{(2+4)/(2+3+1+4)} = 1.1111111111111112 $$

An alternative way to computing lift is by using the support metric [3]:

$$ \text{lift} = \frac{\text{support}(\text{true labels} \cap \text{prediction})}{\text{support}(\text{true labels}) \times \text{support}(\text{prediction})}, $$

Support is $x / N$, where $x$ is the number of incidences of an observation and $N$ is the total number of samples in the datset. $\text{true labels} \cap \text{prediction}$ are the true positives, $true labels$ are true positives plus false negatives, and $prediction$ are true positives plus false positives. Plugging the values from our example into the equation above, we arrive at:

$$ \frac{2/10}{(6/10 \times 3/10)} = 1.1111111111111112 $$

参考文献

示例 1 - 计算提升度

这个示例展示了如何使用“概述”部分中的示例,基本使用lift_score函数。

import numpy as np
from mlxtend.evaluate import lift_score

y_target =    np.array([0, 0, 1, 0, 0, 1, 1, 1, 1, 1])
y_predicted = np.array([1, 0, 1, 0, 0, 0, 0, 1, 0, 0])

lift_score(y_target, y_predicted)

1.1111111111111112

示例 2 - 在 GridSearch 中使用 lift_score

lift_score 函数还可以与 scikit-learn 对象一起使用,例如 GridSearch

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer

# 创建自定义评分器
lift_scorer = make_scorer(lift_score)


iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=123)

hyperparameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                   {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

clf = GridSearchCV(SVC(), hyperparameters, cv=10,
                   scoring=lift_scorer)
clf.fit(X_train, y_train)

print(clf.best_score_)
print(clf.best_params_)

3.0
{'gamma': 0.001, 'kernel': 'rbf', 'C': 1000}

API

lift_score(y_target, y_predicted, binary=True, positive_label=1)

Lift measures the degree to which the predictions of a classification model are better than randomly-generated predictions.

The in terms of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), the lift score is computed as: [ TP/(TP+FN) ] / [ (TP+FP) / (TP+TN+FP+FN) ]

Parameters

Returns

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/lift_score/