lift_score: 分类和关联规则挖掘的提升得分

评分函数用于计算LIFT指标，即在测试数据集中，正确预测的正例与实际正例之间的比率。

> `from mlxtend.evaluate import lift_score`

概览

在分类的背景下，提升 [1] 比较模型预测与随机生成的预测。提升通常与增益与提升图表结合使用，作为一种视觉辅助工具 [2]。例如，假设客户响应的基准为10%，则当使用预测模型时，提升值为3将对应于30%的客户响应。请注意，提升的范围为 $\lbrack 0, \infty \rbrack$。

计算提升有多种策略，下面我们将通过经典的混淆矩阵演示提升分数的计算。例如，假设以下预测和目标标签，其中“1”是正类：

$\text{真实标签}: [0, 0, 1, 0, 0, 1, 1, 1, 1, 1]$
$\text{预测}: [1, 0, 1, 0, 0, 0, 0, 1, 0, 0]$

然后，我们的混淆矩阵将如下所示：

Based on the confusion matrix above, with "1" as positive label, we compute lift as follows:

$$ \text{lift} = \frac{(TP/(TP+FP)}{(TP+FN)/(TP+TN+FP+FN)} $$

Plugging in the actual values from the example above, we arrive at the following lift value:

$$ \frac{2/(2+1)}{(2+4)/(2+3+1+4)} = 1.1111111111111112 $$

An alternative way to computing lift is by using the support metric [3]:

$$ \text{lift} = \frac{\text{support}(\text{true labels} \cap \text{prediction})}{\text{support}(\text{true labels}) \times \text{support}(\text{prediction})}, $$

Support is $x / N$, where $x$ is the number of incidences of an observation and $N$ is the total number of samples in the datset. $\text{true labels} \cap \text{prediction}$ are the true positives, $true labels$ are true positives plus false negatives, and $prediction$ are true positives plus false positives. Plugging the values from our example into the equation above, we arrive at:

$$ \frac{2/10}{(6/10 \times 3/10)} = 1.1111111111111112 $$

参考文献

[1] S. Brin, R. Motwani, J. D. Ullman, 和 S. Tsur. 市场篮子数据的动态项集计数和蕴含规则. 在 ACM SIGMOD 国际数据管理会议（ACM SIGMOD '97）中，页面265-276, 1997年.
[2] https://www3.nd.edu/~busiforc/Lift_chart.html
[3] https://zh.wikipedia.org/wiki/%E5%85%B1%E5%90%88%E8%A1%8C%E4%B8%8E%E5%9B%9E%E5%BA%94%E5%8A%9F%E8%83%BD#%E6%94%AF%E6%8C%81

示例 1 - 计算提升度

这个示例展示了如何使用“概述”部分中的示例，基本使用lift_score函数。

import numpy as np
from mlxtend.evaluate import lift_score

y_target =    np.array([0, 0, 1, 0, 0, 1, 1, 1, 1, 1])
y_predicted = np.array([1, 0, 1, 0, 0, 0, 0, 1, 0, 0])

lift_score(y_target, y_predicted)

1.1111111111111112

示例 2 - 在 `GridSearch` 中使用 `lift_score`

lift_score 函数还可以与 scikit-learn 对象一起使用，例如 GridSearch：

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer

# 创建自定义评分器
lift_scorer = make_scorer(lift_score)


iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=123)

hyperparameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                   {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

clf = GridSearchCV(SVC(), hyperparameters, cv=10,
                   scoring=lift_scorer)
clf.fit(X_train, y_train)

print(clf.best_score_)
print(clf.best_params_)

3.0
{'gamma': 0.001, 'kernel': 'rbf', 'C': 1000}

API

lift_score(y_target, y_predicted, binary=True, positive_label=1)

Lift measures the degree to which the predictions of a classification model are better than randomly-generated predictions.

The in terms of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), the lift score is computed as: [ TP/(TP+FN) ] / [ (TP+FP) / (TP+TN+FP+FN) ]

Parameters

y_target : array-like, shape=[n_samples]

True class labels.
y_predicted : array-like, shape=[n_samples]

Predicted class labels.
binary : bool (default: True)

Maps a multi-class problem onto a binary, where the positive class is 1 and all other classes are 0.
positive_label : int (default: 0)

Class label of the positive class.

Returns

score : float

Lift score in the range [0, $\infty$]

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/lift_score/

lift_score: 分类和关联规则挖掘的提升得分

概览

参考文献

示例 1 - 计算提升度

示例 2 - 在 GridSearch 中使用 lift_score

API

示例 2 - 在 `GridSearch` 中使用 `lift_score`