bootstrap_point632_score: .632 和 .632+ 方法用于分类器评估

一个 .632 自助法的实现，用于评估监督学习算法。

> `from mlxtend.evaluate import bootstrap_point632_score`

概述

最初，自助法的目的是在未知基础分布且没有额外样本的情况下确定估计量的统计特性。现在，为了利用这种方法评估预测模型，例如分类和回归的假设，我们可能更倾向于采用稍微不同的自助法方法，即所谓的袋外(Out-Of-Bag, OOB)或留一法自助(Leave-One-Out Bootstrap, LOOB)技术。在这里，我们使用袋外样本作为测试集进行评估，而不是在训练数据上评估模型。袋外样本是未用于模型拟合的独特实例集，如下图所示[1]。

上图说明了从一个示例的十个样本数据集($X_1,X_2, ..., X_{10}$)中抽取的三个随机自助样本及其用于测试的袋外样本可能的样子。在实践中，Bradley Efron和Robert Tibshirani建议抽取50到200个自助样本，以获得可靠的估计[2]。

.632 自助法

在1983年，Bradley Efron描述了.632估计，这是为了改善上述自助交叉验证方法的悲观偏差[3]。在“经典”自助法中，悲观偏差可以归因于自助样本仅包含原始数据集中约63.2%的唯一样本。例如，我们可以计算从大小为n的数据集中抽取的给定样本没有被抽取为自助样本的概率：

$$P (\text{未被选择}) = \bigg(1 - \frac{1}{n}\bigg)^n,$$

当n趋向于无穷大时，这等价于$\frac{1}{e} \approx 0.368$。

反之，我们可以计算样本被选择的概率为$P (\text{被选择}) = 1 - \bigg(1 - \frac{1}{n}\bigg)^n \approx 0.632$，因此在合理大的数据集中我们大约会选择$0.632 \times n$ 个唯一样本作为自助训练集，并保留$0.368 \times n$个袋外样本用于每次迭代的测试。

现在，为了消除由于有放回抽样造成的偏差，Bradley Efron提出了前面提到的.632估计，其计算公式如下：

$$\text{ACC}{boot} = \frac{1}{b} \sum{i=1}^b \big(0.632 \cdot \text{ACC}{h, i} + 0.368 \cdot \text{ACC}{train}\big), $$

其中$\text{ACC}{train}$是对整个训练集计算的准确率，而$\text{ACC}{h, i}$是对袋外样本的准确率。

.632+ 自助法

现在，虽然.632自助法试图解决估计的悲观偏差，但在倾向于过拟合的模型中可能会出现乐观偏差，因此Bradley Efron和Robert Tibshirani提出了 .632+自助法（Efron和Tibshirani，1997）。我们不再使用固定的“权重”$\omega = 0.632$，而是计算权重$\gamma$，其公式为：

$$\omega = \frac{0.632}{1 - 0.368 \times R},$$

其中R 是相对过拟合率，

$$R = \frac{(-1) \times (\text{ACC}{h, i} - \text{ACC}{train})}{\gamma - (1 -\text{ACC}_{h, i})}.$$

（由于我们将$\omega$代入到计算$$ACC_{boot}$$的公式中，$$\text{ACC}{h, i}$$和$\text{ACC}{train}$仍然指的是第i次自助循环中的袋外准确率和整个训练集的准确率。）

此外，我们需要确定无信息率$\gamma$以计算R。例如，我们可以通过将模型拟合到一个包含样本$x_{i'}$和目标类别标签$y_{i}$的所有可能组合的数据集来计算$\gamma$ ——我们假设观测值和类别标签是独立的：

$$\gamma = \frac{1}{n^2} \sum_{i=1}^{n} \sum_{i '=1}^{n} L(y_{i}, f(x_{i '})).$$

另外，我们还可以通过以下方式估计无信息率$\gamma$：

$$\gamma = \sum_{k=1}^K p_k (1 - q_k),$$

其中$p_k$是数据集中观察到的类$k$样本的比例，而$q_k$是分类器在数据集中预测的类$k$样本的比例。

参考文献

[1] https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html
[2] Efron, Bradley, 和 Robert J. Tibshirani. 《引导法介绍》。CRC出版社，1994年。在数据管理（ACM SIGMOD '97）中，页码265-276，1997年。 [3] Efron, Bradley. 1983年。“估计预测规则的误差率：交叉验证的改进。” 《美国统计协会杂志》 78 (382): 316. doi:10.2307/2288636.
[4] Efron, Bradley, 和 Robert Tibshirani. 1997年。“交叉验证的改进：.632+引导法。” 《美国统计协会杂志》 92 (438): 548. doi:10.2307/2965703.

示例 1 -- 通过经典的袋外自助法评估模型的预测性能

bootstrap_point632_score 函数模拟了 scikit-learn 的 cross_val_score 的行为，下面展示了一个典型的使用示例：

from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from mlxtend.evaluate import bootstrap_point632_score
import numpy as np

iris = datasets.load_iris()
X = iris.data
y = iris.target
tree = DecisionTreeClassifier(random_state=123)

# 模型准确性
scores = bootstrap_point632_score(tree, X, y, method='oob')
acc = np.mean(scores)
print('Accuracy: %.2f%%' % (100*acc))


# 置信区间
lower = np.percentile(scores, 2.5)
upper = np.percentile(scores, 97.5)
print('95%% Confidence interval: [%.2f, %.2f]' % (100*lower, 100*upper))

Accuracy: 94.45%
95% Confidence interval: [87.71, 100.00]

示例2 -- 通过 .632 自助法评估模型的预测性能

from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from mlxtend.evaluate import bootstrap_point632_score
import numpy as np

iris = datasets.load_iris()
X = iris.data
y = iris.target
tree = DecisionTreeClassifier(random_state=123)

# 模型准确性
scores = bootstrap_point632_score(tree, X, y)
acc = np.mean(scores)
print('Accuracy: %.2f%%' % (100*acc))


# 置信区间
lower = np.percentile(scores, 2.5)
upper = np.percentile(scores, 97.5)
print('95%% Confidence interval: [%.2f, %.2f]' % (100*lower, 100*upper))

Accuracy: 96.42%
95% Confidence interval: [92.41, 100.00]

示例 3 -- 通过 .632+ 自助法评估模型的预测性能

from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from mlxtend.evaluate import bootstrap_point632_score
import numpy as np

iris = datasets.load_iris()
X = iris.data
y = iris.target
tree = DecisionTreeClassifier(random_state=123)

# 模型准确性
scores = bootstrap_point632_score(tree, X, y, method='.632+')
acc = np.mean(scores)
print('Accuracy: %.2f%%' % (100*acc))


# 置信区间
lower = np.percentile(scores, 2.5)
upper = np.percentile(scores, 97.5)
print('95%% Confidence interval: [%.2f, %.2f]' % (100*lower, 100*upper))

Accuracy: 96.29%
95% Confidence interval: [91.86, 98.92]

API

bootstrap_point632_score(estimator, X, y, n_splits=200, method='.632', scoring_func=None, predict_proba=False, random_seed=None, clone_estimator=True)

Implementation of the .632 [1] and .632+ [2] bootstrap for supervised learning

References:

- [1] Efron, Bradley. 1983. "Estimating the Error Rate
of a Prediction Rule: Improvement on Cross-Validation."
Journal of the American Statistical Association
78 (382): 316. doi:10.2307/2288636.
- [2] Efron, Bradley, and Robert Tibshirani. 1997.
"Improvements on Cross-Validation: The .632+ Bootstrap Method."
Journal of the American Statistical Association
92 (438): 548. doi:10.2307/2965703.

Parameters

estimator : object

An estimator for classification or regression that follows the scikit-learn API and implements "fit" and "predict" methods.
X : array-like

The data to fit. Can be, for example a list, or an array at least 2d.
y : array-like, optional, default: None

The target variable to try to predict in the case of supervised learning.
n_splits : int (default=200)

Number of bootstrap iterations. Must be larger than 1.
method : str (default='.632')

The bootstrap method, which can be either - 1) '.632' bootstrap (default) - 2) '.632+' bootstrap - 3) 'oob' (regular out-of-bag, no weighting) for comparison studies.
scoring_func : callable,

Score function (or loss function) with signature scoring_func(y, y_pred, **kwargs). If none, uses classification accuracy if the

estimator is a classifier and mean squared error if the estimator is a regressor.

predict_proba : bool

Whether to use the predict_proba function for the estimator argument. This is to be used in conjunction with scoring_func which takes in probability values instead of actual predictions. For example, if the scoring_func is :meth:sklearn.metrics.roc_auc_score, then use predict_proba=True. Note that this requires estimator to have predict_proba method implemented.
random_seed : int (default=None)

If int, random_seed is the seed used by the random number generator.
clone_estimator : bool (default=True)

Clones the estimator if true, otherwise fits the original.

Returns

scores : array of float, shape=(len(list(n_splits)),)

Array of scores of the estimator for each bootstrap replicate.

Examples

    >>> from sklearn import datasets, linear_model
    >>> from mlxtend.evaluate import bootstrap_point632_score
    >>> iris = datasets.load_iris()
    >>> X = iris.data
    >>> y = iris.target
    >>> lr = linear_model.LogisticRegression()
    >>> scores = bootstrap_point632_score(lr, X, y)
    >>> acc = np.mean(scores)
    >>> print('Accuracy:', acc)
    0.953023146884
    >>> lower = np.percentile(scores, 2.5)
    >>> upper = np.percentile(scores, 97.5)
    >>> print('95%% Confidence interval: [%.2f, %.2f]' % (lower, upper))
    95% Confidence interval: [0.90, 0.98]

    For more usage examples, please see
    https://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap_point632_score/