Mlxtend.evaluate

mlxtend version: 0.23.1

BootstrapOutOfBag

BootstrapOutOfBag(n_splits=200, random_seed=None)

Parameters

n_splits : int (默认=200)

自助法迭代次数. 必须大于1.
random_seed : int (默认=None)

如果为整数,random_seed是随机数生成器使用的种子.

Returns

train_idx : ndarray

该次拆分的训练集索引.
test_idx : ndarray

该次拆分的测试集索引.

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/BootstrapOutOfBag/

Methods

get_n_splits(X=None, y=None, groups=None)

返回交叉验证器中的分割迭代次数

Parameters

X : 对象

始终被忽略,存在是为了与scikit-learn兼容.
y : 对象

始终被忽略,存在是为了与scikit-learn兼容.
groups : 对象

始终被忽略,存在是为了与scikit-learn兼容.

Returns

n_splits : int

返回交叉验证器中的分割迭代次数.

split(X, y=None, groups=None)

y : array-like 或 None (默认: None) 该参数不使用,仅作为兼容性参数包含在内,类似于 KFold 在 scikit-learn 中的用法.

groups : array-like 或 None (默认: None)

该参数不使用,仅作为兼容性参数包含在内,类似于 KFold 在 scikit-learn 中的用法.

GroupTimeSeriesSplit

GroupTimeSeriesSplit(test_size, train_size=None, n_splits=None, gap_size=0, shift_size=1, window_type='rolling')

时间序列分组交叉验证器.

Parameters

test_size : int

测试数据集的大小.
train_size : int (默认=None)

训练数据集的大小.
n_splits : int (默认=None)

分割的数量.
gap_size : int (默认=0)

训练数据集和测试数据集之间的间隔大小.
shift_size : int (默认=1)

为下一个折叠移动的步长.
window_type : str (默认="rolling")

窗口的类型.可能的值:"rolling", "expanding".

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/GroupTimeSeriesSplit/

Methods

get_n_splits(X=None, y=None, groups=None)

返回交叉验证器中的分割迭代次数.

Parameters

X : object

总是被忽略,仅为了兼容性而存在.
y : object

总是被忽略,仅为了兼容性而存在.
groups : object

总是被忽略,仅为了兼容性而存在.

Returns

n_splits : int

返回交叉验证器中的分割迭代次数.

split(X, y=None, groups=None)

生成用于将数据划分为训练集和测试集的索引.

Parameters

X : 类数组

训练数据.
y : 类数组, 默认为None

始终被忽略,存在以保持兼容性.
groups : 类数组, 默认为None

包含组名或序列号的数组.

Yields

train : ndarray

该划分的训练集索引.
test : ndarray

该划分的测试集索引.

PredefinedHoldoutSplit

PredefinedHoldoutSplit(valid_indices)

用于sklearn的GridSearchCV等的训练/验证集分割器.

使用用户指定的训练/验证集索引来分割数据集,
通过用户定义或随机索引将其分为训练集和验证集.

Parameters

valid_indices : 类数组, 形状 (num_examples,)

训练集中用于验证的训练样本的索引. 训练集中所有其他索引用于模型拟合的训练子集.

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/PredefinedHoldoutSplit/

Methods

get_n_splits(X=None, y=None, groups=None)

返回交叉验证器中的分割迭代次数

Parameters

X : 对象

总是被忽略,仅为了兼容性而存在.
y : 对象

总是被忽略,仅为了兼容性而存在.
groups : 对象

总是被忽略,仅为了兼容性而存在.

Returns

n_splits : 1

返回交叉验证器中的分割迭代次数. 总是返回1.

split(X, y, groups=None)

生成用于将数据划分为训练集和测试集的索引.

Parameters

X : 类数组, 形状 (样本数量, 特征数量)

训练数据,其中样本数量是样本的数量,特征数量是特征的数量.
y : 类数组, 形状 (样本数量,)

监督学习问题的目标变量. 分层是基于 y 标签进行的.
groups : 对象

始终被忽略,存在是为了兼容性.

Yields

train_index : ndarray

该划分的训练集索引.
valid_index : ndarray

该划分的验证集索引.

RandomHoldoutSplit

RandomHoldoutSplit(valid_size=0.5, random_seed=None, stratify=False)

用于sklearn的GridSearchCV等的训练/验证集分割器.

提供训练/验证集索引,以使用随机索引将数据集分割为训练/验证集.

Parameters

valid_size : float (默认值: 0.5)

被分配为验证样本的样本比例.1-valid_size将自动被分配为训练集样本.
random_seed : int (默认值: None)

用于将数据分割为训练集和验证集分区的随机种子.
stratify : bool (默认值: False)

是否执行分层分割,True或False.

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/RandomHoldoutSplit/

Methods

get_n_splits(X=None, y=None, groups=None)

返回交叉验证器中的分割迭代次数

Parameters

X : 对象

总是被忽略,仅为了兼容性而存在.
y : 对象

总是被忽略,仅为了兼容性而存在.
groups : 对象

总是被忽略,仅为了兼容性而存在.

Returns

n_splits : 1

返回交叉验证器中的分割迭代次数. 总是返回1.

split(X, y, groups=None)

生成用于将数据划分为训练集和测试集的索引.

Parameters

X : 类数组, 形状 (样本数量, 特征数量)

训练数据,其中样本数量是训练样本的数量,特征数量是特征的数量.
y : 类数组, 形状 (样本数量,)

监督学习问题的目标变量. 分层是基于 y 标签进行的.
groups : 对象

始终被忽略,存在是为了兼容性.

Yields

train_index : ndarray

该划分对应的训练集索引.
valid_index : ndarray

该划分对应的验证集索引.

accuracy_score

accuracy_score(y_target, y_predicted, method='standard', pos_label=1, normalize=True)

监督学习的通用准确率函数. Parameters

y_target : 类数组, shape=[n_values]

真实的类别标签或目标值.
y_predicted : 类数组, shape=[n_values]

预测的类别标签或目标值.
method : str, 默认为'standard'.

用于准确率计算的方法. 如果设置为'standard',计算总体准确率. 如果设置为'binary',计算pos_label类别的准确率. 如果设置为'average',计算每个类别的平均（平衡）准确率. 如果设置为'balanced',计算scikit-learn风格的平衡准确率.
pos_label : str 或 int, 默认为1.

要报告准确率的类别. 仅在method设置为'binary'时使用.
normalize : bool, 默认为True.

如果为True,返回正确分类样本的比例. 如果为False,返回正确分类样本的数量.

Returns

score: float

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/accuracy_score/

bias_variance_decomp

bias_variance_decomp(estimator, X_train, y_train, X_test, y_test, loss='0-1_loss', num_rounds=200, random_seed=None, fit_params)

estimator : 对象一个分类器或回归器对象或类,实现类似于scikit-learn API的fit和predict方法.

X_train : 类数组, shape=(样本数量, 特征数量)

用于抽取bootstrap样本以进行偏差-方差分解的训练数据集.
y_train : 类数组, shape=(样本数量)

与X_train样本相关联的目标（分类标签,回归情况下为连续值）.
X_test : 类数组, shape=(样本数量, 特征数量)

用于计算平均损失、偏差和方差的测试数据集.
y_test : 类数组, shape=(样本数量)

与X_test样本相关联的目标（分类标签,回归情况下为连续值）.
loss : str (默认='0-1_loss')

用于执行偏差-方差分解的损失函数. 目前允许的值为'0-1_loss'和'mse'.
num_rounds : int (默认=200)

用于执行偏差-方差分解的bootstrap轮数（从训练集中抽样）. 每个bootstrap样本的大小与原始训练集相同.
random_seed : int (默认=None)

用于偏差-方差分解中bootstrap抽样的随机种子.
fit_params : 额外参数

传递给估计器在拟合bootstrap样本时使用的.fit()函数的额外参数.

Returns

avg_expected_loss, avg_bias, avg_var : 返回平均预期损失、平均偏差和平均方差（均为浮点数）,

其中平均值是根据测试集中的数据点计算的.

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/

bootstrap

bootstrap(x, func, num_rounds=1000, ci=0.95, ddof=1, seed=None)

Implements the ordinary nonparametric bootstrap

Parameters

x : NumPy数组, shape=(n_samples, [n_columns])

一个一维或多维的数据记录数组
func :

用于计算统计量的函数,该统计量用于计算自助样本的复制值（从自助样本计算出的统计量）. 此函数必须返回一个标量值.例如,如果x是一维数组或向量,np.mean或np.median将是 func的可接受参数.
num_rounds : int (默认=1000)

要抽取的自助样本数量,每个自助样本的记录数与原始数据集相同.
ci : int (默认=0.95)

一个范围在(0, 1)之间的整数,表示计算置信区间的置信水平. 例如,ci=0.95（默认）将计算95%的置信区间从自助复制值中.
ddof : int

计算标准误差时使用的自由度修正值.
seed : int 或 None (默认=None)

用于生成自助样本的随机种子.

Returns

original, standard_error, (lower_ci, upper_ci) : tuple

返回原始样本的统计量(original), 估计的标准误差,以及相应的置信区间边界.

Examples

```
>>> from mlxtend.evaluate import bootstrap
>>> rng = np.random.RandomState(123)
>>> x = rng.normal(loc=5., size=100)
>>> original, std_err, ci_bounds = bootstrap(x,
...                                          num_rounds=1000,
...                                          func=np.mean,
...                                          ci=0.95,
...                                          seed=123)
>>> print('Mean: %.2f, SE: +/- %.2f, CI95: [%.2f, %.2f]' % (original,
...                                                         std_err,
...                                                         ci_bounds[0],
...                                                         ci_bounds[1]))
Mean: 5.03, SE: +/- 0.11, CI95: [4.80, 5.26]
>>>

更多使用示例,请参见
https://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap/





## bootstrap_point632_score

*bootstrap_point632_score(estimator, X, y, n_splits=200, method='.632', scoring_func=None, predict_proba=False, random_seed=None, clone_estimator=True, **fit_params)*

监督学习的.632 [1] 和 .632+ [2] 自助法实现

    参考文献:

    - [1] Efron, Bradley. 1983. "估计预测规则的错误率:改进交叉验证."
    美国统计协会杂志
    78 (382): 316. doi:10.2307/2288636.
    - [2] Efron, Bradley, 和 Robert Tibshirani. 1997.
    "改进交叉验证:.632+ 自助法."
    美国统计协会杂志
    92 (438): 548. doi:10.2307/2965703.

**Parameters**

- `estimator` : object

    用于分类或回归的估计器,
    遵循 scikit-learn API 并实现 "fit" 和 "predict" 方法.


- `X` : array-like

    要拟合的数据.例如,可以是一个列表,或至少二维的数组.


- `y` : array-like, 可选, 默认: None

    在监督学习的情况下,要尝试预测的目标变量.


- `n_splits` : int (默认=200)

    自助法的迭代次数.
    必须大于 1.


- `method` : str (默认='.632')

    自助法方法,可以是以下之一:
    - 1) '.632' 自助法 (默认)
    - 2) '.632+' 自助法
    - 3) 'oob' (常规的袋外样本,无加权)
    用于比较研究.


- `scoring_func` : callable,

    评分函数（或损失函数）,签名
``scoring_func(y, y_pred, **kwargs)``.
    如果没有提供,则使用分类准确率（如果估计器是分类器）

和均方误差（如果估计器是回归器）.


- `predict_proba` : bool

    是否使用 `estimator` 参数的 `predict_proba` 函数.
    这需要与 `scoring_func` 结合使用,后者接受概率值
    而不是实际预测值.
    例如,如果 scoring_func 是
    :meth:`sklearn.metrics.roc_auc_score`,则使用
    `predict_proba=True`.
    请注意,这要求 `estimator` 实现 `predict_proba` 方法.


- `random_seed` : int (默认=None)

    如果为整数,random_seed 是随机数生成器使用的种子.


- `clone_estimator` : bool (默认=True)

    如果为真,则克隆估计器,否则使用原始估计器进行拟合.


- `fit_params` : 额外参数

    在将估计器拟合到自助样本时,传递给估计器的 .fit() 函数的额外参数.

**Returns**

- `scores` : array of float, shape=(len(list(n_splits)),)

    估计器在每个自助样本上的得分数组.

**Examples**

>>> from sklearn import datasets, linear_model
>>> from mlxtend.evaluate import bootstrap_point632_score
>>> iris = datasets.load_iris()
>>> X = iris.data
>>> y = iris.target
>>> lr = linear_model.LogisticRegression()
>>> scores = bootstrap_point632_score(lr, X, y)
>>> acc = np.mean(scores)
>>> print('Accuracy:', acc)
0.953023146884
>>> lower = np.percentile(scores, 2.5)
>>> upper = np.percentile(scores, 97.5)
>>> print('95%% Confidence interval: [%.2f, %.2f]' % (lower, upper))
95% Confidence interval: [0.90, 0.98]

更多使用示例,请参见
https://rasbt.github.io/mlxtend/user_guide/evaluate/bootstrap_point632_score/

```

cochrans_q

cochrans_q(y_target, y_model_predictions)*

Cochran's Q检验用于比较2个或更多模型.

Parameters

y_target : 类数组, shape=[n_samples]

真实类别标签,为1维NumPy数组.
*y_model_predictions : 类数组, shape=[n_samples]

包含2个或更多数组的可变数量, 这些数组包含模型预测的类别标签, 为1维NumPy数组.

Returns

q, p : float 或 None, float

返回Q值（卡方值）和p值

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/cochrans_q/

combined_ftest_5x2cv

combined_ftest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None)

Implements the 5x2cv combined F test proposed by Alpaydin 1999, to compare the performance of two models.

Parameters

estimator1 : scikit-learn classifier or regressor
estimator2 : scikit-learn classifier or regressor
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples]

Target values.
scoring : str, callable, or None (default: None)

If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y); see https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information.
random_seed : int or None (default: None)

Random seed for creating the test/train splits.

Returns

f : float

The F-statistic
pvalue : float

Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/combined_ftest_5x2cv/

confusion_matrix

confusion_matrix(y_target, y_predicted, binary=False, positive_label=1)

计算混淆矩阵/列联表.

Parameters

y_target : 类数组, shape=[n_samples]

真实类别标签.
y_predicted : 类数组, shape=[n_samples]

预测类别标签.
binary : bool (默认: False)

将多类问题映射到二分类混淆矩阵,其中正类为1, 所有其他类别为0.
positive_label : int (默认: 1)

正类的类别标签.

Returns

mat : 类数组, shape=[n_classes, n_classes]

Examples

使用示例请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/confusion_matrix/

create_counterfactual

create_counterfactual(x_reference, y_desired, model, X_dataset, y_desired_proba=None, lammbda=0.1, random_seed=None)

实现Wachter等人在2017年提出的反事实方法

参考文献:

- Wachter, S., Mittelstadt, B., & Russell, C. (2017).
反事实解释无需打开黑箱:自动化决策与GDPR.Harv. JL & Tech., 31, 841.,
https://arxiv.org/abs/1711.00399

Parameters

x_reference : array-like, shape=[m_features]

需要解释的数据实例（训练样本）.
y_desired : int

x_reference所需的目标类别标签.
model : estimator

一个（scikit-learn）估计器,实现.predict()和/或predict_proba(). - 如果model支持predict_proba(),则默认使用它作为第一个损失项, (lambda * model.predict[_proba](x_counterfact) - y_desired[_proba])^2 - 否则,方法将回退到predict.
X_dataset : array-like, shape=[n_examples, m_features]

用于选择初始反事实作为优化过程起始值的（训练）数据集.
y_desired_proba : float (default: None)

一个在[0, 1]范围内的浮点数,表示y_desired所需的目标类别概率. - 如果y_desired_proba=None（默认）,第一个损失项为 (lambda * model(x_counterfact) - y_desired)^2,其中y_desired是类别标签 - 如果y_desired_proba不为None,第一个损失项为 (lambda * model(x_counterfact) - y_desired_proba)^2
lammbda : 第一个损失项的权重参数,

(lambda * model(x_counterfact) - y_desired[_proba])^2
random_seed : int (default=None)

如果为整数,random_seed是用于从X_dataset中选择初始反事实的随机数生成器的种子.

feature_importance_permutation

feature_importance_permutation(X, y, predict_method, metric, num_rounds=1, feature_groups=None, seed=None)

特征重要性插补通过排列重要性

Parameters

X : NumPy 数组, 形状 = [n_samples, n_features]

数据集, 其中 n_samples 是样本数量, n_features 是特征数量.
y : NumPy 数组, 形状 = [n_samples]

目标值.
predict_method : 预测函数

一个可调用的函数, 用于从 X 预测目标值.
metric : str, callable

用于通过排列评估特征重要性的指标.默认情况下, 对于分类器推荐使用字符串 'accuracy', 对于回归器推荐使用字符串 'r2'.可选地, 一个自定义评分函数 (例如, metric=scoring_func), 该函数接受两个参数, y_true 和 y_pred, 它们与 y 数组具有相似的形状.
num_rounds : int (默认=1)

特征列被排列以计算排列重要性的轮数.
feature_groups : list 或 None (默认=None)

可选参数, 用于将某些特征视为一组.例如 [1, 2, [3, 4, 5]], 这在可解释性方面很有用, 例如, 如果特征 3, 4, 5 是独热编码特征.
seed : int 或 None (默认=None)

用于排列特征列的随机种子.

Returns

mean_importance_vals, all_importance_vals : NumPy 数组.

第一个数组, mean_importance_vals 的形状为 [n_features, ], 包含所有特征的重要性值. 第二个数组的形状为 [n_features, num_rounds], 包含每次重复的特征重要性.如果 num_rounds=1, 它包含与第一个数组 mean_importance_vals 相同的值.

Examples

有关使用示例, 请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/feature_importance_permutation/

ftest

ftest(y_target, y_model_predictions)*

F-Test test to compare 2 or more models.

Parameters

y_target : array-like, shape=[n_samples]

True class labels as 1D NumPy array.
*y_model_predictions : array-likes, shape=[n_samples]

Variable number of 2 or more arrays that contain the predicted class labels from models as 1D NumPy array.

Returns

f, p : float or None, float

Returns the F-value and the p-value

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/ftest/

lift_score

lift_score(y_target, y_predicted, binary=True, positive_label=1)

提升度衡量了分类模型的预测结果优于随机生成预测结果的程度.

在真阳性（TP）、真阴性（TN）、假阳性（FP）和假阴性（FN）的术语中,提升度得分计算如下:
[ TP / (TP+FP) ] / [ (TP+FN) / (TP+TN+FP+FN) ]

Parameters

y_target : 类数组, shape=[n_samples]

真实类别标签.
y_predicted : 类数组, shape=[n_samples]

预测类别标签.
binary : bool (默认: True)

将多类问题映射为二分类问题,其中正类为1, 其他所有类别为0.
positive_label : int (默认: 0)

正类的类别标签.

Returns

score : float

提升度得分,范围为 [0, 无穷大]

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/lift_score/

mcnemar

mcnemar(ary, corrected=True, exact=False)

McNemar检验用于配对名义数据

Parameters

ary : 类数组, shape=[2, 2]

2 x 2 列联表（如 evaluate.mcnemar_table 返回的）, 其中 a: ary[0, 0]: 两个模型都预测正确的样本数 b: ary[0, 1]: 模型1预测正确而模型2预测错误的样本数 c: ary[1, 0]: 模型2预测正确而模型1预测错误的样本数 d: ary[1, 1]: 两个模型都预测错误的样本数
corrected : 类数组, shape=[n_samples] (默认: True)

如果为 True,使用 Edward 的连续性校正进行卡方检验
exact : bool, (默认: False)

如果为 True,使用精确的二项检验比较 b 与参数为 n = b + c 和 p = 0.5 的二项分布. 强烈建议对于样本量 < 25 的情况使用 exact=True, 因为卡方分布在此情况下近似效果不佳!

Returns

chi2, p : float 或 None, float

返回卡方值和 p 值; 如果 exact=True（默认: False）,chi2 为 None

Examples

使用示例请参见
https://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar/

mcnemar_table

mcnemar_table(y_target, y_model1, y_model2)

计算用于McNemar检验的2x2列联表.

Parameters

y_target : array-like, shape=[n_samples]

真实类别标签,为1维NumPy数组.
y_model1 : array-like, shape=[n_samples]

模型预测的类别标签,为1维NumPy数组.
y_model2 : array-like, shape=[n_samples]

模型2预测的类别标签,为1维NumPy数组.

Returns

tb : array-like, shape=[2, 2]

2x2列联表,包含以下内容: a: tb[0, 0]: 两个模型都预测正确的样本数 b: tb[0, 1]: 模型1预测正确而模型2预测错误的样本数 c: tb[1, 0]: 模型2预测正确而模型1预测错误的样本数 d: tb[1, 1]: 两个模型都预测错误的样本数

Examples

使用示例请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_table/

mcnemar_tables

mcnemar_tables(y_target, y_model_predictions)*

计算McNemar检验或Cochran's Q检验的多个2x2列联表.

Parameters

y_target : array-like, shape=[n_samples]

真实类别标签,为1维NumPy数组.
y_model_predictions : array-like, shape=[n_samples]

模型的预测类别标签.

Returns

tables : dict

包含形状为[2, 2]的NumPy数组的字典.每个字典键名表示要比较的两个模型,基于模型传递的顺序作为*y_model_predictions.字典条目的数量等于m个模型之间的成对组合数,即"m选2”.

例如,以下目标数组（包含真实标签）和3个模型
- y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
- y_mod0 = np.array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0])
- y_mod1 = np.array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0])
- y_mod2 = np.array([0, 1, 1, 1, 0, 1, 0, 0, 0, 0])
将产生以下字典:

{'model_0 vs model_1': array([[ 4., 1.], [ 2., 3.]]), 'model_0 vs model_2': array([[ 3., 0.], [ 3., 4.]]), 'model_1 vs model_2': array([[ 3., 0.], [ 2., 5.]])}

每个数组的结构如下:
- tb[0, 0]: 两个模型都预测正确的样本数
- tb[0, 1]: 模型a预测正确而模型b预测错误的样本数
- tb[1, 0]: 模型b预测正确而模型a预测错误的样本数
- tb[1, 1]: 两个模型都预测错误的样本数

Examples

有关使用示例,请参见
https://rasbt.github.io/mlxtend/user_guide/evaluate/mcnemar_tables/

paired_ttest_5x2cv

paired_ttest_5x2cv(estimator1, estimator2, X, y, scoring=None, random_seed=None)

Implements the 5x2cv paired t test proposed by Dieterrich (1998) to compare the performance of two models.

Parameters

estimator1 : scikit-learn classifier or regressor
estimator2 : scikit-learn classifier or regressor
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples]

Target values.
scoring : str, callable, or None (default: None)

If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y); see https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information.
random_seed : int or None (default: None)

Random seed for creating the test/train splits.

Returns

t : float

The t-statistic
pvalue : float

Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_5x2cv/

paired_ttest_kfold_cv

paired_ttest_kfold_cv(estimator1, estimator2, X, y, cv=10, scoring=None, shuffle=False, random_seed=None)

Implements the k-fold paired t test procedure to compare the performance of two models.

Parameters

estimator1 : scikit-learn classifier or regressor
estimator2 : scikit-learn classifier or regressor
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples]

Target values.
cv : int (default: 10)

Number of splits and iteration for the cross-validation procedure
scoring : str, callable, or None (default: None)

If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y); see https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information.
shuffle : bool (default: True)

Whether to shuffle the dataset for generating the k-fold splits.
random_seed : int or None (default: None)

Random seed for shuffling the dataset for generating the k-fold splits. Ignored if shuffle=False.

Returns

t : float

The t-statistic
pvalue : float

Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_kfold_cv/

paired_ttest_resampled

paired_ttest_resampled(estimator1, estimator2, X, y, num_rounds=30, test_size=0.3, scoring=None, random_seed=None)

Implements the resampled paired t test procedure to compare the performance of two models (also called k-hold-out paired t test).

Parameters

estimator1 : scikit-learn classifier or regressor
estimator2 : scikit-learn classifier or regressor
X : {array-like, sparse matrix}, shape = [n_samples, n_features]

Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples]

Target values.
num_rounds : int (default: 30)

Number of resampling iterations (i.e., train/test splits)
test_size : float or int (default: 0.3)

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to use as a test set. If int, represents the absolute number of test exsamples.
scoring : str, callable, or None (default: None)

If None (default), uses 'accuracy' for sklearn classifiers and 'r2' for sklearn regressors. If str, uses a sklearn scoring metric string identifier, for example {accuracy, f1, precision, recall, roc_auc} for classifiers, {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error', 'median_absolute_error', 'r2'} for regressors. If a callable object or function is provided, it has to be conform with sklearn's signature scorer(estimator, X, y); see https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html for more information.
random_seed : int or None (default: None)

Random seed for creating the test/train splits.

Returns

t : float

The t-statistic
pvalue : float

Two-tailed p-value. If the chosen significance level is larger than the p-value, we reject the null hypothesis and accept that there are significant differences in the two compared models.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/paired_ttest_resampled/

permutation_test

permutation_test(x, y, func='x_mean != y_mean', method='exact', num_rounds=1000, seed=None, paired=False)

非参数置换检验

Parameters

x : 列表或形状为 (n_datapoints,) 的 numpy 数组

第一个样本的列表或一维 numpy 数组（例如,治疗组）.
y : 列表或形状为 (n_datapoints,) 的 numpy 数组

第二个样本的列表或一维 numpy 数组（例如,对照组）.
func : 自定义函数或字符串（默认: 'x_mean != y_mean'）

用于计算置换检验统计量的函数. - 如果为 'x_mean != y_mean',使用 func=lambda x, y: np.abs(np.mean(x) - np.mean(y))) 进行双侧检验. - 如果为 'x_mean > y_mean',使用 func=lambda x, y: np.mean(x) - np.mean(y)) 进行单侧检验. - 如果为 'x_mean < y_mean',使用 func=lambda x, y: np.mean(y) - np.mean(x)) 进行单侧检验.
method : 'approximate' 或 'exact'（默认: 'exact'）

如果为 'exact'（默认）,则考虑所有可能的排列. 如果为 'approximate',则抽取的样本数量由 num_rounds 给出. 请注意,'exact' 通常在数据集大小相对较小时才可行.
paired : 布尔值

如果为 True,则通过仅交换每个数据点与其关联点来执行配对检验.
num_rounds : 整数（默认: 1000）

如果 method='approximate',则为置换样本的数量.
seed : 整数或 None（默认: None）

如果 method='approximate',则为生成置换样本的随机种子.

Returns

原假设下的 p 值

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/

proportion_difference

proportion_difference(proportion_1, proportion_2, n_1, n_2=None)

计算比例差异检验的检验统计量和p值.

Parameters

proportion_1 : float

第一个比例
proportion_2 : float

第二个比例
n_1 : int

第一个测试样本的样本量
n_2 : int 或 None (默认=None)

第二个测试样本的样本量. 如果为 None,则 n_1=n_2.

Returns

z, p : float 或 None, float

返回z分数和p值

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/proportion_difference/

scoring

scoring(y_target, y_predicted, metric='error', positive_label=1, unique_labels='auto')

计算监督学习的评分指标.

Parameters

y_target : 类数组, shape=[n_values]

真实的类别标签或目标值.
y_predicted : 类数组, shape=[n_values]

预测的类别标签或目标值.
metric : str (默认: 'error')

性能指标: 'accuracy': (TP + TN)/(FP + FN + TP + TN) = 1-ERR

'average per-class accuracy': 平均每类准确率

'average per-class error': 平均每类错误率

'balanced per-class accuracy': 平衡的每类准确率

'balanced per-class error': 平衡的每类错误率

'error': (TP + TN)/(FP+ FN + TP + TN) = 1-ACC

'false_positive_rate': FP/N = FP/(FP + TN)

'true_positive_rate': TP/P = TP/(FN + TP)

'true_negative_rate': TN/N = TN/(FP + TN)

'precision': TP/(TP + FP)

'recall': 等于 'true_positive_rate'

'sensitivity': 等于 'true_positive_rate' 或 'recall'

'specificity': 等于 'true_negative_rate'

'f1': 2 * (PRE * REC)/(PRE + REC)

'matthews_corr_coef': (TPTN - FPFN) / (sqrt{(TP + FP)( TP + FN )( TN + FP )( TN + FN )})

其中: [TP: 真阳性, TN = 真阴性,

TN: 真阴性, FN = 假阴性]
positive_label : int (默认: 1)

二分类指标中正类的标签.
unique_labels : str 或类数组 (默认: 'auto')

如果为 'auto', 则从 y_target 推导出唯一的类别标签.

Returns

score : float

Examples

有关使用示例,请参见 https://rasbt.github.io/mlxtend/user_guide/evaluate/scoring/