permutation_test: 假设检验的置换检验

一种用于假设检验的置换检验的实现——检验零假设，即两个不同组来自相同分布。

from mlxtend.evaluate import permutation_test

概述

排列检验（也称为精确检验、随机化检验或重新随机化检验）是一种非参数检验程序，用于检验两个不同组别来自同一分布的原假设。排列检验可以用于显著性或假设检验（包括A/B测试），而无需对抽样分布做任何假设（例如，它不要求样本服从正态分布）。

在本文档中，我们将确切方法称为“排列检验”，近似方法称为“随机化检验”。

排列检验机制

在原假设（处理 = 对照）下，任何排列都是同样可能的。（注意，总共有(n+m)!种排列，其中n是处理样本中的记录数量，m是对照样本中的记录数量）。对于双侧检验，我们定义备择假设为两个样本不同（例如，处理 != 对照）。

计算样本x和样本y的差异（此处为：均值）
将所有测量值组合成一个单一数据集
从步骤2中的所有可能排列中抽取一个排列数据集
将排列数据集分为两个大小为n和m的数据集x'和y'
计算样本x'和样本y'的差异（此处为：均值）并记录该差异
重复步骤3-5，直到评估所有排列
返回p值，作为记录的差异至少与步骤1中的原始差异一样极端的次数，并将此数字除以总排列次数

在这里，p值被定义为在原假设（样本之间没有差异）为真的情况下，我们获得的结果至少与我们观察到的结果一样极端的概率（即，步骤1中的样本差异）。

更正式地，我们可以将p值的计算表示如下（改编自[2]）：

$$p(t \geq t_0) = \frac{1}{(n+m)!} \sum^{(n+m)!}_{j=1} I(t_j \geq t_0),$$

其中$t_0$是检验统计量的观察值（在上述列表中的步骤1），$t$是t值，即从重抽样（步骤5）计算的统计量 $t(x'_1, x'_2, ..., x'_n, y'_1, y'_2, ..., y'_m) = |\bar{x'} - \bar{y'}|$，而I是指示函数。

在进行排列检验之前，我们指定的显著性水平（例如，alpha=0.05），如果p值大于alpha，则我们无法拒绝原假设。

请注意，如果排列的数量很大，计算所有排列可能不具可行性。因此，常见的近似方法是进行k轮排列（其中k通常是在1000到2000之间的值）。

配对样本

置换（/随机化）测试也可以通过设置 paired=True 来对配对样本进行。配对测试与上述常规置换测试程序相关，但置换样本是通过在每对中随机交换一个处理数据点和一个对照数据点来创建的。

参考文献

[1] Efron, Bradley 和 Tibshirani, R. J., 《自助法导论》，Chapman & Hall/CRC 统计与应用概率专著，1994年。
[2] Unpingco, José. 《概率、统计与机器学习的Python》，Springer，2016年。
[3] Pitman, E. J. G., 《可适用于任何人群样本的显著性检验》，皇家统计学会补刊，1937年，4: 119-30 和 225-32。

示例 1 -- 双侧随机化检验

执行双侧随机化检验，以检验“治疗”组和“对照”组来自相同分布的零假设。我们将显著性水平设定为 alpha=0.01。

treatment = [ 28.44,  29.32,  31.22,  29.58,  30.34,  28.76,  29.21,  30.4 ,
              31.12,  31.78,  27.58,  31.57,  30.73,  30.43,  30.31,  30.32,
              29.18,  29.52,  29.22,  30.56]
control = [ 33.51,  30.63,  32.38,  32.52,  29.41,  30.93,  49.78,  28.96,
            35.77,  31.42,  30.76,  30.6 ,  23.64,  30.54,  47.78,  31.98,
            34.52,  32.42,  31.32,  40.72]

由于评估所有可能的排列可能需要一些时间，我们将使用近似方法（有关详细信息，请参见引言），即随机化测试：

from mlxtend.evaluate import permutation_test

p_value = permutation_test(treatment, control,
                           method='approximate',
                           num_rounds=10000,
                           seed=0)
print(p_value)

0.0066993300669933005

由于 p 值 < α，我们可以拒绝原假设，即这两个样本来自相同的分布。

示例 2 -- 使用排列检验计算相关分析的p值（皮尔逊相关系数）

注意：这是单边假设检验，因为我们进行置换检验是“有多少次获得的相关系数大于观察值？”

import numpy as np
from mlxtend.evaluate import permutation_test

x = np.array([1, 2, 3, 4, 5, 6])
y = np.array([2, 4, 1, 5, 6, 7])

print('Observed pearson R: %.2f' % np.corrcoef(x, y)[1][0])


p_value = permutation_test(x, y,
                           method='exact',
                           func=lambda x, y: np.corrcoef(x, y)[1][0],
                           seed=0)
print('P value: %.2f' % p_value)

Observed pearson R: 0.81
P value: 0.10

示例 3 -- 配对双样本随机化检验

假设我们有一个数据集，包含威斯康星州七个湖泊的深度（单位：米）：

$$ \begin{array}{cccccccc} \text { 年份 } & \text{ 湖泊 } 1 & 2 & 3 & 4 & 5 & 6 & 7 \ \hline 1980: & 3.67 & 1.72 & 3.46 & 2.60 & 2.03 & 2.10 & 3.01 \ \hline 1990: & 2.11 & 1.79 & 2.71 & 1.89 & 1.69 & 1.71 & 2.01 \ \hline \end{array} $$

我们有兴趣检验的零假设是1980年和1990年的湖泊深度没有显著差异。对于这个配对的两样本检验，我们在显著性水平0.05下进行配对样本的随机化检验：

from mlxtend.evaluate import permutation_test

lakes_1980 = [3.67, 1.72, 3.46, 2.60, 2.03, 2.10, 3.01]
lakes_1990 = [2.11, 1.79, 2.71, 1.89, 1.69, 1.71, 2.01]

p_value = permutation_test(
    lakes_1980, lakes_1990, paired=True, method="approximate", seed=0, num_rounds=100000
)

print('P value: %.3f' % p_value)

P value: 0.031

由于p值小于显著性阈值0.05，我们得出结论，在1980年和1990年之间湖泊深度存在显著差异。

API

permutation_test(x, y, func='x_mean != y_mean', method='exact', num_rounds=1000, seed=None, paired=False)

Nonparametric permutation test

Parameters

x : list or numpy array with shape (n_datapoints,)

A list or 1D numpy array of the first sample (e.g., the treatment group).
y : list or numpy array with shape (n_datapoints,)

A list or 1D numpy array of the second sample (e.g., the control group).
func : custom function or str (default: 'x_mean != y_mean')

function to compute the statistic for the permutation test. - If 'x_mean != y_mean', uses func=lambda x, y: np.abs(np.mean(x) - np.mean(y))) for a two-sided test. - If 'x_mean > y_mean', uses func=lambda x, y: np.mean(x) - np.mean(y)) for a one-sided test. - If 'x_mean < y_mean', uses func=lambda x, y: np.mean(y) - np.mean(x)) for a one-sided test.
method : 'approximate' or 'exact' (default: 'exact')

If 'exact' (default), all possible permutations are considered. If 'approximate' the number of drawn samples is given by num_rounds. Note that 'exact' is typically not feasible unless the dataset size is relatively small.
paired : bool

If True, a paired test is performed by only exchanging each datapoint with its associate.
num_rounds : int (default: 1000)

The number of permutation samples if method='approximate'.
seed : int or None (default: None)

The random seed for generating permutation samples if method='approximate'.

Returns

p-value under the null hypothesis Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/evaluate/permutation_test/