对数组 API 标准的支持#

备注

数组API标准的支持仍处于实验阶段，并隐藏在环境变量之后。目前仅覆盖了公共API的一小部分。

本指南描述了如何使用和 添加对 Python 数组 API 标准的支持。该标准允许用户开箱即用地使用任何兼容数组 API 的数组库与 SciPy。

RFC 定义了 SciPy 如何实现对标准的支持，主要原则是 “输入数组类型等于输出数组类型”。此外，该实现对允许的类数组输入进行了更严格的验证，例如拒绝 numpy 矩阵和掩码数组实例，以及具有对象数据类型的数组。

在下文中，一个兼容数组API的命名空间被记为 xp。

使用数组 API 标准支持#

要启用数组 API 标准支持，必须在导入 SciPy 之前设置一个环境变量：

export SCIPY_ARRAY_API=1

这既支持数组API标准，又对类似数组的参数进行更严格的输入验证。请注意，此环境变量旨在作为临时措施，以便逐步进行更改并将其合并到``main``中，而不会立即影响向后兼容性。我们不打算长期保留此环境变量。

这个聚类示例展示了如何将 PyTorch 张量作为输入和返回值使用：

>>> import torch
>>> from scipy.cluster.vq import vq
>>> code_book = torch.tensor([[1., 1., 1.],
...                           [2., 2., 2.]])
>>> features  = torch.tensor([[1.9, 2.3, 1.7],
...                           [1.5, 2.5, 2.2],
...                           [0.8, 0.6, 1.7]])
>>> code, dist = vq(features, code_book)
>>> code
tensor([1, 1, 0], dtype=torch.int32)
>>> dist
tensor([0.4359, 0.7348, 0.8307])

请注意，上述示例适用于 PyTorch CPU 张量。对于 GPU 张量或 CuPy 数组，vq 的预期结果是 TypeError，因为 vq 不是一个纯 Python 函数，因此无法在 GPU 上工作。

更严格的数组输入验证将拒绝 np.matrix 和 np.ma.MaskedArray 实例，以及 object dtype 的数组：

>>> import numpy as np
>>> from scipy.cluster.vq import vq
>>> code_book = np.array([[1., 1., 1.],
...                       [2., 2., 2.]])
>>> features  = np.array([[1.9, 2.3, 1.7],
...                       [1.5, 2.5, 2.2],
...                       [0.8, 0.6, 1.7]])
>>> vq(features, code_book)
(array([1, 1, 0], dtype=int32), array([0.43588989, 0.73484692, 0.83066239]))

>>> # The above uses numpy arrays; trying to use np.matrix instances or object
>>> # arrays instead will yield an exception with `SCIPY_ARRAY_API=1`:
>>> vq(np.asmatrix(features), code_book)
...
TypeError: 'numpy.matrix' are not supported

>>> vq(np.ma.asarray(features), code_book)
...
TypeError: 'numpy.ma.MaskedArray' are not supported

>>> vq(features.astype(np.object_), code_book)
...
TypeError: object arrays are not supported

当前支持的功能#

当设置了环境变量时，以下模块提供数组API标准支持：

在 scipy.special 中提供了以下函数的支持：scipy.special.log_ndtr、scipy.special.ndtr、scipy.special.ndtri、scipy.special.erf、scipy.special.erfc、scipy.special.i0、scipy.special.i0e、scipy.special.i1、scipy.special.i1e、scipy.special.gammaln、scipy.special.gammainc、scipy.special.gammaincc、scipy.special.logit、scipy.special.expit、scipy.special.entr、scipy.special.rel_entr、scipy.special.rel_entr、scipy.special.xlogy 和 scipy.special.chdtrc。

在 scipy.stats 中提供了以下函数的支持：scipy.stats.describe、scipy.stats.moment、scipy.stats.skew、scipy.stats.kurtosis、scipy.stats.kstat、scipy.stats.kstatvar、scipy.stats.circmean、scipy.stats.circvar、scipy.stats.circstd、scipy.stats.entropy、scipy.stats.variation、scipy.stats.sem、scipy.stats.ttest_1samp、scipy.stats.pearsonr、scipy.stats.chisquare、scipy.stats.skewtest、scipy.stats.kurtosistest、scipy.stats.normaltest、scipy.stats.jarque_bera、scipy.stats.bartlett、scipy.stats.power_divergence 和 scipy.stats.monte_carlo_test。

实现说明#

对数组API标准以及Numpy、CuPy和PyTorch的特定兼容性功能的支持，主要通过 array-api-compat 提供。该包通过git子模块（位于 scipy/_lib 下）包含在SciPy代码库中，因此不会引入新的依赖项。

array-api-compat 提供了通用的实用函数，并添加了诸如 xp.concat 的别名（对于 numpy，映射到 np.concatenate）。这使得可以在 NumPy、PyTorch、CuPy 和 JAX 之间使用统一的 API（未来还将支持其他库，如 Dask）。

当环境变量未设置，因此 SciPy 中的数组 API 标准支持被禁用时，我们仍然使用 NumPy 命名空间的“增强”版本，即 array_api_compat.numpy。这不应改变 SciPy 函数的行为，它实际上是现有的 numpy 命名空间，添加了一些别名，并修改/添加了一些函数以支持数组 API 标准。当支持启用时，根据数组类型，xp 将返回与输入数组类型匹配的标准兼容命名空间（例如，如果 cluster.vq.kmeans 的输入是 PyTorch 数组，那么 xp 是 array_api_compat.torch）。

向 SciPy 函数添加数组 API 标准支持#

尽可能地，添加到 SciPy 的新代码应尽量遵循数组 API 标准（这些函数通常也是 NumPy 使用的最佳实践惯例）。通过遵循标准，有效地添加对数组 API 标准的支持通常是直接的，我们理想情况下不需要维护任何定制。

有三种辅助函数可用：

array_namespace: 根据输入数组返回命名空间，并进行一些输入验证（如拒绝处理掩码数组，请参阅 RFC。）
_asarray: asarray 的直接替代品，增加了 check_finite 和 order 参数。如上所述，尽量限制使用非标准功能。最终我们希望将我们的需求上游到兼容性库中。传递 xp=xp 可以避免内部重复调用 array_namespace。
copy: _asarray(x, copy=True) 的别名。copy 参数仅在 NumPy 2.0 中被引入到 np.asarray，因此需要使用助手来支持 <2.0。传递 xp=xp 可以避免内部重复调用 array_namespace。

要为一个在 .py 文件中定义的 SciPy 函数添加支持，你需要更改的是：

输入数组验证，
使用 xp 而不是 np 函数，
在调用编译代码时，先将数组转换为NumPy数组，然后在转换回输入数组类型。

输入数组验证使用以下模式:

xp = array_namespace(arr) # where arr is the input array
# alternatively, if there are multiple array inputs, include them all:
xp = array_namespace(arr1, arr2)

# uses of non-standard parameters of np.asarray can be replaced with _asarray
arr = _asarray(arr, order='C', dtype=xp.float64, xp=xp)

请注意，如果一个输入是非 numpy 数组类型，所有类似数组的输入都必须属于该类型；尝试将非 numpy 数组与列表、Python 标量或其他任意 Python 对象混合将引发异常。对于 NumPy 数组，出于向后兼容的原因，这些类型将继续被接受。

如果一个函数只调用一次编译代码，请使用以下模式:

x = np.asarray(x)  # convert to numpy right before compiled call(s)
y = _call_compiled_code(x)
y = xp.asarray(y)  # convert back to original array type

如果有多次调用编译代码，确保只进行一次转换以避免过多的开销。

以下是一个假设的公共 SciPy 函数 toto 的示例:

def toto(a, b):
    a = np.asarray(a)
    b = np.asarray(b, copy=True)

    c = np.sum(a) - np.prod(b)

    # this is some C or Cython call
    d = cdist(c)

    return d

你可以这样转换:

def toto(a, b):
    xp = array_namespace(a, b)
    a = xp.asarray(a)
    b = copy(b, xp=xp)  # our custom helper is needed for copy

    c = xp.sum(a) - xp.prod(b)

    # this is some C or Cython call
    c = np.asarray(c)
    d = cdist(c)
    d = xp.asarray(d)

    return d

浏览编译后的代码需要返回到一个 NumPy 数组，因为 SciPy 的扩展模块只与 NumPy 数组（或 Cython 中的内存视图）一起工作，而不与其他数组类型一起工作。对于 CPU 上的数组，转换应该是零拷贝的，而在 GPU 和其他设备上，尝试转换将引发异常。这样做的原因是，在设备之间静默传输数据被认为是不好的做法，因为它很可能是大且难以检测的性能瓶颈。

添加测试#

以下是可用的 pytest 标记：

array_api_compatible -> xp: 使用参数化在多个数组后端上运行测试。
skip_xp_backends(*backends, reasons=None, np_only=False, cpu_only=False): 跳过某些后端和/或设备。np_only 跳过除默认NumPy后端之外的所有后端的测试。@pytest.mark.usefixtures("skip_xp_backends") 必须与此标记一起使用，以使跳过生效。
skip_xp_invalid_arg 用于跳过在使用 SCIPY_ARRAY_API 时使用无效参数的测试。例如，scipy.stats 函数的一些测试将掩码数组传递给被测试的函数，但掩码数组与数组API不兼容。使用 skip_xp_invalid_arg 装饰器允许这些测试在不使用 SCIPY_ARRAY_API 时防止回归，而在使用 SCIPY_ARRAY_API 时不导致失败。随着时间的推移，我们希望这些函数在接收到数组API无效输入时发出弃用警告，而这个装饰器将检查弃用警告是否发出而不会导致测试失败。当 SCIPY_ARRAY_API=1 行为成为默认且唯一的行为时，这些测试（以及装饰器本身）将被移除。

以下是使用标记的示例:

from scipy.conftest import array_api_compatible, skip_xp_invalid_arg
...
@pytest.mark.skip_xp_backends(np_only=True,
                               reasons=['skip reason'])
@pytest.mark.usefixtures("skip_xp_backends")
@array_api_compatible
def test_toto1(self, xp):
    a = xp.asarray([1, 2, 3])
    b = xp.asarray([0, 2, 5])
    toto(a, b)
...
@pytest.mark.skip_xp_backends('array_api_strict', 'cupy',
                               reasons=['skip reason 1',
                                        'skip reason 2',])
@pytest.mark.usefixtures("skip_xp_backends")
@array_api_compatible
def test_toto2(self, xp):
    a = xp.asarray([1, 2, 3])
    b = xp.asarray([0, 2, 5])
    toto(a, b)
...
# Do not run when SCIPY_ARRAY_API is used
@skip_xp_invalid_arg
def test_toto_masked_array(self):
    a = np.ma.asarray([1, 2, 3])
    b = np.ma.asarray([0, 2, 5])
    toto(a, b)

当 cpu_only=True 时，向 reasons 传递自定义原因是不支持的，因为 cpu_only=True 可以与传递 backends 一起使用。此外，使用 cpu_only 的原因可能仅仅是因为在测试的函数中使用了编译代码。

当文件中的每个测试函数都已更新为兼容数组API时，可以通过使用 pytestmark 告诉 pytest 将标记应用于每个测试函数来减少冗长:

from scipy.conftest import array_api_compatible

pytestmark = [array_api_compatible, pytest.mark.usefixtures("skip_xp_backends")]
skip_xp_backends = pytest.mark.skip_xp_backends
...
@skip_xp_backends(np_only=True, reasons=['skip reason'])
def test_toto1(self, xp):
    a = xp.asarray([1, 2, 3])
    b = xp.asarray([0, 2, 5])
    toto(a, b)

应用这些标记后，可以使用 dev.py test 并带上新选项 -b 或 --array-api-backend:

python dev.py test -b numpy -b pytorch -s cluster

这会自动适当地设置 SCIPY_ARRAY_API。要测试一个具有多个设备的库，并且使用非默认设备，可以设置第二个环境变量（SCIPY_DEVICE，仅在测试套件中使用）。有效值取决于被测试的数组库，例如对于 PyTorch（目前唯一已知支持多设备且能正常工作的库），有效值为 "cpu", "cuda", "mps"。因此，要使用 PyTorch MPS 后端运行测试套件，请使用：SCIPY_DEVICE=mps python dev.py test -b pytorch。

请注意，有一个 GitHub Actions 工作流程运行 pytorch-cpu。

附加信息#

以下是一些额外的资源，它们在某些设计决策中起到了激励作用，并在开发阶段提供了帮助：

最初的 PR 包含了一些讨论
从 PR 快速开始，并从 scikit-learn 中获得一些灵感。
PR 为 scikit-learn 添加 Array API 支持
其他相关的 scikit-learn PRs: #22554 和 #25956