创建 NumPy 通用函数

有两种类型的通用函数：

那些对标量进行操作的函数，这些是“通用函数”或 *ufuncs*（见下文 @vectorize）。
那些对高维数组和标量进行操作的函数，这些是“广义通用函数”或 *gufuncs*（如下 @guvectorize）。

`@vectorize` 装饰器

Numba 的 vectorize 允许接受标量输入参数的 Python 函数作为 NumPy ufuncs 使用。创建传统的 NumPy ufunc 不是一个最直接的过程，并且涉及编写一些 C 代码。Numba 使这变得简单。使用 vectorize() 装饰器，Numba 可以将纯 Python 函数编译成一个 ufunc，该 ufunc 在 NumPy 数组上的操作速度与用 C 编写的传统 ufuncs 一样快。

使用 vectorize() ，你可以将函数编写为对输入标量进行操作，而不是数组。Numba 将生成周围的循环（或内核），允许对实际输入进行高效迭代。

The vectorize() 装饰器有两种操作模式：

急切编译，或装饰时编译：如果你向装饰器传递一个或多个类型签名，你将构建一个 NumPy 通用函数（ufunc）。本小节其余部分描述了使用装饰时编译构建 ufuncs。
惰性或调用时编译：当不提供任何签名时，装饰器将为您提供一个 Numba 动态通用函数（DUFunc），该函数在调用时动态编译一个新的内核，当输入类型之前不受支持时。后面的一个小节“动态通用函数”更深入地描述了这种模式。

如上所述，如果您将一组签名传递给 vectorize() 装饰器，您的函数将被编译成一个 NumPy ufunc。在基本情况下，只会传递一个签名：

来自 numba/tests/doc_examples/test_examples.py 中的 test_vectorize_one_signature

from numba import vectorize, float64

@vectorize([float64(float64, float64)])
def f(x, y):
    return x + y

如果你传递多个签名，请注意你必须先传递最具体的签名，然后再传递最不具体的签名（例如，单精度浮点数在双精度浮点数之前），否则基于类型的调度将无法按预期工作：

来自 numba/tests/doc_examples/test_examples.py 中的 test_vectorize_multiple_signatures

from numba import vectorize, int32, int64, float32, float64
import numpy as np

@vectorize([int32(int32, int32),
            int64(int64, int64),
            float32(float32, float32),
            float64(float64, float64)])
def f(x, y):
    return x + y

该函数将在指定的数组类型上按预期工作：

来自 numba/tests/doc_examples/test_examples.py 中的 test_vectorize_multiple_signatures

a = np.arange(6)
result = f(a, a)
# result == array([ 0,  2,  4,  6,  8, 10])

来自 numba/tests/doc_examples/test_examples.py 中的 test_vectorize_multiple_signatures

a = np.linspace(0, 1, 6)
result = f(a, a)
# Now, result == array([0. , 0.4, 0.8, 1.2, 1.6, 2. ])

但在其他类型上工作时会失败:

>>> a = np.linspace(0, 1+1j, 6)
>>> f(a, a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'ufunc' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

你可能会问自己，“为什么我要这样做，而不是使用 @jit 装饰器编译一个简单的迭代循环？”。答案是，NumPy ufuncs 自动获得其他功能，如归约、累积或广播。使用上面的例子：

来自 numba/tests/doc_examples/test_examples.py 中的 test_vectorize_multiple_signatures

a = np.arange(12).reshape(3, 4)
# a == array([[ 0,  1,  2,  3],
#             [ 4,  5,  6,  7],
#             [ 8,  9, 10, 11]])

result1 = f.reduce(a, axis=0)
# result1 == array([12, 15, 18, 21])

result2 = f.reduce(a, axis=1)
# result2 == array([ 6, 22, 38])

result3 = f.accumulate(a)
# result3 == array([[ 0,  1,  2,  3],
#                   [ 4,  6,  8, 10],
#                   [12, 15, 18, 21]])

result4 = f.accumulate(a, axis=1)
# result3 == array([[ 0,  1,  3,  6],
#                   [ 4,  9, 15, 22],
#                   [ 8, 17, 27, 38]])

参见

ufuncs 的标准功能 (NumPy 文档)。

备注

在编译代码中仅支持 ufuncs 的广播和归约功能。

The vectorize() 装饰器支持多个 ufunc 目标：

目标	描述
cpu	单线程 CPU
并行	多核CPU
cuda	CUDA GPU 备注这将创建一个类似ufunc 的对象。详情请参阅 CUDA ufunc 的文档。

一般的指导原则是根据数据大小和算法选择不同的目标。”cpu” 目标适用于小数据量（大约小于1KB）和低计算强度的算法。它的开销最小。”parallel” 目标适用于中等数据量（大约小于1MB）。线程化会增加少量延迟。”cuda” 目标适用于大数据量（大约大于1MB）和高计算强度的算法。在GPU和内存之间传输数据会增加显著的开销。

从 Numba 0.59 开始，cpu 目标在编译代码中支持以下属性和方法：

ufunc.nin
ufunc.nout
ufunc.nargs
ufunc.identity
ufunc.signature
ufunc.reduce() (仅前5个参数 - 实验性功能)

`@guvectorize` 装饰器

虽然 vectorize() 允许你编写一次处理一个元素的通用函数，但 guvectorize() 装饰器将这一概念更进一步，允许你编写可以处理输入数组中任意数量元素的通用函数，并接受和返回维度不同的数组。典型的例子是运行中值或卷积滤波器。

与 vectorize() 函数相反，guvectorize() 函数不返回其结果值：它们将其作为数组参数，该数组必须由函数填充。这是因为数组实际上是由 NumPy 的调度机制分配的，该机制调用了 Numba 生成的代码。

类似于 vectorize() 装饰器，guvectorize() 也有两种操作模式：急切模式（装饰时编译）和懒惰模式（调用时编译）。

这里是一个非常简单的例子：

来自 numba/tests/doc_examples/test_examples.py 中的 test_guvectorize

from numba import guvectorize, int64
import numpy as np

@guvectorize([(int64[:], int64, int64[:])], '(n),()->(n)')
def g(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i] + y

底层Python函数只是将给定的标量（y）添加到一维数组的所有元素中。更有趣的是声明。那里有两件事：

输入和输出布局的声明，以符号形式表示：(n),()->(n) 告诉 NumPy 该函数接受一个 n 元素的一维数组，一个标量（符号上用空元组 () 表示），并返回一个 n 元素的一维数组；
根据 @vectorize 支持的具体签名列表；这里，与上面的例子一样，我们演示了 int64 数组。

备注

一维数组类型也可以接收标量参数（那些形状为 () 的参数）。在上面的例子中，第二个参数也可以声明为 int64[:]。在这种情况下，值必须通过 y[0] 读取。

我们现在可以通过一个简单的例子来检查编译后的ufunc的作用：

来自 numba/tests/doc_examples/test_examples.py 中的 test_guvectorize

a = np.arange(5)
result = g(a, 2)
# result == array([2, 3, 4, 5, 6])

好处是 NumPy 会根据输入的形状自动处理更复杂的输入：

来自 numba/tests/doc_examples/test_examples.py 中的 test_guvectorize

a = np.arange(6).reshape(2, 3)
# a == array([[0, 1, 2],
#             [3, 4, 5]])

result1 = g(a, 10)
# result1 == array([[10, 11, 12],
#                   [13, 14, 15]])

result2 = g(a, np.array([10, 20]))
g(a, np.array([10, 20]))
# result2 == array([[10, 11, 12],
#                   [23, 24, 25]])

备注

:func:`~numba.vectorize 和 :func:`~numba.guvectorize 都支持传递 nopython=True 类似于 @jit 装饰器中的用法。使用它来确保生成的代码不会回退到对象模式。

标量返回值

现在假设我们想从 guvectorize() 返回一个标量值。为此，我们需要：

在签名中，使用 [:] 声明标量返回值，类似于一维数组（例如 int64[:]）。
在布局中，将其声明为 ()，
在实现中，写入第一个元素（例如 res[0] = acc）。

以下示例函数计算一维数组 (x) 与标量 (y) 的和，并将其作为标量返回：

来自 numba/tests/doc_examples/test_examples.py 中的 test_guvectorize_scalar_return

from numba import guvectorize, int64
import numpy as np

@guvectorize([(int64[:], int64, int64[:])], '(n),()->()')
def g(x, y, res):
    acc = 0
    for i in range(x.shape[0]):
        acc += x[i] + y
    res[0] = acc

现在，如果我们对数组应用这个包装函数，我们会得到一个标量值作为输出：

来自 numba/tests/doc_examples/test_examples.py 中的 test_guvectorize_scalar_return

a = np.arange(5)
result = g(a, 2)
# At this point, result == 20.

覆盖输入值

在大多数情况下，写入输入也可能看似有效 - 然而，这种行为不能依赖。考虑以下示例函数：

来自 numba/tests/doc_examples/test_examples.py 中的 test_guvectorize_overwrite

from numba import guvectorize, float64
import numpy as np

@guvectorize([(float64[:], float64[:])], '()->()')
def init_values(invals, outvals):
    invals[0] = 6.5
    outvals[0] = 4.2

使用 float64 类型的数组调用 init_values 函数会导致输入发生可见变化：

来自 numba/tests/doc_examples/test_examples.py 中的 test_guvectorize_overwrite

invals = np.zeros(shape=(3, 3), dtype=np.float64)
# invals == array([[6.5, 6.5, 6.5],
#                  [6.5, 6.5, 6.5],
#                  [6.5, 6.5, 6.5]])

outvals = init_values(invals)
# outvals == array([[4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2]])

这是可行的，因为 NumPy 可以直接将输入数据传递给 init_values 函数，因为数据的 dtype 与声明的参数匹配。然而，它也可能创建并传递一个临时数组，在这种情况下，对输入的更改将会丢失。例如，当需要类型转换时，这种情况就可能发生。为了演示，我们可以使用一个 float32 类型的数组来调用 init_values 函数：

来自 numba/tests/doc_examples/test_examples.py 中的 test_guvectorize_overwrite

invals = np.zeros(shape=(3, 3), dtype=np.float32)
# invals == array([[0., 0., 0.],
#                  [0., 0., 0.],
#                  [0., 0., 0.]], dtype=float32)
outvals = init_values(invals)
# outvals == array([[4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2]])
print(invals)
# invals == array([[0., 0., 0.],
#                  [0., 0., 0.],
#                  [0., 0., 0.]], dtype=float32)

在这种情况下，invals 数组没有变化，因为临时转换的数组被修改了。

要解决这个问题，需要告诉 GUFunc 引擎 invals 参数是可写的。这可以通过传递 writable_args=('invals',)``（按名称指定）或 ``writable_args=(0,)``（按位置指定）给 ``@guvectorize 来实现。现在，上面的代码可以按预期工作：

来自 numba/tests/doc_examples/test_examples.py 中的 test_guvectorize_overwrite

@guvectorize(
    [(float64[:], float64[:])],
    '()->()',
    writable_args=('invals',)
)
def init_values(invals, outvals):
    invals[0] = 6.5
    outvals[0] = 4.2

invals = np.zeros(shape=(3, 3), dtype=np.float32)
# invals == array([[0., 0., 0.],
#                  [0., 0., 0.],
#                  [0., 0., 0.]], dtype=float32)
outvals = init_values(invals)
# outvals == array([[4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2],
#                   [4.2, 4.2, 4.2]])
print(invals)
# invals == array([[6.5, 6.5, 6.5],
#                  [6.5, 6.5, 6.5],
#                  [6.5, 6.5, 6.5]], dtype=float32)