线程层

本节是关于 Numba 的线程层，这是用于内部执行并行执行的库，该并行执行是通过使用 parallel 目标在 CPU 上发生的，具体包括：

在 @jit 和 @njit 中使用 parallel=True 关键字参数。
在 @vectorize 和 @guvectorize 中使用 target='parallel' 关键字参数。

备注

如果代码库没有使用 threading 或 multiprocessing 模块（或任何其他形式的并行性），Numba 附带的线程层默认设置将运行良好，无需进一步操作！

哪些线程层是可用的？

有三种可用的线程层，它们的名称如下：

tbb - 由 Intel TBB 支持的线程层。
omp - 一个由 OpenMP 支持的线程层。
workqueue - 一个简单的内置工作共享任务调度器。

在实践中，唯一保证存在的线程层是 workqueue。omp 层需要存在合适的 OpenMP 运行时库。tbb 层需要存在 Intel 的 TBB 库，这些可以通过 conda 命令获取:

$ conda install tbb

如果你通过 pip 安装了 Numba，可以通过运行以下命令来启用 TBB:

$ pip install tbb

备注

Numba 默认的线程层搜索和加载方式对缺失库、不兼容的运行时等具有容忍性。

设置线程层

线程层通过环境变量 NUMBA_THREADING_LAYER 或通过赋值给 numba.config.THREADING_LAYER 来设置。如果使用编程方式设置线程层，它必须在任何基于 Numba 的并行目标编译逻辑发生之前进行。有两种选择线程层的方法，第一种是选择在各种并行执行形式下安全的线程层，第二种是通过线程层名称进行显式选择（例如 tbb）。

设置线程层选择的优先级

默认情况下，线程层按 'tbb'、'omp' 然后 'workqueue' 的顺序搜索。要更改此搜索顺序同时根据可用性选择线程层，可以使用环境变量 NUMBA_THREADING_LAYER_PRIORITY。

请注意，它也可以通过 numba.config.THREADING_LAYER_PRIORITY 设置。类似于 numba.config.THREADING_LAYER，它必须在任何基于 Numba 的并行目标编译逻辑发生之前设置。

例如，要指示 Numba 首先选择 omp``（如果可用），然后是 ``tbb 等等，设置环境变量为 NUMBA_THREADING_LAYER_PRIORITY="omp tbb workqueue"。或者通过编程方式，numba.config.THREADING_LAYER_PRIORITY = ["omp", "tbb", "workqueue"]。

选择一个线程层以实现安全的并行执行

并行执行从根本上源自核心 Python 库，有四种形式（前三种形式也适用于通过其他方式使用并行执行的代码！）：

threads 来自 threading 模块。
通过 spawn 从 multiprocessing 模块生成进程（在 Windows 上默认，仅在 Unix 上的 Python 3.4+ 可用）
通过 fork 从 multiprocessing 模块 fork 进程（在 Unix 上默认）。
通过使用 forkserver``（仅在 Unix 上的 Python 3 中可用）从 ``multiprocessing 模块 fork 进程。本质上，一个新的进程被生成，然后根据请求从这个新进程进行 fork。

在使用这些并行形式的任何库都必须在该范式下表现出安全的行为。因此，线程层选择方法旨在提供一种简单、跨平台且环境宽容的方式来选择适合给定范式的线程层库。可以提供给设置机制的选项如下：

default 不提供特定的安全保证，并且是默认设置。
safe 是既支持分叉又支持线程安全的，这需要安装 tbb 包（Intel TBB 库）。
forksafe 提供了一个安全的 fork 库。
threadsafe 提供了一个线程安全的库。

要发现所选择的线程层，可以在并行执行后调用函数 numba.threading_layer()。例如，在未安装 TBB 的 Linux 机器上:

from numba import config, njit, threading_layer
import numpy as np

# set the threading layer before any parallel target compilation
config.THREADING_LAYER = 'threadsafe'

@njit(parallel=True)
def foo(a, b):
    return a + b

x = np.arange(10.)
y = x.copy()

# this will force the compilation of the function, select a threading layer
# and then execute in parallel
foo(x, y)

# demonstrate the threading layer chosen
print("Threading layer chosen: %s" % threading_layer())

这将产生:

Threading layer chosen: omp

这在Linux上是有意义的，因为GNU OpenMP是线程安全的。

选择一个命名的线程层

高级用户可能希望为其用例选择特定的线程层，这可以通过直接将线程层名称提供给设置机制来完成。选项和要求如下：

线程层名称

平台

要求

tbb

全部

tbb 包 ($ conda install tbb)

omp

Linux

Windows

OSX

GNU OpenMP 库（很可能已经存在）

MS OpenMP 库（很可能已经存在）

要么安装 intel-openmp 包，要么安装 llvm-openmp 包（使用 conda install 命令安装指定的包）。

workqueue

全部

无

如果线程层未能正确加载，Numba 将检测到这一点并提供如何解决问题的提示。还应注意的是，Numba 诊断命令 numba -s 有一个部分 __线程层信息__，它报告当前环境中线程层的可用性。

额外注释

线程层与 CPython 内部和系统级库有相当复杂的交互，需要注意一些额外的事项：

安装英特尔的 TBB 库极大地扩展了线程层选择过程中的可用选项。
在Linux上，由于GNU OpenMP运行时库（libgomp）不是fork安全的，omp 线程层也不是fork安全的。如果在一个使用 omp 线程层的程序中发生fork，存在一个检测机制，该机制将尝试优雅地终止fork的子进程，并将错误消息打印到 STDERR。
在具有 fork(2) 系统调用的系统上，如果使用了 TBB 支持的线程层，并且从启动 TBB 的线程（通常是主线程）以外的线程中调用了 fork，这将导致未定义行为，并在 STDERR 上显示警告。由于 spawn 本质上是 fork 后跟 exec，因此从非主线程中 spawn 是安全的，但由于无法区分仅仅是 fork 调用，警告消息仍将被显示。
在OSX上，需要 intel-openmp 包来启用基于OpenMP的线程层。

设置线程数

The number of threads used by numba is based on the number of CPU cores available (see numba.config.NUMBA_DEFAULT_NUM_THREADS), but it can be overridden with the NUMBA_NUM_THREADS environment variable.

The total number of threads that numba launches is in the variable numba.config.NUMBA_NUM_THREADS.

在某些使用场景中，可能希望将线程数设置为较低的值，以便在使用更高级别的并行性时能够使用numba。

The number of threads can be set dynamically at runtime using numba.set_num_threads(). Note that set_num_threads() only allows setting the number of threads to a smaller value than NUMBA_NUM_THREADS. Numba always launches numba.config.NUMBA_NUM_THREADS threads, but set_num_threads() causes it to mask out unused threads so they aren’t used in computations.

当前numba使用的线程数可以通过 numba.get_num_threads() 访问。这两个函数在jitted函数内部都可以工作。

限制线程数量的示例

In this example, suppose the machine we are running on has 8 cores (so numba.config.NUMBA_NUM_THREADS would be 8). Suppose we want to run some code with @njit(parallel=True), but we also want to run our code concurrently in 4 different processes. With the default number of threads, each Python process would run 8 threads, for a total in 4*8 = 32 threads, which is oversubscription for our 8 cores. We should rather limit each process to 2 threads, so that the total will be 4*2 = 8, which matches our number of physical cores.

有两种方法可以做到这一点。一种是将 NUMBA_NUM_THREADS 环境变量设置为 2。

$ NUMBA_NUM_THREADS=2 python ourcode.py

然而，这种方法有两个缺点：

NUMBA_NUM_THREADS 必须在导入 Numba 之前设置，理想情况下在启动 Python 之前设置。一旦导入 Numba，环境变量就会被读取，并且该数量的线程将被锁定为 Numba 启动的线程数。
If we want to later increase the number of threads used by the process, we cannot. NUMBA_NUM_THREADS sets the maximum number of threads that are launched for a process. Calling set_num_threads() with a value greater than numba.config.NUMBA_NUM_THREADS results in an error.

这种方法的优点是，我们可以在不改变代码的情况下从进程外部进行操作。

另一种方法是使用我们代码中的 numba.set_num_threads() 函数

from numba import njit, set_num_threads

@njit(parallel=True)
def func():
    ...

set_num_threads(2)
func()

如果在执行并行代码之前调用 set_num_threads(2)，它的效果与使用 NUMBA_NUM_THREADS=2 调用进程相同，即并行代码将仅在2个线程上执行。然而，我们之后可以调用 set_num_threads(8) 将线程数增加回默认大小。而且我们不必担心在导入 Numba 之前设置它。它只需要在并行函数运行之前调用。

获取线程ID

在某些情况下，访问当前正在执行并行区域的线程的唯一标识符可能是有益的。为此，Numba 提供了 numba.get_thread_id() 函数。此函数与 OpenMP 的函数 omp_get_thread_num 相对应，并返回一个介于 0（包含）和如上所述配置的线程数（不包含）之间的整数。

API 参考

numba.config.NUMBA_NUM_THREADS

numba 启动的线程总数（最大值）。

Defaults to numba.config.NUMBA_DEFAULT_NUM_THREADS, but can be overridden with the NUMBA_NUM_THREADS environment variable.

numba.config.NUMBA_DEFAULT_NUM_THREADS: The number of usable CPU cores on the system (as determined by len(os.sched_getaffinity(0)), if supported by the OS, or multiprocessing.cpu_count() if not). This is the default value for numba.config.NUMBA_NUM_THREADS unless the NUMBA_NUM_THREADS environment variable is set.

numba.set_num_threads(n)[源代码]

设置用于并行执行的线程数。

By default, all numba.config.NUMBA_NUM_THREADS threads are used.

This functionality works by masking out threads that are not used. Therefore, the number of threads n must be less than or equal to NUMBA_NUM_THREADS, the total number of threads that are launched. See its documentation for more details.

此函数可以在 jitted 函数内部使用。

参数:

n: 线程数。必须在1和NUMBA_NUM_THREADS之间。

参见

get_num_threads, numba.config.NUMBA_NUM_THREADS
numba.config.NUMBA_DEFAULT_NUM_THREADS, NUMBA_NUM_THREADS

numba.get_num_threads()[源代码]

获取用于并行执行的线程数。

By default (if set_num_threads() is never called), all numba.config.NUMBA_NUM_THREADS threads are used.

This number is less than or equal to the total number of threads that are launched, numba.config.NUMBA_NUM_THREADS.

此函数可以在 jitted 函数内部使用。

返回:

线程的数量。

参见

set_num_threads, numba.config.NUMBA_NUM_THREADS
numba.config.NUMBA_DEFAULT_NUM_THREADS, NUMBA_NUM_THREADS

numba.get_thread_id()[源代码]: 返回范围内每个线程的唯一ID，范围从0（包含）到 :func:`~.get_num_threads`（不包含）。