关于 LLVM 时间安排的注意事项
获取 LLVM 传递时间
当启用 NUMBA_LLVM_PASS_TIMINGS
或 numba.config.LLVM_PASS_TIMINGS
设置为真值时,调度器会将 LLVM 传递计时存储在 llvm_pass_timings
键下的调度器对象元数据中。计时信息包含每个传递所花费时间的详细信息。传递计时也按其目的分组。例如,会有函数级预优化、模块级优化和对象代码生成的传递计时。
代码示例
1import numba
2
3@numba.njit
4def foo(n):
5 c = 0
6 for i in range(n):
7 for j in range(i):
8 c += j
9 return c
10
11foo(10)
12md = foo.get_metadata(foo.signatures[0])
13print(md['llvm_pass_timings'])
示例输出:
Printing pass timings for JITCodeLibrary('DocsLLVMPassTimings.test_pass_timings.<locals>.foo')
Total time: 0.0376
== #0 Function passes on '_ZN5numba5tests12doc_examples22test_llvm_pass_timings19DocsLLVMPassTimings17test_pass_timings12$3clocals$3e7foo$241Ex'
Percent: 4.8%
Total 0.0018s
Top timings:
0.0015s ( 81.6%) SROA #3
0.0002s ( 9.3%) Early CSE #2
0.0001s ( 4.0%) Simplify the CFG #9
0.0000s ( 1.5%) Prune NRT refops #4
0.0000s ( 1.1%) Post-Dominator Tree Construction #5
== #1 Function passes on '_ZN7cpython5numba5tests12doc_examples22test_llvm_pass_timings19DocsLLVMPassTimings17test_pass_timings12$3clocals$3e7foo$241Ex'
Percent: 0.8%
Total 0.0003s
Top timings:
0.0001s ( 30.4%) Simplify the CFG #10
0.0001s ( 24.1%) Early CSE #3
0.0001s ( 17.8%) SROA #4
0.0000s ( 8.8%) Prune NRT refops #5
0.0000s ( 5.6%) Post-Dominator Tree Construction #6
== #2 Function passes on 'cfunc._ZN5numba5tests12doc_examples22test_llvm_pass_timings19DocsLLVMPassTimings17test_pass_timings12$3clocals$3e7foo$241Ex'
Percent: 0.5%
Total 0.0002s
Top timings:
0.0001s ( 27.7%) Early CSE #4
0.0001s ( 26.8%) Simplify the CFG #11
0.0000s ( 13.8%) Prune NRT refops #6
0.0000s ( 7.4%) Post-Dominator Tree Construction #7
0.0000s ( 6.7%) Dominator Tree Construction #29
== #3 Module passes (cheap optimization for refprune)
Percent: 3.7%
Total 0.0014s
Top timings:
0.0007s ( 52.0%) Combine redundant instructions
0.0001s ( 5.4%) Function Integration/Inlining
0.0001s ( 4.9%) Prune NRT refops #2
0.0001s ( 4.8%) Natural Loop Information
0.0001s ( 4.6%) Post-Dominator Tree Construction #2
== #4 Module passes (full optimization)
Percent: 43.9%
Total 0.0165s
Top timings:
0.0032s ( 19.5%) Combine redundant instructions #9
0.0022s ( 13.5%) Combine redundant instructions #7
0.0010s ( 6.1%) Induction Variable Simplification
0.0008s ( 4.8%) Unroll loops #2
0.0007s ( 4.5%) Loop Vectorization
== #5 Finalize object
Percent: 46.3%
Total 0.0174s
Top timings:
0.0060s ( 34.6%) X86 DAG->DAG Instruction Selection #2
0.0019s ( 11.0%) Greedy Register Allocator #2
0.0013s ( 7.4%) Machine Instruction Scheduler #2
0.0012s ( 7.1%) Loop Strength Reduction
0.0004s ( 2.3%) Induction Variable Users
自定义分析的API
可以获取比上述示例中的摘要文本更详细的信息。传递时间存储在 numba.misc.llvm_pass_timings.PassTimingsCollection
中,该类包含用于访问每个传递的单个记录的方法。
- class numba.misc.llvm_pass_timings.PassTimingsCollection(name)[源代码]
一系列的通行时间。
此类实现了
Sequence
协议,用于访问各个时间记录。
- class numba.misc.llvm_pass_timings.ProcessedPassTimings(raw_data)[源代码]
一个用于处理来自LLVM的原始定时报告的类。
处理是懒惰进行的,因此我们不会浪费时间处理未使用的定时信息。
- class numba.misc.llvm_pass_timings.PassTimingRecord(user_time, user_percent, system_time, system_percent, user_system_time, user_system_percent, wall_time, wall_percent, pass_name, instruction)