⌘+k ctrl+k
1.1.3 (stable)
Search Shortcut cmd + k | ctrl + k
Export to Apache Arrow

查询的所有结果都可以使用arrow函数导出到Apache Arrow Table。或者,可以使用fetch_record_batch函数将结果作为RecordBatchReader返回,并且可以一次读取一批结果。此外,使用DuckDB的Relational API构建的关系也可以导出。

导出到Arrow表

import duckdb
import pyarrow as pa

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],
                                       'j': ["one", "two", "three", "four"]})

# query the Apache Arrow Table "my_arrow_table" and return as an Arrow Table
results = duckdb.sql("SELECT * FROM my_arrow_table").arrow()

导出为 RecordBatchReader

import duckdb
import pyarrow as pa

my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4],
                                       'j': ["one", "two", "three", "four"]})

# query the Apache Arrow Table "my_arrow_table" and return as an Arrow RecordBatchReader
chunk_size = 1_000_000
results = duckdb.sql("SELECT * FROM my_arrow_table").fetch_record_batch(chunk_size)

# Loop through the results. A StopIteration exception is thrown when the RecordBatchReader is empty
while True:
    try:
        # Process a single chunk here (just printing as an example)
        print(results.read_next_batch().to_pandas())
    except StopIteration:
        print('Already fetched all batches')
        break

从关系型API导出

Arrow 对象也可以从关系 API 中导出。可以使用 arrowto_arrow_table 函数将关系转换为 Arrow 表,或者使用 record_batch 转换为记录批次。 可以使用 arrow 或其别名 fetch_arrow_table 将结果导出为 Arrow 表,或者使用 fetch_arrow_reader 导出为 RecordBatchReader。

import duckdb

# connect to an in-memory database
con = duckdb.connect()

con.execute('CREATE TABLE integers (i integer)')
con.execute('INSERT INTO integers VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (NULL)')

# Create a relation from the table and export the entire relation as Arrow
rel = con.table("integers")
relation_as_arrow = rel.arrow() # or .to_arrow_table()

# Or, calculate a result using that relation and export that result to Arrow
res = rel.aggregate("sum(i)").execute()
result_as_arrow = res.arrow() # or fetch_arrow_table()