DuckDB 提供了一个关系型 API,可以用于将查询操作链接在一起。这些操作是惰性求值的,因此 DuckDB 可以优化它们的执行。这些操作符可以作用于 Pandas 数据帧、DuckDB 表或视图(可以指向 DuckDB 可以读取的任何底层存储格式,例如 CSV 或 Parquet 文件等)。这里我们展示了一个从 Pandas 数据帧读取并返回数据帧的简单示例。
import duckdb
import pandas
# connect to an in-memory database
con = duckdb.connect()
input_df = pandas.DataFrame.from_dict({'i': [1, 2, 3, 4],
'j': ["one", "two", "three", "four"]})
# create a DuckDB relation from a dataframe
rel = con.from_df(input_df)
# chain together relational operators (this is a lazy operation, so the operations are not yet executed)
# equivalent to: SELECT i, j, i*2 AS two_i FROM input_df WHERE i >= 2 ORDER BY i DESC LIMIT 2
transformed_rel = rel.filter('i >= 2').project('i, j, i*2 AS two_i').order('i DESC').limit(2)
# trigger execution by requesting .df() of the relation
# .df() could have been added to the end of the chain above - it was separated for clarity
output_df = transformed_rel.df()
关系运算符也可以用于分组行、聚合、查找不同的值组合、连接、联合等。它们还能够直接将结果插入到DuckDB表中或写入CSV文件。