⌘+k ctrl+k
Search Shortcut cmd + k | ctrl + k
DuckDB with Ibis

Ibis is a Python library that allows queries to be written in a pythonic relational style and then be compiled into SQL. Ibis supports multiple database backends, including DuckDB by using DuckDB's SQLAlchemy driver. Ibis expressions can also be combined with SQL statements.

Installation

To install only the DuckDB backend for Ibis, use the commands below. See the Ibis DuckDB installation instructions for a conda alternative. Note that DuckDB support was added in Ibis version 3.0.0.

pip install 'ibis-framework[duckdb]' # duckdb, sqlalchemy, duckdb_engine and more are installed as dependencies

Querying DuckDB with Ibis

The following example is loosely borrowed from the Introduction to Ibis tutorial, which uses SQLite. First, we import Ibis, set it to interactive mode (just for demo purposes - it is faster to not use this option!), and then connect to an in-memory DuckDB instance. We can then inspect the tables in our database.

import ibis
ibis.options.interactive = True # Use eager evaluation. Use only for demo purposes!

connection = ibis.duckdb.connect(':memory:') # Use an In Memory DuckDB
# connection = ibis.duckdb.connect('/path/to/my_db.db') # Use or create a physical DuckDB at this path

print(connection.list_tables())
# Output:
['pragma_database_list', 'duckdb_tables', 'duckdb_views', 'duckdb_indexes',
 'sqlite_master', 'sqlite_schema', 'sqlite_temp_master', 'sqlite_temp_schema', 
 'duckdb_constraints', 'duckdb_columns', 'duckdb_schemas', 'duckdb_types']

We then create a handler to a specific table to be able to explore it further. Here we use a built in table called duckdb_types for simplicity. The first thing we want to see is a list of columns.

duckdb_types_table = connection.table('duckdb_types')
print(duckdb_types_table.columns)
# Output:
['schema_name', 'schema_oid', 'type_oid', 'type_name', 'type_size', 'type_category', 'internal']

To access only certain columns, use bracket syntax on the table handler. We can also apply functions to transform the data, for example to show only distinct values. Use the compile function to see the SQL query that Ibis generates.

print(duckdb_types_table['type_category', 'type_size'].distinct())
print(duckdb_types_table['type_category', 'type_size'].distinct().compile())
type_category type_size
BOOLEAN 1
NUMERIC 1
NUMERIC 2
NUMERIC 4
NUMERIC 8
DATETIME 4
DATETIME 8
STRING 16
NaN 16
DATETIME 16
NUMERIC 16
COMPOSITE 16
COMPOSITE 0
NUMERIC NaN
SELECT DISTINCT t0.type_category, t0.type_size 
FROM duckdb_types AS t0

Multiple methods can be chained together to build up more complex expressions. This statement selects a subset of columns, filters to rows containing a specific value in one column, and sorts by another column. The Ibis-generated SQL is shown below. Note that it uses a parameter as a part of the filter function.

print(duckdb_types_table['type_name','type_category', 'type_size']
    .filter(duckdb_types_table['type_category'] == 'NUMERIC')
    .sort_by('type_size'))

print(duckdb_types_table['type_name','type_category', 'type_size']
    .filter(duckdb_types_table['type_category'] == 'NUMERIC')
    .sort_by('type_size').compile())
type_name type_category type_size
DECIMAL NUMERIC NaN
TINYINT NUMERIC 1
UTINYINT NUMERIC 1
SMALLINT NUMERIC 2
USMALLINT NUMERIC 2
INTEGER NUMERIC 4
FLOAT NUMERIC 4
UINTEGER NUMERIC 4
BIGINT NUMERIC 8
DOUBLE NUMERIC 8
UBIGINT NUMERIC 8
HUGEINT NUMERIC 16
SELECT t0.type_name, t0.type_category, t0.type_size 
FROM (SELECT t1.type_name AS type_name, t1.type_category AS type_category, t1.type_size AS type_size 
FROM duckdb_types AS t1 
WHERE t1.type_category = CAST(? AS TEXT)) AS t0 ORDER BY t0.type_size

Combining SQL and Ibis Expressions

Ibis can also be used to combine SQL and relational operators. SQL can precede or follow Ibis relational operations.

print(duckdb_types_table.sql("""
    SELECT 
        *,
        dense_rank() over (order by type_size) as size_rank 
    FROM duckdb_types""")
    .group_by('type_category')   
    .aggregate(avg_size_rank=lambda t:t.size_rank.mean())

print(duckdb_types_table.sql("""
    SELECT 
        *,
        dense_rank() over (order by type_size) as size_rank 
    FROM duckdb_types""")
    .group_by('type_category')   
    .aggregate(avg_size_rank=lambda t:t.size_rank.mean()).compile())
type_category avg_size_rank
NUMERIC 4.583333333333333
COMPOSITE 3.6666666666666665
BOOLEAN 3.0
DATETIME 6.0
STRING 7.0
NaN 7.0
WITH _ibis_view_11 AS 
(
    SELECT 
        *,
        dense_rank() over (order by type_size) as size_rank 
    FROM duckdb_types)
 SELECT t0.type_category, avg(t0.size_rank) AS avg_size_rank 
FROM _ibis_view_11 AS t0 GROUP BY t0.type_category

To learn more about Ibis, feel free to continue with the Ibis introductory tutorial!