Index Types
DuckDB currently uses two index types:
- A min-max index (also known as zonemap and block range index) is automatically created for columns of all general-purpose data types.
- An Adaptive Radix Tree (ART) is mainly used to ensure primary key constraints and to speed up point and very highly selective (i.e., < 0.1%) queries. Such an index is automatically created for columns with a
UNIQUE
orPRIMARY KEY
constraint and can be defined usingCREATE INDEX
.
Joins on columns with an ART index can make use of the index join algorithm.
Index joins are disabled by default, forcing them is possible by issuing the following PRAGMA
:
PRAGMA force_index_join;
ART indexes must currently be able to fit in-memory. Avoid creating ART indexes if the index does not fit in memory.
Persistence
Both min-max indexes and ART indexes are persisted on disk.
Create Index
CREATE INDEX
constructs an index on the specified column(s) of the specified table. Compound indexes on multiple columns/expressions are supported.
Unidimensional indexes are supported, while multidimensional indexes are not yet supported.
Parameters
Name | Description |
---|---|
UNIQUE |
Causes the system to check for duplicate values in the table when the index is created (if data already exist) and each time data is added. Attempts to insert or update data that would result in duplicate entries will generate an error. |
name |
The name of the index to be created. |
table |
The name of the table to be indexed. |
column |
The name of the column to be indexed. |
expression |
An expression based on one or more columns of the table. The expression usually must be written with surrounding parentheses, as shown in the syntax. However, the parentheses can be omitted if the expression has the form of a function call. |
Examples
-- Create a unique index 'films_id_idx' on the column id of table films.
CREATE UNIQUE INDEX films_id_idx ON films (id);
-- Create index 's_idx' that allows for duplicate values on column revenue of table films.
CREATE INDEX s_idx ON films (revenue);
-- Create compound index 'gy_idx' on genre and year columns.
CREATE INDEX gy_idx ON films (genre, year);
-- Create index 'i_index' on the expression of the sum of columns j and k from table integers.
CREATE INDEX i_index ON integers ((j+k));
Drop Index
DROP INDEX
drops an existing index from the database system.
Parameters
Name | Description |
---|---|
IF EXISTS |
Do not throw an error if the index does not exist. |
name |
The name of an index to remove. |
Examples
-- Remove the index title_idx.
DROP INDEX title_idx;
Index Limitations
ART indexes create a secondary copy of the data in a second location - this complicates processing, particularly when combined with transactions. Certain limitations apply when it comes to modifying data that is also stored in secondary indexes.
Updates Become Deletes and Inserts
When an update statement is executed on a column that is present in an index - the statement is transformed into a delete of the original row followed by an insert. This has certain performance implications, particularly for wide tables, as entire rows are rewritten instead of only the affected columns.
Over-Eager Unique Constraint Checking
Due to the presence of transactions, data can only be removed from the index after (1) the transaction that performed the delete is committed, and (2) no further transactions exist that refer to the old entry still present in the index. As a result of this - transactions that perform deletions followed by insertions may trigger unexpected unique constraint violations, as the deleted tuple has not actually been removed from the index yet. For example:
CREATE TABLE students(id INTEGER PRIMARY KEY, name VARCHAR);
INSERT INTO students VALUES (1, 'Student 1');
BEGIN;
DELETE FROM students WHERE id=1;
INSERT INTO students VALUES (1, 'Student 2');
-- Constraint Error: Duplicate key "id: 1" violates primary key constraint
This, combined with the fact that updates are turned into deletions and insertions within the same transaction, means that updating rows in the presence of unique or primary key constraints can often lead to unexpected unique constraint violations.
CREATE TABLE students(id INTEGER PRIMARY KEY, name VARCHAR);
INSERT INTO students VALUES (1, 'Student 1');
UPDATE students SET name='Student 2', id=1 WHERE id=1;
-- Constraint Error: Duplicate key "id: 1" violates primary key constraint
Currently this is an expected limitation of the system - although we aim to resolve this in the future.