Data chunks represent a horizontal slice of a table. They hold a number of vectors, that can each hold up to the VECTOR_SIZE
rows. The vector size can be obtained through the duckdb_vector_size
function and is configurable, but is usually set to 1024
.
Data chunks and vectors are what DuckDB uses natively to store and represent data. For this reason, the data chunk interface is the most efficient way of interfacing with DuckDB. Be aware, however, that correctly interfacing with DuckDB using the data chunk API does require knowledge of DuckDB's internal vector format.
The primary manner of interfacing with data chunks is by obtaining the internal vectors of the data chunk using the duckdb_data_chunk_get_vector
method, and subsequently using the duckdb_vector_get_data
and duckdb_vector_get_validity
methods to read the internal data and the validity mask of the vector. For composite types (list and struct vectors), duckdb_list_vector_get_child
and duckdb_struct_vector_get_child
should be used to read child vectors.
API Reference
duckdb_data_chunk duckdb_create_data_chunk(duckdb_logical_type *types, idx_t column_count);
void duckdb_destroy_data_chunk(duckdb_data_chunk *chunk);
void duckdb_data_chunk_reset(duckdb_data_chunk chunk);
idx_t duckdb_data_chunk_get_column_count(duckdb_data_chunk chunk);
duckdb_vector duckdb_data_chunk_get_vector(duckdb_data_chunk chunk, idx_t col_idx);
idx_t duckdb_data_chunk_get_size(duckdb_data_chunk chunk);
void duckdb_data_chunk_set_size(duckdb_data_chunk chunk, idx_t size);
Vector Interface
duckdb_logical_type duckdb_vector_get_column_type(duckdb_vector vector);
void *duckdb_vector_get_data(duckdb_vector vector);
uint64_t *duckdb_vector_get_validity(duckdb_vector vector);
void duckdb_vector_ensure_validity_writable(duckdb_vector vector);
void duckdb_vector_assign_string_element(duckdb_vector vector, idx_t index, const char *str);
void duckdb_vector_assign_string_element_len(duckdb_vector vector, idx_t index, const char *str, idx_t str_len);
duckdb_vector duckdb_list_vector_get_child(duckdb_vector vector);
idx_t duckdb_list_vector_get_size(duckdb_vector vector);
duckdb_state duckdb_list_vector_set_size(duckdb_vector vector, idx_t size);
duckdb_state duckdb_list_vector_reserve(duckdb_vector vector, idx_t required_capacity);
duckdb_vector duckdb_struct_vector_get_child(duckdb_vector vector, idx_t index);
Validity Mask Functions
bool duckdb_validity_row_is_valid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_validity(uint64_t *validity, idx_t row, bool valid);
void duckdb_validity_set_row_invalid(uint64_t *validity, idx_t row);
void duckdb_validity_set_row_valid(uint64_t *validity, idx_t row);
duckdb_create_data_chunk
Creates an empty DataChunk with the specified set of types.
Syntax
duckdb_data_chunk duckdb_create_data_chunk(
duckdb_logical_type *types,
idx_t column_count
);
Parameters
types
An array of types of the data chunk.
column_count
The number of columns.
returns
The data chunk.
duckdb_destroy_data_chunk
Destroys the data chunk and de-allocates all memory allocated for that chunk.
Syntax
void duckdb_destroy_data_chunk(
duckdb_data_chunk *chunk
);
Parameters
chunk
The data chunk to destroy.
duckdb_data_chunk_reset
Resets a data chunk, clearing the validity masks and setting the cardinality of the data chunk to 0.
Syntax
void duckdb_data_chunk_reset(
duckdb_data_chunk chunk
);
Parameters
chunk
The data chunk to reset.
duckdb_data_chunk_get_column_count
Retrieves the number of columns in a data chunk.
Syntax
idx_t duckdb_data_chunk_get_column_count(
duckdb_data_chunk chunk
);
Parameters
chunk
The data chunk to get the data from
returns
The number of columns in the data chunk
duckdb_data_chunk_get_vector
Retrieves the vector at the specified column index in the data chunk.
The pointer to the vector is valid for as long as the chunk is alive. It does NOT need to be destroyed.
Syntax
duckdb_vector duckdb_data_chunk_get_vector(
duckdb_data_chunk chunk,
idx_t col_idx
);
Parameters
chunk
The data chunk to get the data from
returns
The vector
duckdb_data_chunk_get_size
Retrieves the current number of tuples in a data chunk.
Syntax
idx_t duckdb_data_chunk_get_size(
duckdb_data_chunk chunk
);
Parameters
chunk
The data chunk to get the data from
returns
The number of tuples in the data chunk
duckdb_data_chunk_set_size
Sets the current number of tuples in a data chunk.
Syntax
void duckdb_data_chunk_set_size(
duckdb_data_chunk chunk,
idx_t size
);
Parameters
chunk
The data chunk to set the size in
size
The number of tuples in the data chunk
duckdb_vector_get_column_type
Retrieves the column type of the specified vector.
The result must be destroyed with duckdb_destroy_logical_type
.
Syntax
duckdb_logical_type duckdb_vector_get_column_type(
duckdb_vector vector
);
Parameters
vector
The vector get the data from
returns
The type of the vector
duckdb_vector_get_data
Retrieves the data pointer of the vector.
The data pointer can be used to read or write values from the vector. How to read or write values depends on the type of the vector.
Syntax
void *duckdb_vector_get_data(
duckdb_vector vector
);
Parameters
vector
The vector to get the data from
returns
The data pointer
duckdb_vector_get_validity
Retrieves the validity mask pointer of the specified vector.
If all values are valid, this function MIGHT return NULL!
The validity mask is a bitset that signifies null-ness within the data chunk. It is a series of uint64_t values, where each uint64_t value contains validity for 64 tuples. The bit is set to 1 if the value is valid (i.e. not NULL) or 0 if the value is invalid (i.e. NULL).
Validity of a specific value can be obtained like this:
idx_t entry_idx = row_idx / 64; idx_t idx_in_entry = row_idx % 64; bool is_valid = validity_mask[entry_idx] & (1 « idx_in_entry);
Alternatively, the (slower) duckdb_validity_row_is_valid function can be used.
Syntax
uint64_t *duckdb_vector_get_validity(
duckdb_vector vector
);
Parameters
vector
The vector to get the data from
returns
The pointer to the validity mask, or NULL if no validity mask is present
duckdb_vector_ensure_validity_writable
Ensures the validity mask is writable by allocating it.
After this function is called, duckdb_vector_get_validity
will ALWAYS return non-NULL.
This allows null values to be written to the vector, regardless of whether a validity mask was present before.
Syntax
void duckdb_vector_ensure_validity_writable(
duckdb_vector vector
);
Parameters
vector
The vector to alter
duckdb_vector_assign_string_element
Assigns a string element in the vector at the specified location.
Syntax
void duckdb_vector_assign_string_element(
duckdb_vector vector,
idx_t index,
const char *str
);
Parameters
vector
The vector to alter
index
The row position in the vector to assign the string to
str
The null-terminated string
duckdb_vector_assign_string_element_len
Assigns a string element in the vector at the specified location.
Syntax
void duckdb_vector_assign_string_element_len(
duckdb_vector vector,
idx_t index,
const char *str,
idx_t str_len
);
Parameters
vector
The vector to alter
index
The row position in the vector to assign the string to
str
The string
str_len
The length of the string (in bytes)
duckdb_list_vector_get_child
Retrieves the child vector of a list vector.
The resulting vector is valid as long as the parent vector is valid.
Syntax
duckdb_vector duckdb_list_vector_get_child(
duckdb_vector vector
);
Parameters
vector
The vector
returns
The child vector
duckdb_list_vector_get_size
Returns the size of the child vector of the list
Syntax
idx_t duckdb_list_vector_get_size(
duckdb_vector vector
);
Parameters
vector
The vector
returns
The size of the child list
duckdb_list_vector_set_size
Sets the total size of the underlying child-vector of a list vector.
Syntax
duckdb_state duckdb_list_vector_set_size(
duckdb_vector vector,
idx_t size
);
Parameters
vector
The list vector.
size
The size of the child list.
returns
The duckdb state. Returns DuckDBError if the vector is nullptr.
duckdb_list_vector_reserve
Sets the total capacity of the underlying child-vector of a list.
Syntax
duckdb_state duckdb_list_vector_reserve(
duckdb_vector vector,
idx_t required_capacity
);
Parameters
vector
The list vector.
required_capacity
the total capacity to reserve.
return
The duckdb state. Returns DuckDBError if the vector is nullptr.
duckdb_struct_vector_get_child
Retrieves the child vector of a struct vector.
The resulting vector is valid as long as the parent vector is valid.
Syntax
duckdb_vector duckdb_struct_vector_get_child(
duckdb_vector vector,
idx_t index
);
Parameters
vector
The vector
index
The child index
returns
The child vector
duckdb_validity_row_is_valid
Returns whether or not a row is valid (i.e. not NULL) in the given validity mask.
Syntax
bool duckdb_validity_row_is_valid(
uint64_t *validity,
idx_t row
);
Parameters
validity
The validity mask, as obtained through duckdb_data_chunk_get_validity
row
The row index
returns
true if the row is valid, false otherwise
duckdb_validity_set_row_validity
In a validity mask, sets a specific row to either valid or invalid.
Note that duckdb_data_chunk_ensure_validity_writable
should be called before calling duckdb_data_chunk_get_validity
,
to ensure that there is a validity mask to write to.
Syntax
void duckdb_validity_set_row_validity(
uint64_t *validity,
idx_t row,
bool valid
);
Parameters
validity
The validity mask, as obtained through duckdb_data_chunk_get_validity
.
row
The row index
valid
Whether or not to set the row to valid, or invalid
duckdb_validity_set_row_invalid
In a validity mask, sets a specific row to invalid.
Equivalent to duckdb_validity_set_row_validity
with valid set to false.
Syntax
void duckdb_validity_set_row_invalid(
uint64_t *validity,
idx_t row
);
Parameters
validity
The validity mask
row
The row index
duckdb_validity_set_row_valid
In a validity mask, sets a specific row to valid.
Equivalent to duckdb_validity_set_row_validity
with valid set to true.
Syntax
void duckdb_validity_set_row_valid(
uint64_t *validity,
idx_t row
);
Parameters
validity
The validity mask
row
The row index