- Installation
- Guides
- Overview
- Data Import & Export
- CSV Import
- CSV Export
- Parquet Import
- Parquet Export
- Query Parquet
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- JSON Import
- JSON Export
- Excel Import
- Excel Export
- SQLite Import
- PostgreSQL Import
- Meta Queries
- ODBC
- Python
- Install
- Execute SQL
- Jupyter Notebooks
- SQL on Pandas
- Import from Pandas
- Export to Pandas
- SQL on Arrow
- Import from Arrow
- Export to Arrow
- Relational API on Pandas
- Multiple Python Threads
- DuckDB with Ibis
- DuckDB with Polars
- DuckDB with Vaex
- DuckDB with DataFusion
- DuckDB with fsspec Filesystems
- SQL Features
- SQL Editors
- Data Viewers
- Documentation
- Connect
- Data Import
- Overview
- CSV Files
- JSON Files
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- Insert Statements
- Client APIs
- Overview
- C
- Overview
- Startup
- Configure
- Query
- Data Chunks
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Java
- Julia
- Node.js
- Python
- Overview
- Data Ingestion
- Result Conversion
- DB API
- Relational API
- Function API
- Types API
- Expression API
- Spark API
- API Reference
- Known Python Issues
- R
- Rust
- Scala
- Swift
- Wasm
- ADBC
- ODBC
- SQL
- Introduction
- Statements
- Overview
- Alter Table
- Alter View
- Attach/Detach
- Call
- Checkpoint
- Copy
- Create Macro
- Create Schema
- Create Sequence
- Create Table
- Create View
- Create Type
- Delete
- Drop
- Export
- Insert
- Pivot
- Select
- Set/Reset
- Unpivot
- Update
- Use
- Vacuum
- Query Syntax
- SELECT
- FROM & JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT
- SAMPLE
- UNNEST
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Data Types
- Overview
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Map
- NULL Values
- Numeric
- Struct
- Text
- Time
- Timestamp
- Time Zones
- Union
- Expressions
- Functions
- Overview
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp with Time Zone Functions
- Utility Functions
- Aggregates
- Configuration
- Constraints
- Indexes
- Information Schema
- Metadata Functions
- Pragmas
- Rules for Case Sensitivity
- Samples
- Window Functions
- Extensions
- Sitemap
- Why DuckDB
- Media
- FAQ
- Code of Conduct
- Live Demo
Examples
-- write a table to a hive partitioned data set of parquet files
COPY orders TO 'orders' (FORMAT PARQUET, PARTITION_BY (year, month));
-- write a table to a hive partitioned data set of CSV files, allowing overwrites
COPY orders TO 'orders' (FORMAT CSV, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE 1);
Partitioned Writes
When the partition_by
clause is specified for the COPY
statement, the files are written in a hive partitioned folder hierarchy. The target is the name of the root directory (in the example above: orders
). The files are written in-order in the file hierarchy. Currently, one file is written per thread to each directory.
orders
├── year=2021
│ ├── month=1
│ │ ├── data_1.parquet
│ │ └── data_2.parquet
│ └── month=2
│ └── data_1.parquet
└── year=2022
├── month=11
│ ├── data_1.parquet
│ └── data_2.parquet
└── month=12
└── data_1.parquet
The values of the partitions are automatically extracted from the data. Note that it can be very expensive to write many partitions as many files will be created. The ideal partition count depends on how large your data set is.
Writing data into many small partitions is expensive. It is generally recommended to have at least
100MB
of data per partition.
Overwriting
By default the partitioned write will not allow overwriting existing directories. Use the OVERWRITE_OR_IGNORE
option to allow overwriting an existing directory.
Filename Pattern
By default, files will be named data_0.parquet
or data_0.csv
. With the flag FILENAME_PATTERN
a pattern with {i}
or {uuid}
can be defined to create specific filenames:
{i}
will be replaced by an index{uuid}
will be replaced by a 128 bits long UUID
-- write a table to a hive partitioned data set of .parquet files, with an index in the filename
COPY orders TO 'orders' (FORMAT PARQUET, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE, FILENAME_PATTERN "orders_{i}");
-- write a table to a hive partitioned data set of .parquet files, with unique filenames
COPY orders TO 'orders' (FORMAT PARQUET, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE, FILENAME_PATTERN "file_{uuid}");