- Installation
- Guides
- Overview
- SQL Features
- Data Import & Export
- CSV Import
- CSV Export
- Parquet Import
- Parquet Export
- Query Parquet
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- JSON Import
- JSON Export
- Excel Import
- Excel Export
- SQLite Import
- Postgres Import
- Meta Queries
- Python
- Install
- Execute SQL
- Jupyter Notebooks
- SQL on Pandas
- Import From Pandas
- Export To Pandas
- SQL on Arrow
- Import From Arrow
- Export To Arrow
- Relational API on Pandas
- Multiple Python Threads
- DuckDB with Ibis
- DuckDB with Fugue
- DuckDB with Polars
- DuckDB with Vaex
- DuckDB with DataFusion
- DuckDB with fsspec filesystems
- SQL Editors
- Data Viewers
- Documentation
- Connect
- Data Import
- Overview
- CSV Files
- JSON Files
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- Insert Statements
- Client APIs
- Overview
- C
- Overview
- Startup
- Configure
- Query
- Data Chunks
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Java
- Julia
- Node.js
- ODBC
- Python
- Overview
- Data Ingestion
- Result Conversion
- DB API
- Relational API
- Function API
- Types API
- API Reference
- R
- Rust
- Scala
- Swift
- Wasm
- SQL
- Introduction
- Statements
- Overview
- Alter Table
- Attach/Detach
- Call
- Checkpoint
- Copy
- Create Macro
- Create Schema
- Create Sequence
- Create Table
- Create View
- Delete
- Drop
- Export
- Insert
- Pivot
- Select
- Set/Reset
- Unpivot
- Update
- Use
- Vacuum
- Query Syntax
- SELECT
- FROM & JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT
- SAMPLE
- UNNEST
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Data Types
- Overview
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Map
- NULL Values
- Numeric
- Struct
- Text
- Timestamp
- Union
- Expressions
- Functions
- Overview
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp With Time Zone Functions
- Utility Functions
- Aggregates
- Configuration
- Constraints
- Indexes
- Information Schema
- Metadata Functions
- Pragmas
- Samples
- Window Functions
- Extensions
- Sitemap
- Why DuckDB
- Media
- FAQ
- Code of Conduct
- Live Demo
This feature is experimental, and is subject to change
DuckDB support for fsspec
filesystems allows querying data in filesystems that DuckDB's httpfs
extension does not support. fsspec
has a large number of inbuilt filesystems, and there are also many external implementations. This capability is only available in DuckDB's Python client because fsspec
is a Python library, while the httpfs
extension is available in many DuckDB clients.
Example
The following is an example of using fsspec
to query a file in Google Cloud Storage (instead of using their s3 inter-compatibility api).
Firstly, you must install duckdb
and fsspec
, and a filesystem interface of your choice
$ pip install duckdb fsspec gcsfs
then you can register whichever filesystem you'd like to query
import duckdb
from fsspec import filesystem
# this line will throw an exception if the appropriate filesystem interface is not installed
duckdb.register_filesystem(filesystem('gcs'))
duckdb.sql("select * from read_csv_auto('gcs:///bucket/file.csv')")
Potential issues
Please also note, that as these filesystems are not implemented in C++, their performance may not be comparable to the ones provided by the httpfs
extension.
It's also worth noting that as they are third party libraries, they may contain bugs that are beyond our control.