The spatial extension provides support for geospatial data processing in DuckDB.
GEOMETRY type
The core of the spatial extension is the GEOMETRY
type. If you're unfamiliar with geospatial data and GIS tooling, this type is probably works very different from what you'd expect.
In short, while the GEOMETRY
type is a binary representation of "geometry" data made up out of sets of vertices (pairs of X and Y double
precision floats), it actually stores one of several geometry subtypes. These are POINT
, LINESTRING
, POLYGON
, as well as their "collection" equivalents, MULTIPOINT
, MULTILINESTRING
and MULTIPOLYGON
. Lastly there is GEOMETRYCOLLECTION
, which can contain any of the other subtypes, as well as other GEOMETRYCOLLECTION
s recursively.
This may seem strange at first, since DuckDB already have types like LIST
, STRUCT
and UNION
which could be used in a similar way, but the design and behaviour of the GEOMETRY
type is actually based on the Simple Features geometry model, which is a standard used by many other databases and GIS software.
That said, the spatial extension also includes a couple of experimental non-standard explicit geometry types, such as POINT_2D
, LINESTRING_2D
, POLYGON_2D
and BOX_2D
that are based on DuckDBs native nested types, such as structs and lists. In theory it should be possible to optimize a lot of operations for these types much better than for the GEOMETRY
type (which is just a binary blob), but only a couple functions are implemented so far.
All of these are implicitly castable to GEOMETRY
but with a conversion cost, so the GEOMETRY
type is still the recommended type to use for now if you are planning to work with a lot of different spatial functions.
GEOMETRY
is not currently capable of storing additional geometry types, Z/M coordinates, or SRID information. These features may be added in the future.
Spatial scalar Functions
The spatial extension implements a large number of scalar functions and overloads. Most of these are implemented using the GEOS library, but we'd like to implement more of them natively in this extension to better utilize DuckDB's vectorized execution and memory management. The following symbols are used to indicate which implementation is used:
🧭 - GEOS - functions that are implemented using the GEOS library
🦆 - DuckDB - functions that are implemented natively in this extension that are capable of operating directly on the DuckDB types
🔄 - CAST(GEOMETRY) - functions that are supported by implicitly casting to GEOMETRY
and then using the GEOMETRY
implementation
The currently implemented spatial functions can roughly be categorized into the following groups:
Geometry Conversion
Convert between geometries and other formats.
Scalar functions | GEOMETRY | POINT_2D | LINESTRING_2D | POLYGON_2D | BOX_2D |
---|---|---|---|---|---|
VARCHAR ST_AsHEXWKB(GEOMETRY) | 🦆 | 🦆 | 🦆 | 🦆 | 🦆 |
VARCHAR ST_AsText(GEOMETRY) | 🧭 | 🦆 | 🦆 | 🦆 | 🔄 (as POLYGON) |
WKB_BLOB ST_AsWKB(GEOMETRY) | 🦆 | 🦆 | 🦆 | 🦆 | 🦆 |
GEOMETRY ST_GeomFromText(VARCHAR) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
GEOMETRY ST_GeomFromWKB(BLOB) | 🦆 | 🦆 | 🦆 | 🦆 | 🔄 (as POLYGON) |
VARCHAR ST_AsGeoJSON(VARCHAR) | 🦆 | 🦆 | 🦆 | 🦆 | 🔄 (as POLYGON) |
Geometry Construction
Construct new geometries from other geometries or other data.
Scalar functions | GEOMETRY | POINT_2D | LINESTRING_2D | POLYGON_2D | BOX_2D |
---|---|---|---|---|---|
GEOMETRY ST_Point(DOUBLE, DOUBLE) | 🦆 | 🦆 | |||
GEOMETRY ST_ConvexHull(GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
GEOMETRY ST_Boundary(GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
GEOMETRY ST_Buffer(GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
GEOMETRY ST_Centroid(GEOMETRY) | 🧭 | 🦆 | 🦆 | 🦆 | 🦆 |
GEOMETRY ST_Collect(GEOMETRY[]) | 🦆 | 🦆 | 🦆 | 🦆 | 🦆 |
GEOMETRY ST_Normalize(GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
GEOMETRY ST_SimplifyPreserveTopology(GEOMETRY, DOUBLE) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
GEOMETRY ST_Simplify(GEOMETRY, DOUBLE) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
GEOMETRY ST_Union(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
GEOMETRY ST_Intersection(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
GEOMETRY ST_MakeLine(GEOMETRY[]) | 🦆 | 🦆 | |||
GEOMETRY ST_Envelope(GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
GEOMETRY ST_FlipCoordinates(GEOMETRY) | 🦆 | 🦆 | 🦆 | 🦆 | 🦆 |
GEOMETRY ST_Transform(GEOMETRY, VARCHAR, VARCHAR) | 🦆 | 🦆 | 🦆 | 🦆 | 🦆 |
Spatial Properties
Calculate and access spatial properties of geometries.
Scalar functions | GEOMETRY | POINT_2D | LINESTRING_2D | POLYGON_2D | BOX_2D |
---|---|---|---|---|---|
DOUBLE ST_Area(GEOMETRY) | 🦆 | 🦆 | 🦆 | 🦆 | 🦆 |
BOOLEAN ST_IsClosed(GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_IsEmpty(GEOMETRY) | 🦆 | 🦆 | 🦆 | 🦆 | 🔄 (as POLYGON) |
BOOLEAN ST_IsRing(GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_IsSimple(GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_IsValid(GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
DOUBLE ST_X(GEOMETRY) | 🧭 | 🦆 | 🔄 | 🔄 | 🔄 (as POLYGON) |
DOUBLE ST_Y(GEOMETRY) | 🧭 | 🦆 | 🔄 | 🔄 | 🔄 (as POLYGON) |
GeometryType ST_GeometryType(GEOMETRY) | 🦆 | 🦆 | 🦆 | 🦆 | 🔄 (as POLYGON) |
DOUBLE ST_Length(GEOMETRY) | 🦆 | 🦆 | 🦆 | 🦆 | 🔄 (as POLYGON) |
Spatial Relationships
Compute relationships and spatial predicates between geometries.
Scalar functions | GEOMETRY | POINT_2D | LINESTRING_2D | POLYGON_2D | BOX_2D |
---|---|---|---|---|---|
BOOLEAN ST_Within(GEOMETRY, GEOMETRY) | 🧭 | 🦆 or 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_Touches(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_Overlaps(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_Contains(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🦆 or 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_CoveredBy(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_Covers(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_Crosses(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_Difference(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_Disjoint(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_Intersects(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_Equals(GEOMETRY, GEOMETRY) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
DOUBLE ST_Distance(GEOMETRY, GEOMETRY) | 🧭 | 🦆 or 🔄 | 🦆 or 🔄 | 🔄 | 🔄 (as POLYGON) |
BOOLEAN ST_DWithin(GEOMETRY, GEOMETRY, DOUBLE) | 🧭 | 🔄 | 🔄 | 🔄 | 🔄 (as POLYGON) |
Spatial Table Functions
The spatial extension provides a ST_Read
table function based on the GDAL translator library to read spatial data from a variety of geospatial vector file formats as if they were DuckDB tables.
ST_Read
can take a number of optional arguments, the full signature is:
ST_Read(VARCHAR, sequential_layer_scan : BOOLEAN, spatial_filter : WKB_BLOB, open_options : VARCHAR[], layer : VARCHAR, allowed_drivers : VARCHAR[], sibling_files : VARCHAR[], spatial_filter_box : BOX_2D)
sequential_layer_scan
(default:false
): If set totrue
, the table function will scan through all layers sequentially and return the first layer that matches the givenlayer
name. This is required for some drivers to work properly, e.g. theOSM
driver.spatial_filter
(default:NULL
): If set to a WKB blob, the table function will only return rows that intersect with the given WKB geometry. Some drivers may support efficient spatial filtering natively, in which case it will be pushed down. Otherwise the filtering is done by GDAL which may be much slower.open_options
(default:[]
): A list of key-value pairs that are passed to the GDAL driver to control the opening of the file. E.g. theGeoJSON
driver supports aFLATTEN_NESTED_ATTRIBUTES=YES
option to flatten nested attributes.layer
(default:NULL
): The name of the layer to read from the file. IfNULL
, the first layer is returned. Can also be a layer index (starting at 0).allowed_drivers
(default:[]
): A list of GDAL driver names that are allowed to be used to open the file. If empty, all drivers are allowed.sibling_files
(default:[]
): A list of sibling files that are required to open the file. E.g. theESRI Shapefile
driver requires a.shx
file to be present. Although most of the time these can be discovered automatically.spatial_filter_box
(default:NULL
): If set to aBOX_2D
, the table function will only return rows that intersect with the given bounding box. Similar tospatial_filter
.
Note that GDAL is single-threaded, so this table function will not be able to make full use of parllelism. We're planning to implement support for the most common vector formats natively in this extension with additional table functions in the future.
We currently support over 50 different formats. You can generate the following table of supported GDAL drivers youself by executing SELECT * FROM ST_Drivers()
.
short_name | long_name | can_create | can_copy | can_open | help_url |
---|---|---|---|---|---|
ESRI Shapefile | ESRI Shapefile | true | false | true | https://gdal.org/drivers/vector/shapefile.html |
MapInfo File | MapInfo File | true | false | true | https://gdal.org/drivers/vector/mitab.html |
UK .NTF | UK .NTF | false | false | true | https://gdal.org/drivers/vector/ntf.html |
LVBAG | Kadaster LV BAG Extract 2.0 | false | false | true | https://gdal.org/drivers/vector/lvbag.html |
S57 | IHO S-57 (ENC) | true | false | true | https://gdal.org/drivers/vector/s57.html |
DGN | Microstation DGN | true | false | true | https://gdal.org/drivers/vector/dgn.html |
OGR_VRT | VRT - Virtual Datasource | false | false | true | https://gdal.org/drivers/vector/vrt.html |
Memory | Memory | true | false | true | |
CSV | Comma Separated Value (.csv) | true | false | true | https://gdal.org/drivers/vector/csv.html |
GML | Geography Markup Language (GML) | true | false | true | https://gdal.org/drivers/vector/gml.html |
GPX | GPX | true | false | true | https://gdal.org/drivers/vector/gpx.html |
KML | Keyhole Markup Language (KML) | true | false | true | https://gdal.org/drivers/vector/kml.html |
GeoJSON | GeoJSON | true | false | true | https://gdal.org/drivers/vector/geojson.html |
GeoJSONSeq | GeoJSON Sequence | true | false | true | https://gdal.org/drivers/vector/geojsonseq.html |
ESRIJSON | ESRIJSON | false | false | true | https://gdal.org/drivers/vector/esrijson.html |
TopoJSON | TopoJSON | false | false | true | https://gdal.org/drivers/vector/topojson.html |
OGR_GMT | GMT ASCII Vectors (.gmt) | true | false | true | https://gdal.org/drivers/vector/gmt.html |
GPKG | GeoPackage | true | true | true | https://gdal.org/drivers/vector/gpkg.html |
SQLite | SQLite / Spatialite | true | false | true | https://gdal.org/drivers/vector/sqlite.html |
WAsP | WAsP .map format | true | false | true | https://gdal.org/drivers/vector/wasp.html |
OpenFileGDB | ESRI FileGDB | true | false | true | https://gdal.org/drivers/vector/openfilegdb.html |
DXF | AutoCAD DXF | true | false | true | https://gdal.org/drivers/vector/dxf.html |
CAD | AutoCAD Driver | false | false | true | https://gdal.org/drivers/vector/cad.html |
FlatGeobuf | FlatGeobuf | true | false | true | https://gdal.org/drivers/vector/flatgeobuf.html |
Geoconcept | Geoconcept | true | false | true | |
GeoRSS | GeoRSS | true | false | true | https://gdal.org/drivers/vector/georss.html |
VFK | Czech Cadastral Exchange Data Format | false | false | true | https://gdal.org/drivers/vector/vfk.html |
PGDUMP | PostgreSQL SQL dump | true | false | false | https://gdal.org/drivers/vector/pgdump.html |
OSM | OpenStreetMap XML and PBF | false | false | true | https://gdal.org/drivers/vector/osm.html |
GPSBabel | GPSBabel | true | false | true | https://gdal.org/drivers/vector/gpsbabel.html |
WFS | OGC WFS (Web Feature Service) | false | false | true | https://gdal.org/drivers/vector/wfs.html |
OAPIF | OGC API - Features | false | false | true | https://gdal.org/drivers/vector/oapif.html |
EDIGEO | French EDIGEO exchange format | false | false | true | https://gdal.org/drivers/vector/edigeo.html |
SVG | Scalable Vector Graphics | false | false | true | https://gdal.org/drivers/vector/svg.html |
ODS | Open Document/ LibreOffice / OpenOffice Spreadsheet | true | false | true | https://gdal.org/drivers/vector/ods.html |
XLSX | MS Office Open XML spreadsheet | true | false | true | https://gdal.org/drivers/vector/xlsx.html |
Elasticsearch | Elastic Search | true | false | true | https://gdal.org/drivers/vector/elasticsearch.html |
Carto | Carto | true | false | true | https://gdal.org/drivers/vector/carto.html |
AmigoCloud | AmigoCloud | true | false | true | https://gdal.org/drivers/vector/amigocloud.html |
SXF | Storage and eXchange Format | false | false | true | https://gdal.org/drivers/vector/sxf.html |
Selafin | Selafin | true | false | true | https://gdal.org/drivers/vector/selafin.html |
JML | OpenJUMP JML | true | false | true | https://gdal.org/drivers/vector/jml.html |
PLSCENES | Planet Labs Scenes API | false | false | true | https://gdal.org/drivers/vector/plscenes.html |
CSW | OGC CSW (Catalog Service for the Web) | false | false | true | https://gdal.org/drivers/vector/csw.html |
VDV | VDV-451/VDV-452/INTREST Data Format | true | false | true | https://gdal.org/drivers/vector/vdv.html |
MVT | Mapbox Vector Tiles | true | false | true | https://gdal.org/drivers/vector/mvt.html |
NGW | NextGIS Web | true | true | true | https://gdal.org/drivers/vector/ngw.html |
MapML | MapML | true | false | true | https://gdal.org/drivers/vector/mapml.html |
TIGER | U.S. Census TIGER/Line | false | false | true | https://gdal.org/drivers/vector/tiger.html |
AVCBin | Arc/Info Binary Coverage | false | false | true | https://gdal.org/drivers/vector/avcbin.html |
AVCE00 | Arc/Info E00 (ASCII) Coverage | false | false | true | https://gdal.org/drivers/vector/avce00.html |
Note that far from all of these drivers have been tested properly, and some may require additional options to be passed to work as expected. If you run into any issues please first consult the GDAL docs.
Spatial Copy Functions
Much like the ST_Read
table function the spatial extension provides a GDAL based COPY
function to export duckdb tables to different geospatial vector formats.
For example to export a table to a GeoJSON file, with generated bounding boxes, you can use the following query:
COPY <table> TO 'some/file/path/filename.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON', LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');
FORMAT
: is the only required option and must be set toGDAL
to use the GDAL based copy function.DRIVER
: is the GDAL driver to use for the export. See the table above for a list of available drivers.LAYER_CREATION_OPTIONS
: list of options to pass to the GDAL driver. See the GDAL docs for the driver you are using for a list of available options.
Extra Information
See the repo for the source code of the extension, or the blog post.