⌘+k ctrl+k
0.8
Search Shortcut cmd + k | ctrl + k
Spatial

The spatial extension provides support for geospatial data processing in DuckDB.

GEOMETRY type

The core of the spatial extension is the GEOMETRY type. If you're unfamiliar with geospatial data and GIS tooling, this type is probably works very different from what you'd expect.

In short, while the GEOMETRY type is a binary representation of "geometry" data made up out of sets of vertices (pairs of X and Y double precision floats), it actually stores one of several geometry subtypes. These are POINT, LINESTRING, POLYGON, as well as their "collection" equivalents, MULTIPOINT, MULTILINESTRING and MULTIPOLYGON. Lastly there is GEOMETRYCOLLECTION, which can contain any of the other subtypes, as well as other GEOMETRYCOLLECTIONs recursively.

This may seem strange at first, since DuckDB already have types like LIST, STRUCT and UNION which could be used in a similar way, but the design and behaviour of the GEOMETRY type is actually based on the Simple Features geometry model, which is a standard used by many other databases and GIS software.

That said, the spatial extension also includes a couple of experimental non-standard explicit geometry types, such as POINT_2D, LINESTRING_2D, POLYGON_2D and BOX_2D that are based on DuckDBs native nested types, such as structs and lists. In theory it should be possible to optimize a lot of operations for these types much better than for the GEOMETRY type (which is just a binary blob), but only a couple functions are implemented so far.

All of these are implicitly castable to GEOMETRY but with a conversion cost, so the GEOMETRY type is still the recommended type to use for now if you are planning to work with a lot of different spatial functions.

GEOMETRY is not currently capable of storing additional geometry types, Z/M coordinates, or SRID information. These features may be added in the future.

Spatial scalar Functions

The spatial extension implements a large number of scalar functions and overloads. Most of these are implemented using the GEOS library, but we'd like to implement more of them natively in this extension to better utilize DuckDB's vectorized execution and memory management. The following symbols are used to indicate which implementation is used:

🧭 - GEOS - functions that are implemented using the GEOS library

🦆 - DuckDB - functions that are implemented natively in this extension that are capable of operating directly on the DuckDB types

🔄 - CAST(GEOMETRY) - functions that are supported by implicitly casting to GEOMETRY and then using the GEOMETRY implementation

The currently implemented spatial functions can roughly be categorized into the following groups:

Geometry Conversion

Convert between geometries and other formats.

Scalar functions GEOMETRY POINT_2D LINESTRING_2D POLYGON_2D BOX_2D
VARCHAR ST_AsHEXWKB(GEOMETRY) 🦆 🦆 🦆 🦆 🦆
VARCHAR ST_AsText(GEOMETRY) 🧭 🦆 🦆 🦆 🔄 (as POLYGON)
WKB_BLOB ST_AsWKB(GEOMETRY) 🦆 🦆 🦆 🦆 🦆
GEOMETRY ST_GeomFromText(VARCHAR) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
GEOMETRY ST_GeomFromWKB(BLOB) 🦆 🦆 🦆 🦆 🔄 (as POLYGON)
VARCHAR ST_AsGeoJSON(VARCHAR) 🦆 🦆 🦆 🦆 🔄 (as POLYGON)

Geometry Construction

Construct new geometries from other geometries or other data.

Scalar functions GEOMETRY POINT_2D LINESTRING_2D POLYGON_2D BOX_2D
GEOMETRY ST_Point(DOUBLE, DOUBLE) 🦆 🦆      
GEOMETRY ST_ConvexHull(GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
GEOMETRY ST_Boundary(GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
GEOMETRY ST_Buffer(GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
GEOMETRY ST_Centroid(GEOMETRY) 🧭 🦆 🦆 🦆 🦆
GEOMETRY ST_Collect(GEOMETRY[]) 🦆 🦆 🦆 🦆 🦆
GEOMETRY ST_Normalize(GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
GEOMETRY ST_SimplifyPreserveTopology(GEOMETRY, DOUBLE) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
GEOMETRY ST_Simplify(GEOMETRY, DOUBLE) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
GEOMETRY ST_Union(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
GEOMETRY ST_Intersection(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
GEOMETRY ST_MakeLine(GEOMETRY[]) 🦆   🦆    
GEOMETRY ST_Envelope(GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
GEOMETRY ST_FlipCoordinates(GEOMETRY) 🦆 🦆 🦆 🦆 🦆
GEOMETRY ST_Transform(GEOMETRY, VARCHAR, VARCHAR) 🦆 🦆 🦆 🦆 🦆

Spatial Properties

Calculate and access spatial properties of geometries.

Scalar functions GEOMETRY POINT_2D LINESTRING_2D POLYGON_2D BOX_2D
DOUBLE ST_Area(GEOMETRY) 🦆 🦆 🦆 🦆 🦆
BOOLEAN ST_IsClosed(GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_IsEmpty(GEOMETRY) 🦆 🦆 🦆 🦆 🔄 (as POLYGON)
BOOLEAN ST_IsRing(GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_IsSimple(GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_IsValid(GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
DOUBLE ST_X(GEOMETRY) 🧭 🦆 🔄 🔄 🔄 (as POLYGON)
DOUBLE ST_Y(GEOMETRY) 🧭 🦆 🔄 🔄 🔄 (as POLYGON)
GeometryType ST_GeometryType(GEOMETRY) 🦆 🦆 🦆 🦆 🔄 (as POLYGON)
DOUBLE ST_Length(GEOMETRY) 🦆 🦆 🦆 🦆 🔄 (as POLYGON)

Spatial Relationships

Compute relationships and spatial predicates between geometries.

Scalar functions GEOMETRY POINT_2D LINESTRING_2D POLYGON_2D BOX_2D
BOOLEAN ST_Within(GEOMETRY, GEOMETRY) 🧭 🦆 or 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_Touches(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_Overlaps(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_Contains(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🦆 or 🔄 🔄 (as POLYGON)
BOOLEAN ST_CoveredBy(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_Covers(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_Crosses(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_Difference(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_Disjoint(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_Intersects(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_Equals(GEOMETRY, GEOMETRY) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)
DOUBLE ST_Distance(GEOMETRY, GEOMETRY) 🧭 🦆 or 🔄 🦆 or 🔄 🔄 🔄 (as POLYGON)
BOOLEAN ST_DWithin(GEOMETRY, GEOMETRY, DOUBLE) 🧭 🔄 🔄 🔄 🔄 (as POLYGON)

Spatial Table Functions

The spatial extension provides a ST_Read table function based on the GDAL translator library to read spatial data from a variety of geospatial vector file formats as if they were DuckDB tables.

ST_Read can take a number of optional arguments, the full signature is:

ST_Read(VARCHAR, sequential_layer_scan : BOOLEAN, spatial_filter : WKB_BLOB, open_options : VARCHAR[], layer : VARCHAR, allowed_drivers : VARCHAR[], sibling_files : VARCHAR[], spatial_filter_box : BOX_2D)
  • sequential_layer_scan (default: false): If set to true, the table function will scan through all layers sequentially and return the first layer that matches the given layer name. This is required for some drivers to work properly, e.g. the OSM driver.
  • spatial_filter (default: NULL): If set to a WKB blob, the table function will only return rows that intersect with the given WKB geometry. Some drivers may support efficient spatial filtering natively, in which case it will be pushed down. Otherwise the filtering is done by GDAL which may be much slower.
  • open_options (default: []): A list of key-value pairs that are passed to the GDAL driver to control the opening of the file. E.g. the GeoJSON driver supports a FLATTEN_NESTED_ATTRIBUTES=YES option to flatten nested attributes.
  • layer (default: NULL): The name of the layer to read from the file. If NULL, the first layer is returned. Can also be a layer index (starting at 0).
  • allowed_drivers (default: []): A list of GDAL driver names that are allowed to be used to open the file. If empty, all drivers are allowed.
  • sibling_files (default: []): A list of sibling files that are required to open the file. E.g. the ESRI Shapefile driver requires a .shx file to be present. Although most of the time these can be discovered automatically.
  • spatial_filter_box (default: NULL): If set to a BOX_2D, the table function will only return rows that intersect with the given bounding box. Similar to spatial_filter.

Note that GDAL is single-threaded, so this table function will not be able to make full use of parllelism. We're planning to implement support for the most common vector formats natively in this extension with additional table functions in the future.

We currently support over 50 different formats. You can generate the following table of supported GDAL drivers youself by executing SELECT * FROM ST_Drivers().

short_name long_name can_create can_copy can_open help_url
ESRI Shapefile ESRI Shapefile true false true https://gdal.org/drivers/vector/shapefile.html
MapInfo File MapInfo File true false true https://gdal.org/drivers/vector/mitab.html
UK .NTF UK .NTF false false true https://gdal.org/drivers/vector/ntf.html
LVBAG Kadaster LV BAG Extract 2.0 false false true https://gdal.org/drivers/vector/lvbag.html
S57 IHO S-57 (ENC) true false true https://gdal.org/drivers/vector/s57.html
DGN Microstation DGN true false true https://gdal.org/drivers/vector/dgn.html
OGR_VRT VRT - Virtual Datasource false false true https://gdal.org/drivers/vector/vrt.html
Memory Memory true false true  
CSV Comma Separated Value (.csv) true false true https://gdal.org/drivers/vector/csv.html
GML Geography Markup Language (GML) true false true https://gdal.org/drivers/vector/gml.html
GPX GPX true false true https://gdal.org/drivers/vector/gpx.html
KML Keyhole Markup Language (KML) true false true https://gdal.org/drivers/vector/kml.html
GeoJSON GeoJSON true false true https://gdal.org/drivers/vector/geojson.html
GeoJSONSeq GeoJSON Sequence true false true https://gdal.org/drivers/vector/geojsonseq.html
ESRIJSON ESRIJSON false false true https://gdal.org/drivers/vector/esrijson.html
TopoJSON TopoJSON false false true https://gdal.org/drivers/vector/topojson.html
OGR_GMT GMT ASCII Vectors (.gmt) true false true https://gdal.org/drivers/vector/gmt.html
GPKG GeoPackage true true true https://gdal.org/drivers/vector/gpkg.html
SQLite SQLite / Spatialite true false true https://gdal.org/drivers/vector/sqlite.html
WAsP WAsP .map format true false true https://gdal.org/drivers/vector/wasp.html
OpenFileGDB ESRI FileGDB true false true https://gdal.org/drivers/vector/openfilegdb.html
DXF AutoCAD DXF true false true https://gdal.org/drivers/vector/dxf.html
CAD AutoCAD Driver false false true https://gdal.org/drivers/vector/cad.html
FlatGeobuf FlatGeobuf true false true https://gdal.org/drivers/vector/flatgeobuf.html
Geoconcept Geoconcept true false true  
GeoRSS GeoRSS true false true https://gdal.org/drivers/vector/georss.html
VFK Czech Cadastral Exchange Data Format false false true https://gdal.org/drivers/vector/vfk.html
PGDUMP PostgreSQL SQL dump true false false https://gdal.org/drivers/vector/pgdump.html
OSM OpenStreetMap XML and PBF false false true https://gdal.org/drivers/vector/osm.html
GPSBabel GPSBabel true false true https://gdal.org/drivers/vector/gpsbabel.html
WFS OGC WFS (Web Feature Service) false false true https://gdal.org/drivers/vector/wfs.html
OAPIF OGC API - Features false false true https://gdal.org/drivers/vector/oapif.html
EDIGEO French EDIGEO exchange format false false true https://gdal.org/drivers/vector/edigeo.html
SVG Scalable Vector Graphics false false true https://gdal.org/drivers/vector/svg.html
ODS Open Document/ LibreOffice / OpenOffice Spreadsheet true false true https://gdal.org/drivers/vector/ods.html
XLSX MS Office Open XML spreadsheet true false true https://gdal.org/drivers/vector/xlsx.html
Elasticsearch Elastic Search true false true https://gdal.org/drivers/vector/elasticsearch.html
Carto Carto true false true https://gdal.org/drivers/vector/carto.html
AmigoCloud AmigoCloud true false true https://gdal.org/drivers/vector/amigocloud.html
SXF Storage and eXchange Format false false true https://gdal.org/drivers/vector/sxf.html
Selafin Selafin true false true https://gdal.org/drivers/vector/selafin.html
JML OpenJUMP JML true false true https://gdal.org/drivers/vector/jml.html
PLSCENES Planet Labs Scenes API false false true https://gdal.org/drivers/vector/plscenes.html
CSW OGC CSW (Catalog Service for the Web) false false true https://gdal.org/drivers/vector/csw.html
VDV VDV-451/VDV-452/INTREST Data Format true false true https://gdal.org/drivers/vector/vdv.html
MVT Mapbox Vector Tiles true false true https://gdal.org/drivers/vector/mvt.html
NGW NextGIS Web true true true https://gdal.org/drivers/vector/ngw.html
MapML MapML true false true https://gdal.org/drivers/vector/mapml.html
TIGER U.S. Census TIGER/Line false false true https://gdal.org/drivers/vector/tiger.html
AVCBin Arc/Info Binary Coverage false false true https://gdal.org/drivers/vector/avcbin.html
AVCE00 Arc/Info E00 (ASCII) Coverage false false true https://gdal.org/drivers/vector/avce00.html

Note that far from all of these drivers have been tested properly, and some may require additional options to be passed to work as expected. If you run into any issues please first consult the GDAL docs.

Spatial Copy Functions

Much like the ST_Read table function the spatial extension provides a GDAL based COPY function to export duckdb tables to different geospatial vector formats. For example to export a table to a GeoJSON file, with generated bounding boxes, you can use the following query:

COPY <table> TO 'some/file/path/filename.geojson'
WITH (FORMAT GDAL, DRIVER 'GeoJSON', LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');
  • FORMAT: is the only required option and must be set to GDAL to use the GDAL based copy function.
  • DRIVER: is the GDAL driver to use for the export. See the table above for a list of available drivers.
  • LAYER_CREATION_OPTIONS: list of options to pass to the GDAL driver. See the GDAL docs for the driver you are using for a list of available options.

Extra Information

See the repo for the source code of the extension, or the blog post.