S3 Express One

Search Shortcut cmd + k | ctrl + k

Documentation / Guides / Network & Cloud Storage

S3 Express One

In late 2023, AWS announced the S3 Express One Zone, a high-speed variant of traditional S3 buckets. DuckDB can read S3 Express One buckets using the httpfs extension.

Credentials and Configuration

The configuration of S3 Express One buckets is similar to regular S3 buckets with one exception: we have to specify the endpoint according to the following pattern:

s3express-⟨availability zone⟩.⟨region⟩.amazonaws.com

where the ⟨availability zone⟩ (e.g., use-az5) can be obtained from the S3 Express One bucket's configuration page and the ⟨region⟩ is the AWS region (e.g., us-east-1).

For example, to allow DuckDB to use an S3 Express One bucket, configure the Secrets manager as follows:

CREATE SECRET (
    TYPE S3,
    REGION 'us-east-1',
    KEY_ID 'AKIAIOSFODNN7EXAMPLE',
    SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
    ENDPOINT 's3express-use1-az5.us-east-1.amazonaws.com'
);

For best performance, make sure that the EC2 instance is in the same availability zone as the S3 Express One bucket you are querying. To determine the mapping between zone names and zone IDs, use the aws ec2 describe-availability-zones command.

Zone name to zone ID mapping:

aws ec2 describe-availability-zones --output json \
    | jq -r '.AvailabilityZones[] | select(.ZoneName == "us-east-1f") | .ZoneId'

use1-az5

Zone ID to zone name mapping:

aws ec2 describe-availability-zones --output json \
    | jq -r '.AvailabilityZones[] | select(.ZoneId == "use1-az5") | .ZoneName'

us-east-1f

Querying

You can query the S3 Express One bucket as any other S3 bucket:

SELECT *
FROM 's3://express-bucket-name--use1-az5--x-s3/my-file.parquet';

Performance

We ran two experiments on a c7gd.12xlarge instance using the LDBC SF300 Comments creationDate Parquet file file (also used in the microbenchmarks of the performance guide).

Experiment	File size	Runtime
Loading only from Parquet	4.1 GB	3.5s
Creating local table from Parquet	4.1 GB	5.1s

The “loading only” variant is running the load as part of an EXPLAIN ANALYZE statement to measure the runtime without account creating a local table, while the “creating local table” variant uses CREATE TABLE ... AS SELECT to create a persistent table on the local disk.

Credentials and Configuration

Instance Location

Querying

Performance

About this page

In this article