Set operations allow queries to be combined according to set operation semantics. Set operations refer to the UNION [ALL]
, INTERSECT
and EXCEPT
clauses.
Traditional set operations unify queries by column position, and require the to-be-combined queries to have the same number of input columns. If the columns are not of the same type, casts may be added. The result will use the column names from the first query.
DuckDB also supports UNION BY NAME
, which joins columns by name instead of by position. UNION BY NAME
does not require the inputs to have the same number of columns. NULL
values will be added in case of missing columns.
Examples
-- the values [0..10) and [0..5)
SELECT * FROM range(10) t1 UNION ALL SELECT * FROM range(5) t2;
-- the values [0..10) (`UNION` eliminates duplicates)
SELECT * FROM range(10) t1 UNION SELECT * FROM range(5) t2;
-- the values [0..5] (all values that are both in t1 and t2)
SELECT * FROM range(10) t1 INTERSECT SELECT * FROM range(6) t2;
-- the values [5..10)
SELECT * FROM range(10) t1 EXCEPT SELECT * FROM range(5) t2;
-- two rows, (24, NULL) and (NULL, Amsterdam)
SELECT 24 AS id UNION ALL BY NAME SELECT 'Amsterdam' AS City;
Syntax
Example Table
CREATE TABLE capitals(city VARCHAR, country VARCHAR);
INSERT INTO capitals VALUES ('Amsterdam', 'NL'), ('Berlin', 'Germany');
CREATE TABLE weather(city VARCHAR, degrees INTEGER, date DATE);
INSERT INTO weather VALUES ('Amsterdam', 10, '2022-10-14'), ('Seattle', 8, '2022-10-12');
UNION (ALL)
The UNION
clause can be used to combine rows from multiple queries. The queries are required to have the same number of columns and the same column types.
The UNION
clause performs duplicate elimination by default - only unique rows will be included in the result.
UNION ALL
returns all rows of both queries without duplicate elimination.
SELECT city FROM capitals UNION SELECT city FROM weather;
-- Amsterdam, Berlin, Seattle
SELECT city FROM capitals UNION ALL SELECT city FROM weather;
-- Amsterdam, Amsterdam, Berlin, Seattle
INTERSECT
The INTERSECT
clause can be used to select all rows that occur in the result of both queries. Note that INTERSECT
performs duplicate elimination, so only unique rows are returned.
SELECT city FROM capitals INTERSECT SELECT city FROM weather;
-- Amsterdam
EXCEPT
The EXCEPT
clause can be used to select all rows that only occur in the left query. Note that EXCEPT
performs duplicate elimination, so only unique rows are returned.
SELECT city FROM capitals EXCEPT SELECT city FROM weather;
-- Berlin
UNION (ALL) BY NAME
The UNION (ALL) BY NAME
clause can be used to combine rows from different tables by name, instead of by position. UNION BY NAME
does not require both queries to have the same number of columns. Any columns that are only found in one of the queries are filled with NULL
values for the other query.
SELECT * FROM capitals UNION BY NAME SELECT * FROM weather;
┌───────────┬─────────┬─────────┬────────────┐
│ city │ country │ degrees │ date │
│ varchar │ varchar │ int32 │ date │
├───────────┼─────────┼─────────┼────────────┤
│ Amsterdam │ NULL │ 10 │ 2022-10-14 │
│ Seattle │ NULL │ 8 │ 2022-10-12 │
│ Amsterdam │ NL │ NULL │ NULL │
│ Berlin │ Germany │ NULL │ NULL │
└───────────┴─────────┴─────────┴────────────┘
UNION BY NAME
performs duplicate elimination, whereas UNION ALL BY NAME
does not.