WITH
子句允许你指定公共表表达式(CTEs)。
常规(非递归)的公共表表达式本质上是范围限定在特定查询中的视图。
CTEs 可以相互引用并且可以嵌套。递归 CTEs 可以引用它们自身。
基本CTE示例
创建一个名为 cte
的CTE,并在主查询中使用它:
WITH cte AS (SELECT 42 AS x)
SELECT * FROM cte;
x |
---|
42 |
创建两个CTE cte1
和 cte2
,其中第二个CTE引用第一个CTE:
WITH
cte1 AS (SELECT 42 AS i),
cte2 AS (SELECT i * 100 AS x FROM cte1)
SELECT * FROM cte2;
x |
---|
4200 |
CTE 物化
DuckDB 可以使用 CTE 物化,即将 CTE 内联到主查询中。
这是通过启发式方法执行的:如果 CTE 执行分组聚合并且被查询多次,则会被物化。
可以通过使用 AS MATERIALIZED
定义 CTE 来显式激活物化,并通过使用 AS NOT MATERIALIZED
来禁用物化。
以以下查询为例,它三次调用了相同的CTE:
WITH t(x) AS (⟨complex_query⟩)
SELECT *
FROM
t AS t1,
t AS t2,
t AS t3;
内联会为每个引用复制t
的定义,从而生成以下查询:
SELECT *
FROM
(⟨complex_query⟩) AS t1(x),
(⟨complex_query⟩) AS t2(x),
(⟨complex_query⟩) AS t3(x);
如果 ⟨complex_query⟩
很耗时,使用 MATERIALIZED
关键字将其物化可以提高性能。在这种情况下,⟨complex_query⟩
只会被评估一次。
WITH t(x) AS MATERIALIZED (⟨complex_query⟩)
SELECT *
FROM
t AS t1,
t AS t2,
t AS t3;
如果想要禁用物化,请使用 NOT MATERIALIZED
:
WITH t(x) AS NOT MATERIALIZED (⟨complex_query⟩)
SELECT *
FROM
t AS t1,
t AS t2,
t AS t3;
递归CTEs
WITH RECURSIVE
允许定义可以引用自身的CTE。请注意,查询必须以确保终止的方式制定,否则可能会陷入无限循环。
示例:斐波那契数列
WITH RECURSIVE
可以用于进行递归计算。例如,这里展示了如何使用 WITH RECURSIVE
来计算前十个斐波那契数列的数字:
WITH RECURSIVE FibonacciNumbers (RecursionDepth, FibonacciNumber, NextNumber) AS (
-- Base case
SELECT
0 AS RecursionDepth,
0 AS FibonacciNumber,
1 AS NextNumber
UNION ALL
-- Recursive step
SELECT
fib.RecursionDepth + 1 AS RecursionDepth,
fib.NextNumber AS FibonacciNumber,
fib.FibonacciNumber + fib.NextNumber AS NextNumber
FROM
FibonacciNumbers fib
WHERE
fib.RecursionDepth + 1 < 10
)
SELECT
fn.RecursionDepth AS FibonacciNumberIndex,
fn.FibonacciNumber
FROM
FibonacciNumbers fn;
斐波那契数列索引 | 斐波那契数列 |
---|---|
0 | 0 |
1 | 1 |
2 | 1 |
3 | 2 |
4 | 3 |
5 | 5 |
6 | 8 |
7 | 13 |
8 | 21 |
9 | 34 |
示例:树遍历
WITH RECURSIVE
可用于遍历树。例如,以标签的层次结构为例:
CREATE TABLE tag (id INTEGER, name VARCHAR, subclassof INTEGER);
INSERT INTO tag VALUES
(1, 'U2', 5),
(2, 'Blur', 5),
(3, 'Oasis', 5),
(4, '2Pac', 6),
(5, 'Rock', 7),
(6, 'Rap', 7),
(7, 'Music', 9),
(8, 'Movies', 9),
(9, 'Art', NULL);
以下查询返回从节点 Oasis
到树的根节点 (Art
) 的路径。
WITH RECURSIVE tag_hierarchy(id, source, path) AS (
SELECT id, name, [name] AS path
FROM tag
WHERE subclassof IS NULL
UNION ALL
SELECT tag.id, tag.name, list_prepend(tag.name, tag_hierarchy.path)
FROM tag, tag_hierarchy
WHERE tag.subclassof = tag_hierarchy.id
)
SELECT path
FROM tag_hierarchy
WHERE source = 'Oasis';
路径 |
---|
[绿洲, 摇滚, 音乐, 艺术] |
图遍历
WITH RECURSIVE
子句可用于在任意图上表达图遍历。然而,如果图中有循环,查询必须执行循环检测以防止无限循环。
实现这一点的一种方法是将遍历的路径存储在列表中,并在用新边扩展路径之前检查其端点是否已被访问过(参见后面的示例)。
从LDBC Graphalytics基准测试中获取以下有向图:
CREATE TABLE edge (node1id INTEGER, node2id INTEGER);
INSERT INTO edge VALUES
(1, 3), (1, 5), (2, 4), (2, 5), (2, 10), (3, 1),
(3, 5), (3, 8), (3, 10), (5, 3), (5, 4), (5, 8),
(6, 3), (6, 4), (7, 4), (8, 1), (9, 4);
请注意,图中包含有向循环,例如在节点1、2和5之间。
枚举从节点出发的所有路径
以下查询返回从节点1开始的所有路径:
WITH RECURSIVE paths(startNode, endNode, path) AS (
SELECT -- Define the path as the first edge of the traversal
node1id AS startNode,
node2id AS endNode,
[node1id, node2id] AS path
FROM edge
WHERE startNode = 1
UNION ALL
SELECT -- Concatenate new edge to the path
paths.startNode AS startNode,
node2id AS endNode,
array_append(path, node2id) AS path
FROM paths
JOIN edge ON paths.endNode = node1id
-- Prevent adding a repeated node to the path.
-- This ensures that no cycles occur.
WHERE list_position(paths.path, node2id) IS NULL
)
SELECT startNode, endNode, path
FROM paths
ORDER BY length(path), path;
起始节点 | 结束节点 | 路径 |
---|---|---|
1 | 3 | [1, 3] |
1 | 5 | [1, 5] |
1 | 5 | [1, 3, 5] |
1 | 8 | [1, 3, 8] |
1 | 10 | [1, 3, 10] |
1 | 3 | [1, 5, 3] |
1 | 4 | [1, 5, 4] |
1 | 8 | [1, 5, 8] |
1 | 4 | [1, 3, 5, 4] |
1 | 8 | [1, 3, 5, 8] |
1 | 8 | [1, 5, 3, 8] |
1 | 10 | [1, 5, 3, 10] |
请注意,此查询的结果不仅限于最短路径,例如,对于节点5,结果包括路径 [1, 5]
和 [1, 3, 5]
。
枚举从一个节点的无权重最短路径
在大多数情况下,枚举所有路径是不切实际或不可行的。相反,只有(未加权的)最短路径是感兴趣的。为了找到这些路径,应该调整WITH RECURSIVE
查询的后半部分,使其仅包含尚未访问过的节点。这是通过使用一个子查询来实现的,该子查询检查之前的任何路径是否包含该节点:
WITH RECURSIVE paths(startNode, endNode, path) AS (
SELECT -- Define the path as the first edge of the traversal
node1id AS startNode,
node2id AS endNode,
[node1id, node2id] AS path
FROM edge
WHERE startNode = 1
UNION ALL
SELECT -- Concatenate new edge to the path
paths.startNode AS startNode,
node2id AS endNode,
array_append(path, node2id) AS path
FROM paths
JOIN edge ON paths.endNode = node1id
-- Prevent adding a node that was visited previously by any path.
-- This ensures that (1) no cycles occur and (2) only nodes that
-- were not visited by previous (shorter) paths are added to a path.
WHERE NOT EXISTS (
FROM paths previous_paths
WHERE list_contains(previous_paths.path, node2id)
)
)
SELECT startNode, endNode, path
FROM paths
ORDER BY length(path), path;
startNode | endNode | path |
---|---|---|
1 | 3 | [1, 3] |
1 | 5 | [1, 5] |
1 | 8 | [1, 3, 8] |
1 | 10 | [1, 3, 10] |
1 | 4 | [1, 5, 4] |
1 | 8 | [1, 5, 8] |
枚举两个节点之间的无权最短路径
WITH RECURSIVE
也可以用来查找两个节点之间的所有(未加权的)最短路径。为了确保在到达结束节点时立即停止递归查询,我们使用了一个窗口函数来检查结束节点是否在新添加的节点中。
以下查询返回节点1(起始节点)和节点8(结束节点)之间的所有未加权最短路径:
WITH RECURSIVE paths(startNode, endNode, path, endReached) AS (
SELECT -- Define the path as the first edge of the traversal
node1id AS startNode,
node2id AS endNode,
[node1id, node2id] AS path,
(node2id = 8) AS endReached
FROM edge
WHERE startNode = 1
UNION ALL
SELECT -- Concatenate new edge to the path
paths.startNode AS startNode,
node2id AS endNode,
array_append(path, node2id) AS path,
max(CASE WHEN node2id = 8 THEN 1 ELSE 0 END)
OVER (ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS endReached
FROM paths
JOIN edge ON paths.endNode = node1id
WHERE NOT EXISTS (
FROM paths previous_paths
WHERE list_contains(previous_paths.path, node2id)
)
AND paths.endReached = 0
)
SELECT startNode, endNode, path
FROM paths
WHERE endNode = 8
ORDER BY length(path), path;
startNode | endNode | path |
---|---|---|
1 | 8 | [1, 3, 8] |
1 | 8 | [1, 5, 8] |
Limitations
DuckDB 不支持相互递归的 CTE。请参阅 DuckDB 仓库中的相关问题和讨论。