FILTER
子句可以选择性地跟在 SELECT
语句中的聚合函数后面。这将过滤输入到聚合函数的数据行,其方式与 WHERE
子句过滤行的方式相同,但仅限于特定的聚合函数。当聚合函数处于窗口上下文中时,目前无法使用 FILTER
。
在多种情况下这很有用,包括在使用不同过滤器评估多个聚合时,以及在创建数据集的透视视图时。与下面讨论的更传统的CASE WHEN
方法相比,FILTER
为数据透视提供了更简洁的语法。
一些聚合函数也不会过滤掉NULL
值,因此使用FILTER
子句在某些情况下会返回有效的结果,而CASE WHEN
方法则不会。这种情况发生在first
和last
函数中,这些函数在非聚合的透视操作中非常有用,其目标是将数据简单地重新定向到列中,而不是重新聚合它。FILTER
在使用list
和array_agg
函数时也改进了NULL
处理,因为CASE WHEN
方法会在列表结果中包含NULL
值,而FILTER
子句会将其移除。
Examples
返回以下内容:
- 总行数。
- 行数,其中
i <= 5
- 其中
i
为奇数的行数
SELECT
count(*) AS total_rows,
count(*) FILTER (i <= 5) AS lte_five,
count(*) FILTER (i % 2 = 1) AS odds
FROM generate_series(1, 10) tbl(i);
总行数 | 小于等于五 | 概率 |
---|---|---|
10 | 5 | 5 |
可以使用不同的聚合函数,并且允许多个WHERE
表达式:
SELECT
sum(i) FILTER (i <= 5) AS lte_five_sum,
median(i) FILTER (i % 2 = 1) AS odds_median,
median(i) FILTER (i % 2 = 1 AND i <= 5) AS odds_lte_five_median
FROM generate_series(1, 10) tbl(i);
lte_five_sum | odds_median | odds_lte_five_median |
---|---|---|
15 | 5.0 | 3.0 |
FILTER
子句也可以用于将数据从行转换为列。这是一种静态的转换,因为在 SQL 中列必须在运行时之前定义。然而,这种语句可以在宿主编程语言中动态生成,以利用 DuckDB 的 SQL 引擎进行快速、大于内存的转换。
首先生成一个示例数据集:
CREATE TEMP TABLE stacked_data AS
SELECT
i,
CASE WHEN i <= rows * 0.25 THEN 2022
WHEN i <= rows * 0.5 THEN 2023
WHEN i <= rows * 0.75 THEN 2024
WHEN i <= rows * 0.875 THEN 2025
ELSE NULL
END AS year
FROM (
SELECT
i,
count(*) OVER () AS rows
FROM generate_series(1, 100_000_000) tbl(i)
) tbl;
按年份“透视”数据(将每一年移动到单独的列中):
SELECT
count(i) FILTER (year = 2022) AS "2022",
count(i) FILTER (year = 2023) AS "2023",
count(i) FILTER (year = 2024) AS "2024",
count(i) FILTER (year = 2025) AS "2025",
count(i) FILTER (year IS NULL) AS "NULLs"
FROM stacked_data;
此语法产生与上述FILTER
子句相同的结果:
SELECT
count(CASE WHEN year = 2022 THEN i END) AS "2022",
count(CASE WHEN year = 2023 THEN i END) AS "2023",
count(CASE WHEN year = 2024 THEN i END) AS "2024",
count(CASE WHEN year = 2025 THEN i END) AS "2025",
count(CASE WHEN year IS NULL THEN i END) AS "NULLs"
FROM stacked_data;
2022 | 2023 | 2024 | 2025 | 空值 |
---|---|---|---|---|
25000000 | 25000000 | 25000000 | 12500000 | 12500000 |
然而,当使用不忽略NULL
值的聚合函数时,CASE WHEN
方法将不会按预期工作。first
函数属于这一类,因此在这种情况下,FILTER
是首选。
按年份“透视”数据(将每一年移动到单独的列中):
SELECT
first(i) FILTER (year = 2022) AS "2022",
first(i) FILTER (year = 2023) AS "2023",
first(i) FILTER (year = 2024) AS "2024",
first(i) FILTER (year = 2025) AS "2025",
first(i) FILTER (year IS NULL) AS "NULLs"
FROM stacked_data;
2022 | 2023 | 2024 | 2025 | 空值 |
---|---|---|---|---|
1474561 | 25804801 | 50749441 | 76431361 | 87500001 |
每当CASE WHEN
子句的第一次评估返回NULL
时,这将产生NULL
值:
SELECT
first(CASE WHEN year = 2022 THEN i END) AS "2022",
first(CASE WHEN year = 2023 THEN i END) AS "2023",
first(CASE WHEN year = 2024 THEN i END) AS "2024",
first(CASE WHEN year = 2025 THEN i END) AS "2025",
first(CASE WHEN year IS NULL THEN i END) AS "NULLs"
FROM stacked_data;
2022 | 2023 | 2024 | 2025 | 空值 |
---|---|---|---|---|
1228801 | NULL | NULL | NULL | NULL |