⌘+k ctrl+k
1.1.3 (stable)
Search Shortcut cmd + k | ctrl + k
FILTER Clause

FILTER 子句可以选择性地跟在 SELECT 语句中的聚合函数后面。这将过滤输入到聚合函数的数据行,其方式与 WHERE 子句过滤行的方式相同,但仅限于特定的聚合函数。当聚合函数处于窗口上下文中时,目前无法使用 FILTER

在多种情况下这很有用,包括在使用不同过滤器评估多个聚合时,以及在创建数据集的透视视图时。与下面讨论的更传统的CASE WHEN方法相比,FILTER为数据透视提供了更简洁的语法。

一些聚合函数也不会过滤掉NULL值,因此使用FILTER子句在某些情况下会返回有效的结果,而CASE WHEN方法则不会。这种情况发生在firstlast函数中,这些函数在非聚合的透视操作中非常有用,其目标是将数据简单地重新定向到列中,而不是重新聚合它。FILTER在使用listarray_agg函数时也改进了NULL处理,因为CASE WHEN方法会在列表结果中包含NULL值,而FILTER子句会将其移除。

Examples

返回以下内容:

  • 总行数。
  • 行数,其中 i <= 5
  • 其中i为奇数的行数
SELECT
    count(*) AS total_rows,
    count(*) FILTER (i <= 5) AS lte_five,
    count(*) FILTER (i % 2 = 1) AS odds
FROM generate_series(1, 10) tbl(i);
总行数 小于等于五 概率
10 5 5

可以使用不同的聚合函数,并且允许多个WHERE表达式:

SELECT
    sum(i) FILTER (i <= 5) AS lte_five_sum,
    median(i) FILTER (i % 2 = 1) AS odds_median,
    median(i) FILTER (i % 2 = 1 AND i <= 5) AS odds_lte_five_median
FROM generate_series(1, 10) tbl(i);
lte_five_sum odds_median odds_lte_five_median
15 5.0 3.0

FILTER 子句也可以用于将数据从行转换为列。这是一种静态的转换,因为在 SQL 中列必须在运行时之前定义。然而,这种语句可以在宿主编程语言中动态生成,以利用 DuckDB 的 SQL 引擎进行快速、大于内存的转换。

首先生成一个示例数据集:

CREATE TEMP TABLE stacked_data AS
    SELECT
        i,
        CASE WHEN i <= rows * 0.25  THEN 2022
             WHEN i <= rows * 0.5   THEN 2023
             WHEN i <= rows * 0.75  THEN 2024
             WHEN i <= rows * 0.875 THEN 2025
             ELSE NULL
             END AS year
    FROM (
        SELECT
            i,
            count(*) OVER () AS rows
        FROM generate_series(1, 100_000_000) tbl(i)
    ) tbl;

按年份“透视”数据(将每一年移动到单独的列中):

SELECT
    count(i) FILTER (year = 2022) AS "2022",
    count(i) FILTER (year = 2023) AS "2023",
    count(i) FILTER (year = 2024) AS "2024",
    count(i) FILTER (year = 2025) AS "2025",
    count(i) FILTER (year IS NULL) AS "NULLs"
FROM stacked_data;

此语法产生与上述FILTER子句相同的结果:

SELECT
    count(CASE WHEN year = 2022 THEN i END) AS "2022",
    count(CASE WHEN year = 2023 THEN i END) AS "2023",
    count(CASE WHEN year = 2024 THEN i END) AS "2024",
    count(CASE WHEN year = 2025 THEN i END) AS "2025",
    count(CASE WHEN year IS NULL THEN i END) AS "NULLs"
FROM stacked_data;
2022 2023 2024 2025 空值
25000000 25000000 25000000 12500000 12500000

然而,当使用不忽略NULL值的聚合函数时,CASE WHEN方法将不会按预期工作。first函数属于这一类,因此在这种情况下,FILTER是首选。

按年份“透视”数据(将每一年移动到单独的列中):

SELECT
    first(i) FILTER (year = 2022) AS "2022",
    first(i) FILTER (year = 2023) AS "2023",
    first(i) FILTER (year = 2024) AS "2024",
    first(i) FILTER (year = 2025) AS "2025",
    first(i) FILTER (year IS NULL) AS "NULLs"
FROM stacked_data;
2022 2023 2024 2025 空值
1474561 25804801 50749441 76431361 87500001

每当CASE WHEN子句的第一次评估返回NULL时,这将产生NULL值:

SELECT
    first(CASE WHEN year = 2022 THEN i END) AS "2022",
    first(CASE WHEN year = 2023 THEN i END) AS "2023",
    first(CASE WHEN year = 2024 THEN i END) AS "2024",
    first(CASE WHEN year = 2025 THEN i END) AS "2025",
    first(CASE WHEN year IS NULL THEN i END) AS "NULLs"
FROM stacked_data;
2022 2023 2024 2025 空值
1228801 NULL NULL NULL NULL

聚合函数语法(包括FILTER子句)