⌘+k ctrl+k
Search Shortcut cmd + k | ctrl + k
Aggregate Functions

Examples

Produce a single row containing the sum of the amount column:

SELECT sum(amount)
FROM sales;

Produce one row per unique region, containing the sum of amount for each group:

SELECT region, sum(amount)
FROM sales
GROUP BY region;

Return only the regions that have a sum of amount higher than 100:

SELECT region
FROM sales
GROUP BY region
HAVING sum(amount) > 100;

Return the number of unique values in the region column:

SELECT count(DISTINCT region)
FROM sales;

Return two values, the total sum of amount and the sum of amount minus columns where the region is north using the FILTER clause:

SELECT sum(amount), sum(amount) FILTER (region != 'north')
FROM sales;

Returns a list of all regions in order of the amount column:

SELECT list(region ORDER BY amount DESC)
FROM sales;

Returns the amount of the first sale using the first() aggregate function:

SELECT first(amount ORDER BY date ASC)
FROM sales;

Syntax

Aggregates are functions that combine multiple rows into a single value. Aggregates are different from scalar functions and window functions because they change the cardinality of the result. As such, aggregates can only be used in the SELECT and HAVING clauses of a SQL query.

DISTINCT Clause in Aggregate Functions

When the DISTINCT clause is provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination with the count aggregate to get the number of distinct elements; but it can be used together with any aggregate function in the system.

ORDER BY Clause in Aggregate Functions

An ORDER BY clause can be provided after the last argument of the function call. Note the lack of the comma separator before the clause.

SELECT aggregate_function(arg, sep ORDER BY ordering_criteria);

This clause ensures that the values being aggregated are sorted before applying the function. Most aggregate functions are order-insensitive, and for them this clause is parsed and discarded. However, there are some order-sensitive aggregates that can have non-deterministic results without ordering, e.g., first, last, list and string_agg / group_concat / listagg. These can be made deterministic by ordering the arguments.

For example:

CREATE TABLE tbl AS
    SELECT s FROM range(1, 4) r(s);

SELECT string_agg(s, ', ' ORDER BY s DESC) AS countdown
FROM tbl;
countdown
3, 2, 1

General Aggregate Functions

The table below shows the available general aggregate functions.

Function Description
any_value(arg) Returns the first non-null value from arg. This function is affected by ordering.
arbitrary(arg) Returns the first value (null or non-null) from arg. This function is affected by ordering.
arg_max(arg, val) Finds the row with the maximum val. Calculates the arg expression at that row. This function is affected by ordering.
arg_min(arg, val) Finds the row with the minimum val. Calculates the arg expression at that row. This function is affected by ordering.
array_agg(arg) Returns a LIST containing all the values of a column. This function is affected by ordering.
avg(arg) Calculates the average value for all tuples in arg.
bit_and(arg) Returns the bitwise AND of all bits in a given expression.
bit_or(arg) Returns the bitwise OR of all bits in a given expression.
bit_xor(arg) Returns the bitwise XOR of all bits in a given expression.
bitstring_agg(arg) Returns a bitstring with bits set for each distinct value.
bool_and(arg) Returns true if every input value is true, otherwise false.
bool_or(arg) Returns true if any input value is true, otherwise false.
count(arg) Calculates the number of tuples in arg.
favg(arg) Calculates the average using a more accurate floating point summation (Kahan Sum).
first(arg) Returns the first value (null or non-null) from arg. This function is affected by ordering.
fsum(arg) Calculates the sum using a more accurate floating point summation (Kahan Sum).
geomean(arg) Calculates the geometric mean for all tuples in arg.
histogram(arg) Returns a MAP of key-value pairs representing buckets and counts.
last(arg) Returns the last value of a column. This function is affected by ordering.
list(arg) Returns a LIST containing all the values of a column. This function is affected by ordering.
max(arg) Returns the maximum value present in arg.
max_by(arg, val) Finds the row with the maximum val. Calculates the arg expression at that row. This function is affected by ordering.
min(arg) Returns the minimum value present in arg.
min_by(arg, val) Finds the row with the minimum val. Calculates the arg expression at that row. This function is affected by ordering.
product(arg) Calculates the product of all tuples in arg.
string_agg(arg, sep) Concatenates the column string values with a separator. This function is affected by ordering.
sum(arg) Calculates the sum value for all tuples in arg.
sum_no_overflow(arg) Calculates the sum value for all tuples in arg without overflow checks. Unlike sum, which works on floating-point values, sum_no_overflow only accepts INTEGER and DECIMAL values.

any_value(arg)

Description Returns the first non-null value from arg. This function is affected by ordering.
Example any_value(A)
Alias(es) -

arbitrary(arg)

Description Returns the first value (null or non-null) from arg. This function is affected by ordering.
Example arbitrary(A)
Alias(es) first(A)

arg_max(arg, val)

Description Finds the row with the maximum val. Calculates the arg expression at that row. This function is affected by ordering.
Example arg_max(A, B)
Alias(es) argMax(arg, val), max_by(arg, val)

arg_min(arg, val)

Description Finds the row with the minimum val. Calculates the arg expression at that row. This function is affected by ordering.
Example arg_min(A, B)
Alias(es) argMin(arg, val), min_by(arg, val)

array_agg(arg)

Description Returns a LIST containing all the values of a column. This function is affected by ordering.
Example array_agg(A)
Alias(es) list

avg(arg)

Description Calculates the average value for all tuples in arg.
Example avg(A)
Alias(es) mean

bit_and(arg)

Description Returns the bitwise AND of all bits in a given expression.
Example bit_and(A)
Alias(es) -

bit_or(arg)

Description Returns the bitwise OR of all bits in a given expression.
Example bit_or(A)
Alias(es) -

bit_xor(arg)

Description Returns the bitwise XOR of all bits in a given expression.
Example bit_xor(A)
Alias(es) -

bitstring_agg(arg)

Description Returns a bitstring with bits set for each distinct value.
Example bitstring_agg(A)
Alias(es) -

bool_and(arg)

Description Returns true if every input value is true, otherwise false.
Example bool_and(A)
Alias(es) -

bool_or(arg)

Description Returns true if any input value is true, otherwise false.
Example bool_or(A)
Alias(es) -

count(arg)

Description Calculates the number of tuples in arg. If no arg is provided, the expression is evaluated as count(*).
Example count(A)
Alias(es) -

favg(arg)

Description Calculates the average using a more accurate floating point summation (Kahan Sum).
Example favg(A)
Alias(es) -

first(arg)

Description Returns the first value (null or non-null) from arg. This function is affected by ordering.
Example first(A)
Alias(es) arbitrary(A)

fsum(arg)

Description Calculates the sum using a more accurate floating point summation (Kahan Sum).
Example fsum(A)
Alias(es) sumKahan, kahan_sum

geomean(arg)

Description Calculates the geometric mean for all tuples in arg.
Example geomean(A)
Alias(es) geometric_mean(A)

histogram(arg)

Description Returns a MAP of key-value pairs representing buckets and counts.
Example histogram(A)
Alias(es) -

last(arg)

Description Returns the last value of a column. This function is affected by ordering.
Example last(A)
Alias(es) -

list(arg)

Description Returns a LIST containing all the values of a column. This function is affected by ordering.
Example list(A)
Alias(es) array_agg

max(arg)

Description Returns the maximum value present in arg.
Example max(A)
Alias(es) -

max_by(arg, val)

Description Finds the row with the maximum val. Calculates the arg expression at that row. This function is affected by ordering.
Example max_by(A, B)
Alias(es) argMax(arg, val), arg_max(arg, val)

min(arg)

Description Returns the minimum value present in arg.
Example min(A)
Alias(es) -

min_by(arg, val)

Description Finds the row with the minimum val. Calculates the arg expression at that row. This function is affected by ordering.
Example min_by(A, B)
Alias(es) argMin(arg, val), arg_min(arg, val)

product(arg)

Description Calculates the product of all tuples in arg.
Example product(A)
Alias(es) -

string_agg(arg, sep)

Description Concatenates the column string values with a separator. This function is affected by ordering.
Example string_agg(S, ',')
Alias(es) group_concat(arg, sep), listagg(arg, sep)

sum(arg)

Description Calculates the sum value for all tuples in arg.
Example sum(A)
Alias(es) -

sum_no_overflow(arg)

Description Calculates the sum value for all tuples in arg without overflow checks. Unlike sum, which works on floating-point values, sum_no_overflow only accepts INTEGER and DECIMAL values.
Example sum_no_overflow(A)
Alias(es) -

Approximate Aggregates

The table below shows the available approximate aggregate functions.

Function Description Example
approx_count_distinct(x) Gives the approximate count of distinct elements using HyperLogLog. approx_count_distinct(A)
approx_quantile(x, pos) Gives the approximate quantile using T-Digest. approx_quantile(A, 0.5)
reservoir_quantile(x, quantile, sample_size = 8192) Gives the approximate quantile using reservoir sampling, the sample size is optional and uses 8192 as a default size. reservoir_quantile(A, 0.5, 1024)

Statistical Aggregates

The table below shows the available statistical aggregate functions. They all ignore NULL values (in the case of a single input column x), or pairs where either input is NULL (in the case of two input columns y and x).

Function Description
corr(y, x) The correlation coefficient.
covar_pop(y, x) The population covariance, which does not include bias correction.
covar_samp(y, x) The sample covariance, which includes Bessel's bias correction.
entropy(x) The log-2 entropy.
kurtosis_pop(x) The excess kurtosis (Fisher’s definition) without bias correction.
kurtosis(x) The excess kurtosis (Fisher's definition) with bias correction according to the sample size.
mad(x) The median absolute deviation. Temporal types return a positive INTERVAL.
median(x) The middle value of the set. For even value counts, quantitative values are averaged and ordinal values return the lower value.
mode(x) The most frequent value.
quantile_cont(x, pos) The interpolated pos-quantile of x for 0 <= pos <= 1, i.e., orders the values of x and returns the pos * (n_nonnull_values - 1)th (zero-indexed) element (or an interpolation between the adjacent elements if the index is not an integer). If pos is a LIST of FLOATs, then the result is a LIST of the corresponding interpolated quantiles.
quantile_disc(x, pos) The discrete pos-quantile of x for 0 <= pos <= 1, i.e., orders the values of x and returns the floor(pos * (n_nonnull_values - 1))th (zero-indexed) element. If pos is a LIST of FLOATs, then the result is a LIST of the corresponding discrete quantiles.
regr_avgx(y, x) The average of the independent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable.
regr_avgy(y, x) The average of the dependent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable.
regr_count(y, x) The number of non-NULL pairs.
regr_intercept(y, x) The intercept of the univariate linear regression line, where x is the independent variable and y is the dependent variable.
regr_r2(y, x) The squared Pearson correlation coefficient between y and x. Also: The coefficient of determination in a linear regression, where x is the independent variable and y is the dependent variable.
regr_slope(y, x) The slope of the linear regression line, where x is the independent variable and y is the dependent variable.
regr_sxx(y, x) The sample variance, which includes Bessel's bias correction, of the independent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable.
regr_sxy(y, x) The sample covariance, which includes Bessel's bias correction.
regr_syy(y, x) The sample variance, which includes Bessel's bias correction, of the dependent variable for non-NULL pairs , where x is the independent variable and y is the dependent variable.
skewness(x) The skewness.
stddev_pop(x) The population standard deviation.
stddev_samp(x) The sample standard deviation.
var_pop(x) The population variance, which does not include bias correction.
var_samp(x) The sample variance, which includes Bessel's bias correction.

corr(y, x)

Description The correlation coefficient.
Formula covar_pop(y, x) / (stddev_pop(x) * stddev_pop(y))
Alias(es) -

covar_pop(y, x)

Description The population covariance, which does not include bias correction.
Formula (sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / regr_count(y, x), covar_samp(y, x) * (1 - 1 / regr_count(y, x))
Alias(es) -

covar_samp(y, x)

Description The sample covariance, which includes Bessel's bias correction.
Formula (sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / (regr_count(y, x) - 1), covar_pop(y, x) / (1 - 1 / regr_count(y, x))
Alias(es) regr_sxy(y, x)

entropy(x)

Description The log-2 entropy.
Formula -
Alias(es) -

kurtosis_pop(x)

Description The excess kurtosis (Fisher’s definition) without bias correction.
Formula -
Alias(es) -

kurtosis(x)

Description The excess kurtosis (Fisher's definition) with bias correction according to the sample size.
Formula -
Alias(es) -

mad(x)

Description The median absolute deviation. Temporal types return a positive INTERVAL.
Formula median(abs(x - median(x)))
Alias(es) -

median(x)

Description The middle value of the set. For even value counts, quantitative values are averaged and ordinal values return the lower value.
Formula quantile_cont(x, 0.5)
Alias(es) -

mode(x)

Description The most frequent value.
Formula -
Alias(es) -

quantile_cont(x, pos)

Description The interpolated pos-quantile of x for 0 <= pos <= 1, i.e., orders the values of x and returns the pos * (n_nonnull_values - 1)th (zero-indexed) element (or an interpolation between the adjacent elements if the index is not an integer). If pos is a LIST of FLOATs, then the result is a LIST of the corresponding interpolated quantiles.
Formula -
Alias(es) -

quantile_disc(x, pos)

Description The discrete pos-quantile of x for 0 <= pos <= 1, i.e., orders the values of x and returns the floor(pos * (n_nonnull_values - 1))th (zero-indexed) element. If pos is a LIST of FLOATs, then the result is a LIST of the corresponding discrete quantiles.
Formula -
Alias(es) quantile

regr_avgx(y, x)

Description The average of the independent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable.
Formula -
Alias(es) -

regr_avgy(y, x)

Description The average of the dependent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable.
Formula -
Alias(es) -

regr_count(y, x)

Description The number of non-NULL pairs.
Formula -
Alias(es) -

regr_intercept(y, x)

Description The intercept of the univariate linear regression line, where x is the independent variable and y is the dependent variable.
Formula regr_avgy(y, x) - regr_slope(y, x) * regr_avgx(y, x)
Alias(es) -

regr_r2(y, x)

Description The squared Pearson correlation coefficient between y and x. Also: The coefficient of determination in a linear regression, where x is the independent variable and y is the dependent variable.
Formula -
Alias(es) -

regr_slope(y, x)

Description Returns the slope of the linear regression line, where x is the independent variable and y is the dependent variable.
Formula regr_sxy(y, x) / regr_sxx(y, x)
Alias(es) -

regr_sxx(y, x)

Description The sample variance, which includes Bessel's bias correction, of the independent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable.
Formula -
Alias(es) -

regr_sxy(y, x)

Description The sample covariance, which includes Bessel's bias correction.
Formula -
Alias(es) -

regr_syy(y, x)

Description The sample variance, which includes Bessel's bias correction, of the dependent variable for non-NULL pairs, where x is the independent variable and y is the dependent variable.
Formula -
Alias(es) -

skewness(x)

Description The skewness.
Formula -
Alias(es) -

stddev_pop(x)

Description The population standard deviation.
Formula sqrt(var_pop(x))
Alias(es) -

stddev_samp(x)

Description The sample standard deviation.
Formula sqrt(var_samp(x))
Alias(es) stddev(x)

var_pop(x)

Description The population variance, which does not include bias correction.
Formula (sum(x^2) - sum(x)^2 / count(x)) / count(x), var_samp(y, x) * (1 - 1 / count(x))
Alias(es) -

var_samp(x)

Description The sample variance, which includes Bessel's bias correction.
Formula (sum(x^2) - sum(x)^2 / count(x)) / (count(x) - 1), var_pop(y, x) / (1 - 1 / count(x))
Alias(es) variance(arg, val)

Ordered Set Aggregate Functions

The table below shows the available “ordered set” aggregate functions. These functions are specified using the WITHIN GROUP (ORDER BY sort_expression) syntax, and they are converted to an equivalent aggregate function that takes the ordering expression as the first argument.

Function Equivalent
mode() WITHIN GROUP (ORDER BY column [(ASC|DESC)]) mode(column ORDER BY column [(ASC|DESC)])
percentile_cont(fraction) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) quantile_cont(column, fraction ORDER BY column [(ASC|DESC)])
percentile_cont(fractions) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) quantile_cont(column, fractions ORDER BY column [(ASC|DESC)])
percentile_disc(fraction) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) quantile_disc(column, fraction ORDER BY column [(ASC|DESC)])
percentile_disc(fractions) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) quantile_disc(column, fractions ORDER BY column [(ASC|DESC)])

Miscellaneous Aggregate Functions

Function Description Alias
grouping() For queries with GROUP BY and either ROLLUP or GROUPING SETS: Returns an integer identifying which of the argument expressions where used to group on to create the current supper-aggregate row. grouping_id()