Aggregate Functions

Search Shortcut cmd + k | ctrl + k

Documentation / SQL

Aggregate Functions

Examples

Produce a single row containing the sum of the amount column:

SELECT sum(amount) FROM sales;

Produce one row per unique region, containing the sum of amount for each group:

SELECT region, sum(amount) FROM sales GROUP BY region;

Return only the regions that have a sum of amount higher than 100:

SELECT region FROM sales GROUP BY region HAVING sum(amount) > 100;

Return the number of unique values in the region column:

SELECT count(DISTINCT region) FROM sales;

Return two values, the total sum of amount and the sum of amount minus columns where the region is north:

SELECT sum(amount), sum(amount) FILTER (region != 'north') FROM sales;

Returns a list of all regions in order of the amount column:

SELECT list(region ORDER BY amount DESC) FROM sales;

Returns the amount of the first sale using the first() aggregate function:

SELECT first(amount ORDER BY date ASC) FROM sales;

Aggregates are functions that combine multiple rows into a single value. Aggregates are different from scalar functions and window functions because they change the cardinality of the result. As such, aggregates can only be used in the SELECT and HAVING clauses of a SQL query.

`DISTINCT` Clause in Aggregate Functions

When the DISTINCT clause is provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination with the count aggregate to get the number of distinct elements; but it can be used together with any aggregate function in the system.

`ORDER BY` Clause in Aggregate Functions

An ORDER BY clause can be provided after the last argument of the function call. Note the lack of the comma separator before the clause.

SELECT ⟨aggregate_function⟩(⟨arg⟩, ⟨sep⟩ ORDER BY ⟨ordering_criteria⟩);

This clause ensures that the values being aggregated are sorted before applying the function. Most aggregate functions are order-insensitive, and for them this clause is parsed and discarded. However, there are some order-sensitive aggregates that can have non-deterministic results without ordering, e.g., first, last, list and string_agg / group_concat / listagg. These can be made deterministic by ordering the arguments.

For example:

CREATE TABLE tbl AS SELECT s FROM range(1, 4) r(s);
SELECT string_agg(s, ', ' ORDER BY s DESC) AS countdown FROM tbl;

countdown
3, 2, 1

General Aggregate Functions

The table below shows the available general aggregate functions.

Function	Description
`any_value(arg)`	Returns the first non-null value from `arg`. This function is affected by ordering.
`arbitrary(arg)`	Returns the first value (null or non-null) from `arg`. This function is affected by ordering.
`arg_max(arg, val)`	Finds the row with the maximum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
`arg_min(arg, val)`	Finds the row with the minimum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
`array_agg(arg)`	Returns a `LIST` containing all the values of a column. This function is affected by ordering.
`avg(arg)`	Calculates the average value for all tuples in `arg`.
`bit_and(arg)`	Returns the bitwise AND of all bits in a given expression.
`bit_or(arg)`	Returns the bitwise OR of all bits in a given expression.
`bit_xor(arg)`	Returns the bitwise XOR of all bits in a given expression.
`bitstring_agg(arg)`	Returns a bitstring with bits set for each distinct value.
`bool_and(arg)`	Returns `true` if every input value is `true`, otherwise `false`.
`bool_or(arg)`	Returns `true` if any input value is `true`, otherwise `false`.
`count(arg)`	Calculates the number of tuples in `arg`.
`favg(arg)`	Calculates the average using a more accurate floating point summation (Kahan Sum).
`first(arg)`	Returns the first value (null or non-null) from `arg`. This function is affected by ordering.
`fsum(arg)`	Calculates the sum using a more accurate floating point summation (Kahan Sum).
`geomean(arg)`	Calculates the geometric mean for all tuples in `arg`.
`histogram(arg)`	Returns a `MAP` of key-value pairs representing buckets and counts.
`last(arg)`	Returns the last value of a column. This function is affected by ordering.
`list(arg)`	Returns a `LIST` containing all the values of a column. This function is affected by ordering.
`max(arg)`	Returns the maximum value present in `arg`.
`max_by(arg, val)`	Finds the row with the maximum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
`min(arg)`	Returns the minimum value present in `arg`.
`min_by(arg, val)`	Finds the row with the minimum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
`product(arg)`	Calculates the product of all tuples in `arg`.
`string_agg(arg, sep)`	Concatenates the column string values with a separator. This function is affected by ordering.
`sum(arg)`	Calculates the sum value for all tuples in `arg`.
`sum_no_overflow(arg)`	Calculates the sum value for all tuples in `arg` without overflow checks. Unlike `sum`, which works on floating-point values, `sum_no_overflow` only accepts `INTEGER` and `DECIMAL` values.

`any_value(arg)`

Description	Returns the first non-null value from `arg`. This function is affected by ordering.
Example	`any_value(A)`
Alias(es)	-

`arbitrary(arg)`

Description	Returns the first value (null or non-null) from `arg`. This function is affected by ordering.
Example	`arbitrary(A)`
Alias(es)	`first(A)`

`arg_max(arg, val)`

Description	Finds the row with the maximum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
Example	`arg_max(A, B)`
Alias(es)	`argMax(arg, val)`, `max_by(arg, val)`

`arg_min(arg, val)`

Description	Finds the row with the minimum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
Example	`arg_min(A, B)`
Alias(es)	`argMin(arg, val)`, `min_by(arg, val)`

`array_agg(arg)`

Description	Returns a `LIST` containing all the values of a column. This function is affected by ordering.
Example	`array_agg(A)`
Alias(es)	`list`

`avg(arg)`

Description	Calculates the average value for all tuples in `arg`.
Example	`avg(A)`
Alias(es)	`mean`

`bit_and(arg)`

Description	Returns the bitwise `AND` of all bits in a given expression.
Example	`bit_and(A)`
Alias(es)	-

`bit_or(arg)`

Description	Returns the bitwise `OR` of all bits in a given expression.
Example	`bit_or(A)`
Alias(es)	-

`bit_xor(arg)`

Description	Returns the bitwise `XOR` of all bits in a given expression.
Example	`bit_xor(A)`
Alias(es)	-

`bitstring_agg(arg)`

Description	Returns a bitstring with bits set for each distinct value.
Example	`bitstring_agg(A)`
Alias(es)	-

`bool_and(arg)`

Description	Returns `true` if every input value is `true`, otherwise `false`.
Example	`bool_and(A)`
Alias(es)	-

`bool_or(arg)`

Description	Returns `true` if any input value is `true`, otherwise `false`.
Example	`bool_or(A)`
Alias(es)	-

`count(arg)`

Description	Calculates the number of tuples in `arg`.
Example	`count(A)`
Alias(es)	-

`favg(arg)`

Description	Calculates the average using a more accurate floating point summation (Kahan Sum).
Example	`favg(A)`
Alias(es)	-

`first(arg)`

Description	Returns the first value (null or non-null) from `arg`. This function is affected by ordering.
Example	`first(A)`
Alias(es)	`arbitrary(A)`

`fsum(arg)`

Description	Calculates the sum using a more accurate floating point summation (Kahan Sum).
Example	`fsum(A)`
Alias(es)	`sumKahan`, `kahan_sum`

`geomean(arg)`

Description	Calculates the geometric mean for all tuples in `arg`.
Example	`geomean(A)`
Alias(es)	`geometric_mean(A)`

`histogram(arg)`

Description	Returns a `MAP` of key-value pairs representing buckets and counts.
Example	`histogram(A)`
Alias(es)	-

`last(arg)`

Description	Returns the last value of a column. This function is affected by ordering.
Example	`last(A)`
Alias(es)	-

`list(arg)`

Description	Returns a `LIST` containing all the values of a column. This function is affected by ordering.
Example	`list(A)`
Alias(es)	`array_agg`

`max(arg)`

Description	Returns the maximum value present in `arg`.
Example	`max(A)`
Alias(es)	-

`max_by(arg, val)`

Description	Finds the row with the maximum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
Example	`max_by(A, B)`
Alias(es)	`argMax(arg, val)`, `arg_max(arg, val)`

`min(arg)`

Description	Returns the minimum value present in `arg`.
Example	`min(A)`
Alias(es)	-

`min_by(arg, val)`

Description	Finds the row with the minimum `val`. Calculates the `arg` expression at that row. This function is affected by ordering.
Example	`min_by(A, B)`
Alias(es)	`argMin(arg, val)`, `arg_min(arg, val)`

`product(arg)`

Description	Calculates the product of all tuples in `arg`.
Example	`product(A)`
Alias(es)	-

`string_agg(arg, sep)`

Description	Concatenates the column string values with a separator. This function is affected by ordering.
Example	`string_agg(S, ',')`
Alias(es)	`group_concat(arg, sep)`, `listagg(arg, sep)`

`sum(arg)`

Description	Calculates the sum value for all tuples in `arg`.
Example	`sum(A)`
Alias(es)	-

`sum_no_overflow(arg)`

Description	Calculates the sum value for all tuples in `arg` without overflow checks. Unlike `sum`, which works on floating-point values, `sum_no_overflow` only accepts `INTEGER` and `DECIMAL` values.
Example	`sum_no_overflow(A)`
Alias(es)	-

Approximate Aggregates

The table below shows the available approximate aggregate functions.

Function	Description	Example
`approx_count_distinct(x)`	Gives the approximate count of distinct elements using HyperLogLog.	`approx_count_distinct(A)`
`approx_quantile(x, pos)`	Gives the approximate quantile using T-Digest.	`approx_quantile(A, 0.5)`
`reservoir_quantile(x, quantile, sample_size = 8192)`	Gives the approximate quantile using reservoir sampling, the sample size is optional and uses 8192 as a default size.	`reservoir_quantile(A, 0.5, 1024)`

Statistical Aggregates

The table below shows the available statistical aggregate functions.

Function	Description
`corr(y, x)`	Returns the correlation coefficient for non-null pairs in a group.
`covar_pop(y, x)`	Returns the population covariance of input values.
`covar_samp(y, x)`	Returns the sample covariance for non-null pairs in a group.
`entropy(x)`	Returns the log-2 entropy of count input-values.
`kurtosis_pop(x)`	Returns the excess kurtosis (Fisher's definition) of all input values. Bias correction is not applied.
`kurtosis(x)`	Returns the excess kurtosis (Fisher's definition) of all input values, with a bias correction according to the sample size.
`mad(x)`	Returns the median absolute deviation for the values within x. NULL values are ignored. Temporal types return a positive `INTERVAL`.
`median(x)`	Returns the middle value of the set. NULL values are ignored. For even value counts, quantitative values are averaged and ordinal values return the lower value.
`mode(x)`	Returns the most frequent value for the values within x. NULL values are ignored.
`quantile_cont(x, pos)`	Returns the interpolated `pos`-quantile of `x` for `0 <= pos <= 1`, i.e., orders the values of `x` and returns the `pos * (n_nonnull_values - 1)`th (zero-indexed) element (or an interpolation between the adjacent elements if the index is not an integer). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding interpolated quantiles.
`quantile_disc(x, pos)`	Returns the discrete `pos`-quantile of `x` for `0 <= pos <= 1`, i.e., orders the values of `x` and returns the `floor(pos * (n_nonnull_values - 1))`th (zero-indexed) element. If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding discrete quantiles.
`regr_avgx(y, x)`	Returns the average of the independent variable for non-null pairs in a group, where x is the independent variable and y is the dependent variable.
`regr_avgy(y, x)`	Returns the average of the dependent variable for non-null pairs in a group, where x is the independent variable and y is the dependent variable.
`regr_count(y, x)`	Returns the number of non-null number pairs in a group.
`regr_intercept(y, x)`	Returns the intercept of the univariate linear regression line for non-null pairs in a group.
`regr_r2(y, x)`	Returns the coefficient of determination for non-null pairs in a group.
`regr_slope(y, x)`	Returns the slope of the linear regression line for non-null pairs in a group.
`regr_sxx(y, x)`	-
`regr_sxy(y, x)`	Returns the population covariance of input values.
`regr_syy(y, x)`	-
`skewness(x)`	Returns the skewness of all input values.
`stddev_pop(x)`	Returns the population standard deviation.
`stddev_samp(x)`	Returns the sample standard deviation.
`var_pop(x)`	Returns the population variance.
`var_samp(x)`	Returns the sample variance of all input values.

`corr(y, x)`

Description	Returns the correlation coefficient for non-`NULL` pairs in a group.
Formula	`covar_pop(y, x) / (stddev_pop(x) * stddev_pop(y))`
Alias(es)	-

`covar_pop(y, x)`

Description	Returns the population covariance of input values.
Formula	`(sum(xy) - sum(x) sum(y) / count()) / count()`
Alias(es)	-

`covar_samp(y, x)`

Description	Returns the sample covariance for non-`NULL` pairs in a group.
Formula	`(sum(xy) - sum(x) sum(y) / count()) / (count() - 1)`
Alias(es)	-

`entropy(x)`

Description	Returns the log-2 entropy of count input-values.
Formula	-
Alias(es)	-

`kurtosis_pop(x)`

Description	Returns the excess kurtosis (Fisher's definition) of all input values. Bias correction is not applied.
Formula	-
Alias(es)	-

`kurtosis(x)`

Description	Returns the excess kurtosis (Fisher's definition) of all input values, with a bias correction according to the sample size.
Formula	-
Alias(es)	-

`mad(x)`

Description	Returns the median absolute deviation for the values within x. `NULL` values are ignored. Temporal types return a positive `INTERVAL`.
Formula	`median(abs(x - median(x)))`
Alias(es)	-

`median(x)`

Description	Returns the middle value of the set. `NULL` values are ignored. For even value counts, quantitative values are averaged and ordinal values return the lower value.
Formula	`quantile_cont(x, 0.5)`
Alias(es)	-

`mode(x)`

Description	Returns the most frequent value for the values within x. `NULL` values are ignored.
Formula	-
Alias(es)	-

`quantile_cont(x, pos)`

Description	Returns the interpolated `pos`-quantile of `x` for `0 <= pos <= 1`, i.e., orders the values of `x` and returns the `pos * (n_nonnull_values - 1)`th (zero-indexed) element (or an interpolation between the adjacent elements if the index is not an integer). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding interpolated quantiles.
Formula	-
Alias(es)	-

`quantile_disc(x, pos)`

Description	Returns the discrete `pos`-quantile of `x` for `0 <= pos <= 1`, i.e., orders the values of `x` and returns the `floor(pos * (n_nonnull_values - 1))`th (zero-indexed) element. If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding discrete quantiles.
Formula	-
Alias(es)	`quantile`

`regr_avgx(y, x)`

Description	Returns the average of the independent variable for non-null pairs in a group, where x is the independent variable and y is the dependent variable.
Formula	-
Alias(es)	-

`regr_avgy(y, x)`

Description	Returns the average of the dependent variable for non-null pairs in a group, where x is the independent variable and y is the dependent variable.
Formula	-
Alias(es)	-

`regr_count(y, x)`

Description	Returns the number of non-null number pairs in a group.
Formula	`(sum(xy) - sum(x) sum(y) / count()) / count()`
Alias(es)	-

`regr_intercept(y, x)`

Description	Returns the intercept of the univariate linear regression line for non-null pairs in a group.
Formula	`avg(y) - regr_slope(y, x) * avg(x)`
Alias(es)	-

`regr_r2(y, x)`

Description	Returns the coefficient of determination for non-null pairs in a group.
Formula	-
Alias(es)	-

`regr_slope(y, x)`

Description	Returns the slope of the linear regression line for non-null pairs in a group.
Formula	`covar_pop(x, y) / var_pop(x)`
Alias(es)	-

`regr_sxx(y, x)`

Description	-
Formula	`regr_count(y, x) * var_pop(x)`
Alias(es)	-

`regr_sxy(y, x)`

Description	Returns the population covariance of input values.
Formula	`regr_count(y, x) * covar_pop(y, x)`
Alias(es)	-

`regr_syy(y, x)`

Description	-
Formula	`regr_count(y, x) * var_pop(y)`
Alias(es)	-

`skewness(x)`

Description	Returns the skewness of all input values.
Formula	-
Alias(es)	-

`stddev_pop(x)`

Description	Returns the population standard deviation.
Formula	`sqrt(var_pop(x))`
Alias(es)	-

`stddev_samp(x)`

Description	Returns the sample standard deviation.
Formula	`sqrt(var_samp(x))`
Alias(es)	`stddev(x)`

`var_pop(x)`

Description	Returns the population variance.
Formula	-
Alias(es)	-

`var_samp(x)`

Description	Returns the sample variance of all input values.
Formula	`(sum(x^2) - sum(x)^2 / count(x)) / (count(x) - 1)`
Alias(es)	`variance(arg, val)`

Ordered Set Aggregate Functions

The table below shows the available "ordered set" aggregate functions. These functions are specified using the WITHIN GROUP (ORDER BY sort_expression) syntax, and they are converted to an equivalent aggregate function that takes the ordering expression as the first argument.

Function	Equivalent
`mode() WITHIN GROUP (ORDER BY column [(ASC\|DESC)])`	`mode(column ORDER BY column [(ASC\|DESC)])`
`percentile_cont(fraction) WITHIN GROUP (ORDER BY column [(ASC\|DESC)])`	`quantile_cont(column, fraction ORDER BY column [(ASC\|DESC)])`
`percentile_cont(fractions) WITHIN GROUP (ORDER BY column [(ASC\|DESC)])`	`quantile_cont(column, fractions ORDER BY column [(ASC\|DESC)])`
`percentile_disc(fraction) WITHIN GROUP (ORDER BY column [(ASC\|DESC)])`	`quantile_disc(column, fraction ORDER BY column [(ASC\|DESC)])`
`percentile_disc(fractions) WITHIN GROUP (ORDER BY column [(ASC\|DESC)])`	`quantile_disc(column, fractions ORDER BY column [(ASC\|DESC)])`

Miscellaneous Aggregate Functions

Function	Description	Alias
`grouping()`	For queries with `GROUP BY` and either `ROLLUP` or `GROUPING SETS`: Returns an integer identifying which of the argument expressions where used to group on to create the current supper-aggregate row.	`grouping_id()`

About this page

In this article