2

We can use ARRAY_AGG aggregate function to generate an array from data.

The goal is to find a way to limit the input to specific number of entries like ARRAY_AGG(...) WITHIN GROUP(... LIMIT 3) without restructuring the main query.

Sidenote: UDAF (User-Defined Aggregate Functions) are not available at the moment of writing.

For sample data:

CREATE OR REPLACE TABLE tab(grp TEXT, col TEXT) AS SELECT * FROM VALUES ('Grp1', 'A'),('Grp1', 'B'),('Grp1', 'C'),('Grp1', 'D'), ('Grp1', 'E'), ('Grp2', 'X'),('Grp2', 'Y'),('Grp2', 'Z'),('Grp2', 'V'), ('Grp3', 'M'),('Grp3', 'N'),('Grp3', 'M'); 

Output:

GRP ARR_LIMIT_3 Grp3 [ "M", "M", "N" ] Grp2 [ "V", "X", "Y" ] Grp1 [ "A", "B", "C" ] 

Usage of ARRAY_SLICE is not an option if underlying ARRAY_AGG exceeds 16MB

SELECT grp, ARRAY_SLICE(ARRAY_AGG(col), 1,3)) FROM big_table JOIN ... GROUP BY grp; -- Result array of ARRAYAGG is too large 

1 Answer 1

5

It is possible to achieve similar effect by using MIN_BY/MIN_MAX function:

MIN_BY( <col_to_return>, <col_containing_mininum> [ , <maximum_number_of_values_to_return> ] )

SELECT grp, ARRAY_AGG(col) AS arr, MIN_BY(col, col, 3) AS arr_limit_3 FROM tab GROUP BY grp; 

Output:

GRP ARR ARR_LIMIT_3
Grp3 [ "M", "N", "M" ] [ "M", "M", "N" ]
Grp2 [ "X", "Y", "Z", "V" ] [ "V", "X", "Y" ]
Grp1 [ "A", "B", "C", "D", "E" ] [ "A", "B", "C" ]

If the sorting is irrelevant then MIN_BY(col, 'some_constant', 3).


ARRAY_UNIQUE_AGG or ARRAY_AGG(DISTINCT ...) is:

SELECT grp, ARRAY_UNIQUE_AGG(col) AS arr, ARRAY_AGG(DISTINCT col) AS arr2, ARRAY_DISTINCT(MIN_BY(col, col, 3)) AS arr_distinct_limit_3 FROM tab GROUP BY grp; 

Output:

GRP ARR ARR2 ARR_DISTINCT_LIMIT_3
Grp3 [ "M", "N" ] [ "M", "N" ] [ "M", "N" ]
Grp2 [ "X", "Y", "Z", "V" ] [ "X", "Y", "Z", "V" ] [ "V", "X", "Y" ]
Grp1 [ "A", "B", "C", "D", "E" ] [ "A", "B", "C", "D", "E" ] [ "A", "B", "C" ]

It is possible to handle WITHIN GROUP(ORDER BY <some_col> ASC/DESC) too:

SELECT grp, -- ASC ARRAY_AGG(col) WITHIN GROUP(ORDER BY some_col), MIN_BY(col, some_col, 3), -- DESC ARRAY_AGG(col) WITHIN GROUP(ORDER BY some_col DESC), MAX_BY(col, some_col, 3) FROM ...; 
Sign up to request clarification or add additional context in comments.

1 Comment

Is there a parenthesis issue in the last code block perhaps?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.