0

Following tables:

CREATE TEMPORARY TABLE guys ( guy_id integer primary key, guy text ); CREATE TEMPORARY TABLE sales ( log_date date, sales_guys integer[], sales smallint ); INSERT INTO guys VALUES(1,'john'),(2,'joe'); INSERT INTO sales VALUES('2016-01-01', '{1,2}', 2),('2016-01-02','{1,2}',4); 

Following query works great to show names on a given date:

SELECT log_date, sales_guys, ARRAY_AGG(guy), sales FROM sales JOIN guys ON guys.guy_id = ANY(sales.sales_guys) GROUP BY log_date, sales_guys, sales ORDER BY log_date ASC; log_date | sales_guys | array_agg | sales ------------+------------+------------+------- 2016-01-01 | {1,2} | {john,joe} | 2 2016-01-02 | {1,2} | {john,joe} | 4 

Following query problematically gives me a name per date per guy, so here each name twice, and so on):

SELECT sales_guys, ARRAY_AGG(guy), SUM(sales) AS sales FROM sales JOIN guys ON guys.guy_id = ANY(sales.sales_guys) GROUP BY sales_guys; 

Yields:

 sales_guys | array_agg | sales ------------+---------------------+------- {1,2} | {john,joe,john,joe} | 12 

Is there a way to somehow reduce the ARRAY_AGG call to give only the unique names?

1
  • Please clarify what you want to aggregate exactly. Sales for the same team? Would you group {1,2} with {2,1}? And always declare your version of Postgres. Commented Nov 25, 2016 at 20:03

2 Answers 2

2

There is no kind of order you can trust without ORDER BY. Except that elements of arrays, when unnested, come in array order. If your query does more with the result, it may be re-ordered, though.

You an simply add ORDER BY to any aggregate function in Postgres:

SELECT s.sales_guys, ARRAY_AGG(DISTINCT g.guy ORDER BY g.guy) AS names, SUM(s.sales) AS sum_sales FROM sales s JOIN guys g ON g.guy_id = ANY(s.sales_guys) GROUP BY s.sales_guys;

But that's obviously not the original order of array elements. And the query has other issues ... Neither IN nor = ANY() care about order of elements in the set, list or array on the right side:

Proper solution

For this task (attention to the details!):

Get the total sales per array sales_guys, where the order of elements makes a difference (arrays '{1,2}' and '{2,1}' are not the same) and sales_guys has neither duplicate nor NULL elements. Add an array of resolved names in matching order.

Use unnest() with WITH ORDINALITY. and aggregate arrays before you resolve names, that's cheaper and less error prone.

SELECT s.*, g. FROM ( SELECT sales_guys, sum (sales) AS total_sales -- aggregate first in subquery FROM sales GROUP BY 1 ) s , LATERAL ( SELECT array_agg(guy ORDER BY ord) AS names -- order by original order FROM unnest(s.sales_guys) WITH ORDINALITY sg(guy_id, ord) -- with order of elements LEFT JOIN guys g USING (guy_id) -- LEFT JOIN to add NULL for missing guy_id ) g; 

The LATERAL subquery can be joined with unconditional CROSS JOIN - comma (,) is shorthand notation - because the aggregate in the subquery guarantees a result for every row. Else you'd use LEFT JOIN LATERAL .. ON true.

Detailed explanation:

Sign up to request clarification or add additional context in comments.

Comments

0

You can use DISTINCT inside of the aggregate:

SELECT sales_guys, ARRAY_AGG(DISTINCT guy), SUM(sales) AS sales FROM sales JOIN guys ON guys.guy_id = ANY(sales.sales_guys) GROUP BY sales_guys; 

3 Comments

dang, that's obvious. Woof. Is there some kind of order you can trust in the DISTINCT call? Would they be ordered here by guy_id in guys or guy in guys or is it effectively random?
I think there is no defined order. You can fetch the guys in the join by an ordered subquery or view first to get an ordered array.
hmm, I kind of hoped to get a guy_id array, and a guy array such that I could trust guy_id[1] == guy[1] but it doesn't seem possible w/o a subquery.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.