5

I have a table of rows with the following structure name TEXT, favorite_colors TEXT[], group_name INTEGER where each row has a list of everyone's favorite colors and the group that person belongs to. How can I GROUP BY group_name and return a list of the most common colors in each group?

Could you do a combination of int[] && int[] to set for overlap, int[] & int[] to get the intersection and then something else to count and rank?

1 Answer 1

6

Quick and dirty:

SELECT group_name, color, count(*) AS ct FROM ( SELECT group_name, unnest(favorite_colors) AS color FROM tbl ) sub GROUP BY 1,2 ORDER BY 1,3 DESC; 

Better with a LATERAL JOIN

In Postgres 9.3 or later this is the cleaner form:

SELECT group_name, color, count(*) AS ct FROM tbl t, unnest(t.favorite_colors) AS color GROUP BY 1,2 ORDER BY 1,3 DESC; 

The above is shorthand for

... FROM tbl t JOIN LATERAL unnest(t.favorite_colors) AS color ON TRUE ... 

And like with any other INNER JOIN, it would exclude rows without color (favorite_colors IS NULL) - as did the first query.

To include such rows in the result, use instead:

SELECT group_name, color, count(*) AS ct FROM tbl t LEFT JOIN LATERAL unnest(t.favorite_colors) AS color ON TRUE GROUP BY 1,2 ORDER BY 1,3 DESC; 

You can easily aggregate the "most common" colors per group in the next step, but you'd need to define "most common colors" first ...

Most common colors

As per comment, pick colors with > 3 occurrences.

SELECT t.group_name, color, count(*) AS ct FROM tbl t, unnest(t.favorite_colors) AS color GROUP BY 1,2 HAVING count(*) > 3 ORDER BY 1,3 DESC; 

To aggregate the top colors in an array (in descending order):

SELECT group_name, array_agg(color) AS top_colors FROM ( SELECT group_name, color FROM tbl t, unnest(t.favorite_colors) AS color GROUP BY 1,2 HAVING count(*) > 3 ORDER BY 1, count(*) DESC ) sub GROUP BY 1; 

-> SQLfiddle demonstrating all.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, could you explain how grouping by those integers achieves this? If I want to put in a HAVING so I could ask for only the colors that have more than 3 occurrences in a group, where would I put the HAVING. I keep getting syntax errors.
@mhkeller: Consider the addendum and the updated fiddle.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.