Redshift query takes long time on grouping

Question

I have table like this

I just want to get all userIds and bookId which have just one bookId. I am using the following query

SELECT userId, count(DISTINCT bookid) AS num_books FROM table GROUP BY userId HAVING num_books=1

I also need to then join with some other tables. This query is excruciatingly slow. I am sure there is a better way to write this query, I just cant figure out the way...

Gordon Linoff · Accepted Answer · 2020-09-08 17:42:06Z

You can use aggregation:

select userid, min(bookid) as bookid from t group by userid having count(distinct bookid) = 1;

Or for the having clause:

having min(bookid) = max(bookid)

I don't think there is significantly faster way to write the query -- although eliminating the count(distinct) as above might help. You could also try:

select distinct userid, bookid from (select t.*, min(bookid) over (partition by userid) as min_bookid, max(bookid) over (partition by userid) as max_bookid from t ) t where min_bookid = max_bookid;

This filters before doing the select distinct, which might help performance. However, there is the cost of the window functions.

Sorry, I updated the question, so I need to get userid and the bookid with exactly one bookid
@Yogi . . . It is a simple tweak to either query. I modified the answer.

Amish Shah · Accepted Answer · 2020-09-08 16:22:56Z

distinct will run slow if table is big. as you are looking for only one book id you can try this one. select user_id, min(book_id), max(book_id) from group by user_id having max(book_id) = min(book_id)

Sorry, I updated the question, so I need to get userid and the bookid with exactly one bookid?

Collectives™ on Stack Overflow

Redshift query takes long time on grouping

2 Answers 2

5 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Related