Create test data...
CREATE UNLOGGED TABLE a AS SELECT a_id, (random()*100000)::INTEGER owner_id FROM generate_series(1,1000000) a_id; CREATE UNLOGGED TABLE b AS SELECT b_id, (random()*100000)::INTEGER a_id, (random()*100000)::INTEGER user_id FROM generate_series(1,10000000) b_id; CREATE INDEX a_o ON a(owner_id); CREATE INDEX b_a ON b(a_id); CREATE INDEX b_u ON b(user_id); ALTER TABLE a ADD PRIMARY KEY(a_id); ALTER TABLE b ADD PRIMARY KEY(b_id); VACUUM ANALYZE a,b;
Problem with first query is postgres doesn't know how to optimize star join, so we have to give it a little help.
WITH ids AS ( SELECT a_id FROM b WHERE user_id=201 UNION SELECT a_id FROM a WHERE owner_id=100 ) SELECT * FROM ids JOIN b USING (a_id) LIMIT 50;
This gives a plan using both indices, it may be faster in your case, or maybe not.
Limit (cost=455.41..634.97 rows=50 width=12) (actual time=0.494..0.642 rows=50 loops=1) -> Nested Loop (cost=455.41..41596.19 rows=11456 width=12) (actual time=0.492..0.629 rows=50 loops=1) -> HashAggregate (cost=450.19..451.32 rows=113 width=4) (actual time=0.425..0.427 rows=1 loops=1) Group Key: b_1.a_id Batches: 1 Memory Usage: 24kB -> Append (cost=5.23..449.91 rows=113 width=4) (actual time=0.076..0.358 rows=98 loops=1) -> Bitmap Heap Scan on b b_1 (cost=5.23..401.21 rows=102 width=4) (actual time=0.075..0.299 rows=92 loops=1) Recheck Cond: (user_id = 201) Heap Blocks: exact=92 -> Bitmap Index Scan on b_u (cost=0.00..5.20 rows=102 width=0) (actual time=0.035..0.035 rows=92 loops=1) Index Cond: (user_id = 201) -> Bitmap Heap Scan on a (cost=4.51..47.00 rows=11 width=4) (actual time=0.019..0.033 rows=6 loops=1) Recheck Cond: (owner_id = 100) Heap Blocks: exact=6 -> Bitmap Index Scan on a_o (cost=0.00..4.51 rows=11 width=0) (actual time=0.014..0.014 rows=6 loops=1) Index Cond: (owner_id = 100) -> Bitmap Heap Scan on b (cost=5.22..363.09 rows=101 width=12) (actual time=0.059..0.174 rows=50 loops=1) Recheck Cond: (a_id = b_1.a_id) Heap Blocks: exact=50 -> Bitmap Index Scan on b_a (cost=0.00..5.19 rows=101 width=0) (actual time=0.023..0.023 rows=104 loops=1) Index Cond: (a_id = b_1.a_id) Planning Time: 0.448 ms Execution Time: 0.747 ms
As for the other query, I had to run this:
select owner_id, user_id, count(*) from a join b using (a_id) group by owner_id,user_id order by count(*) desc limit 100;
to get some user_id,owner_id that would actually return results from my test data. Then,
EXPLAIN ANALYZE SELECT b.* FROM b JOIN a USING (a_id) WHERE (b.user_id = 99238 AND a.owner_id = 58599) OR (b.user_id = 36859 AND a.owner_id = 99027) LIMIT 50; Limit (cost=24.97..532.32 rows=1 width=12) (actual time=0.274..0.982 rows=6 loops=1) -> Nested Loop (cost=24.97..532.32 rows=1 width=12) (actual time=0.271..0.976 rows=6 loops=1) -> Bitmap Heap Scan on a (cost=9.03..92.70 rows=22 width=8) (actual time=0.108..0.216 rows=12 loops=1) Recheck Cond: ((owner_id = 58599) OR (owner_id = 99027)) Heap Blocks: exact=12 -> BitmapOr (cost=9.03..9.03 rows=22 width=0) (actual time=0.086..0.088 rows=0 loops=1) -> Bitmap Index Scan on a_o (cost=0.00..4.51 rows=11 width=0) (actual time=0.064..0.065 rows=3 loops=1) Index Cond: (owner_id = 58599) -> Bitmap Index Scan on a_o (cost=0.00..4.51 rows=11 width=0) (actual time=0.020..0.020 rows=9 loops=1) Index Cond: (owner_id = 99027) -> Bitmap Heap Scan on b (cost=15.95..19.97 rows=1 width=12) (actual time=0.058..0.060 rows=0 loops=12) Recheck Cond: ((a_id = a.a_id) AND ((user_id = 99238) OR (user_id = 36859))) Filter: (((user_id = 99238) AND (a.owner_id = 58599)) OR ((user_id = 36859) AND (a.owner_id = 99027))) Heap Blocks: exact=6 -> BitmapAnd (cost=15.95..15.95 rows=1 width=0) (actual time=0.053..0.053 rows=0 loops=12) -> Bitmap Index Scan on b_a (cost=0.00..5.19 rows=101 width=0) (actual time=0.015..0.015 rows=50 loops=12) Index Cond: (a_id = a.a_id) -> BitmapOr (cost=10.50..10.50 rows=205 width=0) (actual time=0.046..0.046 rows=0 loops=6) -> Bitmap Index Scan on b_u (cost=0.00..5.20 rows=102 width=0) (actual time=0.021..0.021 rows=121 loops=6) Index Cond: (user_id = 99238) -> Bitmap Index Scan on b_u (cost=0.00..5.20 rows=102 width=0) (actual time=0.024..0.024 rows=105 loops=6) Index Cond: (user_id = 36859) Planning Time: 0.703 ms Execution Time: 1.063 ms
It doesn't use a seq scan as yours do, so maybe you have an old version that couldn't optimize this properly? It's quite weird that it picks a seq scan for table a when the row count estimates are pretty accurate. You should investigate, maybe try
SELECT * FROM a WHERE a.owner_id = 58599 OR a.owner_id = 99027 LIMIT 50;
this should give an index or bitmap index scan, if it does a seq scan, then you have a small test case to find out why. Anyway, you can still force use of indices with:
EXPLAIN ANALYZE WITH ids AS ( SELECT a_id FROM b WHERE user_id IN (99238,36859) UNION SELECT a_id FROM a WHERE owner_id IN (58599,99027) ) SELECT * FROM ids JOIN b USING (a_id) JOIN a USING (a_id) WHERE (b.user_id = 99238 AND a.owner_id = 58599) OR (b.user_id = 36859 AND a.owner_id = 99027);
...but it's pretty damn ugly. Or you could do each clause in your OR separately and do the AND manyally with this one, which is also ugly:
EXPLAIN ANALYZE SELECT a_id FROM b WHERE b.user_id = 99238 INTERSECT SELECT a_id FROM a WHERE a.owner_id = 58599 LIMIT 50;
How can I optimize large OFFSETs
You don't, in fact when large offsets are used it usually hints that you're doing it wrong,by repeatedly doing the same query, for example for pagination, and displaying chunks of results. There are two solutions. If results will be fetched quickly enough that a transaction can stay open while you do that, open a cursor for the query without LIMIT or OFFSET and use FETCH to get results in chunks. Otherwise, do the query once without LIMIT, store the results in a cache, and paginate from that without redoing the query.
explain (analyze, buffers, format text)- not just a "simple" explainORcondition toUNIONmaybe it will be fasterOFFSETit is very slow, it takes >10 seconds