Postgres chooses much slower Seq Scan instead of Index Scan

Question

I started by ensuring the planner had updated stats:

my_db=> vacuum analyze; VACUUM Time: 1401.958 ms

When only selecting foos.bar_id, the query executes fine with an Index Only Scan on that column:

my_db=> EXPLAIN ANALYZE SELECT foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id; QUERY PLAN Nested Loop (cost=0.43..16203.46 rows=353198 width=4) (actual time=0.045..114.746 rows=196205 loops=1) -> Seq Scan on bar_ids (cost=0.00..16.71 rows=871 width=4) (actual time=0.005..0.195 rows=871 loops=1) -> Index Only Scan using index_foos_on_bar_id on foos (cost=0.43..14.80 rows=378 width=4) (actual time=0.003..0.055 rows=225 loops=871) Index Cond: (bar_id = bar_ids.id) Heap Fetches: 0 Planning time: 0.209 ms Execution time: 144.364 ms (7 rows) Time: 145.620 ms

However, adding foos.id causes the query to choose an extremely slow Seq Scan:

my_db=> EXPLAIN ANALYZE SELECT foos.id, foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id; QUERY PLAN Hash Join (cost=27.60..221339.63 rows=353198 width=8) (actual time=294.091..3341.926 rows=196205 loops=1) Hash Cond: (foos.bar_id = bar_ids.id) -> Seq Scan on foos (cost=0.00..182314.70 rows=7093070 width=8) (actual time=0.004..1855.900 rows=7111807 loops=1) -> Hash (cost=16.71..16.71 rows=871 width=4) (actual time=0.454..0.454 rows=866 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 39kB -> Seq Scan on bar_ids (cost=0.00..16.71 rows=871 width=4) (actual time=0.002..0.222 rows=871 loops=1) Planning time: 0.237 ms Execution time: 3371.622 ms (8 rows) Time: 3373.150 ms

Disabling Seq Scan forces an Index Scan on the same index, which is an order of magnitude faster than the Seq Scan:

my_db=> set enable_seqscan=false; SET Time: 0.801 ms my_db=> EXPLAIN ANALYZE SELECT foos.id, foos.bar_id FROM foos INNER JOIN bar_ids ON foos.bar_id = bar_ids.id; QUERY PLAN Nested Loop (cost=10000000000.43..10000439554.99 rows=353198 width=8) (actual time=0.028..171.632 rows=196205 loops=1) -> Seq Scan on bar_ids (cost=10000000000.00..10000000016.71 rows=871 width=4) (actual time=0.005..0.212 rows=871 loops=1) -> Index Scan using index_foos_on_bar_id on foos (cost=0.43..500.86 rows=378 width=8) (actual time=0.003..0.118 rows=225 loops=871) Index Cond: (bar_id = bar_ids.id) Planning time: 0.186 ms Execution time: 201.958 ms (6 rows) Time: 203.185 ms

Other answers say the poor planning is due to bad statistics. My statistics are up to date. What gives?

bar_ids is a temporary table, which might be related to the insane cost estimates in the last query (Seq Scan on bar_ids (cost=10000000000.00..10000000016.71), but explicitly running ANALYZE bar_ids doesn't change the query plan.

have you tried running an analyze on the foos table to determine the most efficient execution? I've had this issue before and I have never understood why the system randomly thinks a seq scan is best.. — SpaceCowboy
– SpaceCowboy, Commented Nov 22, 2016 at 16:09
the first command, vacuum analyze, should do this for all tables if I'm understanding the docs correctly — rcrogers
– rcrogers, Commented Nov 22, 2016 at 16:13
I agree it should, in my experience analysing a single table seems to work. I can't say why.. anyone else shed light on this? — SpaceCowboy
– SpaceCowboy, Commented Nov 22, 2016 at 16:14
random_page_cost could likely use to be reduced, though that's more art than science. I'd try 2 and see if that tilts the scale in your favor - given everything else it may just be that the planner is reluctant to pick the index scan because of this. With good enough modern hardware the default of 4 is a bit cautious. — yieldsfalsehood
– yieldsfalsehood, Commented Nov 22, 2016 at 16:51

yieldsfalsehood · Accepted Answer · 2016-11-22 19:45:43Z

Following up from the comments on the OP here.

In the case of the first query when you're only selecting foos.bar_id the executor is able to fulfill this with an index only scan, which is nifty. Tacking on another column (that's not covered in the index) to the select list, however, means that to continue using this index implies the typical "double read" situation where we first read the index page then read the table page to get the remaining column's value, which implies a potential for quite a bit of random IO.

The setting random_page_cost should be taken relative to seq_page_cost to mean (loosely) that random IO is random_page_cost times more expensive than sequential IO (given seq_page_cost=1). With modern drives, random IO isn't that expensive, so lowering random_page_cost can make index scans more preferable. Finding the "best" value for this is tricky, but starting around 2 is a decent rule of thumb.

E: I didn't have time to add these extra thoughts earlier but it's been bugging me. If you happen upon this because you're experiencing similar issues, please don't just start fiddling with this config out the gate. In this situation it seemed like a fruitful approach because OP had already clarified that statistics were fresh and reasonable, that the index scan was viable, and that the planner had a stronger preference for the table scan even though that was clearly worse (in terms of empirical evidence). This config isn't a silver bullet and if you are having performance issues that don't really match the ones given here then this solution may not be for you! Please consider that other situations may require query rewrites or changes to other configuration items (especially those related to memory usage and allocation).

Collectives™ on Stack Overflow

Postgres chooses much slower Seq Scan instead of Index Scan

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related