PostgreSQL conducts seq scan instead of index only scan

Question

I have the following table structure:

create table transfers ( id serial not null constraint transactions_pkey primary key, name varchar(255) not null, money integer not null ); create index transfers_name_index on transfers (name);

When executing the following query it is quite slow as it does a sequential scan:

EXPLAIN ANALYZE SELECT name FROM transfers GROUP by name ORDER BY name ASC; Group (cost=37860.49..41388.54 rows=14802 width=15) (actual time=4285.530..7459.872 rows=999766 loops=1) Group Key: name -> Gather Merge (cost=37860.49..41314.53 rows=29604 width=15) (actual time=4285.529..7136.432 rows=999935 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=36860.46..36897.47 rows=14802 width=15) (actual time=4104.159..5107.148 rows=333312 loops=3) Sort Key: name Sort Method: external merge Disk: 14928kB Worker 0: Sort Method: external merge Disk: 13616kB Worker 1: Sort Method: external merge Disk: 13656kB -> Partial HashAggregate (cost=35687.15..35835.17 rows=14802 width=15) (actual time=604.984..689.111 rows=333312 loops=3) Group Key: name -> Parallel Seq Scan on transfers (cost=0.00..32571.52 rows=1246252 width=15) (actual time=0.063..200.548 rows=997032 loops=3) Planning Time: 0.088 ms Execution Time: 7531.142 ms

However when setting seqscan to off, the index only scan is correctly used, as I would expect.

SET enable_seqscan = OFF; EXPLAIN ANALYZE SELECT name FROM transfers GROUP by name ORDER BY name ASC; Group (cost=1000.45..100492.67 rows=14802 width=15) (actual time=8.032..2212.538 rows=999766 loops=1) Group Key: name -> Gather Merge (cost=1000.45..100418.66 rows=29604 width=15) (actual time=8.029..1880.388 rows=999778 loops=1) Workers Planned: 2 Workers Launched: 2 -> Group (cost=0.43..96001.60 rows=14802 width=15) (actual time=0.074..383.471 rows=333259 loops=3) Group Key: name -> Parallel Index Only Scan using transfers_name_index on transfers (cost=0.43..92885.97 rows=1246252 width=15) (actual time=0.066..189.436 rows=997032 loops=3) Heap Fetches: 0 Planning Time: 0.197 ms Execution Time: 2279.321 ms

Why does Postgres not use the more efficient index only scan without forcing it? The table contains about 3 million records. Am using PostgreSQL 11.2.

@a_horse_with_no_name already, tried that, doesn't seem to make a difference. Version added in opeing post. — edwardmp
– edwardmp, Commented Nov 12, 2019 at 17:27
Your query wants all the records. It would need all the record for an index-only scan, too. (but maybe the rowsize could differ?) — joop
– joop, Commented Nov 12, 2019 at 17:31
Mybe your random_page_cost is set too high.(factory default is 4.0, for ssd / NAS you can lower it to below 2) — joop
– joop, Commented Nov 12, 2019 at 18:10
There are several parameters about query planner As I know for modern SSD devices random_page_cost should be 2 Note that it could to be set at runtime, so just before your query execute set random_page_cost to 2; — Abelisto
– Abelisto, Commented Nov 12, 2019 at 18:15
"Group (cost=37860.49..41388.54 rows=14802 width=15) (actual time=4285.530..7459.872 rows=999766 loops=1)" Do you know why this estimate is so wrong? — jjanes
– jjanes, Commented Nov 12, 2019 at 21:17

Jeremy · Accepted Answer · 2019-11-12 17:59:38Z

2

For postgres to prefer the index only scan, most of the pages should be visible. You can check this in pg_class:

SELECT relpages, relallvisible FROM pg_class WHERE relname='transfers';

If relallvisible is 0 or much lower than relpages, you should VACUUM the table:

VACUUM ANALYZE transfers;

answered Nov 12, 2019 at 17:59

Jeremy

6,77322 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

edwardmp Over a year ago

Thank you, already tried vacuuming. Executing the query returns 20109 for both columns. But since we are only selecting data that is an index (name column), we are not actually accessing the heap so is visibility really a concern here?

Jeremy Over a year ago

Yes, it's definitely a concern. If the pages are not visible to all, postgres will need to access the heap to check visibility. There is some discussion about that here: postgresql.org/docs/current/indexes-index-only-scans.html

edwardmp Over a year ago

thanks. But since relpages equals relallvisible, this doesn't seem to be the issue here, right?

Samuel Goldenbaum · Accepted Answer · 2019-11-12 17:16:55Z

Try adding a decent amount of data and run the queries again. Postgres doesn't always use the index and may decide it will be quicker to do a scan if there are only a few records in the table.

great, I have seen the indexes ignored when there are only a few rows

jjanes · Accepted Answer · 2019-11-12 21:40:03Z

When I fill your table with 3e6 rows containing 1e6 distinct names, I get the index only scan. However, if I force the distinct value estimate to match yours, it switches to the seq scan:

alter table transfers alter name set (N_DISTINCT = 14802); analyze transfers;

So if you use the same method to set it to the correct value, I bet yours would switch the other way.

Why is it wrong in the first place? I bet your table is clustered on name, and your default_statistics_target is too low.

This seems like the most likely cause at this point. There are a few names that appear accross a lot records, and also a lot of names that only appear across 1 record. I'll see if I can play with the table statistics myself

Collectives™ on Stack Overflow

PostgreSQL conducts seq scan instead of index only scan

3 Answers 3

3 Comments

1 Comment

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

1 Comment

Related