PostgreSQL: why does an extra WHERE clause make an index-only scan slow?

Question

I have a query that is being made 18x slower by the presence of a single WHERE clause. The query plan is an index-only scan and the clause in question looks at one of the fields in the index.

This is the query and the offending clause is update_id < 40581:

EXPLAIN (VERBOSE, ANALYZE, COSTS, BUFFERS) WITH a AS ( SELECT e_match.account_id, for_time, for_eid, originating_update_id, attr1, attr2, attr3 FROM combinations combo INNER JOIN (SELECT account_id, time as for_time, eid AS for_eid, originating_update_id FROM entries WHERE account_id = '1' AND time <= '2017-11-30 15:57:35.267405+00' AND originating_update_id < 40581) e_match ON combo.account_id = e_match.account_id) SELECT a.attr1, a.attr2, a.attr3, b.amount, for_time, for_eid, originating_update_id, b.eid = a.for_eid AS is_stored FROM a CROSS JOIN LATERAL (SELECT time, update_id, amount, eid FROM updates WHERE account_id = a.account_id AND attr1 = a.attr1 AND attr2 = a.attr2 AND attr3 = a.attr3 AND update_id < 40581 AND -- THIS CLAUSE MAKES IT SLOW time <= a.for_time AND (time < a.for_time OR (time = a.for_time AND originating_update_id <= a.originating_update_id)) ORDER BY time DESC, originating_update_id DESC, update_id DESC LIMIT 1) b ORDER BY for_time, originating_update_id, for_eid, attr1, attr2, attr3;

It takes 14241.977 ms to run, but if I remove the update_id < 40581 it runs in 797.601 ms.

The query is relatively complex. It joins a table of entries with a table of all updates for entries to use the latest entry for each of the known combinations of attributes. This is the loose-index-scan emulation trick. According to the query plan most of the time is spent in the SELECT on the updates table and it is that time that varies between the two versions.

The tables are structured as follows:

 Table "public.updates" Column | Type | Collation | Nullable | Default -----------------------+-----------------------------+-----------+----------+--------- account_id | text | | not null | attr1 | text | | not null | eid | bigint | | not null | as_of_eid | bigint | | not null | update_id | bigint | | not null | originating_update_id | bigint | | not null | time | timestamp without time zone | | not null | attr2 | integer | | not null | attr3 | text | | not null | amount | numeric | | not null | Indexes: "updates_pkey" PRIMARY KEY, btree (eid, as_of_eid, account_id, attr1) "updates_update_id_eid_account_id_attr1_key" UNIQUE CONSTRAINT, btree (update_id, eid, account_id, attr1) "updates_covering_idx2" btree (account_id, attr1, attr2, attr3, time DESC, update_id DESC, originating_update_id DESC, amount, eid) "updates_covering_idx3" btree (account_id, attr1, attr2, attr3, time DESC, originating_update_id DESC, update_id DESC, amount, eid) Table "public.entries" Column | Type | Collation | Nullable | Default -----------------------+-----------------------------+-----------+----------+--------- account_id | text | | not null | eid | bigint | | not null | time | timestamp without time zone | | | originating_update_id | bigint | | | Indexes: "entries2_pkey" PRIMARY KEY, btree (account_id, eid) "entries_by_time" btree (account_id, time, eid, originating_update_id) Table "public.combinations" Column | Type | Collation | Nullable | Default --------------+---------+-----------+----------+--------- account_id | text | | not null | attr1 | text | | not null | attr2 | integer | | not null | attr3 | text | | not null | Indexes: "combinations_pkey" PRIMARY KEY, btree (account_id, attr1, attr2, attr3)

This is the query plan from the slow run with the offending clause:

 Sort (cost=65859.13..65983.86 rows=49889 width=107) (actual time=14237.002..14239.136 rows=12967 loops=1) Output: a.attr1, a.attr2, a.attr3, updates.amount, a.for_time, a.for_eid, a.originating_update_id, ((updates.eid = a.for_eid)) Sort Key: a.for_time, a.originating_update_id, a.for_eid, a.attr1, a.attr2, a.attr3 Sort Method: quicksort Memory: 2208kB Buffers: shared hit=849601 CTE a -> Nested Loop (cost=0.83..1279.76 rows=49889 width=71) (actual time=0.035..16.200 rows=18102 loops=1) Output: entries.account_id, entries.time, entries.eid, entries.originating_update_id, combo.attr1, combo.attr2, combo.attr3 Buffers: shared hit=69 -> Index Only Scan using entries_by_time on public.entries (cost=0.42..651.60 rows=7127 width=57) (actual time=0.021..1.895 rows=2586 loops=1) Output: entries.account_id, entries.time, entries.eid, entries.originating_update_id Index Cond: ((entries.account_id = '1'::text) AND (entries.time <= '2017-11-30 15:57:35.267405'::timestamp without time zone) AND (entries.originating_update_id < 40581)) Heap Fetches: 0 Buffers: shared hit=65 -> Materialize (cost=0.41..4.57 rows=7 width=50) (actual time=0.000..0.002 rows=7 loops=2586) Output: combo.attr1, combo.attr2, combo.attr3, combo.account_id Buffers: shared hit=4 -> Index Only Scan using combinations_pkey on public.combinations combo (cost=0.41..4.53 rows=7 width=50) (actual time=0.010..0.012 rows=7 loops=1) Output: combo.attr1, combo.attr2, combo.attr3, combo.account_id Index Cond: (combo.account_id = '1'::text) Heap Fetches: 0 Buffers: shared hit=4 -> Nested Loop (cost=0.69..57784.93 rows=49889 width=107) (actual time=0.117..14217.115 rows=12967 loops=1) Output: a.attr1, a.attr2, a.attr3, updates.amount, a.for_time, a.for_eid, a.originating_update_id, (updates.eid = a.for_eid) Buffers: shared hit=849601 -> CTE Scan on a (cost=0.00..997.78 rows=49889 width=124) (actual time=0.037..30.353 rows=18102 loops=1) Output: a.account_id, a.for_time, a.for_eid, a.originating_update_id, a.attr1, a.attr2, a.attr3 Buffers: shared hit=69 -> Limit (cost=0.69..1.12 rows=1 width=39) (actual time=0.783..0.783 rows=1 loops=18102) Output: updates.time, updates.update_id, updates.amount, updates.eid, updates.originating_update_id Buffers: shared hit=849532 -> Index Only Scan using updates_covering_idx3 on public.updates (cost=0.69..777.72 rows=1825 width=39) (actual time=0.782..0.782 rows=1 loops=18102) Output: updates.time, updates.update_id, updates.amount, updates.eid, updates.originating_update_id Index Cond: ((updates.account_id = a.account_id) AND (updates.attr1 = a.attr1) AND (updates.attr2 = a.attr2) AND (updates.attr3 = a.attr3) AND (updates.time <= a.for_time) AND (updates.update_id < 40581)) Filter: ((updates.time < a.for_time) OR ((updates.time = a.for_time) AND (updates.originating_update_id <= a.originating_update_id))) Rows Removed by Filter: 6 Heap Fetches: 0 Buffers: shared hit=849532 Planning time: 0.446 ms Execution time: 14241.977 ms (40 rows)

and the fast run without the clause:

 Sort (cost=58914.11..59038.83 rows=49889 width=107) (actual time=792.649..794.853 rows=12970 loops=1) Output: a.attr1, a.attr2, a.attr3, updates.amount, a.for_time, a.for_eid, a.originating_update_id, ((updates.eid = a.for_eid)) Sort Key: a.for_time, a.originating_update_id, a.for_eid, a.attr1, a.attr2, a.attr3 Sort Method: quicksort Memory: 2208kB Buffers: shared hit=175954 CTE a -> Nested Loop (cost=0.83..1279.76 rows=49889 width=71) (actual time=0.036..15.278 rows=18102 loops=1) Output: entries.account_id, entries.time, entries.eid, entries.originating_update_id, combo.attr1, combo.attr2, combo.attr3 Buffers: shared hit=69 -> Index Only Scan using entries_by_time on public.entries (cost=0.42..651.60 rows=7127 width=57) (actual time=0.022..1.675 rows=2586 loops=1) Output: entries.account_id, entries.time, entries.eid, entries.originating_update_id Index Cond: ((entries.account_id = '1'::text) AND (entries.time <= '2017-11-30 15:57:35.267405'::timestamp without time zone) AND (entries.originating_update_id < 40581)) Heap Fetches: 0 Buffers: shared hit=65 -> Materialize (cost=0.41..4.57 rows=7 width=50) (actual time=0.000..0.002 rows=7 loops=2586) Output: combo.attr1, combo.attr2, combo.attr3, combo.account_id Buffers: shared hit=4 -> Index Only Scan using combinations_pkey on public.combinations combo (cost=0.41..4.53 rows=7 width=50) (actual time=0.011..0.014 rows=7 loops=1) Output: combo.attr1, combo.attr2, combo.attr3, combo.account_id Index Cond: (combo.account_id = '1'::text) Heap Fetches: 0 Buffers: shared hit=4 -> Nested Loop (cost=0.69..50839.90 rows=49889 width=107) (actual time=0.115..774.342 rows=12970 loops=1) Output: a.attr1, a.attr2, a.attr3, updates.amount, a.for_time, a.for_eid, a.originating_update_id, (updates.eid = a.for_eid) Buffers: shared hit=175954 -> CTE Scan on a (cost=0.00..997.78 rows=49889 width=124) (actual time=0.039..28.176 rows=18102 loops=1) Output: a.account_id, a.for_time, a.for_eid, a.originating_update_id, a.attr1, a.attr2, a.attr3 Buffers: shared hit=69 -> Limit (cost=0.69..0.98 rows=1 width=39) (actual time=0.040..0.040 rows=1 loops=18102) Output: updates.time, updates.update_id, updates.amount, updates.eid, updates.originating_update_id Buffers: shared hit=175885 -> Index Only Scan using updates_covering_idx3 on public.updates (cost=0.69..808.79 rows=2820 width=39) (actual time=0.039..0.039 rows=1 loops=18102) Output: updates.time, updates.update_id, updates.amount, updates.eid, updates.originating_update_id Index Cond: ((updates.account_id = a.account_id) AND (updates.attr1 = a.attr1) AND (updates.attr2 = a.attr2) AND (updates.attr3 = a.attr3) AND (updates.time <= a.for_time)) Filter: ((updates.time < a.for_time) OR ((updates.time = a.for_time) AND (updates.originating_update_id <= a.originating_update_id))) Rows Removed by Filter: 47 Heap Fetches: 0 Buffers: shared hit=175885 Planning time: 0.367 ms Execution time: 797.601 ms (40 rows)

There are 65788426 rows in the updates table and has been recently vacuumed and analyzed. The database settings are Amazon RDS default. I've renamed the tables and fields in the output above. Everything else is unabridged.

Why would that particular clause cause such a slowdown? The EXPLAIN ANALYZE output is almost identical between the two. The main difference is the shared hit count.

Strangely, if I change to clause to a clause on some other field, e.g. originating_update_id < 100000000 the query doesn't slow down.

UPDATE

I've simplified the query as @jjanes suggested:

EXPLAIN (VERBOSE, ANALYZE, COSTS, BUFFERS) SELECT time, update_id, amount, eid FROM updates WHERE account_id = '1' AND attr1 = '' AND attr2 = 3 AND attr3 = 'ABC' AND update_id < 40581 AND -- THIS CLAUSE MAKES IT SLOW time <= '2017-12-30 15:57:35.267405+00' AND (time < '2017-12-30 15:57:35.267405+00' OR (time = '2017-12-30 15:57:35.267405+00' AND originating_update_id <= 10000000)) ORDER BY time DESC, originating_update_id DESC, update_id DESC LIMIT 1;

With the offending clause it takes 35.5ms, reproducibly. Without that clause it takes 0.073ms.

Query plan with clause:

 Limit (cost=0.69..0.82 rows=1 width=39) (actual time=35.467..35.468 rows=1 loops=1) Output: time, update_id, amount, eid, originating_update_id Buffers: shared hit=1856 -> Index Only Scan using updates_covering_idx3 on public.updates (cost=0.69..731.25 rows=5423 width=39) (actual time=35.465..35.465 rows=1 loops=1) Output: time, update_id, amount, eid, originating_update_id Index Cond: ((updates.account_id = '1'::text) AND (updates.attr1 = ''::text) AND (updates.attr2 = 3) AND (updates.attr3 = 'ABC'::text) AND (updates.time <= '2017-12-30 15:57:35.267405'::timestamp without time zone) AND (updates.update_id < 40581)) Filter: ((updates.time < '2017-12-30 15:57:35.267405'::timestamp without time zone) OR ((updates.time = '2017-12-30 15:57:35.267405'::timestamp without time zone) AND (updates.originating_update_id <= 10000000))) Heap Fetches: 0 Buffers: shared hit=1856 Planning time: 0.202 ms Execution time: 35.493 ms (11 rows)

and without

 Limit (cost=0.69..0.78 rows=1 width=39) (actual time=0.045..0.045 rows=1 loops=1) Output: time, update_id, amount, eid, originating_update_id Buffers: shared hit=6 -> Index Only Scan using updates_covering_idx3 on public.updates (cost=0.69..762.10 rows=8383 width=39) (actual time=0.044..0.044 rows=1 loops=1) Output: time, update_id, amount, eid, originating_update_id Index Cond: ((updates.account_id = '1'::text) AND (updates.attr1 = ''::text) AND (updates.attr2 = 3) AND (updates.attr3 = 'ABC'::text) AND (updates.time <= '2017-12-30 15:57:35.267405'::timestamp without time zone)) Filter: ((updates.time < '2017-12-30 15:57:35.267405'::timestamp without time zone) OR ((updates.time = '2017-12-30 15:57:35.267405'::timestamp without time zone) AND (updates.originating_update_id <= 10000000))) Heap Fetches: 0 Buffers: shared hit=6 Planning time: 0.201 ms Execution time: 0.073 ms (11 rows)

If you want a fast query, don't use a CTE unless you need it. — Evan Carroll
– Evan Carroll, Commented Nov 16, 2017 at 16:35
Is this timing fully reproducible? If you add and remove the condition multiple times do you get similar results each time? — jjanes
– jjanes, Commented Nov 16, 2017 at 17:11
@jjanes yes totally reproducible. Have been playing with these for days now and always get the same timings — thehouse
– thehouse, Commented Nov 16, 2017 at 17:16
@EvanCarrol I tried a subquery instead of a CTE and it was slower — thehouse
– thehouse, Commented Nov 16, 2017 at 17:17

mustaccio · Accepted Answer · 2017-11-20 19:01:21Z

1

This kind of makes sense if your data are distributed in certain way. Without the update_id < 40581 condition the very first entry in the index that matches account_id = '1' AND attr1 = '' AND attr2 = 3 AND attr3 = 'ABC' satisfies the LIMIT clause, because the entries are ordered appropriately. With the condition in place it has to go and find one row with the matching update_id, which may be way down the list.

answered Nov 20, 2017 at 19:01

mustaccio

28.9k24 gold badges60 silver badges77 bronze badges

You're right! To confirm it I changed update_id to a value higher than MAX(update_id) and the query ran as fast as the one without an update_id clause. Is there some way I can speed up the process of searching for the right update_id? For example, if it's sequentially searching them, could it be made to do binary search using the index?

thehouse
– thehouse

2017-11-21 10:51:59 +00:00
Commented Nov 21, 2017 at 10:51
Since many of your predicates are range-based rather than equality, binary search is only applicable for finding the beginning of the range, from there index scan cannot be avoided. Table partitioning might be beneficial.

mustaccio
– mustaccio

2017-11-21 17:40:40 +00:00
Commented Nov 21, 2017 at 17:40
With LIMIT 1 the beginning of the range is what I'm after (well, the end, technically)

thehouse
– thehouse

2017-11-21 17:42:37 +00:00
Commented Nov 21, 2017 at 17:42

Add a comment |

Evan Carroll · Accepted Answer · 2017-11-16 16:53:07Z

There is something really weird going on here. Let's look at two lines

Slow:

 -> Index Only Scan using updates_covering_idx3 on public.updates (cost=0.69..777.72 rows=1825 width=39) (actual time=0.782..0.782 rows=1 loops=18102) Output: updates.time, updates.update_id, updates.amount, updates.eid, updates.originating_update_id Index Cond: ((updates.account_id = a.account_id) AND (updates.attr1 = a.attr1) AND (updates.attr2 = a.attr2) AND (updates.attr3 = a.attr3) AND (updates.time <= a.for_time) AND (updates.update_id < 40581)) Filter: ((updates.time < a.for_time) OR ((updates.time = a.for_time) AND (updates.originating_update_id <= a.originating_update_id))) Rows Removed by Filter: 6 Heap Fetches: 0 Buffers: shared hit=849532

Fast:

 -> Index Only Scan using updates_covering_idx3 on public.updates (cost=0.69..808.79 rows=2820 width=39) (actual time=0.039..0.039 rows=1 loops=18102) Output: updates.time, updates.update_id, updates.amount, updates.eid, updates.originating_update_id Index Cond: ((updates.account_id = a.account_id) AND (updates.attr1 = a.attr1) AND (updates.attr2 = a.attr2) AND (updates.attr3 = a.attr3) AND (updates.time <= a.for_time)) Filter: ((updates.time < a.for_time) OR ((updates.time = a.for_time) AND (updates.originating_update_id <= a.originating_update_id))) Rows Removed by Filter: 47 Heap Fetches: 0 Buffers: shared hit=175885

That's the condition that changes. In the Fast one you'll see MORE rows removed despite having less selectivity in index-access conditional. That tells me that these two queries were not run on the same data set. And, they may not be all that useful.

Moreover, on the fast one you're getting fewer hits to the shared buffer and on both of them you're getting no heap fetches. That means that on the slow one you're having to access the shared buffers 4.8x (849532/175885) as much on this one index scan. Which means,

Your data changed.
The index got fragmented.

Run a VACUUM FULL on the table and tell me if the slow one does not become faster than the fast one.

Running VACUUM FULL now. Will report back when it's finished. It's unlikely to be that the data changed between runs because I've run both versions of this query many times over several days and the timings are always the same — thehouse
– thehouse, Commented Nov 16, 2017 at 17:19
Then you need to start diving into the source, or asking an index hacker as to why reducing the range on the index scan can result in more shared buffer hits. You may want to upgrade to 10. — Evan Carroll
– Evan Carroll, Commented Nov 16, 2017 at 17:20
VACUUM FULL has been running for 3 days now and still hasn't completed. I've rerun the queries nevertheless and they still exhibit the same massive time difference. Strangely, they're now doing lots of heap lookups — thehouse
– thehouse, Commented Nov 20, 2017 at 11:19
VACUUM FULL finally finished. It made no difference to the run times. — thehouse
– thehouse, Commented Nov 20, 2017 at 18:36

Stack Exchange Network

PostgreSQL: why does an extra WHERE clause make an index-only scan slow?

2 Answers 2

Hot Network Questions

PostgreSQL: why does an extra WHERE clause make an index-only scan slow?

2 Answers 2

Related

Hot Network Questions