I have a query that is being made 18x slower by the presence of a single WHERE clause. The query plan is an index-only scan and the clause in question looks at one of the fields in the index.
This is the query and the offending clause is update_id < 40581:
EXPLAIN (VERBOSE, ANALYZE, COSTS, BUFFERS) WITH a AS ( SELECT e_match.account_id, for_time, for_eid, originating_update_id, attr1, attr2, attr3 FROM combinations combo INNER JOIN (SELECT account_id, time as for_time, eid AS for_eid, originating_update_id FROM entries WHERE account_id = '1' AND time <= '2017-11-30 15:57:35.267405+00' AND originating_update_id < 40581) e_match ON combo.account_id = e_match.account_id) SELECT a.attr1, a.attr2, a.attr3, b.amount, for_time, for_eid, originating_update_id, b.eid = a.for_eid AS is_stored FROM a CROSS JOIN LATERAL (SELECT time, update_id, amount, eid FROM updates WHERE account_id = a.account_id AND attr1 = a.attr1 AND attr2 = a.attr2 AND attr3 = a.attr3 AND update_id < 40581 AND -- THIS CLAUSE MAKES IT SLOW time <= a.for_time AND (time < a.for_time OR (time = a.for_time AND originating_update_id <= a.originating_update_id)) ORDER BY time DESC, originating_update_id DESC, update_id DESC LIMIT 1) b ORDER BY for_time, originating_update_id, for_eid, attr1, attr2, attr3; It takes 14241.977 ms to run, but if I remove the update_id < 40581 it runs in 797.601 ms.
The query is relatively complex. It joins a table of entries with a table of all updates for entries to use the latest entry for each of the known combinations of attributes. This is the loose-index-scan emulation trick. According to the query plan most of the time is spent in the SELECT on the updates table and it is that time that varies between the two versions.
The tables are structured as follows:
Table "public.updates" Column | Type | Collation | Nullable | Default -----------------------+-----------------------------+-----------+----------+--------- account_id | text | | not null | attr1 | text | | not null | eid | bigint | | not null | as_of_eid | bigint | | not null | update_id | bigint | | not null | originating_update_id | bigint | | not null | time | timestamp without time zone | | not null | attr2 | integer | | not null | attr3 | text | | not null | amount | numeric | | not null | Indexes: "updates_pkey" PRIMARY KEY, btree (eid, as_of_eid, account_id, attr1) "updates_update_id_eid_account_id_attr1_key" UNIQUE CONSTRAINT, btree (update_id, eid, account_id, attr1) "updates_covering_idx2" btree (account_id, attr1, attr2, attr3, time DESC, update_id DESC, originating_update_id DESC, amount, eid) "updates_covering_idx3" btree (account_id, attr1, attr2, attr3, time DESC, originating_update_id DESC, update_id DESC, amount, eid) Table "public.entries" Column | Type | Collation | Nullable | Default -----------------------+-----------------------------+-----------+----------+--------- account_id | text | | not null | eid | bigint | | not null | time | timestamp without time zone | | | originating_update_id | bigint | | | Indexes: "entries2_pkey" PRIMARY KEY, btree (account_id, eid) "entries_by_time" btree (account_id, time, eid, originating_update_id) Table "public.combinations" Column | Type | Collation | Nullable | Default --------------+---------+-----------+----------+--------- account_id | text | | not null | attr1 | text | | not null | attr2 | integer | | not null | attr3 | text | | not null | Indexes: "combinations_pkey" PRIMARY KEY, btree (account_id, attr1, attr2, attr3) This is the query plan from the slow run with the offending clause:
Sort (cost=65859.13..65983.86 rows=49889 width=107) (actual time=14237.002..14239.136 rows=12967 loops=1) Output: a.attr1, a.attr2, a.attr3, updates.amount, a.for_time, a.for_eid, a.originating_update_id, ((updates.eid = a.for_eid)) Sort Key: a.for_time, a.originating_update_id, a.for_eid, a.attr1, a.attr2, a.attr3 Sort Method: quicksort Memory: 2208kB Buffers: shared hit=849601 CTE a -> Nested Loop (cost=0.83..1279.76 rows=49889 width=71) (actual time=0.035..16.200 rows=18102 loops=1) Output: entries.account_id, entries.time, entries.eid, entries.originating_update_id, combo.attr1, combo.attr2, combo.attr3 Buffers: shared hit=69 -> Index Only Scan using entries_by_time on public.entries (cost=0.42..651.60 rows=7127 width=57) (actual time=0.021..1.895 rows=2586 loops=1) Output: entries.account_id, entries.time, entries.eid, entries.originating_update_id Index Cond: ((entries.account_id = '1'::text) AND (entries.time <= '2017-11-30 15:57:35.267405'::timestamp without time zone) AND (entries.originating_update_id < 40581)) Heap Fetches: 0 Buffers: shared hit=65 -> Materialize (cost=0.41..4.57 rows=7 width=50) (actual time=0.000..0.002 rows=7 loops=2586) Output: combo.attr1, combo.attr2, combo.attr3, combo.account_id Buffers: shared hit=4 -> Index Only Scan using combinations_pkey on public.combinations combo (cost=0.41..4.53 rows=7 width=50) (actual time=0.010..0.012 rows=7 loops=1) Output: combo.attr1, combo.attr2, combo.attr3, combo.account_id Index Cond: (combo.account_id = '1'::text) Heap Fetches: 0 Buffers: shared hit=4 -> Nested Loop (cost=0.69..57784.93 rows=49889 width=107) (actual time=0.117..14217.115 rows=12967 loops=1) Output: a.attr1, a.attr2, a.attr3, updates.amount, a.for_time, a.for_eid, a.originating_update_id, (updates.eid = a.for_eid) Buffers: shared hit=849601 -> CTE Scan on a (cost=0.00..997.78 rows=49889 width=124) (actual time=0.037..30.353 rows=18102 loops=1) Output: a.account_id, a.for_time, a.for_eid, a.originating_update_id, a.attr1, a.attr2, a.attr3 Buffers: shared hit=69 -> Limit (cost=0.69..1.12 rows=1 width=39) (actual time=0.783..0.783 rows=1 loops=18102) Output: updates.time, updates.update_id, updates.amount, updates.eid, updates.originating_update_id Buffers: shared hit=849532 -> Index Only Scan using updates_covering_idx3 on public.updates (cost=0.69..777.72 rows=1825 width=39) (actual time=0.782..0.782 rows=1 loops=18102) Output: updates.time, updates.update_id, updates.amount, updates.eid, updates.originating_update_id Index Cond: ((updates.account_id = a.account_id) AND (updates.attr1 = a.attr1) AND (updates.attr2 = a.attr2) AND (updates.attr3 = a.attr3) AND (updates.time <= a.for_time) AND (updates.update_id < 40581)) Filter: ((updates.time < a.for_time) OR ((updates.time = a.for_time) AND (updates.originating_update_id <= a.originating_update_id))) Rows Removed by Filter: 6 Heap Fetches: 0 Buffers: shared hit=849532 Planning time: 0.446 ms Execution time: 14241.977 ms (40 rows) and the fast run without the clause:
Sort (cost=58914.11..59038.83 rows=49889 width=107) (actual time=792.649..794.853 rows=12970 loops=1) Output: a.attr1, a.attr2, a.attr3, updates.amount, a.for_time, a.for_eid, a.originating_update_id, ((updates.eid = a.for_eid)) Sort Key: a.for_time, a.originating_update_id, a.for_eid, a.attr1, a.attr2, a.attr3 Sort Method: quicksort Memory: 2208kB Buffers: shared hit=175954 CTE a -> Nested Loop (cost=0.83..1279.76 rows=49889 width=71) (actual time=0.036..15.278 rows=18102 loops=1) Output: entries.account_id, entries.time, entries.eid, entries.originating_update_id, combo.attr1, combo.attr2, combo.attr3 Buffers: shared hit=69 -> Index Only Scan using entries_by_time on public.entries (cost=0.42..651.60 rows=7127 width=57) (actual time=0.022..1.675 rows=2586 loops=1) Output: entries.account_id, entries.time, entries.eid, entries.originating_update_id Index Cond: ((entries.account_id = '1'::text) AND (entries.time <= '2017-11-30 15:57:35.267405'::timestamp without time zone) AND (entries.originating_update_id < 40581)) Heap Fetches: 0 Buffers: shared hit=65 -> Materialize (cost=0.41..4.57 rows=7 width=50) (actual time=0.000..0.002 rows=7 loops=2586) Output: combo.attr1, combo.attr2, combo.attr3, combo.account_id Buffers: shared hit=4 -> Index Only Scan using combinations_pkey on public.combinations combo (cost=0.41..4.53 rows=7 width=50) (actual time=0.011..0.014 rows=7 loops=1) Output: combo.attr1, combo.attr2, combo.attr3, combo.account_id Index Cond: (combo.account_id = '1'::text) Heap Fetches: 0 Buffers: shared hit=4 -> Nested Loop (cost=0.69..50839.90 rows=49889 width=107) (actual time=0.115..774.342 rows=12970 loops=1) Output: a.attr1, a.attr2, a.attr3, updates.amount, a.for_time, a.for_eid, a.originating_update_id, (updates.eid = a.for_eid) Buffers: shared hit=175954 -> CTE Scan on a (cost=0.00..997.78 rows=49889 width=124) (actual time=0.039..28.176 rows=18102 loops=1) Output: a.account_id, a.for_time, a.for_eid, a.originating_update_id, a.attr1, a.attr2, a.attr3 Buffers: shared hit=69 -> Limit (cost=0.69..0.98 rows=1 width=39) (actual time=0.040..0.040 rows=1 loops=18102) Output: updates.time, updates.update_id, updates.amount, updates.eid, updates.originating_update_id Buffers: shared hit=175885 -> Index Only Scan using updates_covering_idx3 on public.updates (cost=0.69..808.79 rows=2820 width=39) (actual time=0.039..0.039 rows=1 loops=18102) Output: updates.time, updates.update_id, updates.amount, updates.eid, updates.originating_update_id Index Cond: ((updates.account_id = a.account_id) AND (updates.attr1 = a.attr1) AND (updates.attr2 = a.attr2) AND (updates.attr3 = a.attr3) AND (updates.time <= a.for_time)) Filter: ((updates.time < a.for_time) OR ((updates.time = a.for_time) AND (updates.originating_update_id <= a.originating_update_id))) Rows Removed by Filter: 47 Heap Fetches: 0 Buffers: shared hit=175885 Planning time: 0.367 ms Execution time: 797.601 ms (40 rows) There are 65788426 rows in the updates table and has been recently vacuumed and analyzed. The database settings are Amazon RDS default. I've renamed the tables and fields in the output above. Everything else is unabridged.
Why would that particular clause cause such a slowdown? The EXPLAIN ANALYZE output is almost identical between the two. The main difference is the shared hit count.
Strangely, if I change to clause to a clause on some other field, e.g. originating_update_id < 100000000 the query doesn't slow down.
UPDATE
I've simplified the query as @jjanes suggested:
EXPLAIN (VERBOSE, ANALYZE, COSTS, BUFFERS) SELECT time, update_id, amount, eid FROM updates WHERE account_id = '1' AND attr1 = '' AND attr2 = 3 AND attr3 = 'ABC' AND update_id < 40581 AND -- THIS CLAUSE MAKES IT SLOW time <= '2017-12-30 15:57:35.267405+00' AND (time < '2017-12-30 15:57:35.267405+00' OR (time = '2017-12-30 15:57:35.267405+00' AND originating_update_id <= 10000000)) ORDER BY time DESC, originating_update_id DESC, update_id DESC LIMIT 1; With the offending clause it takes 35.5ms, reproducibly. Without that clause it takes 0.073ms.
Query plan with clause:
Limit (cost=0.69..0.82 rows=1 width=39) (actual time=35.467..35.468 rows=1 loops=1) Output: time, update_id, amount, eid, originating_update_id Buffers: shared hit=1856 -> Index Only Scan using updates_covering_idx3 on public.updates (cost=0.69..731.25 rows=5423 width=39) (actual time=35.465..35.465 rows=1 loops=1) Output: time, update_id, amount, eid, originating_update_id Index Cond: ((updates.account_id = '1'::text) AND (updates.attr1 = ''::text) AND (updates.attr2 = 3) AND (updates.attr3 = 'ABC'::text) AND (updates.time <= '2017-12-30 15:57:35.267405'::timestamp without time zone) AND (updates.update_id < 40581)) Filter: ((updates.time < '2017-12-30 15:57:35.267405'::timestamp without time zone) OR ((updates.time = '2017-12-30 15:57:35.267405'::timestamp without time zone) AND (updates.originating_update_id <= 10000000))) Heap Fetches: 0 Buffers: shared hit=1856 Planning time: 0.202 ms Execution time: 35.493 ms (11 rows) and without
Limit (cost=0.69..0.78 rows=1 width=39) (actual time=0.045..0.045 rows=1 loops=1) Output: time, update_id, amount, eid, originating_update_id Buffers: shared hit=6 -> Index Only Scan using updates_covering_idx3 on public.updates (cost=0.69..762.10 rows=8383 width=39) (actual time=0.044..0.044 rows=1 loops=1) Output: time, update_id, amount, eid, originating_update_id Index Cond: ((updates.account_id = '1'::text) AND (updates.attr1 = ''::text) AND (updates.attr2 = 3) AND (updates.attr3 = 'ABC'::text) AND (updates.time <= '2017-12-30 15:57:35.267405'::timestamp without time zone)) Filter: ((updates.time < '2017-12-30 15:57:35.267405'::timestamp without time zone) OR ((updates.time = '2017-12-30 15:57:35.267405'::timestamp without time zone) AND (updates.originating_update_id <= 10000000))) Heap Fetches: 0 Buffers: shared hit=6 Planning time: 0.201 ms Execution time: 0.073 ms (11 rows)