Why the second query is so much faster than the first one?

Question

I have a query that look like this:

SELECT DISTINCT "articles"."id", "articles"."company_uuid", "articles"."status", "articles"."discount_reference", "articles"."created_at" --- more columns FROM "articles" INNER JOIN "customers" ON "customers"."rec" = 'f' AND "customers"."article_id" = "articles"."id" WHERE "articles"."type" = 'sending_request' AND "articles"."hidden" = 'f' AND "articles"."discount_reference" = 'Discount/737251623' AND "articles"."api_domain" = 'company' AND "articles"."status" IN ('completed', 'active') AND (customers.search_text ILIKE unaccent('%verb%')) ORDER BY authored_on DESC NULLS LAST LIMIT 20 OFFSET 0;

The first query is slow, but when I reran it is always faster. I do not see a clear answer when I compare plans. I have a gin trigram index for the customers search text.

https://explain.dalibo.com/plan/b11657f576699fa8

And second run

https://explain.dalibo.com/plan/g81h74b9g521g5e7

Is the difference in the IO & Buffers the source of the difference?
I am running on PostgreSQL 14 on RDS.

Please no images. Remember to provide plain text output from EXPLAIN (ANALYZE, BUFFERS, SETTINGS) for your next question. Consider instructions for Postgres performance questions. — Erwin Brandstetter
– Erwin Brandstetter, Commented May 17, 2023 at 1:27
@erwin-brandstetter I updated the question with the full plan. — Mio
– Mio, Commented May 17, 2023 at 8:10

Erwin Brandstetter · Accepted Answer · 2023-05-16 22:12:59Z

TL/DR: Cold cache.

If only the first execution is slow it's typically cold cache. And yours is just such a case.

As you can see under IO & Buffers in your image (already made you wonder), the fast query execution has only hits, no reads. Meaning, later executions benefit from the now populated cache.

Typically, more RAM (and more available cache for Postgres accordingly), faster storage, less wasteful DB design, and/or optimized queries help.

If it's just a matter of the very first query being slow, pre-warming the cache might do it. See:

Aside, this query might do better, avoiding the unnecessary and expensive DISTINCT. Does not remove the general issue of cold cache, though.

SELECT a.id, a.status FROM articles a WHERE a.status IN ('completed', 'active') AND EXISTS ( SELECT FROM customers c WHERE c.article_id = a.id AND c.ehr = 'f' AND c.search_text ILIKE unaccent('%nam%') ) ORDER BY a.created_at DESC NULLS LAST LIMIT 20 OFFSET 0;

That's assuming created_at is really articles.created_at, and articles.id is the PK.

It's probably more than just cold cache; 431 reads totalling just 3.4 MB can hardly account for the approximately 5.5 second difference, unless there's also some "cold storage" effect, this being RDS... — mustaccio
– mustaccio, Commented May 17, 2023 at 0:01
@mustaccio: You may have missed the 97 MB of reads in the bitmap heap scan. That's substantial. But yes, the query itself seems problematic, too. Reading 100 MB for just 20 rows of output ... Anyway, I am not digging deeper without proper information for performance questions. — Erwin Brandstetter
– Erwin Brandstetter, Commented May 17, 2023 at 1:26
If DISTINCT is over the primary key of one table and no columns of other tables then in theory it's not expensive, it should just get transformed into the above EXISTS code automatically (because of functional dependencies). Why that didn't happen I'm not sure. Possibly the final sort for the ORDER BY is anyway necessary so it wouldn't have helped. — Charlieface
– Charlieface, Commented May 17, 2023 at 13:24

Stack Exchange Network

Why the second query is so much faster than the first one?

1 Answer 1

Linked

Hot Network Questions

Why the second query is so much faster than the first one?

1 Answer 1

Linked

Related

Hot Network Questions