The following query finishes in 1.5s (which is ok, the table contains about 500M rows):
explain (analyze, buffers) select sales.* from sales join product on (product.id = sales.productid) join date on (date.id = sales.dateid) where product.id = 24 order by date.timestamp limit 200; Query plan: http://explain.depesz.com/s/8Ix
Searching for product.name instead increases the runtime to totally unacceptable 200s:
explain (analyze, buffers) select sales.* from sales join product on (product.id = sales.productid) join date on (date.id = sales.dateid) where product.name = 'new00000006' order by date.timestamp limit 200; Query plan: http://explain.depesz.com/s/0RfQ
Note that the product named 'new00000006' has id 24 (same id as in the fast query above). Proof:
select name from product where id = 24; name ------------- new00000006 Why does that query take 200 times longer than the first query?
Another interesting modification of this query is this.. instead of product.id = 24 (like in the first query), I use product.id = (select 24). This also takes 200s to run (it actually results in the same bad query plan as when searching for product.name):
explain (analyze, buffers) select sales.* from sales join product on (product.id = sales.productid) join date on (date.id = sales.dateid) where product.id = (select 24) order by date.timestamp limit 200; Query plan: http://explain.depesz.com/s/K3VO
The statistics table shows that product id 24 is "rare":
select most_common_vals from pg_stats where tablename='sales' and attname='productid'; {19,2,7,39,40,14,33,18,8,37,16,48,6,23,49,29,46,41,20,53,47,26,38,1,32,42,56,57,10,15,27,50,30,45,51,58,17,36,4,25,44,43,5,22,11,35,52,9,21,12,24,31,28,54,34,3,55,13} select most_common_freqs from pg_stats where tablename='sales' and attname='productid'; {0.020225,0.020119,0.0201133,0.0201087,0.0201,0.0200903,0.0200843,0.020069,0.0200557,0.0200477,0.0200427,0.0200303,0.0200197,0.020019,0.020012,0.0200107,0.0200067,0.020006,0.019995,0.0199947,0.0199917,0.019986,0.019986,0.0199777,0.0199747,0.0199713,0.0199693,0.019969,0.019967,0.019962,0.0199607,0.0199603,0.01996,0.0199567,0.0199567,0.0199533,0.019952,0.019951,0.0199467,0.019944,0.019944,0.01993,0.0199297,0.0199257,0.0199223,0.0199143,0.01989,0.0198887,0.019883,0.0198747,6.7e-005,6e-005,5.9e-005,5.6e-005,5.46667e-005,5.43333e-005,5.13333e-005,4.96667e-005} Product id 24 has a frequency of 6.7e-005 (it's a "new product"), while older products have frequencies of about 0.01.
Statistics show that the first query plan (the one that runs in 1.5s) makes perfect sense. It uses the sales_productid_index to quickly find the sales of this product. Why isn't the same query plan used in the other two cases? It seems like statistics are ignored.
Table definitions (slightly obfuscated / renamed):
Tabelle äpublic.salesô Spalte | Typ | Attribute | Speicherung | Statistikziel | Beschreibung -----------+---------+-----------+-------------+---------------+-------------- id | uuid | not null | plain | | dateid | integer | | plain | 10000 | productid | integer | | plain | 10000 | a | text | | extended | 10000 | b | integer | | plain | 10000 | x1 | boolean | | plain | | x2 | boolean | | plain | | x3 | boolean | | plain | | x4 | boolean | | plain | | x5 | boolean | | plain | | Indexe: "sales_pkey" PRIMARY KEY, btree (id) "sales_a_index" btree (a) "sales_b_index" btree (b) "sales_dateid_index" btree (dateid) "sales_productid_index" btree (productid) Fremdschlnssel-Constraints: "sales_dateid_fkey" FOREIGN KEY (dateid) REFERENCES date(id) "sales_productid_fkey" FOREIGN KEY (productid) REFERENCES product(id) Hat OIDs: nein Tabelle äpublic.productô Spalte | Typ | Attribute | Speicherung | Statistikziel | Beschreibung --------+---------+----------------------------------------------------------+-------------+---------------+-------------- id | integer | not null Vorgabewert nextval('product_id_seq'::regclass) | plain | | name | text | | extended | | Indexe: "product_pkey" PRIMARY KEY, btree (id) "product_name_index" UNIQUE, btree (name) Fremdschlnsselverweise von: TABLE "sales" CONSTRAINT "sales_productid_fkey" FOREIGN KEY (productid) REFERENCES product(id) TABLE "salesaggr" CONSTRAINT "salesaggr_productid_fkey" FOREIGN KEY (productid) REFERENCES product(id) Hat OIDs: nein Version: PostgreSQL 9.3.1, compiled by Visual C++ build 1600, 64-bit
Config: default configuration except for maintenance_work_mem, which has been increased to 1GB.
Operating system: Microsoft Windows [Version 6.2.9200]
Amount and size of RAM installed: 32GB
Storage: a single 1TB SSD