Simple PostgreSQL lookup table is inexplicably slow

Question

I'm trying PostgreSQL 8.4.14 for storing triples, pieces of data of the form (String, String, String).
For speed, I'm not repeatedly storing strings but rather using two tables:

main table triples (subject BigInt, predicate BigInt, object BigInt)
lookup table entities (entityId BigInt, name Varying(40000))

I've added indexes and foreign keys on triples:

"nsubject_idx" hash (subject)
"npredicate_idx" btree (predicate)
"nobject_idx" hash (object)
"triples_subject_fkey" FOREIGN KEY (subject) REFERENCES entities(entityid)
"triples_predicate_fkey" FOREIGN KEY (predicate) REFERENCES entities(entityid)
"triples_object_fkey" FOREIGN KEY (object) REFERENCES entities(entityid)

and also indexes on entities:

"entities_pkey" PRIMARY KEY, btree (entityid) CLUSTER
"name_idx" hash (name)

Now it would be reasonable to assume that lookups are fast. They aren't.
With 151M elements in triples and 44M in entities, the following query is immensely slow:

SELECT entityId FROM entities WHERE entityId in (SELECT object FROM triples LIMIT 10000);

It's only 10.000 lookups, so I'd expect this to complete at high speed. Query plan:

 Nested Loop (cost=288.69..3856.26 rows=43806140 width=8) (actual time=25.226..40110.699 rows=6959 loops=1) -> HashAggregate (cost=288.69..290.69 rows=200 width=8) (actual time=19.445..24.087 rows=6959 loops=1) -> Limit (cost=0.00..163.69 rows=10000 width=8) (actual time=0.013..15.792 rows=10000 loops=1) -> Seq Scan on triples (cost=0.00..2474009.68 rows=151135968 width=8) (actual time=0.012..14.101 rows=10000 loops=1) -> Index Scan using entities_pkey on entities (cost=0.00..17.82 rows=1 width=8) (actual time=5.756..5.759 rows=1 loops=6959) Index Cond: (entities.entityid = triples.object) Total runtime: 40112.383 ms

What would be happening here?

Note that this is even a trick query: because of the foreign key constraint, it is actually equivalent to SELECT object FROM triples LIMIT 10000. For my use case, I'd need the actual lookup.

What do you see if you create a btree index on triples.object? Also, which PostgreSQL version do you use? — András Váczi
– András Váczi, Commented Oct 17, 2013 at 13:25
I added the btree index, removed the hash index, analyzed again but no noticeable difference. — Ruben Verborgh
– Ruben Verborgh, Commented Oct 17, 2013 at 19:47
Index lookups are reported to take 5.75ms, which indeed is slow. Try explain (analyze,buffers) select... to get information on how much is actually read from disk versus buffers. I would run vmstat 1 during the query to also see at which rate data is pulled from disk during these 40s and how much I/O-wait occurs. — Daniel Vérité
– Daniel Vérité, Commented Oct 18, 2013 at 15:27
@DanielVérité It's PostgreSQL 8, so unfortunately I can't do buffers. Here is the output of vmstat 1: pastebin.com/dLmuJmgY — Ruben Verborgh
– Ruben Verborgh, Commented Oct 18, 2013 at 16:47

bma · Accepted Answer · 2013-10-17 15:09:26Z

Rather than "VARCHAR(40000)" why not use "TEXT"?
HASH index use is discouraged (see docs at http://www.postgresql.org/docs/current/static/indexes-types.html).
Have you run "ANALYZE" on your tables before running the query?
Giant IN lists can be performance killers.

How do the following queries perform?

SELECT e.entityId FROM entities e INNER JOIN triples t ON (t.object = e.entityId) LIMIT 10000;

or

SELECT e.entityId FROM entities e WHERE EXISTS (SELECT 1 FROM triples t WHERE t.object = e.entityId LIMIT 10000);

1. I was told TEXT and VARCHAR are the same (stackoverflow.com/a/4849030/476820) 2. I understand, but are those reasons related to performance? Will try BTREE now. 3. I ran ANALYZE on both tables. 4. Those queries are equally slow unfortunately (actually, the last one performs worse). — Ruben Verborgh
– Ruben Verborgh, Commented Oct 17, 2013 at 15:42
TEXT and unconstrained VARCHAR are virtually the same, that is true. You have an upper bound on the VARCHAR length though, but I doubt that has any appreciable impact on your query. — bma
– bma, Commented Oct 17, 2013 at 16:05
No difference with BTREE for my query. But the first query you suggested goes way faster now! — Ruben Verborgh
– Ruben Verborgh, Commented Oct 17, 2013 at 19:47

Stack Exchange Network

Simple PostgreSQL lookup table is inexplicably slow

1 Answer 1

Hot Network Questions

Simple PostgreSQL lookup table is inexplicably slow

1 Answer 1

Related

Hot Network Questions