PostgreSQL OK to use HASH exclude constraint for uniqueness?

Question

Since hashes are smaller than lengthy text, it seems to me that they could be preferred to b-trees for ensuring column uniqueness.

For the sole purpose of ensuring uniqueness, is there any reason the following isn't OK in PG 10?

CREATE TABLE test ( path ltree, EXCLUDE USING HASH ((path::text) WITH =) );

I assume hash collisions are internally dealt with. Would be useless otherwise.

I will use a GiST index for query enhancement.

Erwin Brandstetter · Accepted Answer · 2017-12-26 10:50:49Z

I think quoting the manual on this sums it up:

Although it's allowed, there is little point in using B-tree or hash indexes with an exclusion constraint, because this does nothing that an ordinary unique constraint doesn't do better. So in practice the access method will always be GiST or SP-GiST.

All the more since you want to create a GiST index anyway. The exclusion constraint with USING GIST will create a matching GiST index as implementation detail automatically. No point in maintaining another, inefficient hash index not even being used in queries.

For simple uniqueness (WITH =) a plain UNIQUE btree index is more efficient. If your keys are very long, consider a unique index on a hash expression (using any immutable function) to reduce size. Like:

CREATE UNIQUE INDEX test_path_hash_uni_idx ON test (my_hash_func(path));

How does PostgreSQL enforce the UNIQUE constraint / what type of index does it use?

md5() would be a simple option as hash function.

Before Postgres 10 the use of hash indexes was discouraged. But with the latest updates, this has improved. Robert Haas (core developer) sums it up in a blog entry:

PostgreSQL's Hash Indexes Are Now Cool

CREATE INDEX test_path_hash_idx ON test USING HASH (path);

Alas (missed that in my draft), the access method "hash" does not support unique indexes, yet. So I would still go with the unique index on a hash expression above. (But no hash function that reduces information can fully guarantee uniqueness of the key - which may or may not be an issue.)

One note: hash indexes do not support UNIQUE, so I'll need to use a C hash function and non-unique b-tree index with a BEFORE INSERT TRIGGER that detects hash AND field equality.
@IamIC: md5() is just a simple (good) example. It's not perfect, the cryptographic component adds some cost that may not be needed ... Your hash function may very well be superior. The cost is not too important, since it's not re-executed in select queries, the index is based on the result.
With md5() (or other functions producing compatible results), consider casting to uuid like demonstrated in the linked answer.
The EXCLUDE USING HASH solution seems to have some advantages: 1. UNIQUE btree index on a fast and small hash function will cause false errors on collisions. 2. Practically collision-free hash functions might (?) cause noticeable slowdown 3. EXCLUDE creates a simple hash index, which can also be used directly by queries. In contrast, the btree index on the hash function will only be used by queries which explicitly do something like where md5(ltree1) = md5(ltree2) and ltree1 = ltree2. Edit: My comment was about the case where the values are too large for a plain UNIQUE btree.
@ErwinBrandstetter: Assuming hash indexes use the hash$type family of functions (not sure), it does seem do deal with collisions internally. I would also expect this behavior because we were semantically asking Postgres for exclusion on the = operator, not on the hashes.

Collectives™ on Stack Overflow

PostgreSQL OK to use HASH exclude constraint for uniqueness?

1 Answer 1

13 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

13 Comments

Linked

Related