5

When reading the chess programming wiki article, it mentions the following:

The main purpose of Zobrist hash codes in chess programming is to get an almost unique index number for any chess position, with a very important requirement that two similar positions generate entirely different indices. These index numbers are used for faster and more space efficient Hash tables or databases, e.g. transposition tables and opening books.

It is not immediately obvious to me why it would be beneficial for similar positions to generate entirely different hash values. The article implies that this would make the hash table more efficient somehow? But I don't see why that would be the case.

2
  • 5
    Hash tables are most efficient when they avoid collisions - i.e. multiple given inputs map to the same output. You then have an extra level of searching to resolve the collision. I am not a chess programmer but my guess is that as similar positions are likely to occur in the analysis of a given position for efficiency you really want to avoid collisions. But this is a guess, I'll leave it to somebody who has actually programmed this to answer. Commented Dec 12, 2024 at 7:49
  • 3
    It's a bit unclear what "entirely different" actually means. It's a property of many hashing algorithms that very similar inputs yield radically different outputs. But it's not clear if that's what's meant here, or "entirely different" just means "definitely not the same". Commented Dec 13, 2024 at 4:11

2 Answers 2

2

If a small change in the input consistently makes a small change in the hash, then it becomes somewhat probable that two small changes in the input will combine to produce no change to the hash. In other words, hash collisions with such a function are likely to come from inputs that have a small Hamming distance. This seems like an undesirable property if you're using them to accelerate tree search, where you're making incremental changes from a starting position.

On the other hand, if even a single-bit change in the input changes on average 50% of the hash bits (the standard for cryptographic hashes), then nearby inputs will be no more likely to generate a collision than any pair of inputs, and so the "distance between collisions" will be large on average, decreasing the number of collisions you're likely to encounter in an orderly scenario like evaluating a chess game.

1

The emphasis on why it needs to be "entirely different indices" may be because of how engines calculate table indices based on the Zobrist Key - many of them use a subset of bits of the actual Zobrist Key to index a Transposition Table entry very quickly.

If similar positions (e.g. one-ply different) produce similar results, transposition tables may replace them very frequently during the search because it would collide with a similar position searched recently - thereby negating any performance improvements the Transposition Table was able to provide.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.