0
$\begingroup$

I have two 64-bit numbers a and b which have a few properties: only 49 out of the 64 bits are used (15 bits are always 0), and the two number's binary representations are disjoint (a & b == 0). I am looking for a good way to hash the two numbers together into a deterministic 64-bit integer.

Tabular hashing with randomly generated keys works wonderfully, but I was wondering if there was a faster non-iterative alternative. The target is to minimize the number of collisions between two unique sets of a and b.

$\endgroup$
5
  • $\begingroup$ (a & b == 0a | b == a ^ b == a + b) $\endgroup$ Commented Mar 22, 2024 at 6:49
  • $\begingroup$ What does it mean to say that "49 bits are used"? Please edit your post to clarify the meaning of that. $\endgroup$ Commented Mar 22, 2024 at 7:36
  • $\begingroup$ @D.W. It means that there are 15 bits in the 64 bit integer which are always zero in both a and b. I have edited my question to reflect that. $\endgroup$ Commented Mar 22, 2024 at 9:02
  • $\begingroup$ I still don't understand. How can they possibly be disjoint, as 49+49 > 64? Please rephrase "used" and state it more explicitly and precisely. Are you saying the Hamming weight of a is 49, and the Hamming weight of b is 49? Are the locations of those 15 bits fixed and known in advance? $\endgroup$ Commented Mar 22, 2024 at 21:00
  • $\begingroup$ What I meant was that the maximum number of set bits in either a or b is 49, i.e. there are 15 bits which are never set. The actual number of set bits can be lower than 49 but never more. $\endgroup$ Commented Mar 28, 2024 at 12:24

2 Answers 2

1
$\begingroup$

I don't know of a good way to use your special properties. But I do know a very fast provably good hash.

Choose three random 64-bit integers $x, y, z$ (which must be independent of your input $a, b$). Then the following function $h$:

$$h(a, b) = xa + \lfloor ya / 2^{64}\rfloor + yb + \lfloor zb / 2^{64}\rfloor \mod 2^{64}$$

is an $2^{-63}$-ADU (almost delta universal) hash function, per Short-output universal hash functions and their use in fast and secure data authentication by Long Hoang Nguyen and Andrew William Roscoe. That is, if $(a, b) \neq (a', b')$ then for any $\delta$ (including $\delta = 0$ which implies collision resistance) $$\Pr[h(a, b) + \delta \equiv h(a', b') \mod 2^{64}] \leq 2^{-63}.$$

Note that computing $\lfloor pq / 2^{64}\rfloor$ only takes a single instruction on modern CPUs - it is the high bits of the 64x64-bit product. On Intel x86-64 this is always returned by the MUL instruction which returns both parts, on ARM64 it is given by the MULHI instruction.

So in total computing this hash takes 4 multiplication and 3 addition instructions.

$\endgroup$
0
$\begingroup$

Normally comparing objects involves comparing subobjects that can be compared and hashed, you write a good hash function once that combines hash values and use that everywhere and never worry about the problem anymore.

In your case, you might calculate hash(a or b) or hash (a ^ b) or hash (a + b). However, six months from now your assumptions may become false, so if you suddenly have many pairs where a = b or b = a+1 then hash(a ^ b) will be an awfully bad hash value.

$\endgroup$
2
  • $\begingroup$ These properties are not subject to change. Also a | b is a bad operation for this since multiple sets of a and b can have the same a | b. $\endgroup$ Commented Mar 22, 2024 at 7:32
  • $\begingroup$ @RakLaptudirm multiple sets of a and b can have the same a | b collisions are inevitable mapping a bigger set to a smaller one. Do some hurt more than others? $\endgroup$ Commented Mar 22, 2024 at 18:36

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.