44

What are good hashing functions (fast, good distribution, few collisions) for hashing 2d and 3d vectors composed of IEEE 32bit floats. I assume general 3d vectors, but algorithms assuming normals (always in [-1,1]) are also welcome. I also do not fear bit-manipulation as IEEE floats are alsways IEEE floats.

Another more general problem is hashing an Nd float-vector, where N is quite small (3-12) and constant but not known at compile time. At the moment I just take these floats as uints and XOR them together, which is probably not the best solution.

6
  • 2
    ...have you tested how well your hashes are being distributed using the plain XOR method? You might be surprised. Commented May 8, 2011 at 16:33
  • @Matti it seems the distribution at least for 3d vectors is not very bad (tested on Stanford bunny 35k verts against hash table of size 65537). I just thought somebody perhaps has a more specialized solution, as I searched the net some time ago and haven't found anything on the topic. Commented May 8, 2011 at 17:31
  • 65537 sounds like one greater than the number you might want to be using (or is a typo) Commented Sep 13, 2013 at 3:12
  • 1
    Related: Good way to hash a float vector? Commented Mar 27, 2014 at 10:13
  • @StevenLu: absolutely not. ++ a power of two is a good safe way to almost always get a prime number. Which is necessary to avoid modulo correlations, and as such, makes awesome hash table sizing. Commented Nov 20, 2014 at 2:35

3 Answers 3

50

There's a spatial hash function described in Optimized Spatial Hashing for Collision Detection of Deformable Objects. They use the hash function

hash(x,y,z) = ( x p1 xor y p2 xor z p3) mod n

where p1, p2, p3 are large prime numbers, in our case 73856093, 19349663, 83492791, respectively. The value n is the hash table size.

In the paper, x, y, and z are the discretized coordinates; you could probably also use the binary values of your floats.

Sign up to request clarification or add additional context in comments.

7 Comments

Note that 19349663 isn't prime (it's the product of 41 and 471943)
I found that using the prime numbers p1 and p3 for the two-dimensional case results in very good distributions.
When they wrote x p1 xor y p2 xor z p3, did they mean (x*p1) xor (y*p2) xor (z*p3) or x * (p1 xor y) * (p2 xor z) * p3?
@tuple_cat I believe it's (x*p1) xor (y*p2) xor (z*p3)
Very interesting! Is there any implementation around? I am trying to implement this with scipy/numpy. Thanks.
|
11

I have two suggestions.

If you don't do the quantization, it wont be sensitive to closeness(locality).

  • Locality Sensitive Hashing has been mentioned for hashing higher dimensional vectors. Why not use them for 3d or 2d vectors as well? A variant of LSH using adapted for Eucledian distance metric (which is what we need for 2d and 3d vectors) is called Locality Sensitive Hashing using p-stable distributions. A very readable tutorial is here.

Comments

0

I wrote this in Python based on the comments seen here,

l = 5 n = 5 p1,p2,p3 = 73856093, 19349663, 83492791 x1 = [33,4,11] x2 = [31,1,14] x3 = [10,44,19] def spatial_hash(x): ix,iy,iz = np.floor(x[0]/l), np.floor(x[1]/l), np.floor(x[2]/l) return (int(ix*p1) ^ int(iy*p2) ^ int(iz*p3)) % n print (spatial_hash(x1)) print (spatial_hash(x2)) print (spatial_hash(x3)) 

It gives

1 1 3 

It seemed to work.

In C++

#include <cstdlib> #include <iostream> #include <unordered_map> #include <vector> #include <random> #include <eigen3/Eigen/Dense> using namespace Eigen; using namespace std; const int HASH_SIZE = 200; //const float MAX = 500.0; const float L = 0.2f; const float mmin = -1.f; const float mmax = 1.f; unordered_map<int, vector<Vector3d>> map ; inline size_t hasha(Vector3d &p) { int ix = (unsigned int)((p[0]+2.f) / L); int iy = (unsigned int)((p[1]+2.f) / L); int iz = (unsigned int)((p[2]+2.f) / L); return (unsigned int)((ix * 73856093) ^ (iy * 19349663) ^ (iz * 83492791)) % HASH_SIZE; } int main(int argc, char** argv) { std::default_random_engine generator; std::uniform_real_distribution<double> distribution(-1.0,1.0); for(size_t i=0;i<300;i++){ float x = distribution(generator); float y = distribution(generator); float z = distribution(generator); Vector3d v(x,y,z); std::cout << hasha(v) << " " << v[0] << " " << v[1] << " " << v[2] << std::endl; map[hasha(v)].push_back(v); vector<Vector3d> entry = map[hasha(v)]; std::cout << "size " << entry.size() << std::endl; } for (const auto & [ key, value ] : map) { cout << key << std::endl; vector<Vector3d> v = map[key]; float average = 0.0f; for (int i=0; i<v.size(); i++){ for (int j=0; j<v.size(); j++){ if (i!=j){ Vector3d v1 = v[i]; Vector3d v2 = v[j]; std::cout << " dist " << (v1-v2).norm() << std::endl; } } } } } 

2 Comments

Hmm. These results seem wrong jsfiddle.net/uwg0ehs6
Or jsfiddle.net/spowz758/2 , very not correct

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.