0

I have a code that should get unique string(for example, "d86c52ec8b7e8a2ea315109627888fe6228d") from client and return integer more than 2200000000 and less than 5800000000. It's important, that this generated int is not random, it should be one for one unique string. What is the best way to generate it without using DB?

Now it looks like this:

did = "d86c52ec8b7e8a2ea315109627888fe6228d" min_cid = 2200000000 max_cid = 5800000000 cid = did.hash.abs.to_s.split.last(10).to_s.to_i if cid < min_cid cid += min_cid else while cid > max_cid cid -= 1000000000 end end 
2
  • There is no way to guarantee uniqueness of a number in a key space that is smaller than your value space. The best you can hope for is that the chance of collisions is relatively small. Also, your algorithm has some flaws in the sense that you introduce additional collision possibilities by adding 220000000 to a hash that is lower than your million, which would collide with something that maps to that new value. Commented Feb 28, 2012 at 16:40
  • Ok, it's only important to generate maximum set of specific numbers for specific devices, that wont change within time. Collisions are not a big problem, but less collisions, the better. Commented Feb 28, 2012 at 16:51

2 Answers 2

3

Here's the problem - your range of numbers has only 3.6x10^9 possible values where as your sample unique string (which looks like a hex integer with 36 digits) has 16^32 possible values (i.e. many more). So when mapping your string into your integer range there will be collisions.

The mapping function itself can be pretty straightforward, I would do something such as below (also, consider using only a part of the input string for integer conversion, e.g. the first seven digits, if performance becomes critical):

def my_hash(str, min, max) range = (max - min).abs (str.to_i(16) % range) + min end my_hash(did, min_cid, max_cid) # => 2461595789 

[Edit] If you are using Ruby 1.8 and your adjusted range can be represented as a Fixnum, just use the hash value of the input string object instead of parsing it as a big integer. Note that this strategy might not be safe in Ruby 1.9 (per the comment by @DataWraith) as object hash values may be randomized between invocations of the interpreter so you would not get the same hash number for the same input string when you restart your application:

def hash_range(obj, min, max) (obj.hash % (max-min).abs) + [min, max].min end hash_range(did, min_cid, max_cid) # => 3886226395 

And, of course, you'll have to decide what to do about collisions. You'll likely have to persist a bucket of input strings which map to the same value and decide how to resolve the conflicts if you are looking up by the mapped value.

Sign up to request clarification or add additional context in comments.

4 Comments

@Alexey Krasnoperov - Ruby has a built in construct which deals with such collisions. maerics is gently directing you to it with the name of his method.
Thank you for your answer, I used the second variant, It works good.
Note that #hash will not give the same result for the same input after a restart of the application (at least in 1.9), as #hash is randomized to prevent a certain class of DoS attack.
@AlexeyKrasnoperov: yikes! Note the comment by DataWraith. I've only tested on ruby 1.8 which does not randomize, so make sure that your version of ruby uses the same hash code for the same string between invocations of the interpreter. I'll update my answer per this information.
0

You could generate a 32-bit CRC, drop one bit, and add the result to 2.2M. That gives you a max value of 4.3M.
Alternately you could use all 32 bits of the CRC, but when the result is too large, append a zero to the input string and recalculate, repeating until you get a value in range.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.