• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Ron McLeod
  • Devaka Cooray
  • Paul Clapham
Sheriffs:
  • paul wheaton
Saloon Keepers:
  • Tim Holloway
Bartenders:

Hashcodes - need a math whiz

 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just a silly thing for a Tuesday night: I've always thought that using 0 as a hashcode for something like null seemed a bit "lazy".

15 seems like it might be better.

My rationale is this:
1. It's unlikely to be the result of a bug in the method.
2. It's smaller than the prime seed used for most hash calculations.
3. 2^32 + 15 (4294967311) is prime, and so can't (I think) be the result of a multiplication that results in an overflow.

Basically, I'm wondering if my 3rd assumption is correct. I get tied in knots when 2's complement signs get involved. Any math/binary whizzes out there?

Winston
 
Bartender
Posts: 1952
7
Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I couldn't begin to answer this satisfactory, but all I "know" is that most hash calculations are based on prime value 31. Which is the value returned if all attributes included in the calculation are null. The value 31 was supposedly chosen, because it's prime and because way smarter people than I found that the value 31 yields a good distribution of values with few collisions. Supposedly. That's why I just stick to 31 as the base case in my hashCode() implementations, which may in turn end up as a factor in other hashCode() caluculations. If it's good enough for the smart people, it's good enough for me
 
Marshal
Posts: 28492
113
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Winston Gutkowski wrote:
3. 2^32 + 15 (4294967311) is prime, and so can't (I think) be the result of a multiplication that results in an overflow.

Basically, I'm wondering if my 3rd assumption is correct. I get tied in knots when 2's complement signs get involved. Any math/binary whizzes out there?



I'm a math whiz, and no, that isn't correct. Multiplication of signed 32-bit integers can be anything up to 2^62, just about. So if you're concerned about primes (and it's not clear to me why you are) then you'd have to be looking at a very large set of numbers of the form 2^32 * N + 15 -- and I can guarantee you that not all 2^30-ish of them are prime. (If N = 3 then it isn't prime, for example.)
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:I'm a math whiz, and no, that isn't correct. Multiplication of signed 32-bit integers can be anything up to 2^62, just about.


Hol' on hoss. They then need to be truncated in order to fit back into a 32-bit integer.

My thinking was along these lines: If we find a number that is larger than 2^32 and prime (And the smallest number greater than 2^32 that is prime is 2^32 + 15), and work out what it will be when truncated (15), then the result of a multiplication that overflows can't be equal to it (I realise that lots of other ops might also be involved in a hash, but '*' is usually one of them, and we can certainly eliminate any results that are negative).

I've had a bit of time to think about it, and I'm pretty sure I'm right for positive integers, but plainly not for other combinations; what I'm not sure is if the choice of a prime helps for other combos. The only thing (I suspect) that helps is that prime numbers are odd which, given the nature of hash calculations is likely to give a 3:1 chance of 15 NOT being the answer to a multiplication.

As I say, Tuesday night nonsense; but you haven't convinced me I'm entirely wrong - at least not yet.

Winston
 
Bartender
Posts: 612
7
Mac OS X Python
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Back in the old days --- HP3000 - the data base used, used a hash code for all keys. The rule was to always make the capacity (which was the divisor for the hash code) a prime.

Very famous paper written about the database and the hashing (i.e.: how they got the db that fast ).

Anyway one point in the paper where they had demonstrated everything, but the prime capacity issue. So i wrote a utility that would "pre hash" everything for a range of keys. we ran it for millions and millions of iterations - prime capacity (or key bases) turned out to be the worst in most cases.

If i remember correctly:

integer keys worked best with large, odd, non-prime numbers.

character based actually did well with large powers of 2

and real keys (yes they used them) actually did very well with 31 and i think 79.

-steve
 
Steve Fahlbusch
Bartender
Posts: 612
7
Mac OS X Python
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
my last post was just for historical perspective - it may mean nothing with java.
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Steve Fahlbusch wrote:integer keys worked best with large, odd, non-prime numbers.
character based actually did well with large powers of 2
and real keys (yes they used them) actually did very well with 31 and i think 79.


I've heard of #1 and #3 (especially the magical '31', which is also often an optimized constant) on your list - not familiar with #2.

The only other thing I rather like about 15 is that the business of most hash algorithms I know is "bit shuffling", and 15 is 28 0's followed by 4 '1's.

But I'm enough of a mathematician to know that that don't mean nothin'

Winston
 
Paul Clapham
Marshal
Posts: 28492
113
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Winston Gutkowski wrote:

Paul Clapham wrote:I'm a math whiz, and no, that isn't correct. Multiplication of signed 32-bit integers can be anything up to 2^62, just about.


Hol' on hoss. They then need to be truncated in order to fit back into a 32-bit integer.



Right. That's what you're doing, isn't it? You are finding a prime number which truncates to 15 as a 32-bit integer.

My thinking was along these lines: If we find a number that is larger than 2^32 and prime (And the smallest number greater than 2^32 that is prime is 2^32 + 15), and work out what it will be when truncated (15), then the result of a multiplication that overflows can't be equal to it (I realise that lots of other ops might also be involved in a hash, but '*' is usually one of them, and we can certainly eliminate any results that are negative).



Which is backwards. I've just demonstrated that there are plenty of composite 64-bit numbers which truncate to 15 as a 32-bit integer. And since those numbers are composite, that means they are the product of smaller numbers. Not all of them will be the product of two smaller numbers which are both less than 2^32 but many of them will be. So there are two 32-bit numbers A and B whose product truncates to 15.

Why that's relevant to anything I still have no idea, but that's the math of it.
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:Which is backwards. I've just demonstrated that there are plenty of composite 64-bit numbers which truncate to 15 as a 32-bit integer.


Gotcha (finally). Thanks.

I still reckon that an odd number has better odds of not being the product of two random numbers than an even one though.

Winston
 
Steve Fahlbusch
Bartender
Posts: 612
7
Mac OS X Python
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Easy way to find out -- do a chi squared analysis on odd vs even on your key generator.
 
Master Rancher
Posts: 5250
85
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Winston - you also seem to be assuming that the last operation in generating a hashCode() will be a multiplication. Often, it's an addition instead. At least, that's true for the formula used by Arrays.hashCode() and List.HashCode(), as well as for the alternate formula recommended by Josh Bloch in Effective Java. In both of these, the last operation is an addition of the last data member to be included (or the hash of that member). Seems like if that last member has a reasonably even distribution, it will wash out any previous propensity for even numbers or other patterns in the last few bits.
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:Winston - you also seem to be assuming that the last operation in generating a hashCode() will be a multiplication...


Yup, you're quite right, all of you. Sometimes these things just pop into my head and I have to get them purged. And thanks, Paul, for setting me straight; I was plainly having a "Homer moment".

Winston
 
Let's get him boys! We'll make him read this tiny ad!
The new gardening playing cards kickstarter is now live!
https://www.kickstarter.com/projects/paulwheaton/garden-cards
reply
    Bookmark Topic Watch Topic
  • New Topic