24

In the Javadoc for Object.hashCode() it states

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)

It's a common miconception this has something to do with the memory address but it doesn't as that can change without notice and the hashCode() does not and must not change for an object.

@Neet Provided a link to a good answer https://stackoverflow.com/a/565416/57695 but I am looking for more details.


Here is an example to illustrate my concern

Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe"); theUnsafe.setAccessible(true); Unsafe unsafe = (Unsafe) theUnsafe.get(null); for (int t = 0; t < 10; t++) { System.gc(); Object[] objects = new Object[10]; for (int i = 0; i < objects.length; i++) objects[i] = new Object(); for (int i = 0; i < objects.length; i++) { if (i > 0) System.out.print(", "); int location = unsafe.getInt(objects, Unsafe.ARRAY_OBJECT_BASE_OFFSET + Unsafe.ARRAY_OBJECT_INDEX_SCALE * i); System.out.printf("%08x: hc= %08x", location, objects[i].hashCode()); } System.out.println(); } 

prints

eac00038: hc= 4f47e0ba, eac00048: hc= 2342d884, eac00058: hc= 7994d431, eac00068: hc= 19f71b53, eac00078: hc= 2e22f376, eac00088: hc= 789ddfa3, eac00098: hc= 44c58432, eac000a8: hc= 036a11e4, eac000b8: hc= 28bc917c, eac000c8: hc= 73f378c8 eac00038: hc= 30813486, eac00048: hc= 729f624a, eac00058: hc= 3dee2310, eac00068: hc= 5d400f33, eac00078: hc= 18a60d19, eac00088: hc= 3da5f0f3, eac00098: hc= 596e0123, eac000a8: hc= 450cceb3, eac000b8: hc= 4bd66d2f, eac000c8: hc= 6a9a4f8e eac00038: hc= 711dc088, eac00048: hc= 584b5abc, eac00058: hc= 3b3219ed, eac00068: hc= 564434f7, eac00078: hc= 17f17060, eac00088: hc= 6c08bae7, eac00098: hc= 3126cb1a, eac000a8: hc= 69e0312b, eac000b8: hc= 7dbc345a, eac000c8: hc= 4f114133 eac00038: hc= 50c8c3b8, eac00048: hc= 2ca98e77, eac00058: hc= 2fc83d89, eac00068: hc= 034005e1, eac00078: hc= 6041f871, eac00088: hc= 0b1df416, eac00098: hc= 5b83d60d, eac000a8: hc= 2c5a1e6b, eac000b8: hc= 5083198c, eac000c8: hc= 4f025f9f eac00038: hc= 00c5eb8a, eac00048: hc= 41eab16b, eac00058: hc= 1726099c, eac00068: hc= 4240eca3, eac00078: hc= 346fe350, eac00088: hc= 1db4b415, eac00098: hc= 429addef, eac000a8: hc= 45609812, eac000b8: hc= 489fe953, eac000c8: hc= 7a8f6d64 eac00038: hc= 7e628e42, eac00048: hc= 7869cfe0, eac00058: hc= 6aceb8e2, eac00068: hc= 29cc3436, eac00078: hc= 1d77daaa, eac00088: hc= 27b4de03, eac00098: hc= 535bab52, eac000a8: hc= 274cbf3f, eac000b8: hc= 1f9fd541, eac000c8: hc= 3669ae9f eac00038: hc= 772a3766, eac00048: hc= 749b46a8, eac00058: hc= 7e3bfb66, eac00068: hc= 13f62649, eac00078: hc= 054b8cdc, eac00088: hc= 230cc23b, eac00098: hc= 1aa3c177, eac000a8: hc= 74f2794a, eac000b8: hc= 5af92541, eac000c8: hc= 1afcfd10 eac00038: hc= 396e1dd8, eac00048: hc= 6c696d5c, eac00058: hc= 7d8aea9e, eac00068: hc= 2b316b76, eac00078: hc= 39862621, eac00088: hc= 16315e08, eac00098: hc= 03146a9a, eac000a8: hc= 3162a60a, eac000b8: hc= 4382f3da, eac000c8: hc= 4a578fd6 eac00038: hc= 225765b0, eac00048: hc= 17d5176d, eac00058: hc= 26f50154, eac00068: hc= 1f2a45c7, eac00078: hc= 104b1bcd, eac00088: hc= 330e3816, eac00098: hc= 6a844689, eac000a8: hc= 12330301, eac000b8: hc= 530a3ffc, eac000c8: hc= 45eee3fb eac00038: hc= 3f9432e0, eac00048: hc= 1a9830bc, eac00058: hc= 7da79447, eac00068: hc= 04f801c4, eac00078: hc= 363bed68, eac00088: hc= 185f62a9, eac00098: hc= 1e4651bf, eac000a8: hc= 1aa0e220, eac000b8: hc= 385db088, eac000c8: hc= 0ef0cda1 

As a side note; If you look at this code

if (value == 0) value = 0xBAD ; 

It appears that 0xBAD is twice as likely as normal as any hashCode as 0 is mapped to this value. If you run this long enough you see

long count = 0, countBAD = 0; while (true) { for (int i = 0; i < 200000000; i++) { int hc = new Object().hashCode(); if (hc == 0xBAD) countBAD++; count++; } System.out.println("0xBAD ratio is " + (double) (countBAD << 32) / count + " times expected."); } 

prints

0xBAD ratio is 2.0183116992481205 times expected. 
14
  • 1
    Has been asked before, see here: stackoverflow.com/questions/557574/… Commented Dec 13, 2012 at 12:52
  • 1
    @Neet Since the OP was one of the people answering that other question, I would tend to think he is asking something different. Commented Dec 13, 2012 at 12:52
  • 1
    @Peter okay, then nevermind :-). @Michael every object you pass around gets passed as a pointer ... otherwise the NullPointerException won't make sense^^ Commented Dec 13, 2012 at 12:56
  • 3
    The docs says "typically implemented". All the internals of the JVM don't need to make sense to the developer. It will work is all they tell you. So most likely when the object is instantiated it will have a unique identifier stored based from the address, which stays with the object wherever. Commented Dec 13, 2012 at 13:00
  • 1
    variants of this questions have been asked multiple times (usually regarding hashCode): stackoverflow.com/questions/4930781/… or this: stackoverflow.com/questions/10917731/… Commented Jan 10, 2013 at 19:48

4 Answers 4

23

This is clearly implementation-specific.

Below I include the Object.hashCode() implementation used in OpenJDK 7.

The function supports six different calculation methods, only two of which take any notice of the object's address (the "address" being the C++ oop cast to intptr_t). One of the two methods uses the address as-is, whereas the other does some bit twiddling and then mashes the result with an infrequently-updated random number.

Of the remaining methods, one returns a constant (presumably for testing), one returns sequential numbers, and the rest are based on pseudo-random sequences.

It would appear that the method can be chosen at runtime, and the default seems to be method 0, which is os::random(). The latter is a linear congruential generator, with an alleged race condition thrown in. :-) The race condition is acceptable because at worst it would result in two objects sharing the same hash code; this does not break any invariants.

The computation is performed the first time a hash code is required. To maintain consistency, the result is then stored in the object's header and is returned on subsequent calls to hashCode(). The caching is done outside this function.

In summary, the notion that Object.hashCode() is based on the object's address is largely a historic artefact that has been obsoleted by the properties of modern garbage collectors.

// hotspot/src/share/vm/runtime/synchronizer.hpp // hashCode() generation : // // Possibilities: // * MD5Digest of {obj,stwRandom} // * CRC32 of {obj,stwRandom} or any linear-feedback shift register function. // * A DES- or AES-style SBox[] mechanism // * One of the Phi-based schemes, such as: // 2654435761 = 2^32 * Phi (golden ratio) // HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ; // * A variation of Marsaglia's shift-xor RNG scheme. // * (obj ^ stwRandom) is appealing, but can result // in undesirable regularity in the hashCode values of adjacent objects // (objects allocated back-to-back, in particular). This could potentially // result in hashtable collisions and reduced hashtable efficiency. // There are simple ways to "diffuse" the middle address bits over the // generated hashCode values: // static inline intptr_t get_next_hash(Thread * Self, oop obj) { intptr_t value = 0 ; if (hashCode == 0) { // This form uses an unguarded global Park-Miller RNG, // so it's possible for two threads to race and generate the same RNG. // On MP system we'll have lots of RW access to a global, so the // mechanism induces lots of coherency traffic. value = os::random() ; } else if (hashCode == 1) { // This variation has the property of being stable (idempotent) // between STW operations. This can be useful in some of the 1-0 // synchronization schemes. intptr_t addrBits = intptr_t(obj) >> 3 ; value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ; } else if (hashCode == 2) { value = 1 ; // for sensitivity testing } else if (hashCode == 3) { value = ++GVars.hcSequence ; } else if (hashCode == 4) { value = intptr_t(obj) ; } else { // Marsaglia's xor-shift scheme with thread-specific state // This is probably the best overall implementation -- we'll // likely make this the default in future releases. unsigned t = Self->_hashStateX ; t ^= (t << 11) ; Self->_hashStateX = Self->_hashStateY ; Self->_hashStateY = Self->_hashStateZ ; Self->_hashStateZ = Self->_hashStateW ; unsigned v = Self->_hashStateW ; v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ; Self->_hashStateW = v ; value = v ; } value &= markOopDesc::hash_mask; if (value == 0) value = 0xBAD ; assert (value != markOopDesc::no_hash, "invariant") ; TEVENT (hashCode: GENERATE) ; return value; } 
Sign up to request clarification or add additional context in comments.

7 Comments

I was browsing the source but could not find it - what file is that in?
hotspot/src/share/vm/runtime/synchronizer.hpp
Thanks - for the record, link to openjdk7: hg.openjdk.java.net/jdk7/jdk7/hotspot/file/9b0ca45cd756/src/… - it does not seem to have changed much...
Any idea which one it uses, or how you might change it?
@PeterLawrey: It looks like hashCode is a runtime flags (not yet sure how to set it). The default seems to be 0.
|
5

It typically is the memory address of the object. However, the first time the hashcode method is called on an object, the integer is stored in the header of that object so that the next invocation will return the same value (as you say, compacting garbage collection can change the address). To my knowledge that is how it is implemented in the Oracle JVM.

EDIT: digging down into the JVM source code, this is what shows up (synchronizer.cpp):

// hashCode() generation : // // Possibilities: // * MD5Digest of {obj,stwRandom} // * CRC32 of {obj,stwRandom} or any linear-feedback shift register function. // * A DES- or AES-style SBox[] mechanism // * One of the Phi-based schemes, such as: // 2654435761 = 2^32 * Phi (golden ratio) // HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ; // * A variation of Marsaglia's shift-xor RNG scheme. // * (obj ^ stwRandom) is appealing, but can result // in undesirable regularity in the hashCode values of adjacent objects // (objects allocated back-to-back, in particular). This could potentially // result in hashtable collisions and reduced hashtable efficiency. // There are simple ways to "diffuse" the middle address bits over the // generated hashCode values: // static inline intptr_t get_next_hash(Thread * Self, oop obj) { intptr_t value = 0 ; if (hashCode == 0) { // This form uses an unguarded global Park-Miller RNG, // so it's possible for two threads to race and generate the same RNG. // On MP system we'll have lots of RW access to a global, so the // mechanism induces lots of coherency traffic. value = os::random() ; } else if (hashCode == 1) { // This variation has the property of being stable (idempotent) // between STW operations. This can be useful in some of the 1-0 // synchronization schemes. intptr_t addrBits = intptr_t(obj) >> 3 ; value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ; } else if (hashCode == 2) { value = 1 ; // for sensitivity testing } else if (hashCode == 3) { value = ++GVars.hcSequence ; } else if (hashCode == 4) { value = intptr_t(obj) ; } else { // Marsaglia's xor-shift scheme with thread-specific state // This is probably the best overall implementation -- we'll // likely make this the default in future releases. unsigned t = Self->_hashStateX ; t ^= (t << 11) ; Self->_hashStateX = Self->_hashStateY ; Self->_hashStateY = Self->_hashStateZ ; Self->_hashStateZ = Self->_hashStateW ; unsigned v = Self->_hashStateW ; v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ; Self->_hashStateW = v ; value = v ; } value &= markOopDesc::hash_mask; if (value == 0) value = 0xBAD ; assert (value != markOopDesc::no_hash, "invariant") ; TEVENT (hashCode: GENERATE) ; return value; } 

So 6 different ways to do it in the Oracle JVM, one of them is equivalent to what I said... The value is stored in the header of the object by the method calling get_next_hash (that one is called FastHashCode and is called from the native version of Object.hashCode().

4 Comments

Objects are first created in the eden space, and everytime a minor/full GC is run the eden space is reused with the same addresses. If the hashCode used the memory address they would repeat a lot.
Whether it is the address depends on which of the 6 hashcode algorithms is chosen. Really, all that is needed is a way to generate integers that do not overlap too much and do it fast...
I don't really know why so many people are voting me down for my answer. I will leave this discussion now. I have better things to do than fighting over who found some random source code first...
@MathiasSchwarz, It is possible that they decided to upvote NPE answer and remove the vote thay have given to you. It is not about who found example first but how does it added a value to it. You just posted to code NPE pointed some thing out.
2

IMHO, although implementation dependent, Object references in JVM are never actual memory addresses, but internal JVM references that point to actual addresses. These internal references ARE generated initially based on memory address, but they remain associated with the object until it is discarded.

I say this because Java HotSpot GCs are copying collectors of some form or the other, and they work by traversing live objects and copying them from one heap to the other, destroying the old heap subsequently. So when GC occurs, the JVM does not have to update the references in all objects, but just change the actual memory address mapped to the internal reference.

Comments

0

The Javadoc's suggestion for Object.hashCode() that it is implemented by deriving the hashcode from the memory address is simply outdated.

Probably nobody bothered (or noticed) that this implementation path is no longer possible whith a copying garbage collector (Since it would change the hashcode when the Object is copied to another memory location).

It made a lot of sense to implement hashcode that way before there was a copying garbage collector, because it would save heap space. A non-copying GC (CMS) may still implement hashcode that way today.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.