77

Typically the default implementation of Object.hashCode() is some function of the allocated address of the object in memory (though this is not mandated by the Java Language Specification). Given that the VM shunts objects about in memory, why does the value returned by System.identityHashCode() never change during the object's lifetime?

If it is a "one-shot" calculation (the object's hashCode is calculated once and stashed in the object header or something), then does that mean it is possible for two objects to have the same identityHashCode (if they happen to be first allocated at the same address in memory)?

6
  • 1
    Related question: Is that memory address a real memory address or something virtual that can stay fixed even as the object gets shuffled about? If virtual, that would be nice because the pointers to it would not need to be adjusted. On the other hand, this would mean an extra indirection and a potentially big mapping table. Commented Jun 30, 2009 at 11:06
  • 3
    It's a slight rearrangement of the address when first requested. (Returning a hash code with low bits all zero isn't great.) Commented Jun 30, 2009 at 11:14
  • Actually, where does it say that the identityHashCode must never change? The JavaDoc for System.identityHashCode is not clear on that. Commented Jun 30, 2009 at 11:32
  • Of course, if identityHashCode did change, you could only use objects that implements hashCode() as keys in hash tables. Commented Jun 30, 2009 at 12:06
  • 3
    Okay, got it: "Whenever (hashCode) is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified." And equals in this case is object identity comparison. Commented Jul 1, 2009 at 1:15

5 Answers 5

45

Modern JVMs save the value in the object header. I believe the value is typically calculated only on first use in order to keep time spent in object allocation to a minimum (sometimes down to as low as a dozen cycles). The common Sun JVM can be compiled so that the identity hash code is always 1 for all objects.

Multiple objects can have the same identity hash code. That is the nature of hash codes.

Sign up to request clarification or add additional context in comments.

6 Comments

Right - I've just looked thru ObjectSynchronizer::FastHashCode in synchronizer.cpp (vm runtime source code) and after generating the hashcode, it looks like it merges it into the object header. Looks like there are several possible implementations of HashCode; the one you allude to that returns 1 for all objects is used to ensure no part of the VM assumes hashcodes are unique for any reason.
public static native int identityHashCode(Object x); is a native method. Are you able to explain it from native implemented code perspective? I mean C++ implementation.it is mainly used in inIdentityHashMap right?
@Tom What do you mean by object header? You also wrote " I believe the value is typically calculated only on first use in order to keep object allocation to a minimum (sometimes down to as low as a dozen cycles)." Can you explain which object allocation you are referring to here?
@Geek I meant the execution time spent allocating an object is kept to a minimum (I have clarified the text). Every object (including arrays) in a typical Java implementation will start with some bytes indicating the runtime type, the monitor for intrinsic locking, possibly GC-related bits and the identity hash code. Actual details may be quite complicated because it needs to be heavily optimised.
@Lil Identity and monitors on objects are rarely used, yet they are still always there. This severely hampers the JVM, but there you go. Where are you proposing the header be expanded to. Stop the machine and track down every incoming reference for every object so used? / You are right, in that typically a few bits short of four bytes will be used for the hash code. Some implementations may to peculiar things, such as copy out the hash onto the stack during synchronisation to make more room for nice contended lock behaviour. No need for the downvote, IMO.
|
18

In answer to the second question, irrespective of the implementation, it is possible for multiple objects to have the same identityHashCode.

See bug 6321873 for a brief discussion on the wording in the javadoc, and a program to demonstrate non-uniqueness.

3 Comments

True. Two different objects can have the same hashCode. That is the case with all hash functions (over a domain bigger then their result size).
@Thilo: The JVM could have been written in such fashion as to guarantee that, provided there are never more than four billion objects in existence at once, identityHashCode would never return a value which had been returned for with any other object which is still in existence. Depending upon how the memory manager is implemented, this could be expensive, or it might add zero additional cost. For example, an Object could contain an index into a table of pointers, with each object being immutably assigned a table slot for as long as it exists. Typical JVM implementations don't do that...
...but some other "handle-based" memory-management schemes do, so it may be worthwhile to document that the JVM essentially picks an arbitrary number the first time an object is asked for its identity hash code, and then stores it for later use [btw, I don't recall ever reading anything to officially document whether identityHashcode is thread-safe. If an object's hash code has never been retrieved, is there any guarantee that simultaneous "first" calls to identityHashCode on that object will yield the same value?
2

The header of an object in HotSpot consists of a class pointer and a "mark" word.

The source code of the data structure for the mark word can be found the markOop.hpp file. In this file there is a comment describing memory layout of the mark word:

hash:25 ------------>| age:4 biased_lock:1 lock:2 (normal object)

Here we can see that the the identity hash code for normal Java objects on a 32 bit system is saved in the mark word and it is 25 bits long.

Comments

0

The general guideline for implementing a hashing function is :

  • the same object should return a consistent hashCode, it should not change with time or depend on any variable information (e.g. an algorithm seeded by a random number or values of mutable member fields
  • the hash function should have a good random distribution, and by that I mean if you consider the hashcode as buckets, 2 objects should map to different buckets (hashcodes) as far as possible. The possibility that 2 objects would have the same hashcode should be rare - although it can happen.

Comments

-6

As far as I know, this is implemented to return the reference, that will never change in a objects lifetime .

1 Comment

So you are saying that the reference is not a real memory address (or directly derived from that). So is it a sort of a pointer to the real memory address?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.