The worst case time complexity for HashMap after Java 8 is still O(n) not O(log n)?

Question

I encountered an interview question today related to Java's HashMaps and I'm seeking clarification on a few points regarding the time complexity of certain operations and key comparisons in Java 8 and later versions.

Given the following code snippet, what is the time complexity of the map.get(new Key(1)); line, considering Java 8's improvements to HashMap?

import java.util.*; class Key { int v; public Key(int v) { this.v = v; } @Override public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; Key key = (Key) o; return v == key.v; } @Override public int hashCode() { return 0; // always hash collision } } class Solution { public static void main(String[] args) { int n = 1_000_000; Map<Key, Integer> map = new HashMap<>(); for (int i = 0; i < n; i++) { Key key = new Key(i); map.put(key, i); } map.get(new Key(1)); // What's the time complexity of this line? } }

I initially answered O(log n), mentioning that Java 8 automatically converts lengthy linked lists in buckets into Red-Black Trees for efficiency.

Q1: However, the interviewer then inquired about how comparison is done within the RB Tree when the Key class doesn't implement the Comparable interface.

I answered that hashCode is first used for comparison, since even in the same bucket it just means that h1 % length == h2 % length. Upon further questioning about the scenario where hash codes are identical, I mentioned the use of System.identityHashCode. My answer is same as https://stackoverflow.com/a/43911638/10969942

Q2: This leads to a discussion about the case where two keys are equals but likely have different System.identityHashCode values:

Key k1 = new Key(1); Key k2 = new Key(1);

I speculated that it might require traversing all entries in the subtree with the same hash code. Then the interviewer asked whether it means that the worst case time complexity is O(n)? I don't know since I always heard that after Java8, HashMap can have guaranteed worst case O(log n) time complexity.

Q3: The interviewer then questioned how two different keys with the same System.identityHashCode can be inserted into the map given that an RB Tree does not allow duplicate keys:

Key k1 = new Key(1); // same identityHashCode Key k2 = new Key(2); // same identityHashCode

I remember it being possible for different objects to have the same System.identityHashCode https://stackoverflow.com/a/1063100/10969942, but I'm unclear on how the HashMap manages these insertions without violating the RB Tree's constraints on no duplicate keys, that is handle both hashCode and identityHashCode collision.

Yes, of course it means the worst case time complexity is O(n). Using System.identityHashCode wouldn't be compatible with alternate definitions of equals and hashCode, so it can't be used. — Louis Wasserman
– Louis Wasserman, Commented Apr 8, 2024 at 20:52
@LouisWasserman However, I always heard that after Java8 there is a guaranteed performance of worst case O(log n). Is it wrong? — maplemaple
– maplemaple, Commented Apr 8, 2024 at 20:54
@LouisWasserman However, from the answer stackoverflow.com/questions/43911369/…, it truly says that it will use identity hash code to compare if the key class does not implement Comparable interface. What's the meaning to do it if it cannot optimize the time complexity. — maplemaple
– maplemaple, Commented Apr 8, 2024 at 21:35
See e.g. here. identityHashCode cannot be used to make get O(log n). — Louis Wasserman
– Louis Wasserman, Commented Apr 8, 2024 at 21:44

John Bollinger · Accepted Answer · 2024-04-08 21:46:14Z

Given the following code snippet, what is the time complexity of the map.get(new Key(1)); line, considering Java 8's improvements to HashMap?

It is O(n).

I initially answered O(log n), mentioning that Java 8 automatically converts lengthy linked lists in buckets into Red-Black Trees for efficiency.

Yes. In this case, every key has the same hash code, so there will be only one hash bucket. With enough entries it will be organized as a balanced tree.

Q1: However, the interviewer then inquired about how comparison is done within the RB Tree when the Key class doesn't implement the Comparable interface.

As you said, the tree order falls back to identity hashcode when hash codes are equal and the key type does not have a natural order.

Q2: This leads to a discussion about the case where two keys are equals but likely have different System.identityHashCode values:

This is indeed a problem for HashMap. If a key to be looked up is the same object as one already in the map, then it will be found in O(log n) steps, but if it isn't then the map still has to be searched for one equals() to it. That will take O(n) steps if such a key is present, and Θ(n) steps if not.

Then the interviewer asked whether it means that the worst case time complexity is O(n)?

Yes, it does.

I don't know since I always heard that after Java8, HashMap can have guaranteed worst case O(log n) time complexity.

I hope you didn't say that. A good interviewer doesn't expect you to know everything, and people can acquire misinformation in a wide variety of ways. But having led you to the conclusion, they could reasonably expect you to accept it. At minimum to acknowledge that the conclusion seems sound.

Q3: The interviewer then questioned how two different keys with the same System.identityHashCode can be inserted into the map given that an RB Tree does not allow duplicate keys:

I guess the supposition here is

it being possible for different objects to have the same System.identityHashCode

It is true that Java does not guarantee distinct identity hash codes, but it does specify:

As far as is reasonably practical, the hashCode method defined by class Object returns distinct integers for distinct objects.

(Object's implementation of hashCode() is, of course, the identity hash code.)

I'm unclear on how the HashMap manages these insertions without violating the RB Tree's constraints on no duplicate keys, that is handle both hashCode and identityHashCode collision.

I don't know off the top of my head, and a quick doc check did not turn up an answer. But assuming that HashMap does handle that case, I expect it does so with a linked list in each tree node, or similar. This does not present the same issue here that using a linked list did at the bucket level, because the standard library implementation can be relied upon to provide reasonably good identity hash codes (see above).

Even if that's not what HashMap actually does, I suspect that the interviewer would have been satisfied with it. Again, they don't expect you to know everything, but they likely are looking for people who are prepared to think.

When the identity hash codes are the same, the tree will become unbalanced (which is as good as a linked list). This may happen when you have lots of non-comparable key objects with the same hash code because, depending on the implementation and environment, only a few bits in the object header might be reserved for the identity hash code (far less than 32). But then, it’s your key class’s hash code implementation that must be fixed. The identity hash code does not help here; it is not checked during a lookup, as both branches have to be checked when looking up a key below such a node.

Collectives™ on Stack Overflow

The worst case time complexity for HashMap after Java 8 is still O(n) not O(log n)?

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related