7

I was wondering what was the complexity of the replace(Key , Value) for a HashMap is.

My initial thoughts are O(1) since it's O(1) to get the value and I can simply replace the value assigned to the key.

I'm unsure as to if I should take into account collisions that there might be in a large hashmap implemented in java with java.util.

8
  • 6
    It is O(1) amortized, just like containsKey or put or remove. Not amortized, its probably O(n) since any change might trigger a rehashing, potentially touching all entries. But who cares about non-amortized analysis. Commented Aug 25, 2021 at 9:32
  • I want to expand on "But who cares about non-amortized analysis.": non-amortized analysis only matters in real-time use cases, where a maximum time per item has to be strongly enforced. Almost all use cases really care more about throughput and average runtime and then the non-amortized use case does become irrelevant. Commented Aug 25, 2021 at 9:43
  • Just in case, if you are wondering what is amortized time? Commented Aug 25, 2021 at 9:47
  • 2
    Actually, replace only changes the value, which is not subject to hashing. So it actually runs in the same complexity than get or contains. Commented Aug 25, 2021 at 9:53
  • 1
    @Zabuzard that's exactly what i was thinking but the number of votes on your first comment made me think that i might be wrong :P Commented Aug 25, 2021 at 9:57

3 Answers 3

7

tl:dr

HashMap#replace runs in O(1) amortized;

and under the premise that the map is properly balanced, which Java takes care of during your put and remove calls, also non-amortized.

Non-amortized

The fact whether it also holds for non-amortized analysis hinges on the question regarding the implemented self-balancing mechanism.

Basically, due to replace only changing the value which does not influence hashing and the general structure of the HashMap, replacing a value will not trigger any re-hashing or re-organization of the internal structure.

Hence we only pay for the cost of locating the key, which depends on the bucket size.

The bucket size, if the map is properly self-balanced, can be considered a constant. Leading to O(1) for replace also non-amortized.

However, the implementation triggers self-balancing and re-hashing based on heuristic factors only. A deep analysis of that is a bit more complex.

So the reality is probably somewhere in between due to the heuristics.


Implementation

To be sure, let us take a look at the current implementation (Java 16):

@Override public V replace(K key, V value) { Node<K,V> e; if ((e = getNode(key)) != null) { V oldValue = e.value; e.value = value; afterNodeAccess(e); return oldValue; } return null; } 

The method afterNodeAccess is a dummy for subclasses and is empty in HashMap. Everything except getNode runs in O(1) trivially.

getNode

getNode is the canonical implementation of locating an entry in a HashMap, which we know runs in O(1) for a proper self-balancing map, like Javas implementation. Let's take a look at the code:

/** * Implements Map.get and related methods. * * @param key the key * @return the node, or null if none */ final Node<K,V> getNode(Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n, hash; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & (hash = hash(key))]) != null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; } 

This method basically computes the hash hash = hash(key), then looks up the hash in the table first = tab[(n - 1) & (hash = hash(key))] and starts iterating through the data structure stored in the bucket.

Regarding the data structure for the bucket, we have a little branching going on at if (first instanceof TreeNode).

Bucket

The buckets are either simple implicitly linked lists or red-black-tree.

Linked List

For the linked list, we have a straightforward iteration

do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); 

which obviously runs in O(m) with m being the size of the linked list.

Red-Black-Tree

For the red-black-tree, we have

return ((TreeNode<K,V>)first).getTreeNode(hash, key); 

Lookup in a red-black-tree is O(log m), with m being the size of the tree.

Bucket size

Javas implementation makes sure to re-balance the buckets by rehashing if it detects that it gets out of hands (you pay for that on each modifying method like put or remove).

So in both cases we can consider the size of the buckets as constant or, due to the heuristics involved with self-balancing, close to a constant.

Conclusion

Having the buckets at constant size, effectively, makes getNode run in O(1), leading to replace running in O(1) as well.

Without any self-balancing mechanism, in worst case it would degrade to O(n) if a linked list is used and O(log n) for a red-black-tree (for the case that all keys yield a hash collision).

Feel free to dig deeper into the code but it gets a bit more complex down there.

Sign up to request clarification or add additional context in comments.

9 Comments

No it is not O(1), it is O(1) amortized. A table with a large number of collisions is log(m), with m the number of entries in the target bucket.
@YannTM You pay the cost for collisions during put and remove, not during location containsKey, get, replace. See the last paragraphs for why. tldr: the bucket size can be considered constant due to rehashing and self-balancing.
This is ridiculous, the lookup in a hash table is simply not O(1). That would only be the case if there never were any collisions. It is normal to have a bit of collision in the table, put/remove will not guarantee the absence of collisions.
It's not O(1) amortized time, it's O(1) expected time. Unlike amortized time, there's no guarantee that performing N operations is O(N).
I prefer the new wording, thanks for the edit.
|
3

You are right, the main cost is the lookup, which is amortized O(1).

Replacing the associated value with the new one is O(1) once we have found the correct spot. But the lookup is only amortized O(1).

As shown in the code accompanying the incorrect answer of Zabuzard, Java HashMap uses a classical approach, where if you are lucky (the entry you are looking for is the first in the bucket) you get O(1).

If you are less lucky or you have a poor quality hash function (just suppose the worst case, all elements map to the same hash key), to avoid meeting the dreaded O(n) of iterating a plain linked list in the bucket, Java's implementation uses a TreeMap to provide O(log n) complexity.

So Java's hashmap if used correctly should yield basically O(1) replace, and if used incorrectly will degrade gracefully to O(log n) complexity. The threshold is in the TREEIFY (e.g. value is 8 in modern implementation).

Please have a look at these implementation notes in the source: https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/util/HashMap.java#L143-L231

1 Comment

I don't think this is right. Firstly, for it to be true that the lookup is amortized O(1) but not worst-case O(1), the lookup would have to support mutating the map in a way that guarantees O(1) time for later lookups. As far as I can tell, it does not. Secondly, according to the linked source, the O(log n) fallback complexity (when many mappings fall into the same bucket) is only possible when either (1) the keys have distinct hash-codes (that just unfortunately end up in the same bucket) or (2) the keys are Comparable to each other. The worst-case complexity is still O(n).
2

The basics:

  • java.util.HashMap will resize itself to match given amount of elements
  • so collisions are quite rare (compared to n)
  • (for collisions,) modern HashMap implementations use Trees (Node and TreeNode) inside buckets

In one replace/contains/put/get operation, bucket collisions,

  • if you have k bucket collisions out of n, that's k,
  • which with the tree search gets reduced to O(log2(k))
  • which in the O notation, with k being a small number, is equivalent to O(1).

Furthermore, worst case, hash collisions:

  • if you have a really had hash generator that always gives the same result
  • so we get hash collisions
  • for hash collisions, the Node implementation functions like a LinkedList
  • you would have (with this LinkedList-like search) O(n/2) = O(n) complexity.
  • but this would have to be made on purpose, because
  • the primary factor distribution and the primary number modulo get really good distributions,as long as you don't have too many identical hashCode()s
  • most IDEs or simple id sequencing (like primary keys in databases) will provide a near perfect distribution
    • with an id-sequenced hash function, you will not have any (hash or bucket) collisions, so you could actually just use array indexing instead of hash functions and collision handling

Also, check out the comments and the code yourself: https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java

  • tableSizeFor(int cap)
  • getNode()

Specifically:

  • setting the table size for the bucket array is getting close to using prime numbers, which is basically 2^n - 1
  • getting the bucket is first = tab[(n - 1) & hash]) with 'first' being the bucket
    • which is NOT, as I said, a modulo operation, but simply a bit-wise AND,
      • which is faster,
      • can use more valid bits
      • and produces comparably distributed results

To illustrate how to research this yourself, I wrote some code showing worst-case (hash collision) behaviour:

import java.util.HashMap; public class TestHashMapCollisions { static class C { private final String mName; public C(final String pName) { mName = pName; } @Override public int hashCode() { return 1; } @Override public boolean equals(final Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; final C other = (C) obj; if (mName == null) { if (other.mName != null) return false; } else if (!mName.equals(other.mName)) return false; return true; } } public static void main(final String[] args) { final HashMap<C, Long> testMap = new HashMap<>(); for (int i = 0; i < 5; i++) { final String name = "name" + i; final C c = new C(name); final Long value = Long.valueOf(i); testMap.put(c, value); } final C c = new C("name2"); System.out.println("Result: " + testMap.get(c)); System.out.println("End."); } } 

Procedure:

  • use an IDE
  • link the source code of the JDR/JRE you're using to your IDE
  • set a breakpoint to the line System.out.println("Result: " + testMap.get(c));
  • run in debug
  • debugger halts at the breakpoint
  • now go into the HashMap implementation
  • set a breakpoint to the first line of HashMap.getNode() (Node<K,V>[] tab; Node<K,V> first, e; int n; K k; )
  • resume debug; debugger will halt inside HashMap
  • now you can follow the debugger step-by-step

Hint: (You could immediately set the breakpoint inside HashMap, but this would lead to a little chaos, as HashMap is used quite often when the JVM initializes, so you'll hit a lot of unwanted stops first, before you get to testing your code)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.