3

I have few doubts about the Java HashMap class. It is my understanding that

 transient Entry[] table; 

the table array is going to hold the data based on the value of hashCode(). I need to know when this array gets initialized. Is the array length based on the capacity we define during the HashMap's initialization or the default capacity of 16 if it is not defined when calling the constructor?

How is the hashcode scaled to the array index? For example, if the hashcode has a huge value, how it is scaled to array index like 10, 20?

I have read that when the threshold value is reached, rehashing will occur. For example, in the default case, when 16 is the capacity and 0.75 is the load factor, then the threshold value is 16*0.75=12. Once the 12 items are added rehashing will occur and capacity will increase. Does this mean that the table array size gets increased?

2 Answers 2

7

since your post has many questions I'm going to enumerate your questions as part of my answer. Also, please note that I'm going off HashMap's source code for Java 1.8 b132 for my answers.

  1. Q: I need to know when this array gets initialized.
    A: The table array only gets initialized when data is first entered into the map (e.g. a put() method call). It does not happen as part of the instantiation of the map, itself, unless the copy constructor is called, or the map is being deserialized into an object.
  2. Q: Is the array length based on the capacity we define during the HashMap's initialization or the default capacity of 16 if it is not defined when calling the constructor?
    A: Correct, the table array's length is based on the initial capacity your pass to the constructor. When the initial capacity is not specified and the default constructor is called, the default capacity is used.
  3. Q: How is the hashcode scaled to the array index?
    A: For the actual code that does this, itself, see the implementation of the putVal() method. Basically what happens is that the code takes the very large hash value and performs a bitwise-AND with the last element index of the table. That effectively randomizes the position of the key/value pair with the table array. For example, if the hash value is 333 (101001101 in base 2) and the table array size is 32 (100000), the last element index would be 31 (11111). Thus the index chosen would be 11111 & 101001101 == 01101 == 13.
  4. Q: I have read that when the threshold value is reached, rehashing will occur. ... Does this mean that the table array size gets increased?
    A: More or less, yes. When the threshold is exceeded, the table is resized. Note that by resizing, the existing table array isn't modified. Rather, a new table array is created with the twice the capacity of the first table array. For details, see the implementation of the resize() method.
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you Jonathan. Am almost clear. just an another question i.e related to the Q:3. you have clearly explained in the example of how to fit the hashcode 333 in 32 length array. so it means any hashcode value can fit into 32 length array. then what is the need for resizing. In case of index collision, the collide object will be stored inside the same index element. am i right? on what basis then resizing happens.
Good question. Yes, multiple elements can be stored inside the same index because the type of the table array is an internal type to HashMap - Node, which has references to a possible next node. Thus, in such a manner, a fixed size table can have an unbounded number of entries. Resizing is usually triggered when the number of key-value pairs (i.e. the size field in HashMap) exceeds the threshold value (i.e. the size of the table array multiplied by the loadFactor).
So while resizing, will the objects gets rearranged? In the above example, when table size is 32, the object with hash code 333 occupied will be in 13th position. During resize, the size gets double , so now size will become 64.to avoid confusion we consider again it get doubled to 128 because with size 64 also it occupies in 13 position. With 128, 1111111 & 101001101 == 1001101==77. Will it get stored in 77 position?
With respect to resizing, the objects could get rearranged, yes. Consider a table of size 4 (last index is 3, which is 11 in base 2), indexing a hash of 14 (1110). In this case, the value will be stored at index 3 & 14 == 11 & 1110 == 0011 & 1110 == 10 == 2. Later, the table is resized to 8 (last index is 7, or 111 in base 2). So when resized, the 14 will be indexed at 7 & 14 == 111 & 1110 == 0111 & 1110 == 110 == 6. Thus, index in the table array of the same element changes from 2 to 6 when sized.
As for your concerns with the get() method, the nodes are not indexed " inside one another". Rather, you can think of the table as having a linked list of nodes at each index, even though the type is Node[]. This is because Node instances contain a reference to the next node. See the implementation of Node class for details (tinyurl.com/hhqcxl8). Note that this is referred to as chaining. For a description of hash table chaining check out the Wikipedia article on the subject (tinyurl.com/zn8gl3c).
|
0
public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); // Find a power of 2 >= initialCapacity int capacity = 1; while (capacity < initialCapacity) capacity <<= 1; this.loadFactor = loadFactor; threshold = (int)(capacity * loadFactor); table = new Entry[capacity]; init(); } 

Above code block explains how and when you populate the table.

Once the rehashing occurs it doesn't increase the table array size as you can declare array size once for ever; It creates a new array every time with the updated size:

void resize(int newCapacity) { Entry[] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity == MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return; } Entry[] newTable = new Entry[newCapacity]; transfer(newTable); table = newTable; threshold = (int)(newCapacity * loadFactor); } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.