4

Which is better option to use:

And why?

Also what is loadfactor and modcount in HashMap's property?

When I debug my code in eclipse and look at value of HashMap it shows a property named loadfactor with a value of 0.75 and a property named modcount with a value of 3.


Where i use hashmap in my code:-

I'm developing a communication application you can a say a chat application. in which i store all send/received messages in HashMap. Now as i cant assume how many messages will a user will send/receive i declare a hashmap without initial capacity. What i have written is

Map<String, Map<String, List<String>>> usersMessagesMap = new HashMap<String, Map<String,List<String>>>(); 

if i use it with initial capacity of 100 or higher will it effect the code?

9
  • What exactly do you mean by "with default size" or "without size"? Commented Feb 8, 2011 at 11:05
  • I guess he means the initial capacity. Commented Feb 8, 2011 at 11:06
  • It depends greatly. Even if you know the keyset size ahead of time, there's no guarantee that the keys' hashCode function will be uniformly distributed (unless you write one by yourself). If you know that the default capacity is much smaller than the keyset size, I'd recommend using a higher initial capacity to reduce rehashing. Commented Feb 8, 2011 at 11:07
  • @Joachim,@DR yes i mean initial capacity. Commented Feb 8, 2011 at 11:08
  • I suggest reading the Javadocs, they explain this quite well. Commented Feb 8, 2011 at 11:12

4 Answers 4

8

Have you checked the HashMap API Javadoc?

  • The capacity is the number of buckets in the hash table
  • The initial capacity is simply the capacity at the time the hash table is created
  • The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased

On setting the initial size too high:

Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

Impact of the load factor on the performance:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

Well, in short: depending on the estimated size and the expected growth ratio, you'll have to chose an aproximation or the opposite.

Usually, if you know the initial number of elements for your Map, it's recommended to set it at building time, avoiding early rehashes on initialization time.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @Adeel ( I have edited for making it a little clearer)
Indeed. As a side comment, I often find the Java API documentation (well, at least on this case) is a nice example of good, useful software documentation.
Never before knew what the load factor was there for, thanks!
4

If you know that the keyset size will be much bigger than the initial capacity (which is 16), I'd use a higher initial capacity to reduce rehashing (as the number of keys grow and the N/C value (where N is the number of stored keys and C is the map's capacity) reaches the load factor, the map array is extended and the keys are rehashed). Also, since the map size increases exponentially, you won't see a drastic reduction on the number on rehashing unless you have a significant number of keys.

So, my opinion is: if you have the spare memory and lots of keys, go for a higher initial capacity.

5 Comments

what does this loadfactor means?
@Harry Joy: The load factor the maximum value allowed for N/C, where N is the number of keys currently stored at the map and C is the map's capacity. When the N/C value reaches the load factor, the map is resized. The idea of resizing the map before N/C reaches 1 is because a high N/C value means a higher probability of collisions (the chance of the hash function to map two keys to the same array slot).
strictly speaking the loadfactor is the maximum value that the HashMap will allow the value N/C to approach. Once that threshold is reached, it will resize the internal array.
@Joachim Sauer Thanks for clearing that up. I will edit the comment and the answer.
thnx. [+1] for first explaining rehashing and loadfactor.
2
  • Better in terms of simplicity, without initial size.
  • Better in terms of performance, try that out yourself.

Found an SO thread, Performance of Hashmap with Different Initial Capacity And Load Factor

Load Factor

The performance of most collision resolution methods does not depend directly on the number n of stored entries, but depends strongly on the table's load factor, the ratio n/s between n and the size s of its bucket array. Sometimes this is referred to as the fill factor, as it represents the portion of the s buckets in the structure that are filled with one of the n stored entries. With a good hash function, the average lookup cost is nearly constant as the load factor increases from 0 up to 0.7(about 2/3 full) or so. -- Wikipedia on Load Factor

Now your new question

if i use it with initial capacity of 100 or higher will it effect the code?

Its not a good idea, you are good to go with default thing. Don't think too much about this in the start. As he said, "premature optimisation is the root of all evil". It wouldn't give any real benefit, whatsoever.

6 Comments

@Adeel: i had tried it out at my side but can't figure out the difference. thats why i'm asking here.
@Harry: Doesn't that mean it doesn't matter as far as performance is concerned. And of course when one doesn't gain anything by giving initial size, why one even like to give one. Further, you might like to share the code you have written to find the difference. And that way, you wouldn't get this kind of dumb answers.
Most of the time it doesn't matter, but if your HashMap is going to be very big and you know the final size (approximately or exactly, doesn't really matter), then providing that can improve the performance of constructing the HashMap and filling it with values.
@Joachim: Yes, very much logical. But I just wanted him to try it out. And therefore, I said him to share the code he tried, so we can comment.
@Harry, added my answer for your new question.
|
1

Strictly speaking you should not care about the internal fields of the HashMap (loadfactor and modcount are fields, not properties: properties would have getters/setters).

The modcount is most likely the number of modifications applied to the Map since its creation. It's used to detect concurrent modifications and to know when an Iterator becomes "broken" (because the originating Map was structurally modified since it was created).

The loadfactor is probably a field storing the second argument of the two-argument constructor. It defines how "tightly packed" the internal array may become until it is resized (which results in a re-hashing of all the keys).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.