2

The documentation for HashSet.add says

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

Since my code below will return false for e.equals(e2), I'd expect it to let me add the same instance twice. But the set only contains my instance once. Can someone explain why?

package com.sandbox; import java.util.HashSet; import java.util.Set; public class Sandbox { public static void main(String[] args) { Set<A> as = new HashSet<A>(); A oneInstance = new A(); System.out.println(oneInstance.equals(oneInstance)); //this prints false as.add(oneInstance); as.add(oneInstance); System.out.println(as.size()); //this prints 1, I'd expect it to print 2 since the System.out printed false } private static class A { private Integer key; @Override public boolean equals(Object o) { if (!(o instanceof A)) { return false; } A a = (A) o; if (this.key == null || a.key == null) { return false; //the key is null, it should return false } if (key != null ? !key.equals(a.key) : a.key != null) { return false; } return true; } @Override public int hashCode() { return key != null ? key.hashCode() : 0; } } } 
10
  • 4
    The problem is not in HashSet, is in your equals method that doesn't follow the equals contract. Commented May 2, 2013 at 17:07
  • 3
    @tieTYT: "It is reflexive: for any non-null reference value x, x.equals(x) should return true." (From: docs.oracle.com/javase/7/docs/api/java/lang/… Commented May 2, 2013 at 17:09
  • 1
    HashSet (or HashMap) assumes Object.equals is implemented as specified in Object. In particularly consistency with == which it checks first, presumably for efficiency. Commented May 2, 2013 at 17:16
  • 1
    @tieTYT That's what the source code says (UTSL). Commented May 2, 2013 at 17:20
  • 1
    Anyone care to say why I'm getting downvotes? I think it's a pretty legitimate question. It explains what it's asking for clearly, it gives sample code, it even has comments. Commented May 2, 2013 at 17:24

4 Answers 4

11

HashSet (really HashMap under the hood) has an "optimization" in that it checks for object reference equality before calling the equals() method. since you are putting the same instance in twice, they are considered equal even though the equals() method does not agree.

Relevant line from HashMap.put():

 if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { 
Sign up to request clarification or add additional context in comments.

4 Comments

@JackStraw It doesn't matter how many you assign, you'll get the same result.
@JackStraw - how does that change anything? copying a reference doesn't make it not the same instance.
@jtahlborn The code given does not use HashMap.put(). It uses HashSet.add. Refer to Javadoc here: <docs.oracle.com/javase/6/docs/api/java/util/HashSet.html#add(E)> In HashSet.add if the element is already in the HashSet, it doesn't add it again.
@ScottShipp - HashSet.add() just calls HashMap.put() internally (as i said, HashSet is backed by HashMap).
3

You're breaking the contract of Object.equals(Object), which starts with:

The equals method implements an equivalence relation on non-null object references:

  • It is reflexive: for any non-null reference value x, x.equals(x) should return true.

As your sample code says:

System.out.println(oneInstance.equals(oneInstance)); //this prints false 

It seems that HashSet<E> (entirely reasonably) assumes reflexivity, and doesn't check for equality when it finds that the exact same object is already in the set, as an optimization. Therefore it will not even call your equals method - it considers that the object is already in the set, so doesn't add a second copy.

In particular, if x.equals(x) is false, then any containment check would also be useless.

I'd implement equals like this:

public boolean equals(Object o) { // Normal reflexive optimization if (this == o) { return true; } // "Correct type" check if (!(o instanceof A)) {     return false; } A a = (A) o; // If both keys are null, the objects are equal. This is the most normal // approach; you *could* make non-identical objects with null keys non-equal, // but that would be odd. if (this.key == null && a.key == null) { return true; } // If exactly *one* key is null, the objects are not equal. if (this.key == null || a.key == null) { return false; } // By now we know that both keys are non-null; use normal equality. return this.key.equals(a.key); } 

Or if you're using Java 7:

public boolean equals(Object o) { // Normal reflexive optimization if (this == o) { return true; } // "Correct type" check if (!(o instanceof A)) {     return false; } A a = (A) o; return Objects.equals(this.key, a.key); } 

2 Comments

I hope you don't hate me for this. I said earlier that I'd mark this as the answer if you wrote it. But now I think that @jtahlborn is giving a more accurate reason for why this isn't working. You're pointing out a legitimate problem, but I don't think it's the cause of what I'm seeing.
@tieTYT: My answer effectively includes that though: "It seems that HashSet<E> (entirely reasonably) assumes reflexivity, and doesn't check for equality when it finds that the exact same object is already in the set, as an optimization." My answer shows that you're violating the contract, and then why that causes HashSet a problem.
2

Hash maps/tables work by taking an object and 'hashing' it with a 'hash' function to produce a Psuedo Random Uniformly Distributed unique id representing the object where said id can be used as a key into an indexable structure like an array. Ideally you would have a perfect hash where each unique item produces a unique indexable id.

Obviously your array is fixed in size (you can grow the array but this will dramatically affect runtime performance) so at some point, if you continue to add elements to the Hash map/table you will eventually get 2 items with the same hash code and then you will have a collision; this is where equals comes into play.

When this occurs equality is used to disambiguate WHICH key/value you are seeking by iterating through (usually by storing a LinkedList at the index position and not just the element) the available objects and checking the equals method.

So, the problem for your case is easy: If your hash implementation is wrong then HashSet (which is backed by HashMap) fails to find your object in it's table and thus never bothers to call equals (have a look at HashMap.get() to see their implementation).

Whatever you use in equals MUST be used in hashCode() if you want this to work and vice versa. If you implement equals() it's a damn good idea to implement hashCode(). If you implement hashCode() then you MUST implement equals for hashing to actually work.

3 Comments

You haven't said how the hash is invalid anywhere. It's more the equality test which seems invalid to me.
(In fact, the hash looks fine to me. It's the equals handling of null key values which is wrong.) And your description of a hash is wrong too. You describe it as a unique ID - which is absolutely not the case, and isn't feasible for this case. (Heck, a hashCode method which always returns a constant value is "correct" - just not efficient.)
@JonSkeet read carefully: It has to be a uniformly distributed id to be effective. If you return the same number every time, then you will have a collision on every add. Performance thusly becomes O(N) as it essentially is a LinkedList. As for the other point, I have corrected my above assertions. Others had mentioned the hash as being wrong so I explained why a wrong hash would appear to ignore equals. Thanks for the feedback
0

I don't know exactly why but I feel compelled to point out that when you implement equals, part of the contract for the equals method that you should uphold is that it is reflexive, meaning the same object equals itself. So your equals should return true.

My thought for an answer to your question is that the .equals() method is not getting called when you merely add the two items of the same instance to the HashSet. Probably only hashCode is called up to that point. Since hashCode returns the key it will return the same key each time and the item just gets hashed to the same place twice, leaving only one item in the set.

9 Comments

no, that is not how hashCodes are used. you can have many items in a HashSet with the same hashCode.
Once there is a hash collision, the equals() method is used.
@SotiriosDelimanolis Under normal circumstances, yes. But not in this example. You can try it youself to see.
@tieTYT I'm just trying to clarify in case there was confusion, not related to your question.
So why did I get downvoted if my answer (.equals method is not getting called) was right?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.