0

The problem that I am having is distributed over many source files and my attempts to reproduce the problem in a simple linear format have failed. Nonetheless the problem I am having is simply described.

I have a class Path for which I implement __hash__ and __eq__

I have an item of type Path in a dict as evidenced by

path in list(thedict) >> True 

I verify that path == other and hash(path) == hash(other) and id(path) == id(other) where other is an item taken straight out of list(thedict.keys()). Yet, I get the following

path in thedict: >> False 

and attempting the following results in a KeyError

thedict[path] 

So my question is, under what circumstance is this possible? I would have expected that if the path is in list(thedict) then it must be in thedict.keys() and hence we must be able to write thedict[path]. What is wrong with this assumption?

Further Info

If it helps, the classes in question are listed below. It is at the level of SpecificationPath that the above issue is observed

class Path: pass @dataclass class ConfigurationPath(Path): configurationName: str = None def __repr__(self) -> str: return self.configurationName def __hash__(self): return hash(self.configurationName) def __eq__(self, other): if not isinstance(other, ConfigurationPath): return False return self.configurationName == other.configurationName #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @dataclass class SpecificationPath(Path): configurationPath: ConfigurationPath specificationName: str = None def __repr__(self) -> str: return f"{self.configurationPath}.{self.specificationName or ''}" def __hash__(self): return hash((self.configurationPath, self.specificationName)) def __eq__(self, other): if not isinstance(other, SpecificationPath): return False if self.configurationPath != other.configurationPath: return False if self.specificationName != other.specificationName: return False return True 

In response to a comment below, here is the output in the (Spyder) debug terminal, where pf is an object containing the paths dictionary using paths as keys and the object in question (self) has the path.

In : others = list(pf.paths.keys()) In : other = others[1] In : self.path is other Out[1]: True In : self.path in pf.paths Out[1]: False 
19
  • 1
    Is Path mutable? Commented Dec 11, 2020 at 3:29
  • 2
    Minor nitpick: When your isinstance check fails, you should be returning NotImplemented, not False; that will allow the right-side class (if it's not the same as the left) to attempt the comparison (if both return NotImplemented, Python converts that to False for you). Commented Dec 11, 2020 at 3:43
  • 1
    @tdelaney not really since I have several items in that dictionary. These are not class variables as I am using the dataclass mechanism, which converts them to instance variables Commented Dec 11, 2020 at 3:46
  • 2
    Side-note: Is there a reason you're manually implementing __eq__ and __hash__? @dataclass would generate the __eq__ for you with no changes; make it @dataclass(frozen=True) would generate the __hash__ for you too (@dataclass(unsafe_hash=True) would do it too, but leave your instances mutable, which hashable instances should not be). Commented Dec 11, 2020 at 3:50
  • 4
    "The paths do want to be mutable" - then they shouldn't be hashable. Mutation breaks hashes. Commented Dec 11, 2020 at 3:55

1 Answer 1

6

Per your comment:

The paths do want to be mutable as I am setting specificationName to None in places (leaving them Anonymous to be filled out later). Further, it is on an instance where the specificationName is None that this occurs, however in my simple test scripts I can get away with setting this to None without an error. Could mutability of the hashable instances cause an error such as this?

There's your problem. You're putting these objects in a dict immediately after creation, while specificationName is None, so it's stored in the dict with a hashcode based on None (that hashcode is cached in the dict itself, and using that hashcode is the only way to look up the object in the future). If you subsequently change it to anything that produces a different hash value (read almost everything else), the object is stored in a bucket corresponding to the old hash code, but using it for lookups computes the new hash code and cannot find that bucket.

If specificationName must be mutable, then it cannot be part of the hash, it's as simple as that. This will potentially increase collisions, but it can't be helped; a mutable field can't be part of the hash without triggering this exact problem.

Sign up to request clarification or add additional context in comments.

3 Comments

Also, avoid implementing your own __hash__ and __eq__. Let dataclass do it for you. There are so many pitfalls...
@BrianMcCutchon: The OP got those mostly right (aside from not using NotImplemented correctly). But yeah, it's much better to let dataclass do it for you. Although they don't recommend it, they do note that you could even omit specific fields from the generated hash by initializing the attribute definition to a field(hash=False) (might require you to use @dataclass(unsafe_hash=True) instead of @dataclass(frozen=True) to support the OP's case of a mutable field that shouldn't be part of the hash), which would allow dataclass to generate __hash__ w/o use of specificationName.
In case anyone wants it, here's a minimum example of this in action: gist.github.com/Multihuntr/7756efa077b7837e494098adf0053dac

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.