3

A stupid newbie question here For a python dictionary q len(set(q.keys())) != len(q.keys()). Is that even possible?

12
  • can you post the dictionary for which this happens? Commented Mar 27, 2011 at 18:20
  • 1
    That is, in fact, not possible. Commented Mar 27, 2011 at 18:20
  • 1
    @user it's possible that you're getting overflow errors. What are the values for len(q.keys()) and len(set(q.keys()))? Commented Mar 27, 2011 at 18:22
  • 1
    Overflow shouldn't be the case. And this shouldn't happen, except if __hash__ is flawed and/or you don't have __eq__/__ne__ overloaded. Commented Mar 27, 2011 at 18:24
  • 1
    Oh sorry I think I found the problem. __ne__ wasn't overloaded. Thanks everyone for the help Commented Mar 27, 2011 at 18:32

2 Answers 2

17

This can happen if you violate a requirement of dict, and change its hash.

When an object is used in a dict, its hash value must not change, and its equality to other objects must not change. Other properties may change, as long as they don't affect how it appears to the dict.

(This does not mean that a hash value is never allowed to change. That's a common misconception. Hash values themselves may change. It's only dict which requires that key hashes be immutable, not __hash__ itself.)

The following code adds an object to a dict, then changes its hash out from under the dict. q[a] = 2 then adds a as a new key in the dict, even though it's already present; since the hash value changed, the dict doesn't find the old value. This reproduces the peculiarity you saw.

class Test(object): def __init__(self, h): self.h = h def __hash__(self): return self.h a = Test(1) q = {} q[a] = 1 a.h = 2 q[a] = 2 print q # True: print len(set(q.keys())) != len(q.keys()) 
Sign up to request clarification or add additional context in comments.

Comments

1

The underlying code for dictionaries and sets is substantially the same, so you can usually expect that len(set(d.keys()) == len(d.keys()) is an invariant.

That said, both sets and dicts depend on __eq__ and __hash__ to identify unique values and to organize them for efficient search. So, if those return inconsistent results (or violate the rule that "a==b implies hash(a)==hash(b)", then there is no way to enforce the invariant:

>>> from random import randrange >>> class A(): def __init__(self, x): self.x = x def __eq__(self, other): return bool(randrange(2)) def __hash__(self): return randrange(8) def __repr__(self): return '|%d|' % self.x >>> s = [A(i) for i in range(100)] >>> d = dict.fromkeys(s) >>> len(d.keys()) 29 >>> len(set(d.keys())) 12 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.