15

In Tim Peter's answer to "Are there any reasons not to use an ordered dictionary", he says

OrderedDict is a subclass of dict.

It's not a lot slower, but at least doubles the memory over using a plain dict.

Now, while going through a particular question, I tried some sample checks using ipython and both of them contradict the earlier reasoning:

  1. both dict and OrderedDict are of same size
  2. operating on an OrderedDict takes easily around 7-8 times more time than operating on a dict (Hence a lot slower)

Can someone explain to me where I'm going wrong in my reasoning?


Create a large Dict and OrderedDict and compare sizes:

import sys import random from collections import OrderedDict test_dict = {} test_ordered_dict = OrderedDict() for key in range(10000): test_dict[key] = random.random() test_ordered_dict[key] = random.random() sys.getsizeof(test_dict) 786712 sys.getsizeof(test_ordered_dict) 786712 

Check time taken for the insertions using %timeit:

import sys import random from collections import OrderedDict def operate_on_dict(r): test_dict = {} for key in range(r): test_dict[key] = random.random() def operate_on_ordered_dict(r): test_ordered_dict = OrderedDict() for key in range(r): test_ordered_dict[key] = random.random() %timeit for x in range(100): operate_on_ordered_dict(100) 100 loops, best of 3: 9.24 ms per loop %timeit for x in range(100): operate_on_dict(100) 1000 loops, best of 3: 1.23 ms per loop 

1 Answer 1

10

I think the problem with size is due to the fact that there's no __sizeof__ method defined in Python 2.X implementation of OrderedDict, so it simply falls back to dict's __sizeof__ method.

To prove this here I've created a class A here which extends list and also added an additional method foo to check if that affects the size.

class A(list): def __getitem__(self, k): return list.__getitem__(self, k) def foo(self): print 'abcde' >>> a = A(range(1000)) >>> b = list(range(1000)) 

But still same size is returned by sys.getsizeof:

>>> sys.getsizeof(a), sys.getsizeof(b) (9120, 9120) 

Of course A is going to be slow because its methods are running in Python while list's method will run in pure C.

>>> %%timeit ... for _ in xrange(1000): ... a[_] ... 1000 loops, best of 3: 449 µs per loop >>> %%timeit for _ in xrange(1000): b[_] ... 10000 loops, best of 3: 52 µs per loop 

And this seems to be fixed in Python 3 where there's a well defined __sizeof__ method now:

def __sizeof__(self): sizeof = _sys.getsizeof n = len(self) + 1 # number of links including root size = sizeof(self.__dict__) # instance dictionary size += sizeof(self.__map) * 2 # internal dict and inherited dict size += sizeof(self.__hardroot) * n # link objects size += sizeof(self.__root) * n # proxy objects return size 
Sign up to request clarification or add additional context in comments.

5 Comments

+1 I think you are right about the missing __sizeof__, but your answer doesn't explain Tim's claim that It's not a *lot* slower. FWIW, I get the same 7-8x factor of running time whether I choose a range of 100 or 10000 keys.
@mu無 May be that's what Tim meant by lot there.
@mu無 Better post a comment on his answer.
> also added an additional method foo to check if that affects the size. Adding method does not increase size of instance, as methods are attributes on the class. A more accurate explanation would be by adding a new attribute "foo" in the subclass's __init__

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.