The only way to be sure would be to implement both and check, but my informed guess is that the dictionary will be faster, because a binary search tree has cost O(log(n)) for lookup and insertion, and I think that except under the most pessimal of situations (such as massive hash collisions) the hash table's O(1) lookup will outweigh the occasional resize.
If you take a look at the Python dictionary implementation, you'll see that:
- a dictionary starts out with 8 entries (
PyDict_MINSIZE); - a dictionary with 50,000 or fewer entries quadruples in size when it grows;
- a dictionary with more than 50,000 entries doubles in size when it grows;
- key hashes are cached in the dictionary, so they are not recomputed when the dictionary is resized.
(The "NOTES ON OPTIMIZING DICTIONARIES" are worth reading too.)
So if your dictionary has 1,000,000 entries, I believe that it will be resized eleven times (8 → 32 → 128 → 512 → 2048 → 8192 → 32768 → 131072 → 262144 → 524288 → 1048576 → 2097152) at a cost of 2,009,768 extra insertions during the resizes. This seems likely to be much less than the cost of all the rebalancing involved in 1,000,000 insertions into an AVL tree.
from collections import Counter