3

Is there a way to extract existing key hashes from a dictionary, without recalculating them again?

What would be the risks for exposing them and consequntly, accessing the dict by hashes not keys?

3
  • 1
    I'm not sure what you're asking. A hash and a key are not equivalent - hashes can collide, but keys cannot. I guess it would depend on the implementation if you can go through the internals of the hash and inspect each entry, but this doesn't exist in CPython Commented Apr 1, 2016 at 19:55
  • 1
    for key in my_dict: print hash(key) I guess maybe ... Commented Apr 1, 2016 at 19:56
  • @JoranBeasley technically those hashes can be recomputed again though (a __hash__ implementation may cache the hash, but it's not guaranteed) Commented Apr 1, 2016 at 19:58

2 Answers 2

2

I don't think Python's dictionary objects have any public API that allows you to see the hashes their objects are stored with. You can't store an object directly by hash in Python code (it may be possible by calling internal C functions in CPython). There are a few good reasons that you can't add values to a dictionary by hash value, rather than by key.

The most obvious is that multiple key objects might have the same hash. If such a hash collision happens, the second value will be inserted somewhere else in the hash table. The important thing is that it won't overwrite the previous value stored under a different key that hashes the same. If you could just pass the hash and not the key too, Python wouldn't be able to tell if you were using the same key or if you were providing a new key that happened to have a colliding hash.

A secondary reason that you can't insert by hash is that it would be a security vulnerability. The performance of a hash table like Python's dictionaries is very good when there are few hash collisions. It is very bad however if every hash is the same. If you could submit data to a Python program that all hashes to the same value, you can perform a very efficient denial of service attack (the new hash randomization for strings was added in recent versions of Python to make this kind of attack harder).

Sign up to request clarification or add additional context in comments.

2 Comments

Well, you could still shoot yourself in the foot by providing a custom __hash__ method that returns the same value regardless of the contents, and simply differs in comparison results
Exactly doing that: query a dict by providing a hash -> datadump of (value or a list of values). Dict object Introspection.
2

A Python dict's keys must be hashable, i.e., implement the __hash__ special method (as well as some other methods irrelevant to your question), or be one of some predetermined built in types. So you actually can access the hash value of a key without the table, e.g., through

>>> '123'.__hash__() 163512108404620371 

or more uniformly

>>> hash('123') 163512108404620371 >>> hash(2) 2 

That being said, as noted by the comments, a hash value and a position in a table are not the same thing. In fact, as a table resizes, the hash value of a key will remain the same, but the position might change. Consequently, as:

  • the hash value is easily available to you via hash()

  • the position would expose the dictionary's internal state

  • you can "cache" hash values in your objects easily enough in the __hash__ method

there is probably no point in exposing the keys' positions.

3 Comments

You could also use the hash() built-in function instead of going to the method directly.
Thanks, @zondo - I've already updated it, but I appreciate the comment.
@zondo However, it was important for me to mention this method, as perhaps the OP's motivation was to avoid recalculating the hash value, and my point was that it could be cached within the __hash__ method.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.