4

In some cases, organizations are not permitted to use or store useful keys, such as SSN numbers, phone numbers, etc.

However, these unique keys are very useful for matching data. So, theoretically, if a data provider were able to provide you with a hashed value of the SSN, and you were to store that hash and use it for matching, you would never have to use or store the SSN.

What would be an appropriate hash function for something like a SSN?

3 Answers 3

1

You need to treat the SSN exactly like a password. Hash them using a strong, slow hash algorithm such as bcrypt or PBKDF2, using a unique per-record prefix and suffix salt.

The downside of hashing SSNs is that they're predictable, and have very little entropy, making a plaintext bruteforce quite easy. If you can afford it, I'd suggest investing in hardware protection (i.e. a HSM) for this kind of thing. In fact, you should avoid identifying people by their SSN entirely.

Sign up to request clarification or add additional context in comments.

9 Comments

@HunterMcMillen The number of bits of padding doesn't have to be specific, as long as there are plenty of them before and after the data.
generally hash functions process data in blocks of n bits, if the incoming data is < n bits it gets padded in some predictable way due to the algorithm.
@HunterMcMillen Sure, but for all practical attacks the salt is simply there to prevent collisions between equal plaintexts and stop rainbow tables from being effective.
The salt here seems irrelevant due to the already unique nature of an SSN. If you are going add salt values to pad the SSN you might as well just generate some other unique id instead.
@chris Then I think the best option is to never store the SSN, and give the user a different form of unique ID.
|
0

So, theoretically, if a data provider were able to provide you with a hashed value of the SSN, and you were to store that hash and use it for matching, you would never have to use or store the SSN.

That is false; hashes by design are not unique and cannot be used to uniquely identify anything. If you must uniquely identify something, and are not allowed to use someone else's identifier, you must come up with your own identifier. That is why things like gas cards, movie rental cards, et al. come with their own unique membership identifiers.

3 Comments

If the provider hashes a number, and I hash the same number with the same algorithm, the hash value will be the same. I can then match my data keyed on my hash with their data, keyed on the same hash value.
@Chris, if the provider hashes two different numbers they can come out to the same hash value. You would treat two different SSNs as the same one.
I believe that the point of a good hashing algorithm is to reduce or eliminate the possibility of collisions, even for small inputs. Take a look at stackoverflow.com/questions/4676828/…
0

True, but anyway you can still use it to uniquely fingerprint something, that is the SSN number, relying on the second preimage resistance property of the cryptographic hash function. (as said above hashing them using a strong, slow hash algorithm, using a unique per-record prefix and suffix salt, because of the small size of the data)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.