Skip to main content
Revised and Expanded
Source Link
S.Lott
  • 392.9k
  • 83
  • 520
  • 791

"I'm more interested in the way Hash Tables look up the key and how the key is generated."

  1. Hashing transforms ana key object to a number. This is called "hashing" -- it makes a hash out of the object. See Hash Function. Summing the bytes of a string, for example, is a standard hash technique. You compute the sum modulo 232 to keep the hash to a manageable size. Hash always gives the same answer. This is O(1).

  2. The number isgives you a "slot" in the HashTable. Given an arbitrary key object, the hash value computes a hash value. The hash value then gives you the slot in table. Usually mod( hash, table size ). This is O(1), also.

That's itthe general solution. Two numeric calculations and you've gone from arbitrary object as key to arbitrary object as value. Few things can be as fast.

The transformation from object to numberhash value happens in one of threethese common ways.

  1. If it's a "primitive" object of 4 bytes, then the object's native value is a number.

  2. The object's address is 4 bytes, then the object's address iscan be used as a numberhash value.

  3. A simple hash algorithm hash function (MD5, SHA1, whatever) sumsaccumulates the bytes of the object to create a 4-byte number. The advanced hashes aren't simple sums of bytes, a simple sum doesn't reflect all the original input bits fairly enough.

The slot in the hash table is mod( number, size of table ).

If that slot has the desired value, you're done. If that's not the desired value, you need to look somewhere else. RandomThere are several popular probing works wellalgorithms to look for a free spot in the table. Linear is a simple search for the next free spot. Quadratic is a non-linear hopping around looking for a free slot. A random number generator (with a fixed seed) can be used to generate a series of probes that will spread data evenly but arbitrarily.

The probing algorithms are not O(1). If the table's big enough, the odds of collision are low, and probes don't matter. If the table's too small, then collisions happen and probing happens. At that point, it becomes a matter of "tuning and tweaking" to balance probing and table size to optimize performance. Usually we just make the table bigger.

See Hash Table.

"I'm more interested in the way Hash Tables look up the key and how the key is generated."

  1. Hashing transforms an object to a number.

  2. The number is a "slot" in the HashTable.

That's it.

The transformation from object to number happens in one of three ways.

  1. If it's a "primitive" object of 4 bytes, then the object's native value is a number.

  2. The object's address is 4 bytes, then the object's address is a number.

  3. A simple hash algorithm (MD5, SHA1, whatever) sums the bytes of the object to create a 4-byte number.

The slot in the hash table is mod( number, size of table ).

If that slot has the desired value, you're done. If that's not the desired value, you need to look somewhere else. Random probing works well.

See Hash Table.

"I'm more interested in the way Hash Tables look up the key and how the key is generated."

  1. Hashing transforms a key object to a number. This is called "hashing" -- it makes a hash out of the object. See Hash Function. Summing the bytes of a string, for example, is a standard hash technique. You compute the sum modulo 232 to keep the hash to a manageable size. Hash always gives the same answer. This is O(1).

  2. The number gives you a "slot" in the HashTable. Given an arbitrary key object, the hash value computes a hash value. The hash value then gives you the slot in table. Usually mod( hash, table size ). This is O(1), also.

That's the general solution. Two numeric calculations and you've gone from arbitrary object as key to arbitrary object as value. Few things can be as fast.

The transformation from object to hash value happens in one of these common ways.

  1. If it's a "primitive" object of 4 bytes, then the object's native value is a number.

  2. The object's address is 4 bytes, then the object's address can be used as a hash value.

  3. A simple hash function (MD5, SHA1, whatever) accumulates the bytes of the object to create a 4-byte number. The advanced hashes aren't simple sums of bytes, a simple sum doesn't reflect all the original input bits fairly enough.

The slot in the hash table is mod( number, size of table ).

If that slot has the desired value, you're done. If that's not the desired value, you need to look somewhere else. There are several popular probing algorithms to look for a free spot in the table. Linear is a simple search for the next free spot. Quadratic is a non-linear hopping around looking for a free slot. A random number generator (with a fixed seed) can be used to generate a series of probes that will spread data evenly but arbitrarily.

The probing algorithms are not O(1). If the table's big enough, the odds of collision are low, and probes don't matter. If the table's too small, then collisions happen and probing happens. At that point, it becomes a matter of "tuning and tweaking" to balance probing and table size to optimize performance. Usually we just make the table bigger.

See Hash Table.

Source Link
S.Lott
  • 392.9k
  • 83
  • 520
  • 791

"I'm more interested in the way Hash Tables look up the key and how the key is generated."

  1. Hashing transforms an object to a number.

  2. The number is a "slot" in the HashTable.

That's it.

The transformation from object to number happens in one of three ways.

  1. If it's a "primitive" object of 4 bytes, then the object's native value is a number.

  2. The object's address is 4 bytes, then the object's address is a number.

  3. A simple hash algorithm (MD5, SHA1, whatever) sums the bytes of the object to create a 4-byte number.

The slot in the hash table is mod( number, size of table ).

If that slot has the desired value, you're done. If that's not the desired value, you need to look somewhere else. Random probing works well.

See Hash Table.