Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

6
  • What Database? Do you want to do the hashing in the DB or in your code? Commented Feb 17, 2011 at 11:01
  • 1
    Hash is never guaranteed to be unique so syncing only by a (short) hash is dangerous. And you speak of columns but you tagged your question with c# and .Net. Are you working in a database or do you mean the fields/properties of an object ? Commented Feb 17, 2011 at 11:04
  • @SemVanmeenen: What about a CRC check? I used to use this in a C program to check validity of files. Commented Feb 17, 2011 at 11:43
  • @Jon Just read the edit of your question. Hashing is usually used to speed up the process. First, compare the hashes, if those are equal, compare the values on which the hash is based and only if those are equal then you sync. If you really want to do it only on a hash basis, you'll have to choose between length of the hash and the chance on equal hashes. Wikipedia has here a list of hash functions. Generally speaking, how longer the hash, how better. I wouldn't recommend it however, I wouldn't feel safe myself to implement it that way. Commented Feb 17, 2011 at 12:24
  • 1
    The chances that multiple instances of the putatively "same" address will be slightly different from one another are very high, in my experience, at least with U.S. addresses. "Road" and "Avenue" can be abbreviated "Rd" or "Ave" or spelled out. "Apt 3B" might be "Apt. #3B". Etc. Joining on addresses (or on a hashed representation of address) is notoriously difficult. There are "address sanitation" measures that one can take to regularize the address format by reducing variants. A hash on santized addresses would work much more reliably than a hash on raw addresses. Commented Feb 17, 2011 at 13:12