Implementing Equals and GetHashCode - an easier way

Question

I have a tree of objects (DTOs), where one object references other objects and so on:

class Person { public int Id { get; } public Address Address { get; } // Several other properties } public Address { public int Id { get; } public Location Location { get; } // Several other properties }

These objects can be quite complex and have many other properties.

In my app, a Person with same Id could be in two storages, local storage in the app and coming from backend. I need to merge the online Person with local Person in a specific way, so for this I need to first know if the online Person is same with the one stored locally (in other words if local Person hasn't been updated by the app).

In order to use LINQ's Except, I know I need to implement Equatable<T> and the usual way I've seen it is like this:

class Person : IEquatable<Person> { public int Id { get; } public Address Address { get; } public override bool Equals(object obj) { return Equals(obj as Person); } public bool Equals(Person other) { return other != null && Id == other.Id && Address.Equals(other.Address); } public override int GetHashCode() { var hashCode = -306707981; hashCode = hashCode * -1521134295 + Id.GetHashCode(); hashCode = hashCode * -1521134295 + (Address != null ? Address.GetHashCode() : 0); return hashCode; }

To me this sounds complicated and hard to maintain, it's easy to forget to update Equals and GetHashCode when properties change. Depending on the objects, it can also be a bit computational expensive.

Wouldn't the following be a simpler and much effective way of implementing Equals and GethashCode?

class Person : IEquatable<Person> { public int Id { get; } public Address Address { get; private set; } public DateTime UpdatedAt { get; private set; } public void SetAdress(Address address) { Address = address; UpdatedAt = DateTime.Now; } public override bool Equals(object obj) { return Equals(obj as Person); } public bool Equals(Person other) { return other != null && Id == other.Id && UpdatedAt.Ticks == other.UpdatedAt.Ticks; } public override int GetHashCode() { var hashCode = -306707981; hashCode = hashCode * -1521134295 + Id.GetHashCode(); hashCode = hashCode * -1521134295 + UpdatedAt.Ticks.GetHashCode(); return hashCode; } }

My idea is whenever the object changes, there's a timestamp. This timestamp gets saved along with the object. I am thinking to use this field as a concurrency token in storage too.

Since resolution of DateTime could be an issue, instead of using time, I'm thinking a Guid is also a good option instead of DateTime. There wouldn't be too many objects, so the uniqueness of Guid shouldn't be an issue.

Do you see a problem with this approach?

Like I said above, I think it would be much easier to implement and faster to run than having Equals and GetHashCode go over all the properties.

Update: The more I think about it, I tend to feel that having Equals and GetHashCode implemented on the class is not a good approach. I think it would be better to implement a specialized IEqualityComparer<Person> which compares Persons in a specific way and pass it to LINQ's methods.

The reason for this is because, like in the comments and answer, a Person could be used in different ways.

A GUID or a TimeStamp knows nothing about the properties you have and their values. Neither does the Id field you already have. How would those be better than this - supposing you can guarantee them to be unique? — ZorgoZ
– ZorgoZ, Commented Jun 16, 2018 at 7:07
Person has the property Id. Isn't that enough by itself to distinguish different instances? Why would you have more than one instance of that class with the same Id? -- if you want to track changes in the data of an instance, then there are other ways to do it. For example by using property change. — Corak
– Corak, Commented Jun 16, 2018 at 7:10
On other hand, you can have a private hash field that is calculated only on entity update, and thus you will not need to recalculate each time. — ZorgoZ
– ZorgoZ, Commented Jun 16, 2018 at 7:15
IEquatable work in two steps. 1) Create a hash for each item and creates a binary tree for the hash values. 2) Then when two values have equal hash it does 2nd comparison using Equal method. You can have the hash just return zero for all values and then make the Equal method the only criteria for comparison. You can make the hash on return the ID and then let the Equal check for address. — jdweng
– jdweng, Commented Jun 16, 2018 at 8:17
@DonBox - so you have two instances of a class (with hopefully the same Id) and you want to decide which one is "better" (more up-to-date). -- well, it ain't that easy, or is it? Imagine a Person with only the properties Name and Surname. At point A in time server and app are equal. At point B app changes (only) the name. At point C server changes (only) the surname. At point D, you want to merge. Is the server instance "better" than the app instance? Do you want to lose any one of the changes? — Corak
– Corak, Commented Jun 16, 2018 at 8:37

Marc Gravell · Accepted Answer · 2018-06-16 09:08:41Z

Really lazy version for implementing GetHashCode / Equals using value-tuples (which don't allocate for this):

class Person : IEquatable<Person> { public int Id { get; } public Address Address { get; } public Person(int id, Address address) => (Id, Address) = (id, address); public override bool Equals(object obj) => Equals(obj as Person); public bool Equals(Person other) => other != null && (Id, Address).Equals((other.Id,other.Address)); public override int GetHashCode() => (Id, Address).GetHashCode(); }

@DonBox I simply mean that the "value tuple" code doesn't involve any object allocations, so the overhead is minimal
Thanks for your answer. I already mentioned I am not looking to know how to implement IEquatable but how to implement in a specific way.

George Helyar · Accepted Answer · 2018-06-16 07:32:09Z

1

This would give you false negative equality if two objects have the same properties but were created at different times, and it would give you false positive equality if two objects were created with different properties but right after each other (the clock is not that accurate).

For LINQ Except, it's really GetHashCode you need to implement, and this should be using the hash code of all of the properties.

Ideally, they should also be immutable (remove the private setter) so that one object has the same hash code for its whole life.

Your GetHashCode should also be unchecked.

Alternatively, you could use Except with a custom comparer.

edited Jun 16, 2018 at 7:32

answered Jun 16, 2018 at 7:11

George Helyar

5,5382 gold badges25 silver badges21 bronze badges

5 Comments

Don Box Over a year ago

I think "equality" can vary depending on what kind of equality you need. If you're looking from a general \ abstract point of view, then yes, you're correct, if two Person objects are created at different times, but they have same properties, they would be different. But that's not possible in my case, because of the way UpdatedAt works Please take a look at the question, I updated it to be more clear and precise.

Don Box Over a year ago

The more I think about it, I tend to feel that having Equals and GetHashCode implemented on the class is not a good approach. I feel like implementing an IEqualityComparer<Person> and pass it to LINQ's methods is much better. Not sure...

George Helyar Over a year ago

How you want to define equality determines what your implementation should be. If you don't care about the property values, just whether is has been updated, then rather than updating a GUID every time you update the object, you could just make it immutable and use the default reference equality. If you do care about the property values, you should only use these to compare equality. Most of the time Equals could just use GetHashCode if you really want to be lazy though.

Don Box Over a year ago

Good point on immutability, but it doesn't work in my scenario. I mentioned I have a functionality where I need to merge two sets of Person objects, stored in different storages. Also, I already mentioned in the question that the reason I don't want to use properties is because it could get computational heavy since a Person has complex properties which in turn have complex properties and so on. I already kindly mentioned please read my question again since I updated it.

Don Box Over a year ago

Thank you for your help. Again, you're correct if you're looking at the problem in an abstract way, without having an idea of what the implementation is actually doing. But please read the question

ZorgoZ · Accepted Answer · 2018-06-17 19:57:57Z

Following is a LinqPad sketch, you could start from. It has all the tools you could use to tailor it to your needs. Of course, this is just a concept, and not all aspects are completely elaborated.

As you can see, there is an Include attribute that can be applied to the backing fields you want to include in the hash.

void Main() { var o1 = new C { Interesting = "Whatever", NotSoInterresting = "Blah.." }; var o2 = new C { Interesting = "Whatever", NotSoInterresting = "Blah-blah.." }; (o1 == o2).Dump("o1 == o2"); // False (o2 == o1).Dump("o2 == o1"); // False var o3 = o1.Clone(); (o3 == o1).Dump("o3 == o1"); // True (object.ReferenceEquals(o1, o3)).Dump("R(o3) == R(o2)"); // False o3.NotSoInterresting = "Changed!"; (o1 == o3).Dump("o1 == C(o3)"); // True o3.Interesting = "Changed!"; (o1 == o3).Dump("o1 == C(o3)"); // False } [AttributeUsage(AttributeTargets.Field)] public class IncludeAttribute : Attribute { } public static class ObjectExtensions { public static int GetHash(this object obj) => obj?.GetHashCode() ?? 1; public static int CalculateHashFromFields(this object obj) { var fields = obj.GetType() .GetFields(BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.DeclaredOnly /*or not*/) .Where(f => f.CustomAttributes.Any(x => x.AttributeType.Equals(typeof(IncludeAttribute)))); var result = 1; unchecked { foreach(var f in fields) result *= f.GetValue(obj).GetHash(); } return result; } } public partial class C { [Include] private int id; public int Id { get => id; private set { id = value; UpdateHash(); } } [Include] private string interesting; public string Interesting { get => interesting; set { interesting = value; UpdateHash(); } } public string NotSoInterresting { get; set; } } public partial class C: IEquatable<C> { public C Clone() => new C { Id = this.Id, Interesting = this.Interesting, NotSoInterresting = this.NotSoInterresting }; private static int _id = 1; // Some persistence is required instead public C() { Id = _id++; } private int hash; private void UpdateHash() => hash = this.CalculateHashFromFields(); public override bool Equals(object obj) { return Equals(obj as C); } public bool Equals(C other) => this.hash == other.hash; public override int GetHashCode() => hash; public static bool operator ==(C obj1, C obj2) => obj1.Equals(obj2); public static bool operator !=(C obj1, C obj2) => !obj1.Equals(obj2); }

[Update 18.06.17]

Updated version:

void Main() { var o1 = new C { Interesting = "Whatever", NotSoInterresting = "Blah.." }; var o2 = new C { Interesting = "Whatever", NotSoInterresting = "Blah-blah.." }; (o1 == o2).Dump("o1 == o2"); // False (o2 == o1).Dump("o2 == o1"); // False var o3 = o1.Clone(); (o3 == o1).Dump("o3 == o1"); // True (object.ReferenceEquals(o1, o3)).Dump("R(o3) == R(o2)"); // False o3.NotSoInterresting = "Changed!"; (o1 == o3).Dump("o1 == C(o3)"); // True o3.Interesting = "Changed!"; (o1 == o3).Dump("o1 == C(o3)"); // False C o4 = null; (null == o4).Dump("o4 == null"); // True } [AttributeUsage(AttributeTargets.Field)] public class IncludeAttribute : Attribute { } public static class ObjectExtensions { public static int GetHash(this object obj) => obj?.GetHashCode() ?? 1; } public abstract class EquatableBase : IEquatable<EquatableBase> { private static FieldInfo[] fields = null; private void PrepareFields() { fields = this.GetType() .GetFields(BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.DeclaredOnly /*or not*/) .Where(f => f.CustomAttributes.Any(x => x.AttributeType.Equals(typeof(IncludeAttribute)))) .ToArray(); } private int CalculateHashFromProperties() { if (fields == null) PrepareFields(); var result = 1; unchecked { foreach (var f in fields) result ^= f.GetValue(this).GetHash(); } return result; } private bool CheckDeepEqualityTo(EquatableBase other) { if (ReferenceEquals(other, null) || other.GetType() != GetType()) return false; if (fields == null) PrepareFields(); var result = true; for(int i = 0; i < fields.Length && result; i++) { var field = fields[i]; result &= field.GetValue(this).Equals(field.GetValue(other)); } return result; } private int hash; protected int UpdateHash() => hash = this.CalculateHashFromProperties(); protected void InvalidateHash() => hash = 0; public override bool Equals(object obj) => Equals(obj as EquatableBase); public bool Equals(EquatableBase other) => object.ReferenceEquals(this, other) || this.CheckDeepEqualityTo(other); public override int GetHashCode() => hash == 0 ? UpdateHash() : hash; public static bool operator ==(EquatableBase obj1, EquatableBase obj2) => ReferenceEquals(obj1, obj2) || obj1?.CheckDeepEqualityTo(obj2) == true; public static bool operator !=(EquatableBase obj1, EquatableBase obj2) => !(obj1 == obj2); } public partial class C: EquatableBase { private static int _id = 1; // Some persistence is required instead public C() { Id = _id++; } public C Clone() => new C { Id = this.Id, Interesting = this.Interesting, NotSoInterresting = this.NotSoInterresting }; [Include] private int id; public int Id { get => id; private set { id = value; InvalidateHash(); } } [Include] private string interesting; public string Interesting { get => interesting; set { interesting = value; InvalidateHash(); } } public string NotSoInterresting { get; set; } }

One still can't get rid of calling something in the setter (and there certainly is still place for optimization), but these are the improvements so har:

Reusable base class instead of partial
The fields of interest are cached per type
Hash is recalculated only at first request after it was invalidated, and invalidation is cheep
Deep equality check based on the fields of interest, instead of just comparing the hashes

... And then you pay the cost of the reflection every time you modify a property (assuming you remember to add the UpdateHash call into the setter). Depending on usage, this could be much worse. If anything, build and cache a delegate/expression tree once per type and use that inside the CalculateHashFromFields method... But still I don't know if I really like this approach. That code could be in a custom, generic, equality comparer. Then it's be easy to cache the delegate per type, still querying your attributes and only invoked when necessary (when you need equality checks, not always).
True. I never said it is a complete solution. Caching part of the reflection is one possible improvement. On the other hand, the "cost" is relative: if the instances are far less often updated than compared you can still gain this way. There could be another possibility to intercept somehow the spot where updating the instance is finished, and calculate the hash only then.

Collectives™ on Stack Overflow

Implementing Equals and GetHashCode - an easier way

3 Answers 3

3 Comments

5 Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

5 Comments

2 Comments

Related