1

Say I have a collection of object arrays of equal dimension, like this:

var rows = new List<object[]> { new object[] {1, "test1", "foo", 1}, new object[] {1, "test1", "foo", 2}, new object[] {2, "test1", "foo", 3}, new object[] {2, "test2", "foo", 4}, }; 

And I want to group by one or more of the "columns" -- which ones to be determined dynamically at runtime. For instance grouping by columns 1, 2 and 3 would result in three groups:

  • group 1: [1, "test1", "foo"] (includes rows 1 and 2)
  • group 2: [2, "test1", "foo"] (includes row 3)
  • group 3: [2, "test2", "foo"] (includes row 4)

Certainly I can achieve this with some kind of custom group class and by sorting and iterating. However, it seems like I should be able to do it much cleaner with Linq grouping. But my Linq-fu is failing me. Any ideas?

3 Answers 3

2

@Matthew Whited's solution is nice if you know the grouping columns up front. However, it sounds like you need to determine them at runtime. In that case, you can create an equality comparer which defines row equality for GroupBy using a configurable column set:

rows.GroupBy(row => row, new ColumnComparer(0, 1, 2)) 

The comparer checks the equality of the value of each specified column. It also combines the hash codes of each value:

public class ColumnComparer : IEqualityComparer<object[]> { private readonly IList<int> _comparedIndexes; public ColumnComparer(params int[] comparedIndexes) { _comparedIndexes = comparedIndexes.ToList(); } #region IEqualityComparer public bool Equals(object[] x, object[] y) { return ReferenceEquals(x, y) || (x != null && y != null && ColumnsEqual(x, y)); } public int GetHashCode(object[] obj) { return obj == null ? 0 : CombineColumnHashCodes(obj); } #endregion private bool ColumnsEqual(object[] x, object[] y) { return _comparedIndexes.All(index => ColumnEqual(x, y, index)); } private bool ColumnEqual(object[] x, object[] y, int index) { return Equals(x[index], y[index]); } private int CombineColumnHashCodes(object[] row) { return _comparedIndexes .Select(index => row[index]) .Aggregate(0, (hashCode, value) => hashCode ^ (value == null ? 0 : value.GetHashCode())); } } 

If this is something you will do often, you can put it behind an extension method:

public static IGrouping<object[], object[]> GroupByIndexes( this IEnumerable<object[]> source, params int[] indexes) { return source.GroupBy(row => row, new ColumnComparer(indexes)); } // Usage row.GroupByIndexes(0, 1, 2) 

Extending IEnumerable<object[]> will only work with .NET 4. You would need to extend List<object[]> directly in .NET 3.5.

Sign up to request clarification or add additional context in comments.

7 Comments

You won't want to just xor the hashcodes. If you do, you will increase the chance of collisions.
Of course! Nice elegant solution. There were a few little errors in ColumnComparer. I edited your post with the corrections.
@Matthew Whited: You are correct, that is a less-than-optimal implementation of GetHashCode. I wanted to avoid getting into that messy discussion, though, so went with the low-friction approach.
@Tim Scott: Thanks for fixing the errors I had - it was late :-) I noticed that you remove the null check in GetHashCode. I included that because ColumnComparer is a public type. If you make it private, where you can absolutely guarantee no nulls, then it is safe to remove it. In the future, though, please refrain from making stylistic edits such as adding a local variable within CombineColumnHashCodes. To me, that is superfluous and I don't want it to be mistaken for code I wrote. Thanks.
@Bryan: Yeah the null check should be there. Resharper told me it would be always false. Never seen Resharper be wrong about something like that before. @Matthew Whited: Can you suggest a more robust way to implement GetHashCode?
|
1

If your collection contains items with an indexer (Such as your object[] you could do it like this...

var byColumn = 3; var rows = new List<object[]> { new object[] {1, "test1", "foo", 1}, new object[] {1, "test1", "foo", 2}, new object[] {2, "test1", "foo", 3}, new object[] {2, "test2", "foo", 4}, }; var grouped = rows.GroupBy(k => k[byColumn]); var otherGrouped = rows.GroupBy(k => new { k1 = k[1], k2 = k[2] }); 

... If you don't like the static sets that are above you could also do something a little more interesting directly in LINQ. This would assume that your HashCodes will works for Equals evaluations. Note, you may want to just write an IEqualityComparer<T>

var cols = new[] { 1, 2}; var grouped = rows.GroupBy( row => cols.Select(col => row[col]) .Aggregate( 97654321, (a, v) => (v.GetHashCode() * 12356789) ^ a)); foreach (var keyed in grouped) { Console.WriteLine(keyed.Key); foreach (var value in keyed) Console.WriteLine("{0}|{1}|{2}|{3}", value); } 

Comments

0

Shortest solution:

 int[] columns = { 0, 1 }; var seed = new[] { rows.AsEnumerable() }.AsEnumerable(); // IEnumerable<object[]> = group, IEnumerable<group> = result var result = columns.Aggregate(seed, (groups, nCol) => groups.SelectMany(g => g.GroupBy(row => row[nCol]))); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.