3

I have a LINQ expression that groups customers from an Azure Table Storage by partition.

Because Azure only supports batch operations with max 100 entities at a time (and entities in a batch much have the same PartitionKey), I need each group to contain a maximum 100 entities.

//How to complete this LINQ expression var groups = customers.GroupBy(c => c.PartitionKey)....; //Do some Azure table storage magic in parallel Parallel.ForEach(groups , customersInGroup => {...}); 

How do I complete my LINQ expression, so each group contains max 100 customers? That is... if the customers collection eg. has 142 customers with the same PartitionKey, i want to create two groups... one groups with 100 customers and one with 42 customers.

4 Answers 4

11

For LINQ to Objects:

yourCollection .Select((v, i) => new {Value = v, Index = i}) .GroupBy(x => x.Index / 100) 

Not sure if this works with Azure though...

Sign up to request clarification or add additional context in comments.

5 Comments

Neat - for some reason I hadn't thought of this :)
This works if I have a collection that I want to split in batches... right? But I have a collection of collections... and if one of these inner collections has more than 100 objects, I want one more "outer collection of collections". I know it might not be clear in my question. That's why i gave a practical example using Azure. But this is still a nice anwser +1.
@Thomas Jesperen: Yes, you can use this code on each of your groups. groups.Select(customersInGroup => customersInGroup.Select(...).GroupBy(...))
This solution is elegant and perfect for dividing objects into batches. Thanks!
@JHubbard80 The value of x.Index / 100 is 0 for indexes between 0 and 99, then 1 for indexes 100 to 199, etc... Note that it's integer division so the result of the division is truncated to an integer.
5

There's nothing within "normal" LINQ to do this directly, but MoreLINQ has a Batch method which you may find useful:

public static IEnumerable<TResult> Batch<TSource, TResult> (this IEnumerable<TSource> source, int size, Func<IEnumerable<TSource>, TResult> resultSelector) public static IEnumerable<IEnumerable<TSource>> Batch<TSource> (this IEnumerable<TSource> source, int size) 

Note that in your case you'd probably want something like:

var groups = customers.GroupBy(c => c.PartitionKey).Batch(100, p => p.ToList()); 

so that the returned results are materialized immediately.

Of course, this is assuming you're using LINQ to Objects - if you're trying to partition via another LINQ provider, I'm not sure how you'd go about it.

2 Comments

+1 for the mention of MoreLINQ - this is something that could be useful for me. Thanks :)
Thanks... this did almost what I wanted. The Batch() did return an IEnumerable<IEnumerable<IGrouping<string, Posting>>> compared to the original IEnumerable<IGrouping<string, Posting>>. I had to do a nested loop but it works.
1

This sounds like a job for .Skip and .Take, something like the following:

result = collection.Skip(100 * i).Take(100); 

Where i is the page or group number you want to fetch.

Comments

0

This is my test applying "into" and "take" to group result:

 static void Main(string[] args) { int[] numbers = new int[] { 1,2,3,4,5,6,7,8,9,0 }; var result = from n in numbers group n by n%2 into group_numbers select new { short_group = group_numbers.Take(3) }; foreach(var v in result) { foreach (var v1 in v.short_group) { Console.WriteLine(v1.ToString()); } Console.WriteLine(); } } 

Output:

 1 3 5 2 4 6 

1 Comment

Grouping by the index modded by a number only restricts the number of groups not the size. Combined with the take like above causes some data loss in the result, perhaps an approach more like paging would work?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.