When processing data sets in batches I usually can think of the following three implementations.
Which one do you consider better than the other and why?
Notes:
- The implementation is in C# but the question is about the algorithm.
- The `GetBatchedData` works with a fixed batch size
- The `Process` method can take an empty batch as argument, which means nothing has to be
processed.
- In case of `EmptyBatch`, `Items` is empty and `HasMoreData` returns `true`
**Option A**
As @Flater pointed out, this approach has a bug!
var batchIndex = 0;
var batch = GetBatchedData(batchIndex++);
while (batch.HasMoreData)
{
Process(batch.Items);
batch = GetBatchedData(batchIndex++);
}
**Option B**
var batchIndex = 0;
var batch = GetBatchedData(batchIndex++);
do
{
Process(batch.Items);
batch = GetBatchedData(batchIndex++);
} while (batch.HasMoreData)
**Option C**
var batchIndex = 0;
var batch = new EmptyBatch();
do
{
Process(batch.Items);
batch = GetBatchedData(batchIndex++);
} while (batch.HasMoreData)
Additional approaches suggested only in comments but not in responses to the question:
**Suggestion A**
The `GetBatchedData` method returns `IEnumerable<Data[]>`, where every `Data[]` batch is `yield return`-ed
foreach (var batch in GetBatchedData())
{
Process(batch)
}