When processing data sets in batches I usually can think of the following three implementations.
Which one do you consider better than the other and why?
Notes:
- The implementation is in C# but the question is about the algorithm.
- The
GetBatchedDataworks with a fixed batch size - The
Processmethod can take an empty batch as argument, which means nothing has to be processed. - In case of
EmptyBatch,Itemsis empty andHasMoreDatareturnstrue
Option A
As @Flater pointed out, this approach has a bug!
var batchIndex = 0; var batch = GetBatchedData(batchIndex++); while (batch.HasMoreData) { Process(batch.Items); var batch = GetBatchedData(batchIndex++); } Option B
var batchIndex = 0; var batch = GetBatchedData(batchIndex++); do { Process(batch.Items); batch = GetBatchedData(batchIndex++); } while (batch.HasMoreData) Option C
var batchIndex = 0; var batch = new EmptyBatch(); do { Process(batch.Items); batch = GetBatchedData(batchIndex++); } while (batch.HasMoreData) Additional approaches suggested only in comments but not in responses to the question:
Suggestion A
The GetBatchedData method returns IEnumerable<Data[]>, where every Data[] batch is yield return-ed
foreach (var batch in GetBatchedData()) { Process(batch) }