0

I have a method in which I'm retrieving a list of deployments. For each deployment I want to retrieve an associated release. Because all calls are made to an external API, I now have a foreach-loop in which those calls are made.

public static async Task<List<Deployment>> GetDeployments() { try { var depjson = await GetJson($"{BASEURL}release/deployments?deploymentStatus=succeeded&definitionId=2&definitionEnvironmentId=5&minStartedTime={MinDateTime}"); var deployments = (JsonConvert.DeserializeObject<DeploymentWrapper>(depjson))?.Value?.OrderByDescending(x => x.DeployedOn)?.ToList(); foreach (var deployment in deployments) { var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}"); deployment.Release = JsonConvert.DeserializeObject<Release>(reljson); } return deployments; } catch (Exception) { throw; } } 

This all works perfectly fine. However, I do not like the await in the foreach-loop at all. I also believe this is not considered good practice. I just don't see how to refactor this so the calls are made parallel, because the result of each call is used to set a property of the deployment.

I would appreciate any suggestions on how to make this method faster and, whenever possible, avoid the await-ing in the foreach-loop.

4
  • 1
    You could use Parallel.Foreach() or use PLinq or create a task foreach GetJson() and wait them all see: Use Task.WaitAll() to handle awaited tasks? Commented Sep 25, 2018 at 9:09
  • 2
    Or create a List<Task<...>>, add each call to the api then await Task.WhenAll(tasks) gigi.nullneuron.net/gigilabs/avoid-await-in-foreach Commented Sep 25, 2018 at 9:12
  • This is a job for TPL DataFlow, yehaa Commented Sep 25, 2018 at 9:16
  • @J.vanLangen Thanks, although I have read that Parallel.ForEach is recommended CPU intensive tasks and not as much for IO related tasks. As @Ric mentioned, I could indeed put each call to the API in a List<Task<..>> and then map the correct Release by id later. I will look into the TPL DataFlow @Saruman mentioned as well. Thanks! Commented Sep 25, 2018 at 9:25

3 Answers 3

1

There is nothing wrong with what you are doing now. But there is a way to call all tasks at once instead of waiting for a single task, then processing it and then waiting for another one.

This is how you can turn this:

wait for one -> process -> wait for one -> process ...

into

wait for all -> process -> done

Convert this:

foreach (var deployment in deployments) { var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}"); deployment.Release = JsonConvert.DeserializeObject<Release>(reljson); } 

To:

var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}")); var reljsons = await Task.WhenAll(deplTasks); for(var index = 0; index < deployments.Count; index++) { deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]); } 

First you take a list of unfinished tasks. Then you await it and you get a collection of results (reljson's). Then you have to deserialize them and assign to Release.

By using await Task.WhenAll() you wait for all the tasks at the same time, so you should see a performance boost from that.

Let me know if there are typos, I didn't compile this code.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! I went with this approach with some minor tweaks on the result. Instead of looping through all Deployments, I created a list of Releases and then mapped the right Release with the right Deployment by its Id.
@ZiNNED I'm glad it worked. I'm not proud of this for loop myself, so I'm glad you figured out a nicer way.
1

Fcin suggested to start all Tasks, await for them all to finish and then start deserializing the fetched data.

However, if the first Task is already finished, but the second task not, and internally the second task is awaiting, the first task could already start deserializing. This would shorten the time that your process is idly waiting.

So instead of:

var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}")); var reljsons = await Task.WhenAll(deplTasks); for(var index = 0; index < deployments.Count; index++) { deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]); } 

I'd suggest the following slight change:

// async fetch the Release data of Deployment: private async Task<Release> FetchReleaseDataAsync(Deployment deployment) { var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}"); return JsonConvert.DeserializeObject<Release>(reljson); } // async fill the Release data of Deployment: private async Task FillReleaseDataAsync(Deployment deployment) { deployment.Release = await FetchReleaseDataAsync(deployment); } 

Then your procedure is similar to the solution that Fcin suggested:

IEnumerable<Task> tasksFillDeploymentWithReleaseData = deployments. .Select(deployment => FillReleaseDataAsync(deployment) .ToList(); await Task.WhenAll(tasksFillDeploymentWithReleaseData); 

Now if the first task has to wait while fetching the release data, the 2nd task begins and the third etc. If the first task already finished fetching the release data, but the other tasks are awaiting for their release data, the first task starts already deserializing it and assigns the result to deployment.Release, after which the first task is complete.

If for instance the 7th task got its data, but the 2nd task is still waiting, the 7th task can deserialize and assign the data to deployment.Release. Task 7 is completed.

This continues until all tasks are completed. Using this method there is less waiting time because as soon as one task has its data it is scheduled to start deserializing

1 Comment

This is a better solution than mine :)
-1

If i understand you right and you want to make the var reljson = await GetJson parralel:

Try this:

Parallel.ForEach(deployments, (deployment) => { var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}"); deployment.Release = JsonConvert.DeserializeObject<Release>(reljson); }); 

you might limit the number of parallel executions such as:

Parallel.ForEach( deployments, new ParallelOptions { MaxDegreeOfParallelism = 4 }, (deployment) => { var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}"); deployment.Release = JsonConvert.DeserializeObject<Release>(reljson); }); 

you might also want to be able to break the loop:

Parallel.ForEach(deployments, (deployment, state) => { var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}"); deployment.Release = JsonConvert.DeserializeObject<Release>(reljson); if (noFurtherProcessingRequired) state.Break(); }); 

3 Comments

Parallel.For/Foreach is the wrong tool for IO bound work, you lock resources up waiting for IO completion ports, TPL Dataflow is better suited for this
Thanks for the suggestion, although as @Saruman already mentioned I also thought Parallel.ForEach should be used primarily on CPU intensive tasks and not so much for IO tasks.
Parallel.ForEach does not support await.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.