3

My aim is to download images from an Amazon Web Services bucket.

I have the following code function which downloads multiple images at once:

public static void DownloadFilesFromAWS(string bucketName, List<string> imageNames) { int batchSize = 50; int maxDownloadMilliseconds = 10000; List<Task> tasks = new List<Task>(); for (int i = 0; i < imageNames.Count; i++) { string imageName = imageNames[i]; Task task = Task.Run(() => GetFile(bucketName, imageName)); tasks.Add(task); if (tasks.Count > 0 && tasks.Count % batchSize == 0) { Task.WaitAll(tasks.ToArray(), maxDownloadMilliseconds);//wait to download tasks.Clear(); } } //if there are any left, wait for them Task.WaitAll(tasks.ToArray(), maxDownloadMilliseconds); } private static void GetFile(string bucketName, string filename) { try { using (AmazonS3Client awsClient = new AmazonS3Client(Amazon.RegionEndpoint.EUWest1)) { string key = Path.GetFileName(filename); GetObjectRequest getObjectRequest = new GetObjectRequest() { BucketName = bucketName, Key = key }; using (GetObjectResponse response = awsClient.GetObject(getObjectRequest)) { string directory = Path.GetDirectoryName(filename); if (!Directory.Exists(directory)) { Directory.CreateDirectory(directory); } if (!File.Exists(filename)) { response.WriteResponseStreamToFile(filename); } } } } catch (AmazonS3Exception amazonS3Exception) { if (amazonS3Exception.ErrorCode == "NoSuchKey") { return; } if (amazonS3Exception.ErrorCode != null && (amazonS3Exception.ErrorCode.Equals("InvalidAccessKeyId") || amazonS3Exception.ErrorCode.Equals("InvalidSecurity"))) { // Log AWS invalid credentials throw new ApplicationException("AWS Invalid Credentials"); } else { // Log generic AWS exception throw new ApplicationException("AWS Exception: " + amazonS3Exception.Message); } } catch { // } } 

The downloading of the images all works fine but the Task.WaitAll seems to be ignored and the rest of the code continues to be executed - meaning I try to get files that are currently non existent (as they've not yet been downloaded).

I found this answer to another question which seems to be the same as mine. I tried to use the answer to change my code but it still wouldn't wait for all files to be downloaded.

Can anyone tell me where I am going wrong?

14
  • Should you not first add all download tasks and only afterwards wait for them to complete ? This is how I do it. The way you are doing - waiting right away after placing a new thread in the pool for it to complete - looks like synchronous programming - in other words you are not taking any benefit from multithreading. This does not directly relate to not waiting but... am leaving as a remark. Commented May 16, 2016 at 9:53
  • @Veverke I'm new to Tasks so I thought if I added them all, suppose there were 10,000 files I needed to download, would all tasks be running at the same time and be too much for the system to handle??? - For this reason I decides to run the tasks in batches. Commented May 16, 2016 at 9:55
  • Well I missed the fact that you have 2 waits, I saw the first only. I then now ask something else: why do you need the one inside the loop ? Did you try running once without it ? Try commenting the if inside the for loop and see what happens. Do you get exceptions ? Commented May 16, 2016 at 9:57
  • 1
    Relating to your comment - there is an amount of threads the operating system can handle in an effective way, and everything beyond that will not only not provide benefits but will decrease performance due task handling overheads (context switches, etc) - but I do not know what this number is. It also depends on how much time your tasks take to complete/process time estimate. If they are supposed to complete their job in 5 seconds, you should be able to handle much more of them concurrently rather than if each must take 5 minutes. Commented May 16, 2016 at 10:03
  • 1
    The batching might actually be a good thing, because the thing you'll most likely be expending first is not the amount of tasks that can be run at the same time, but rather the number of connections to the web service and the bandwidth. Commented May 16, 2016 at 10:07

1 Answer 1

6

The code behaves as expected. Task.WaitAll returns after ten seconds even when not all files have been downloaded, because you have specified a timeout of 10 seconds (10000 milliseconds) in variable maxDownloadMilliseconds.

If you really want to wait for all downloads to finish, call Task.WaitAll without specifying a timeout.

Use

Task.WaitAll(tasks.ToArray());//wait to download 

at both places.

To see some good explanations on how to implement parallel downloads while not stressing the system (only have a maximum number of parallel downloads), see the answer at How can I limit Parallel.ForEach?

Sign up to request clarification or add additional context in comments.

15 Comments

The unfinished tasks are not aborted, the WaitAll just returns while the tasks continue running in the background.
If you want the tasks getting aborted you have to bring a CancellationToken within this game :o)
@NineBerry It wont abort them. It just returns false if the tasks are still executing and you can choose to continue waiting or to cancel them manually.
The final Status of a task is RanToCompletion, Canceled or Faulted
When an exception occurs in the task, the task function finishes, so the task is considered finished.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.