0

I've been working on refactoring a process that iterates over a collection of FileClass objects that have a Filename, NewFilename, and a string[] FileReferences property, and replaces all FileReferences in which reference the old filename with the new one. The code below is slightly simplified, in that the real file references property is not just a list of the file names- they are lines which may contain a file name somewhere in them, or not. The current code is ok when the _fileClass collection is below around 1000 objects... but is painfully slow if there are more objects, or the file references property has thousands.

Following the answers on this post: Run two async tasks in parallel and collect results in .NET 4.5 (and several like it). I've been trying to make a async method which would take a list of all the old and new file names as well as an individual FileClass, then build an array of these Task<FileClass> and try to process them in parallel via Task.WhenAll(). But running into a "Cannot await void" error. I believe it's due to the Task.Run(() => ...); but removing the () => causes further problems.

It's an older code base, and I can't let the async propagate further than the calling code (in this case, Main, as I have found in some other examples. Nor can I use C#8's async foreach due to a .Net 4.5 limitation.

class Program { private static List<FileClass> _fileClasses; static void Main(string[] args) { var watch = new Stopwatch(); _fileClasses = GetFileClasses(); watch.Start(); ReplaceFileNamesAsync(); watch.Stop(); Console.WriteLine($"Async Elapsed Ticks: {watch.ElapsedTicks}"); watch.Reset(); //watch.Start(); //ReplaceFileNamesSLOW(); //watch.Stop(); //Console.WriteLine($"Slow Elapsed Ticks: {watch.ElapsedTicks}"); Console.ReadLine(); } public static async void ReplaceFileNamesAsync() { var newOldFilePairs = _fileClasses.Select(p => new NewOldFilePair() { OldFile = p.Filename, NewFile = p.NewFilename }).ToArray(); var tasks = new List<Task<FileClass>>(); foreach (var file in _fileClasses) { tasks.Add(ReplaceFileNamesAsync(newOldFilePairs, file)); } //Red underline "Cannot await void". FileClass[] result = await Task.WaitAll(tasks.ToArray()); } private static async Task<FileClass> ReplaceFileNamesAsync(NewOldFilePair[] fastConfigs, FileClass fileClass) { foreach (var config in fastConfigs) { //I suspect this is part of the issue. await Task.Run(() => fileClass.ReplaceFileNamesInFileReferences(config.OldFile, config.NewFile)); } return fileClass; } public static void ReplaceFileNamesSLOW() { // Current Technique for (var i = 0; i < _fileClasses.Count; i++) { var oldName = _fileClasses[i].Filename; var newName = _fileClasses[i].NewFilename; for (var j = 0; j < _fileClasses.Count; j++) { _fileClasses[j].ReplaceFileNamesInFileReferences(oldName, newName); } } } public static List<FileClass> GetFileClasses(int numberToGet = 2000) { //helper method to build a bunch of FileClasses var fileClasses = new List<FileClass>(); for (int i = 0; i < numberToGet; i++) { fileClasses.Add(new FileClass() { Filename = $@"C:\fake folder\fake file_{i}.ext", NewFilename = $@"C:\some location\sub folder\fake file_{i}.ext" }); } var fileReferences = fileClasses.Select(p => p.Filename).ToArray(); foreach (var fileClass in fileClasses) { fileClass.FileReferences = fileReferences; } return fileClasses; } } public class NewOldFilePair { public string OldFile { get; set; } public string NewFile { get; set; } } public class FileClass { public string Filename { get; set; } public string NewFilename { get; set; } public string[] FileReferences { get; set; } //Or this might be the void it doesn't like. public void ReplaceFileNamesInFileReferences(string oldName, string newName) { if (FileReferences == null) return; if (FileReferences.Length == 0) return; for (var i = 0; i < FileReferences.Length; i++) { if (FileReferences[i] == oldName) FileReferences[i] = newName; } } } 

Update If others find this question and actually need to implement something similar to above, there were some potential pitfalls worth mentioning. Obviously, I had a typo for Task.WaitAll() vs Task.WhenAll() (I blame VS autocomplete, and maybe me rushing to make a scratch app 😉). Secondly, once the code was "working", I found out that while async was cutting down the time to get through this, it was not completing the entire list of tasks (as they could be in the thousands) before continuing onto the next stage of the process. This resulted in a Task.Run(() => ReplaceFileNamesAsync()).Wait() call, which actually took longer than the nested loop method. Unpacking and merging the results back into the _fileClasses property also required some logic, which contributed to the problem.

The Parallel.ForEach was a much quicker process, and while I didn't see the updated code posted below, I ended up with largely the same result (other than the dictionary).

1
  • @MichaelRandall yes, that is the only way we've been able to make it work consistently. With the WhenAll and async methods (which remove the nested loop, but still fit every file with every file reference) are around 6.17 ticks after on my laptop.. That'll be more than enough on a real workstation. Commented Feb 14, 2020 at 5:16

2 Answers 2

4

To solve the initial problem, is you should be using await Task.WhenAll not Task.WaitAll

Task.WhenAll

Creates a task that will complete when all of the supplied tasks have completed.

However, this looks like more of a job for Parallel.ForEach

Another issue is you looping over the same list twice (nested) which is a quadratic time complexity and is definitely not thread safe

As a solve, you could create a dictionary of the changes, loop over the change set once (in parallel), and update the references in one go.

_fileClasses = GetFileClasses(); // create a dictionary for fast lookup var changes = _fileClasses.Where(x => x.Filename != null && x.NewFilename != null) .ToDictionary(x => x.Filename, x => x.NewFilename); // parallel the workloads Parallel.ForEach(_fileClasses, (item) => { // iterate through the references for (var i = 0; i < item.FileReferences.Length; i++) { // check for updates if (changes.TryGetValue(item.FileReferences[i], out var value)) item.FileReferences[i] = value; } }); 

Note : This wasn't intended to be a complete solution as all the code wasn't supplied, however the time complexity should be a lot better

Sign up to request clarification or add additional context in comments.

2 Comments

That works. But I'd like to look into the Parallel.ForEach...
Thanks for taking the time to knock together a Parallel.ForEach example. I ended up with something very similar today, however as I mentioned, and you noted, lack of the real code, and complexity of what was actually going on in the "check for updates" loop, the dictionary and logic differ. The resulting refactor improved the process dramatically- and I've now been tasked with finding similar bottlenecks to improve 🤦‍♂️
1

Try use Task.WhenAll. It will allow you to await it because it returns a task. Task.WaitAll is a blocking call and will wait till all the tasks are finished before returning void.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.