5

While I was dealing with relatively large text files I noticed something strange. Async Reading and writing is actually slower than the non-async reading:

E,g, executing this dummy code:

 var res1 = File.WriteAllLinesAsync(string.Format(@"C:\Projects\DelMee\file{0}.txt", i), lines); var res2 = File.WriteAllLinesAsync(string.Format(@"C:\Projects\DelMee\file{0}_bck.txt", i), lines); await res1; await res2; 

is actually much slower than

 File.WriteAllLines(string.Format(@"C:\Projects\DelMee\file{0}.txt", i), lines); File.WriteAllLines(string.Format(@"C:\Projects\DelMee\file{0}_bck.txt", i), lines); 

In theory the first approach should be faster, because the second writing should be stared before the first one is completed. The performance difference is about 100% for 15 ~25MBs files (10 vs 20 seconds).

I noticed the same behavior for ReadAllLines and ReadAllLinesAsync.

Update:0 The main idea is to have all files processed after the completion of the function of TestFileWriteXXX functions. Therefore

Task.WhenAll(allTasks1); // Without await is not a valid option 

Update:1 I added read write using threads and it showed 50% improvement. Here is the complete example:

Update:2 I updated the code in order to eliminate the buffer generation overhead

 const int MaxAttempts = 5; static void Main(string[] args) { TestFileWrite(); TestFileWriteViaThread(); TestFileWriteAsync(); Console.ReadLine(); } private static void TestFileWrite() { Clear(); Stopwatch stopWatch = new Stopwatch(); stopWatch.Start(); Console.WriteLine( "Begin TestFileWrite"); for (int i = 0; i < MaxAttempts; ++i) { TestFileWriteInt(i); } TimeSpan ts = stopWatch.Elapsed; string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10); Console.WriteLine("TestFileWrite took: " + elapsedTime); } private static void TestFileWriteViaThread() { Clear(); Stopwatch stopWatch = new Stopwatch(); stopWatch.Start(); Console.WriteLine("Begin TestFileWriteViaThread"); List<Thread> _threads = new List<Thread>(); for (int i = 0; i < MaxAttempts; ++i) { var t = new Thread(TestFileWriteInt); t.Start(i); _threads.Add(t); } _threads.ForEach(T => T.Join()); TimeSpan ts = stopWatch.Elapsed; string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10); Console.WriteLine("TestFileWriteViaThread took: " + elapsedTime); } private static void TestFileWriteInt(object oIndex) { int index = (int)oIndex; List<string> lines = GenerateLines(index); File.WriteAllLines(string.Format(@"C:\Projects\DelMee\file{0}.txt", index), lines); File.WriteAllLines(string.Format(@"F:\Projects\DelMee\file{0}_bck.txt", index), lines); var text = File.ReadAllLines(string.Format(@"C:\Projects\DelMee\file{0}.txt", index)); var text1 = File.ReadAllLines(string.Format(@"C:\Projects\DelMee\file{0}.txt", index)); //File.WriteAllLines(string.Format(@"C:\Projects\DelMee\file_test{0}.txt", index), text1); } private static async void TestFileWriteAsync() { Clear(); Console.WriteLine("Begin TestFileWriteAsync "); Stopwatch stopWatch = new Stopwatch(); stopWatch.Start(); for (int i = 0; i < MaxAttempts; ++i) { List<string> lines = GenerateLines(i); var allTasks = new List<Task>(); allTasks.Add(File.WriteAllLinesAsync(string.Format(@"C:\Projects\DelMee\file{0}.txt", i), lines)); allTasks.Add(File.WriteAllLinesAsync(string.Format(@"F:\Projects\DelMee\file{0}_bck.txt", i), lines)); await Task.WhenAll(allTasks); var allTasks1 = new List<Task<string[]>>(); allTasks1.Add(File.ReadAllLinesAsync(string.Format(@"C:\Projects\DelMee\file{0}.txt", i))); allTasks1.Add(File.ReadAllLinesAsync(string.Format(@"C:\Projects\DelMee\file{0}.txt", i))); await Task.WhenAll(allTasks1); // await File.WriteAllLinesAsync(string.Format(@"C:\Projects\DelMee\file_test{0}.txt", i), allTasks1[0].Result); } stopWatch.Stop(); TimeSpan ts = stopWatch.Elapsed; string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10); Console.WriteLine("TestFileWriteAsync took: " + elapsedTime); } private static void Clear() { for (int i = 0; i < 15; ++i) { System.IO.File.Delete(string.Format(@"C:\Projects\DelMee\file{0}.txt", i)); System.IO.File.Delete(string.Format(@"F:\Projects\DelMee\file{0}_bck.txt", i)); } } static string buffer = new string('a', 25 * 1024 * 1024); private static List<string> GenerateLines(int i) { return new List<string>() { buffer }; } 

And the results are:

TestFileWrite took: 00:00:03.50

TestFileWriteViaThread took: 00:00:01.63

TestFileWriteAsync took: 00:00:06.78

8 Code CPU/ C and F are 2 different SSD drive 850 EVOs on 2 different SATAs.

Update:3 - Conclusion Looks like the File.WriteAllLinesAsync does handle well the scenario when we want to flush large amount of data. As it was pointed out from the answers bellow it is better to use FileStream directly. But still async is slower than sequential access.

But for now the fastest approach still remains if you use multi threading.

14
  • 6
    In theory the first approach should be faster That is an incorrect assumption. async is not necessarily about making things faster. It depends on implementation. stackoverflow.com/questions/54753339/… Commented Apr 7, 2020 at 6:51
  • 2
    @Kiko I think that the problem is that you are performing writing data to a single physical drive. If you use SATA drive than it cannot write data simultaneously. Therefore write operations itself in your code could perform sequentially. Commented Apr 7, 2020 at 6:58
  • 1
    Your question is so lacking in detail, it would be impossible for anyone to say for sure why you see the relative performance difference that you do. Both of the previous comments suggest very plausible explanations, but frankly your question cannot be provided any sort of good answer without more details and a good minimal reproducible example, at a minimum. Even then, it may not be all that novel or useful a question anyway. Commented Apr 7, 2020 at 7:01
  • 1
    @IliarTurdushev very good point. But I tried on 2 different drives and the result was Same. Commented Apr 7, 2020 at 7:17
  • 2
    @Kiko you can't benchmark code like this. Your code is performing a ton of string operations that would affect performance far more than any difference between sync and async. Disk IO is affected by everything too, so performing just 5 iterations is meaningless - each runner will be completely different. Use BenchmarkDotNet instead, and check for allocation overhead. Rewrite your tests so they all write the same static string Commented Apr 7, 2020 at 8:22

2 Answers 2

5

I think this is a known issue. If you Google, you will see heaps similar posts.

Eg https://github.com/dotnet/runtime/issues/23196

If fast is the requirement for single IO operation, you should always use sync IO as well as sync methods.

Write*Async methods internally opens file stream in asynchronous IO mode, which has overhead cost compared to sync IO.

https://learn.microsoft.com/en-us/windows/win32/fileio/synchronous-and-asynchronous-i-o

However, for relatively fast I/O operations, the overhead of processing kernel I/O requests and kernel signals may make asynchronous I/O less beneficial, particularly if many fast I/O operations need to be made. In this case, synchronous I/O would be better.

Also asynchronous methods in FileStream and StreamWriter may have issues with small buffer size. The default buffer size for writing to file stream is 4KB, which is way too small than file size (25MB to 50MB). Despite the buffer size seems working for sync methods fine, but it exaggerates the overhead cost incurred by async methods.

See this line, the method yields thread every time when buffer is full. If you are writing a 25MB file with the default 4096 bytes buffer, this happens 6400 times.

To optimize that, if the entire file is in memory, the buffer size could be set to the file size to reduce context switching and synchronization between each write and flush.

If you open FileStream and StreamWriter with different buffer sizes in your code and run the tests for Write and WriteAsync, you will see difference. If the buffer size is identical to the file size, the difference between sync and async methods is very small.

For example:

// 4KB buffer sync stream using (var stream = new FileStream( path, FileMode.Create, FileAccess.Write, FileShare.Read, 4096, FileOptions.SequentialScan)) { using (var writer = new StreamWriter(stream, Encoding.UTF8)) { writer.Write(str25mb); } } // 25MB buffer sync stream using (var stream = new FileStream( path, FileMode.Create, FileAccess.Write, FileShare.Read, 25 * 1024 * 1024, FileOptions.SequentialScan)) { using (var writer = new StreamWriter(stream, Encoding.UTF8)) { writer.Write(str25mb); } } // 4KB buffer async stream using (var stream = new FileStream( path, FileMode.Create, FileAccess.Write, FileShare.Read, 4096, FileOptions.Asynchronous | FileOptions.SequentialScan)) using (var writer = new StreamWriter(stream, Encoding.UTF8)) { await writer.WriteAsync(str25mb); } // 25MB buffer async stream using (var stream = new FileStream( path, FileMode.Create, FileAccess.Write, FileShare.Read, 25 * 1024 * 1024, FileOptions.Asynchronous | FileOptions.SequentialScan)) using (var writer = new StreamWriter(stream, Encoding.UTF8)) { await writer.WriteAsync(str25mb); } 

Result (I run each test 10 times) was:

TestFileWriteWithLargeBuffer took: 00:00:00.9291647 TestFileWriteWithLargeBufferAsync took: 00:00:01.1950127 TestFileWrite took: 00:00:01.5251026 TestFileWriteAsync took: 00:00:03.6913877 
Sign up to request clarification or add additional context in comments.

3 Comments

You are right, but still Async is 2 times slower than non-async. The same proportion like WriteAllLines.
The WriteAllLines and WriteAllLinesAsync internally use FileStream and StreamWriter with small buffer, so that is why the async method is twice slower than the sync method. But this only happens when writing large files when needing to flush the buffer many times. In this case, you could optimize the write operation by using a FileStream with large size buffer.
In terms of async is slower than sync. This is true for any single IO operation as per the MSDN docs says.
1

I was wrong with my first answer since not used wait for Task.WhenAll when tried to remove async from the TestFileWriteAsync.

I've fixed test and it shows that File.Write*Async is really slower.

Begin TestFileWriteAsync TestFileWriteAsync took: 00:00:13.7128699 Begin TestFileWrite TestFileWrite took: 00:00:01.5734895 Begin TestFileWriteViaThread TestFileWriteViaThread took: 00:00:00.8322218 

Please excuse me

PS I've checked the Async methods source code.

Looks like File.WriteAllLinesAsync and File.WriteAllTextAsync uses the same InternalWriteAllTextAsync that copies the part of the original buffer one more time

buffer = ArrayPool<char>.Shared.Rent(DefaultBufferSize); int count = contents.Length; int index = 0; while (index < count) { int batchSize = Math.Min(DefaultBufferSize, count - index); contents.CopyTo(index, buffer, 0, batchSize); #if MS_IO_REDIST await sw.WriteAsync(buffer, 0, batchSize).ConfigureAwait(false); #else await sw.WriteAsync(new ReadOnlyMemory<char>(buffer, 0, batchSize), cancellationToken).ConfigureAwait(false); #endif 

The contents.CopyTo(index, buffer, 0, batchSize); is the line that copies the part of the original data buffer.

You can try with File.WriteAllBytesAsync it takes data buffer "as is" and does not execute additional copy operation:

Begin TestFileWriteAsync TestFileWriteAsync took: 00:00:00.7741439 Begin TestFileWrite TestFileWrite took: 00:00:00.5772008 Begin TestFileWriteViaThread TestFileWriteViaThread took: 00:00:00.4457552 

WriteAllBytesAsync test source code

9 Comments

N.P. Yes I've noticed this and updated my comment + reduced the overhead caused by buffer generation, just to make it clear.
@Kiko please try with WriteAllBytesAsync - it is the same fast as the WriteAllBytes method. I've updated my answer with part of .Net Core source code that can explain why WriteAllLinesAsync is so slow. And here is test v3 with byte array source code
Hi @oleksa, thanks for you analysis. I am still trying to realise what I am seeing :). I see this function is very dummy implemented. M$ goal is to write write on chunks but they could simple use use WriteAsync(buffer, offset, count) with proper offsets, instead of copy the buffer + to use ReadOnlyMemory.
I don't think that line of code "copies the part of the original buffer one more time". That line of code copies string to a char[] then the next line writes the char[] to stream. The sync method writes string to stream, where it is also copied to a char[] buffer.
@weichch The WriteAllBytesAsync is the same fast as WriteAllBytes so I do not think that the problem is in the async approach. For example, InternalWriteAllTextAsync uses string.CopyTo that moves data between memory using the Buffer.Memmove<T>. The InternalWriteAllLines uses Buffer.Memmove. I don't know what is RuntimeHelpers.IsReferenceOrContainsReferences<string>() result. Will it do the BulkMoveWithWriteBarrier or not ?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.