Revision d83e73b7-1ff7-4956-9ac9-0295cbfc4a6b - Software Engineering Stack Exchange

> As far as I can tell, every request is already being run on a thread
> pool (as empirically tested by logging the thread ID during each
> request), so making all calls use async/await within your webmethods
> will, at best, move the execution from your thread pool to a different
> thread pool.

The 2nd part of the assumption is incorrect. The async / await calls, assuming they are IO calls, will not be offloaded to a different thread pool thread. 

Essentially, while IO happens, the thread that encountered an await will be free to pick up other requests. This improves the throughout of the web application. The fundamental reason behind this is that IO is not done by the CPU, but by the various IO devices on the PC (disk, network card etc), the CPU merely coordinates them. Synchronous calls will simply block the CPU core waiting for the IO device to finish, which is not ideal measure for maximum throughput. 

This is a pretty good read on the matter:

https://blog.stephencleary.com/2013/11/there-is-no-thread.html

> It doesn't free up the socket, because, well the connection is still
> open and the client is still waiting (synchronously or not) for a
> response.

A simplified view: 
Your server will bind a listener on a port (80 or 443 usually). When a request comes in, a new socket is created for every single connection (you can't have the same socket shared between 2 clients). The simplified workflow is like this:

1. Server binds listener port.
2. Connection incoming
3. Socket is created between server and client.
4. Request is assigned to a thread pool, and beginning processing -> this is where your async happens.
5. Listener is again free to serve a new connection. Repeat 2-4

Note that steps 4 and 5 happen in parallel.

**Async in step 4 allows the physical thread to pick up multiple sockets from the listener.** 

There's a hard limit on how many requests can be processed at the same time. As you correctly identified, there is a limit on how many sockets you can have open, and you cannot simply close the socket on someone. That is true. However, the limit of sockets is in the range of tens of thousands, whereas the limit on threads is in the thousands. So in order to fully saturate your sockets, which is ideal 100% usage of hardware, you need to better manage your threads, which is where async await comes in.

When a thread processing at step 4 encounters an await on async IO, it will simply return to the pool and be ready to process another request. The async IO device will send a notification to the CPU when it is done, so the processing of the request that was interrupted can continue. In the case of web APIs, the thread continuing after an await is not always the thread that encountered the await. This can be configured using `ConfigureAwait`. 

You can imagine this as a clown juggling 3-4 balls with just one hand. The thread is the clown. The balls in the air are async IO that are handled by IO devices. The ball in the clown's hand is the request currently being actively processed by that thread. If the clown wasn't allowed to throw balls in the air he'd be limited by the number of hands (one in this case) on how many balls he can handle.