1

I have an application with very strict requirements around auditing and the "replayability" of user actions. For this reason, I've chosen an event-sourced architecture because of its append-only/ledger nature. The system is event-driven and uses CQRS as well. Specifically: writes are appended to Redpanda (Kafka) topics, consumers process those writes and store the results in Redis/Scylla, then the client reads exclusively from Redis as a pull-through cache with Scylla reads on cache misses.

The issue is with notifying the browser/client application when a change has made its way through the back end. When a write occurs, the client application obviously needs to know when the change is completed (or rejected). A change may require multiple consumers to handle different processing steps (something like the saga pattern), any of which could be potentially long-running (up to 10s, waiting on third-party services, etc). I'm familiar with strategies where the data is simply optimistically updated at the client in a sort of "fire and forget" way but I'm talking about something where it's critical that the user knows each step has succeeded/failed.

My question is: why is short polling so bad for this? To be clear - the client only reads from read-optimized stores like Redis (with the exception of the cache miss explained earlier). It never hits a relational DB with a huge, multiple-scan join, parses text files, etc. It only exchanges keys for values. On writes, the mutating change request is given a UUID, written to Redpanda immediately and the UUID is returned - all in a single request/response. The client can then poll the read store (possibly with some sort of backoff function) with that UUID to check the status of the request, which will be stored in Redis. Either the status is returned, or a timeout is reached, at which point the client stops polling. I think I've heard this referred to as the "coat check pattern"?

My reasoning is as follows:

Why not websockets?

While WS would provide real-time feedback for the client, it requires drastically reducing the number of clients a single machine can serve since the connections must be held open until the result is returned. Besides this, websockets have been a considerable issue for me in the past when multiple proxies, load balancers, kubernetes pods, ingresses, etc are involved. This application will serve 10's of millions of users and will need to scale elastically. The user actions being performed are largely transactional and there are no real-time requirements outside the one I've described (no long-running chats or the like).

Why not SSE?

This would essentially be the same as the WS situation. While there aren't all the protocol upgrade issues and everything happens over standard HTTP, the issue still remains that once a client is connected, it must hold a connection open while the requested change makes it through the system - thereby tying up server resources until the change is complete.

Why short polling?

It seems like everything I read on this topic lists the options as WS, SSE, and short polling only to categorically dismiss short polling due to its "chattiness". In summary, "It's too chatty because you have to make a bunch of requests, a majority of which don't return anything, which uses resources for nothing". Then there's what feels like something of a hand-wave as the recommendation of "just use WS/SSE because short polling is too chatty" is made with no mention of the complex path a WS request actually needs to traverse to make it through all of the (ever-elastically-shifting) layers required for a typical, large-scale, enterprise network. Not to mention that a WS/SSE connection may return a majority of its responses as heartbeats until the one "success/failure" response comes back - all while preventing other users' requests (also in the form of websockets/SSE) from even making it to the server - which sounds a lot like the complaint about short polling: that most responses carry no domain-level data.

This is why short polling seems so appealing to me. Yes, it's chatty - there are more requests made and some of them will return empty response bodies. However:

  • As stated before: these requests are only made to read-optimized stores.
  • The back-end infra can scale/shift/etc along with load and the client won't care because there's no long-running connection to sever.
  • One server can handle more users - not from the standpoint of concurrent connections - but from the standpoint that eventually, if 1,000,000 users make a request at the same time to one server, all "is it done yet?" polling sessions will eventually get their answer - assuming it's within the timeout (to be clear - I'm not actually planning to throw millions of users at a single box; just using the extreme for the sake of example). I know this because I load tested it myself. I ran the test with far more users than a single machine could handle and all users ultimately received their "yes, it's done" responses well within the timeout.
  • Keep-alive is standard since HTTP/1.1. I realize this won't be useful on each and every connection - and will be less likely to apply as load increases - but in the case where a client is able to make multiple short-poll requests before its connection is severed, all requests after the initial one require much lower overhead, similar to the repeat requests made in a WS/SSE approach.

What am I missing that makes short polling so bad? I'm not building Discord or some sort of low-latency gaming server. I just need to get a lot of users through multi-step, transactional workflows and don't want to worry about having to "hold the door open", so to speak while the back end converges on the final state of each request.

To be clear, I'm not saying that WS and SSE don't have their place. I've built trading applications that relied on tick-level data and other monitoring/status dashboards where it's nice to just have that one pipe connected to the client and let the server emit whatever happens whenever it happens. My current requirements, however, aren't that kind of real-time.

Am I way off here? Am I missing something obvious? Like I said, I've load tested this myself (on an actual, physical rack across multiple machines with actual networking cables between them - not just on my computer) and it seems to work quite well. Just seems like I'm missing something considering how opposed the internet evidently is to this idea.

Thanks!

3 Answers 3

2

My question is: why is short polling so bad for this?

It is not "bad". It simply is either imprecise or extremely inefficient. To increase precision, you have to decrease polling interval. And that costs a lot of bandwidth and processing.

Short-polling has only one real advantage: it is very easy to setup. But since you are thinking (probably overthinking) about scaling this app to millions of users, then this cannot be the decisive factor.

While WS would provide real-time feedback for the client, it requires drastically reducing the number of clients a single machine can serve since the connections must be held open until the result is returned.

That claim does not seem to be backed by anything. Modern machines can handle tens of thousand of idle connections (if implemented correctly, e.g. through async). It is extremely unlikely to be less efficient than constantly reconnecting clients, constantly renegotiating TLS (which costs), constantly going through authentication and other layers, and doing (potentially) empty queries. I simply don't buy that argument.

Besides this, websockets have been a considerable issue for me in the past when multiple proxies, load balancers, kubernetes pods, ingresses, etc are involved.

I understand. But that's, you know, skill issue?

Why not SSE?

SSE seems to be a dead tech anyway. If you need to use it, then you may as well use WebSockets.

  • As stated before: these requests are only made to read-optimized stores.

So? Does it mean they are free? No, it doesn't.

  • The back-end infra can scale/shift/etc along with load and the client won't care because there's no long-running connection to sever.

Are you saying that you are not using long-running connections at all? E.g. no HTTP Keep-Alive? That's strange, and inefficient. Sounds to me like you overestimate how hard it is to scale long-running connections. Most of the modern apps out there do Keep-Alive. So scaling long-running connections is not inherently hard.

  • One server can handle more users - not from the standpoint of concurrent connections - but from the standpoint that eventually, if 1,000,000 users make a request at the same time to one server, all "is it done yet?" polling sessions will eventually get their answer

What? Do you claim that a single server can handle 1000000 requests at the same time? How it will open so many sockets? Temporary, but existing? No, you will need more servers to do that. You will need proxies, and queueing. It is not different at all, it simply looks easier because this is a well known problem, already solved by infrastructure. This is mostly because of how well understood the HTTP is.

  • Keep-alive is standard since HTTP/1.1.

If you use Keep-Alive, then you literally have 1000000 open, long-running connections now. There's no way a single, real world server can handle that. You may as well use WebSockets at that point, for which you can fine tune the overhead.


The last thing you can consider, is using long-polling instead of short-polling. This has an advantage of using well understood, standard HTTP, and only react to changes. So its like a mix of short-polling and WS. And in fact, even big companies like Facebook still use long-polling (e.g. for Messenger). Which shows how powerful and useful that technique is.

22
  • "SSE seems to be a dead tech anyway." Huh? EventSource is not "dead". It's still defined in modern browsers. Commented Oct 24 at 14:18
  • 2
    @guest271314 I didn't mean it is unsupported. I meant that hardly anyone uses it anymore, since WebSockets are simply superior in every aspect. Commented Oct 24 at 14:44
  • So, you think if a lot of people use something that makes it a viable software solution? A bunch of people use the garbage that is Microsoft Windows. That doesn't mean anything. The HTTP/3 folks would say QUIC is "superior" to WebSockets, even given the broad usage of WebSockets. Commented Oct 24 at 14:45
  • 1
    @guest271314 I think your opinion on Windows doesn't really mean anything. Yes, it does matter how many people use a tech. But that's not the time for such discussion. Regardless, WebSockets are not only used more, but more importantly are vastly superior to SSE. Commented Oct 24 at 14:49
  • @guest271314 Quic is superior to WebSockets? These aren't even comparable. One is a transport protocol, the latter is not really (requires HTTP). You can't open Quic connection from browser. Commented Oct 24 at 14:50
1

There is polling, and there's event driven. And polling is always going to be "slow", since an event result can be delivered before a polling result will show up.

different processing steps ... up to 10s

I'm not building ... some sort of low-latency ... server.

"Long poll" is essentially "event driven", since we keep the link open, and an asynchronously arriving event will traverse the link without delay. As you observe, there are costs to keeping it open, and there may be multiple layers which incur those dollar costs.

So we have an engineering tradeoff, of P95 latency vs dollar cost. What the OP did not describe was how to evaluate that tradeoff. At best we are told this is not realtime gaming or stock trading, so the several steps of up to ten seconds each could suggest a client polling interval of once every sixty seconds.

On a trading floor it's straightforward to get stakeholders to say that cutting off half of a 0.1 second delay or of a ten second delay will be worth X million or Y million dollars in additional development cost.

For the OP application, we would need to know the dollar cost A to serve a "not yet" miss, the cost B to serve "it's complete", and the cost F(t) incurred by a client receiving delayed completion status after t=20 seconds or 20 minutes or 20 hours, when the actual completion time was somewhat less than that. The client has some willingness to pay for receiving timely updates, and a hosting service like AWS has some willingness to deploy webservers that can answer such queries if you're willing to pay the hosting fees. The one can be harmonized with the other, but that would require more information than was divulged in the OP.


I spoke of costs A and B above, the costs for a "not yet" miss and for a hit. The OP essentially grumbled about the dollar cost of a multi-layered serving architecture, where "resources" (memory?) are allocated for the tens of seconds it takes to confirm an airline reservation or whatever. By hypotheses we have > 1e7 users.

This suggests breaking out a bunch of a.example.com and b.example.com webservers, specializing in the "not yet" and "completed" portions of the user experience. We should be able to architect very cheap A servers which can hold open a large number of idle TCP connections. Upon receiving an asynchronous "completed!" event, they immediately produce a 307 redirect, so the patiently waiting client will go ask the B servers about current details. This costs a RTT, which is probably less than the short poll interval you had contemplated. Crucially the cost structure of the A servers and of the B servers can be different, in terms of CPU, RAM, storage, and config.

-2

If short polling works for you, use short polling.

It is possible to achieve full-duplex streaming between the browser and peers or server using a variety of Web API's.

There is WebTransport, which uses HTTP/3 over QUIC.

WebRTC Data Channels can be used for persistent connections between peers, a peer could be a server for your use cases.

It is possible to establish direct UDP connections by using Direct Sockets UDPSocket in and to and from an Isolated Web App to arbitrary Web pages, for example by using WebRTC Data Channels, or fetch() to communicate with the IWA (where it is also possible to streamm persistently from server to client and from client to server using streaming requests using WHATWG Fetch with duplex set to "half"; and/or using Transfer-Encoding: chunked requests) if there is concern about head-of-line blocking over TCP.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.