I have been looking at different solutions for large scale chatting solutions. I feel as if I understand 90 % of it but am turning to this forum to tie the knot.
I imagine running a bunch of message servers behind a load balancer, keeping long lived connections to the client (wss or xmpp). The amount of servers can scale horizontally based on incoming requests. Then on the backend the messages are distributed using a pub-sub pattern. Meaning if two clients are trying to communicate but are connected to different servers, then the pub-sub messaging distribution will take care of it.
So far, it all make sense. The image below clarifies it perfectly.
But I can't imagine how each WebSocket Server is managing to subscribe to all of the topics the PubSub server (redis) will handle.
I imagine the PubSub server managing 100,000 requests per second through sharding. But I can't understand how every WebSocket Server then handles the subscriptions. Because being subscribed to all topics on all shards will that not flood each WebSocket Server?
So once we go down that path of sharding, how do we distribute messages properly? Because then we need some intelligence in order for each WebSocket Server to be subscribed to the exact right topics where client A and client B is sending its messages. Somehow each WebSocket Server needs to be continuously subscribing/de-subscribing based on which topic is being used for client A and client B.
Or is actually the WebSocket servers capable of being subscribed to EVERY topic? Am I wrong in seeing this as a problem? If so, then why use load balancing at all?
Edit: Taking inspiration from here and here
Edit2: rephrasing.
