Socket.io scaling issue

Question

I have 3 nodejs apps running on a GCP compute engine instance(2cpu, 2GB ram, ubuntu 20.04) with Nginx reverse proxy. One of them is a socket.io chat server. The socket.io app uses @socket.io/cluster-adapter to utilize all available CPU cores. I followed this tutorial to update the Linux settings to get maximum number of connections. Here is the output of ulimit command,

core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 7856 max locked memory (kbytes, -l) 65536 max memory size (kbytes, -m) unlimited open files (-n) 500000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 7856 virtual memory (kbytes, -v) unlimited

cat /proc/sys/fs/file-max 2097152

/etc/nginx/nginx.conf

user www-data; worker_processes auto; worker_rlimit_nofile 65535; pid /run/nginx.pid; include /etc/nginx/modules-enabled/*.conf; events { worker_connections 30000; # multi_accept on; } ...

/etc/nginx/sites-available/default

... //socket.io part location /socket.io/ { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_set_header X-NginX-Proxy false; proxy_pass http://localhost:3001/socket.io/; proxy_redirect off; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } ...

My chat server code,

const os = require("os"); const cluster = require("cluster"); const http = require("http"); const { Server } = require("socket.io"); const { setupMaster, setupWorker } = require("@socket.io/sticky"); const { createAdapter, setupPrimary } = require("@socket.io/cluster-adapter"); const { response } = require("express"); const PORT = process.env.PORT || 3001; const numberOfCPUs = os.cpus().length || 2; if (cluster.isPrimary) { const httpServer = http.createServer(); // setup sticky sessions setupMaster(httpServer, { loadBalancingMethod: "least-connection", // either "random", "round-robin" or "least-connection" }); // setup connections between the workers setupPrimary(); cluster.setupPrimary({ serialization: "advanced", }); httpServer.listen(PORT); for (let i = 0; i < numberOfCPUs; i++) { cluster.fork(); } cluster.on("exit", (worker) => { console.log(`Worker ${worker.process.pid} died`); cluster.fork(); }); } //worker process else { const express = require("express"); const app = express(); const Chat = require("./models/chat"); const mongoose = require("mongoose"); const request = require("request"); //todo remove var admin = require("firebase-admin"); var serviceAccount = require("./serviceAccountKey.json"); const httpServer = http.createServer(app); const io = require("socket.io")(httpServer, { cors: { origin: "*", methods: ["GET", "POST"], }, transports: "websocket", }); mongoose.connect(process.env.DB_URL, { authSource: "admin", user: process.env.DB_USERNAME, pass: process.env.DB_PASSWORD, }); app.use(express.json()); app.get("/", (req, res) => { res .status(200) .json({ status: "success", message: "Hello, I'm your chat server.." }); }); // use the cluster adapter io.adapter(createAdapter()); // setup connection with the primary process setupWorker(io); io.on("connection", (socket) => { activityLog( "Num of connected users: " + io.engine.clientsCount + " (per CPU)" ); ... //chat implementations }); }

Load test client code,

const { io } = require("socket.io-client"); const URL = //"https://myserver.com/"; const MAX_CLIENTS = 6000; const CLIENT_CREATION_INTERVAL_IN_MS = 100; const EMIT_INTERVAL_IN_MS = 300; //1000; let clientCount = 0; let lastReport = new Date().getTime(); let packetsSinceLastReport = 0; const createClient = () => { const transports = ["websocket"]; const socket = io(URL, { transports, }); setInterval(() => { socket.emit("chat_event", {}); }, EMIT_INTERVAL_IN_MS); socket.on("chat_event", (e) => { packetsSinceLastReport++; }); socket.on("disconnect", (reason) => { console.log(`disconnect due to ${reason}`); }); if (++clientCount < MAX_CLIENTS) { setTimeout(createClient, CLIENT_CREATION_INTERVAL_IN_MS); } }; createClient(); const printReport = () => { const now = new Date().getTime(); const durationSinceLastReport = (now - lastReport) / 1000; const packetsPerSeconds = ( packetsSinceLastReport / durationSinceLastReport ).toFixed(2); console.log( `client count: ${clientCount} ; average packets received per second: ${packetsPerSeconds}` ); packetsSinceLastReport = 0; lastReport = now; }; setInterval(printReport, 5000);

As you can see from the code, I'm only using websocket for transports. So, it should be able to serve up to 8000 connections as per this StackOverflow answer. But when I run the load test, the server becomes unstable after 1600 connections. And CPU usage goes up to 90% and memory usage up to 70%. I couldn’t find anything in the Nginx error log. How can increase the number of connections to at least 8000? Should I upgrade the instance or change any Linux settings? Any help would be appreciated.

UPDATE I removed everything related to clustering and ran it again as a regular single-threaded nodejs app. This time, the result was a little better, 2800 stable connections (CPU usage 40%., memory usage 50%). Please note that I'm not performing any disk I/O during the test.

This is probably not related to your question but why do you need sticky session ? I remembered I had a hard to make it work and with ws only you don't it. Check my question here stackoverflow.com/questions/46891819/… — Qiulang
– Qiulang, Commented Sep 21, 2022 at 12:17
So, if I remove sticky session and use only websocket, will the workers be able to communicate to each other? It's a chat application and the users may connect to different worker processes. That's why I use sticky session, so that user A can send message to user B who is connected on a different worker. — Sujith S Manjavana
– Sujith S Manjavana, Commented Sep 21, 2022 at 13:00
YES. You may need socket.io/docs/v4/redis-adapter but I see you already use node cluster. I am not sure how to make these 2 work together. — Qiulang
– Qiulang, Commented Sep 21, 2022 at 13:41
There are an important point here. Looks your express server is not scalable as you need. *Nginx receives incoming traffic and deliver to "more idle" instance of express. May you need create 8 instances eg. From port 3001 to 3009 they each one should handle 1000 clients. To ensure communication across sockets.io you can gracefully create a common route to all subscriptions. So your nginx have role to distribute to all express instances clientes. This is a very nice case. Tell us about your progress — Aloiso Junior
– Aloiso Junior, Commented Sep 21, 2022 at 23:49
@AloisoJunior Maybe I should give it a try. Could you please explain how I can do that with Nginx and pm2? — Sujith S Manjavana
– Sujith S Manjavana, Commented Sep 23, 2022 at 7:20

Tibic4 · Accepted Answer · 2022-09-27 16:52:19Z

You are using the cluster adapter, which is not meant to be used with sticky sessions. You should use the Redis adapter instead. Each worker will connect to Redis and will be able to communicate with each other. You can also use the Redis adapter with sticky sessions, but you will need to use the Redis adapter on the primary process as well.

To answer your another question:

"if I remove sticky session and use only websocket, will the workers be able to communicate to each other?"

Yes, the workers will be able to communicate to each other. I don't think it's a good idea to use sticky sessions for a chat application. You should use a pub/sub system like Redis or NATS to communicate between the workers. For example, you can use Redis to publish a message to a channel and the other workers will receive the message and send it to the client.

When you use sticky sessions, each worker will be connected to a single client. So, if you have 4 workers, you will be able to serve 4 clients at the same time. If you use the cluster adapter, each worker will be connected to all the clients. So, if you have 4 workers, you will be able to serve 4 * 4 clients at the same time. So, you will be able to serve more clients with the cluster adapter.

Example of using the Redis adapter:

const { createAdapter } = require("socket.io-redis"); const io = require("socket.io")(httpServer, { cors: { origin: "*", methods: ["GET", "POST"], }, transports: "websocket", }); io.adapter(createAdapter("redis://localhost:6379"));

Example of using the NATS adapter:

const { createAdapter } = require("socket.io-nats"); const io = require("socket.io")(httpServer, { cors: { origin: "*", methods: ["GET", "POST"], }, transports: "websocket", }); io.adapter(createAdapter("nats://localhost:4222"));

Try both options and see which works best for you.

Could you please explain, why it's a bad idea to use cluster adapter with sticky sessions? socket.io/docs/v4/cluster-adapter I was following this documentation and it also uses sticky sessions. Also, I tried it to run without cluster as a regular single-threaded app and I got only 2800 stable connections. So, there might be some other problems in my code.
Could it be a linux related issue? As I'm still not getting more than 3000 connections even after removing the cluster module.
I don't think it's a Linux issue. You can try to use the Redis adapter and see if you can get more connections. You can also try to use the NATS adapter.

Collectives™ on Stack Overflow

Socket.io scaling issue

1 Answer 1

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Linked

Related