Many sources recommend a "filter input, escape output" way to manage content. I have been following this pretty well, but have happened upon a situation where I might want to violate this mantra but I am not sure of the potential costs for doing so.
I have a pubsub websocket server (node.js) whose purpose is to read data from another server and then send it to subscribed clients through a websocket.
What I am doing now is when a client receives data via the websocket server, the data is escaped for HTML in the client-side javascript. This seems fragile. I have limited security expertise, so it is possible that having client-side javascript do encoding may be a vulnerability I am unaware of. As a hypothetical example, maybe the client is an older browser that does not understand the escaping function fully, causing it to be vulnerable. It seems kind of strange to make a browser client do this sort of escaping in general.
I am considering moving the escaping of HTML from the client to the websocket server. The only encoding needed is to convert strings to html entities, but the strings would have to be unparsed, encoded, then reparsed and sent to the client. This might add significant overhead since each string can be several KB long. The websocket server is supposed to be just a pipe between a publisher and the subscribers and so is meant to be as fast as possible, but security is even more important.
Clients of the websocket server will ONLY need to display the data as html, so I am confident that the need to for a client to do something different with the data will not exist. (If a new sort of client must consume the data differently, I would setup a new websocket server or new subscription that does not encode the data to be sent.)
This doesn't quite violate the "filter input, escape output" idea because the original data remains untouched. The intermediate "client," the websocket server, would receive and secure the data before sending it to a browser, which is like how a normal webserver operates when it fetches from the database and then sends HTML to the browser.
The options I am considering now are:
- Keep encoding the data as html entities in the client-side javascript
- Encode as html entities on the websocket server before sending it to clients
Encoding on the websocket server might incur an overhead but seems like it would be safest thing to do since it happens server-side. Erring on the side of safety causes me to want to move away from encoding HTML in client-side js.
Encoding on the client seems the riskiest option to me due to being full of unknowns.
I use Underscore's escape function to https://underscorejs.org/docs/modules/escape.html to escape html in client-side js.
Should I move the html escaping to the websocket server or keep it in client-side javascript? Are there any vulnerabilities I should be aware of in keeping it in client-side javascript?