I would like to store a JSON's contents in a HTML document's source, inside a script tag.
The content of that JSON does depend on user submitted input, thus great care is needed to sanitise that string for XSS.
I've read two concept here on SO.
1. Replace all occurrences of the </script tag into <\/script, or replace all </ into <\/ server side.
Code wise it looks like the following (using Python and jinja2 for the example):
// view data = { 'test': 'asdas</script><b>as\'da</b><b>as"da</b>', } context_dict = { 'data_json': json.dumps(data, ensure_ascii=False).replace('</script', r'<\/script'), } // template <script> var data_json = {{ data_json | safe }}; </script> // js access it simply as window.data_json object 2. Encode the data as a HTML entity encoded JSON string, and unescape + parse it in client side. Unescape is from this answer: https://stackoverflow.com/a/34064434/518169
// view context_dict = { 'data_json': json.dumps(data, ensure_ascii=False), } // template <script> var data_json = '{{ data_json }}'; // encoded into HTML entities, like < > & </script> // js function htmlDecode(input) { var doc = new DOMParser().parseFromString(input, "text/html"); return doc.documentElement.textContent; } var decoded = htmlDecode(window.data_json); var data_json = JSON.parse(decoded); This method doesn't work because \" in a script source becames " in a JS variable. Also, it creates a much bigger HTML document and also is not really human readable, so I'd go with the first one if it doesn't mean a huge security risk.
Is there any security risk in using the first version? Is it enough to sanitise a JSON encoded string with .replace('</script', r'<\/script')?
Reference on SO:
Best way to store JSON in an HTML attribute?
Why split the <script> tag when writing it with document.write()?
Script tag in JavaScript string
Sanitize <script> element contents
Escape </ in script tag contents
Some great external resources about this issue:
Flask's tojson filter's implementation source
Rail's json_escape method's help and source
A 5 year long discussion in Django ticket and proposed code
<,>, and&as HTML entities.JSON.stringify()and Python'sjson.dumps()doesn't escape/into\/. I'm looking for an automated way, which uses either the script tag parser to decode JSON orJSON.parse()on a string. Escaping manually on the server side would need something manual on the client side as well.|tojsonfilter's implementation in Flask to be the best resource. The source code as well as some really important comments are written there. github.com/pallets/flask/blob/… My understanding of the correct approach is the following then: 1. Use method 1. from my question. 2. encode <, >, & and ' into u00 form (not HTML entities!). 3. Double check if the JSON encoder escapes `\` or not, as it depends from implementation to implementation (or even changed mid-version sometimes).