47

Are there any equivalent JavaScript functions for Python's urllib.parse.quote() and urllib.parse.unquote()?

The closest I've come across are encodeURI()/encodeURIComponent() and escape() (and their corresponding un-encoding functions), but they don't encode/decode the same set of special characters as far as I can tell.

1
  • (un)escape did the work for me Commented Oct 31, 2021 at 16:41

8 Answers 8

104
JavaScript | Python ----------------------------------- encodeURI(str) | urllib.parse.quote(str, safe='~@#$&()*!+=:;,?/\''); ----------------------------------- encodeURIComponent(str) | urllib.parse.quote(str, safe='~()*!\'') 

On Python 3.7+ you can remove ~ from safe=.

Sign up to request clarification or add additional context in comments.

Comments

6

OK, I think I'm going to go with a hybrid custom set of functions:

Encode: Use encodeURIComponent(), then put slashes back in.
Decode: Decode any %hex values found.

Here's a more complete variant of what I ended up using (it handles Unicode properly, too):

function quoteUrl(url, safe) { if (typeof(safe) !== 'string') { safe = '/'; // Don't escape slashes by default } url = encodeURIComponent(url); // Unescape characters that were in the safe list toUnencode = [ ]; for (var i = safe.length - 1; i >= 0; --i) { var encoded = encodeURIComponent(safe[i]); if (encoded !== safe.charAt(i)) { // Ignore safe char if it wasn't escaped toUnencode.push(encoded); } } url = url.replace(new RegExp(toUnencode.join('|'), 'ig'), decodeURIComponent); return url; } var unquoteUrl = decodeURIComponent; // Make alias to have symmetric function names 

Note that if you don't need "safe" characters when encoding ('/' by default in Python), then you can just use the built-in encodeURIComponent() and decodeURIComponent() functions directly.

Also, if there are Unicode characters (i.e. characters with codepoint >= 128) in the string, then to maintain compatibility with JavaScript's encodeURIComponent(), the Python quote_url() would have to be:

def quote_url(url, safe): """URL-encodes a string (either str (i.e. ASCII) or unicode); uses de-facto UTF-8 encoding to handle Unicode codepoints in given string. """ return urllib.quote(unicode(url).encode('utf-8'), safe) 

And unquote_url() would be:

def unquote_url(url): """Decodes a URL that was encoded using quote_url. Returns a unicode instance. """ return urllib.unquote(url).decode('utf-8') 

1 Comment

I used the unquote_url function but ran into issues when moving to Python 3 - the decode is automatic in python 3, in python 2, it is still required. I couldn't figure out a way to do it nicely that worked in both languages. My py3 code is urllib.parse.unquote(six.text_type(a))
6

The requests library is a bit more popular if you don't mind the extra dependency

from requests.utils import quote quote(str) 

1 Comment

requests.utils.quote is just urllib.parse.quote. Don't install requests just for that function.
3

Here are implementations based on a implementation on github repo purescript-python:

import urllib.parse as urllp def encodeURI(s): return urllp.quote(s, safe="~@#$&()*!+=:;,.?/'") def decodeURI(s): return urllp.unquote(s, errors="strict") def encodeURIComponent(s): return urllp.quote(s, safe="~()*!.'") def decodeURIComponent(s): return urllp.unquote(s, errors="strict") 

Comments

2

Python: urllib.quote

Javascript:unescape

I haven't done extensive testing but for my purposes it works most of the time. I guess you have some specific characters that don't work. Maybe if I use some Asian text or something it will break :)

This came up when I googled so I put this in for all the others, if not specifically for the original question.

Comments

0

Try a regex. Something like this:

mystring.replace(/[\xFF-\xFFFF]/g, "%" + "$&".charCodeAt(0)); 

That will replace any character above ordinal 255 with its corresponding %HEX representation.

2 Comments

That's great for characters above 255, but there are some other funny ones that quote() catches that are below 255 (like '?', '&', '@', and others I don't know about)
The brackets denote a character set which can include individual characters as well as ranges. You can just as easily write it as /[\?&@\xFF-\xFFFF]/g to achieve that result. You just need to escape any chars that also regex special chars (like ? or /).
0

decodeURIComponent() is similar to unquote

const unquote = decodeURIComponent const unquote_plus = (s) => decodeURIComponent(s.replace(/\+/g, ' ')) 

except that Python is much more forgiving. If one of the two characters after a % is not a hex digit (or there's not two characters after a %), JavaScript will throw a URIError: URI malformed error, whereas Python will just leave the % as is.

encodeURIComponent() is not quite the same as quote, you need to percent encode a few more characters and un-escape /:

const quoteChar = (c) => '%' + c.charCodeAt(0).toString(16).padStart(2, '0').toUpperCase() const quote = (s) => encodeURIComponent(s).replace(/[()*!']/g, quoteChar).replace(/%2F/g, '/') const quote_plus = (s) => quote(s).replace(/%20/g, '+') 

The characters that Python's quote doesn't escape is documented here and is listed as (on Python 3.7+) "Letters, digits, and the characters '_.-~' are never quoted. By default, this function is intended for quoting the path section of a URL. The optional safe parameter specifies additional ASCII characters that should not be quoted — its default value is '/'"

The characters that JavaScript's encodeURIComponent doesn't encode is documented here and is listed as uriAlpha (upper and lowercase ASCII letters), DecimalDigit and uriMark, which are - _ . ! ~ * ' ( ).

Comments

0

I am passing text files back and forth between Python and JavaScript.

Although urllib.parse.quote (Python side) and decodeURIComponent (JavaScript side) seems to work OK, it may not work for every character correctly.

So I wrote my own function that should be 100% reliable, regardless of the characters in the text file.

On the Python side I use xxd to encode the file. xxd is a linux utility that converts the binary file to a string of 2 hex digits for each binary byte. The Python code to encode the file to a string of of hex codes from Python is:

mystring = os.popen("xxd -p "+your_file_name_here).read().replace('\n','') 

If you want to do the xxd conversion in Python instead of using the external program, you can use these functions. They only work with text files, though. If you need to work with binary, stick with the external xxd program.

 def doxxd(s): xd="" c="" for i in range(0,len(s)): if (ord(s[i]))<16: c=hex( ord(s[i]) ).replace('0x','0') else: c=hex( ord(s[i]) ).replace('0x','') xd+=c return xd def unxxd(x): s="" #get two chars at a time for i in range(0,len(x),2): s+=chr(int('0x'+x[i:i+2],16)) return s 

On the JavaScript side this function restores the hex code file back to the original text string:

function unxxd(str){ var s="" //get two chars at a time for (i=0;i<str.length;i=i+2){ s+=String.fromCharCode(parseInt("0x"+str.substr(i,2))) } return s } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.