54

Are there any known ways for ast.literal_eval(node_or_string)'s evaluation to not actually be safe?

If yes, are patches available for them?

(I already know about PyPy[sandbox], which is presumably more secure, but unless the answers are yes then no, my needs are minor enough that I won't be going that far.)

0

2 Answers 2

98
+100

The documentation formerly stated it was safe, but now contains the caveat:

This function had been documented as “safe” in the past without defining what that meant. That was misleading. This is specifically designed not to execute Python code, unlike the more general eval(). There is no namespace, no name lookups, or ability to call out. But it is not free from attack: A relatively small input can lead to memory exhaustion or to C stack exhaustion, crashing the process. There is also the possibility for excessive CPU consumption denial of service on some inputs. Calling it on untrusted data is thus not recommended.

It is possible to crash the Python interpreter due to stack depth limitations in Python’s AST compiler. It can raise ValueError, TypeError, SyntaxError, MemoryError and RecursionError depending on the malformed input.

Also, according to the source, literal_eval parses the string to a Python AST (source tree), and returns only if it is a literal.

So, because the code is never executed, only parsed, arbitrary code execution (ACE) exploits should† be impossible. Denial of service (DOS) attacks are possible, however.

†known bugs of this nature will be patched, but there are the unknown bugs ;)

Sign up to request clarification or add additional context in comments.

6 Comments

+1 The reason there aren't more answers here is that nothing more needs to be said.
Well, it's always difficult to prove that there is no risk, but the fact the code is never actually executed should help to convince that there is not much risk.
The risk is about the same as using Python itself.
unfortunately, i would like to use ast.literal_eval() in order to filter an input before passing it to eval() or exec(), which always represents a risk. but in fact, the source code seems to show that the input is pretty strictly filtered. i just hope that i did not miss an edge-case...
If the input is a litteral, literal_eval() will return the value. If the input is more than a literal (it contains code), then literal_eval() will fail, and there would be a risk in executing the code. In both case, literal_eval() does the job. Why do you want to use eval() or exec() after that ?
|
18
>>> code = '()' * 1000000 >>> ast.literal_eval(code) [1] 3061 segmentation fault (core dumped) python2 

or possibly smaller will crash with SIGSEGV in Python 2. It might be exploitable under some conditions. This particular bug has got some mitigations in Python 3, but bugs may still exist in the AST parser, as evidenced by a crash that user caot found out is happening in Red Hat 9 with 3.9.18:

$ python Python 3.9.18 (main, Jan 24 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import ast >>> code = '()' * 100 >>> ast.literal_eval(code) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.9/ast.py", line 105, in literal_eval return _convert(node_or_string) File "/usr/lib64/python3.9/ast.py", line 104, in _convert return _convert_signed_num(node) File "/usr/lib64/python3.9/ast.py", line 78, in _convert_signed_num return _convert_num(node) File "/usr/lib64/python3.9/ast.py", line 69, in _convert_num _raise_malformed_node(node) File "/usr/lib64/python3.9/ast.py", line 66, in _raise_malformed_node raise ValueError(f'malformed node or string: {node!r}') ValueError: malformed node or string: <ast.Call object at 0x7f882ad96fa0> >>> code = '()' * 1000000 >>> ast.literal_eval(code) Segmentation fault (core dumped) 

7 Comments

you are using a operation in arguments toliteral_eval (which is not a string or node), and has nothing to do with literal_eval.
@ProdiptaGhosh it is a string. There is a very good reason why I didn't expand those million parentheses in this answer!
The point is, you are first evaluating an expression (the string multiplied a gazillion time, it is an expression, not a string) before you are calling literal_eval, and that string expansion has nothing to do with literal_eval whatsoever. If things go write it gets the expanded string. If it goes wrong, python crashes even before literal_eval is called.
Ok, this makes things much clear. This seems a valid point. Not much to do with literal_eval but to the underlying parse and then the compile call, which segfault on exceeding max recursion limit. This is a valid point. I have reversed my vote. This seems to be an open issue for later versions as well
In a 2022 issue and later patch, the safety claims were updated to reflect that it is not safe from DOS attacks, just ACE attacks.
|