324

My python script uses subprocess to call a linux utility that is very noisy. I want to store all of the output to a log file and show some of it to the user. I thought the following would work, but the output doesn't show up in my application until the utility has produced a significant amount of output.

# fake_utility.py, just generates lots of output over time import time i = 0 while True: print(hex(i)*512) i += 1 time.sleep(0.5) 

In the parent process:

import subprocess proc = subprocess.Popen(['python', 'fake_utility.py'], stdout=subprocess.PIPE) for line in proc.stdout: # the real code does filtering here print("test:", line.rstrip()) 

The behavior I really want is for the filter script to print each line as it is received from the subprocess, like tee does but within Python code.

What am I missing? Is this even possible?


2

14 Answers 14

250

I think the problem is with the statement for line in proc.stdout, which reads the entire input before iterating over it. The solution is to use readline() instead:

#filters output import subprocess proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE) while True: line = proc.stdout.readline() if not line: break #the real code does filtering here print "test:", line.rstrip() 

Of course you still have to deal with the subprocess' buffering.

Note: according to the documentation the solution with an iterator should be equivalent to using readline(), except for the read-ahead buffer, but (or exactly because of this) the proposed change did produce different results for me (Python 2.5 on Windows XP).

Sign up to request clarification or add additional context in comments.

13 Comments

for file.readline() vs. for line in file see bugs.python.org/issue3907 (in short: it works on Python3; use io.open() on Python 2.6+)
The more pythonic test for an EOF, per the "Programming Recommendations" in PEP 8 (python.org/dev/peps/pep-0008), would be 'if not line:'.
@naxa: for pipes: for line in iter(proc.stdout.readline, ''):.
@Jan-PhilipGehrcke: yes. 1. you could use for line in proc.stdout on Python 3 (there is no the read-ahead bug) 2. '' != b'' on Python 3 -- don't copy-paste the code blindly -- think what it does and how it works.
@J.F.Sebastian: sure, the iter(f.readline, b'') solution is rather obvious (and also works on Python 2, if anyone is interested). The point of my comment was not to blame your solution (sorry if it appeared like that, I read that now, too!), but to describe the extent of the symptoms, which are quite severe in this case (most of the Py2/3 issues result in exceptions, whereas here a well-behaved loop changed to be endless, and garbage collection struggles fighting the flood of newly created objects, yielding memory usage oscillations with long period and large amplitude).
|
110

Bit late to the party, but was surprised not to see what I think is the simplest solution here:

import io import subprocess proc = subprocess.Popen(["prog", "arg"], stdout=subprocess.PIPE) for line in io.TextIOWrapper(proc.stdout, encoding="utf-8"): # or another encoding # do something with line 

(This requires Python 3.)

9 Comments

I'd like to use this answer but I am getting: AttributeError: 'file' object has no attribute 'readable' py2.7
Works with python 3
@sorin neither of those things make it "not valid". If you're writing a library that still needs to support Python 2, then don't use this code. But many people have the luxury of being able to use software released more recently than a decade ago. If you try to read on a closed file you'll get that exception regardless of whether you use TextIOWrapper or not. You can simply handle the exception.
you are maybe late to the party but you answer is up to date with current version of Python, ty
@Ammad \n is the newline character. it's conventional in Python for the newline to not be removed when splitting by lines - you'll see the same behaviour if you iterate over a file's lines or use a readlines() method. You can get the line without it with just line[:-1] (TextIOWrapper operates in "universal newlines" mode by default, so even if you're on Windows and the line ends with \r\n, you'll only have \n at the end, so -1 works). You can also use line.rstrip() if you don't mind any other whitespace-like characters at the end of the line also being removed.
|
31

Indeed, if you sorted out the iterator then buffering could now be your problem. You could tell the python in the sub-process not to buffer its output.

proc = subprocess.Popen(['python','fake_utility.py'],stdout=subprocess.PIPE) 

becomes

proc = subprocess.Popen(['python','-u', 'fake_utility.py'],stdout=subprocess.PIPE) 

I have needed this when calling python from within python.

Comments

26

A function that allows iterating over both stdout and stderr concurrently, in realtime, line by line

In case you need to get the output stream for both stdout and stderr at the same time, you can use the following function.

The function uses Queues to merge both Popen pipes into a single iterator.

Here we create the function read_popen_pipes():

from queue import Queue, Empty from concurrent.futures import ThreadPoolExecutor def enqueue_output(file, queue): for line in iter(file.readline, ''): queue.put(line) file.close() def read_popen_pipes(p): with ThreadPoolExecutor(2) as pool: q_stdout, q_stderr = Queue(), Queue() pool.submit(enqueue_output, p.stdout, q_stdout) pool.submit(enqueue_output, p.stderr, q_stderr) while True: if p.poll() is not None and q_stdout.empty() and q_stderr.empty(): break out_line = err_line = '' try: out_line = q_stdout.get_nowait() except Empty: pass try: err_line = q_stderr.get_nowait() except Empty: pass yield (out_line, err_line) 

read_popen_pipes() in use:

import subprocess as sp with sp.Popen(my_cmd, stdout=sp.PIPE, stderr=sp.PIPE, text=True) as p: for out_line, err_line in read_popen_pipes(p): # Do stuff with each line, e.g.: print(out_line, end='') print(err_line, end='') return p.poll() # return status-code 

Comments

19

The subprocess module has come a long way since 2010, and most of the answers here are quite outdated.

Here is a simple way working for modern Python versions:

from subprocess import Popen, PIPE, STDOUT with Popen(args, stdout=PIPE, stderr=STDOUT, text=True) as proc: for line in proc.stdout: print(line) rc = proc.returncode 

About using Popen as a context-manager (supported since Python 3.2): on exit of the with block, standard file descriptors are closed, and the process is waited / returncode attribute set. See subprocess.py:Popen.__exit__ in CPython sources.

Comments

18

You want to pass these extra parameters to subprocess.Popen:

bufsize=1, universal_newlines=True 

Then you can iterate as in your example. (Tested with Python 3.5)

1 Comment

@nicoulaj It should work if using the subprocess32 package.
7

You can also read lines w/o loop. Works in python3.6.

import os import subprocess process = subprocess.Popen(command, stdout=subprocess.PIPE) list_of_byte_strings = process.stdout.readlines() 

2 Comments

Or to convert into strings: list_of_strings = [x.decode('utf-8').rstrip('\n') for x in iter(process.stdout.readlines())]
@ndtreviv, you can pass text=True to Popen or use its "encoding" kwarg if you want the output as strings, no need to convert it yourself
6

Pythont 3.5 added the methods run() and call() to the subprocess module, both returning a CompletedProcess object. With this you are fine using proc.stdout.splitlines():

proc = subprocess.run( comman, shell=True, capture_output=True, text=True, check=True ) for line in proc.stdout.splitlines(): print "stdout:", line 

See also How to Execute Shell Commands in Python Using the Subprocess Run Method

3 Comments

This solution is short and effective. One problem, compared to the original question: it does not print each line "as it is received," which I think means printing the messages in realtime just as if running the process directly in the command line. Instead it only prints the output after the process finishes running.
Thanks @sfuqua for mentioning that. I use pipelines extensively and rely on streaming data and would have wrongly chosen this for its brevity.
This does not answer the question. It buffers entire output of subprocess into memory.
1

I tried this with python3 and it worked, source

When you use popen to spawn the new thread, you tell the operating system to PIPE the stdout of the child processes so the parent process can read it and here, stderr is copied to the stderr of the parent process.

in output_reader we read each line of stdout of the child process by wrapping it in an iterator that populates line by line output from the child process whenever a new line is ready.

def output_reader(proc): for line in iter(proc.stdout.readline, b''): print('got line: {0}'.format(line.decode('utf-8')), end='') def main(): proc = subprocess.Popen(['python', 'fake_utility.py'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) t = threading.Thread(target=output_reader, args=(proc,)) t.start() try: time.sleep(0.2) import time i = 0 while True: print (hex(i)*512) i += 1 time.sleep(0.5) finally: proc.terminate() try: proc.wait(timeout=0.2) print('== subprocess exited with rc =', proc.returncode) except subprocess.TimeoutExpired: print('subprocess did not terminate in time') t.join() 

Comments

0

The following modification of Rômulo's answer works for me on Python 2 and 3 (2.7.12 and 3.6.1):

import os import subprocess process = subprocess.Popen(command, stdout=subprocess.PIPE) while True: line = process.stdout.readline() if line != '': os.write(1, line) else: break 

Comments

0

I was having a problem with the arg list of Popen to update servers, the following code resolves this a bit.

import getpass from subprocess import Popen, PIPE username = 'user1' ip = '127.0.0.1' print ('What is the password?') password = getpass.getpass() cmd1 = f"""sshpass -p {password} ssh {username}@{ip}""" cmd2 = f"""echo {password} | sudo -S apt update""" cmd3 = " && " cmd4 = f"""echo {password} | sudo -S apt upgrade -y""" cmd5 = " && " cmd6 = "exit" commands = [cmd1, cmd2, cmd3, cmd4, cmd5, cmd6] command = " ".join(commands) cmd = command.split() with Popen(cmd, stdout=PIPE, bufsize=1, universal_newlines=True) as p: for line in p.stdout: print(line, end='') 

And to run the update on a local computer, the following code example does this.

import getpass from subprocess import Popen, PIPE print ('What is the password?') password = getpass.getpass() cmd1_local = f"""apt update""" cmd2_local = f"""apt upgrade -y""" commands = [cmd1_local, cmd2_local] with Popen(['echo', password], stdout=PIPE) as auth: for cmd in commands: cmd = cmd.split() with Popen(['sudo','-S'] + cmd, stdin=auth.stdout, stdout=PIPE, bufsize=1, universal_newlines=True) as p: for line in p.stdout: print(line, end='') 

Comments

0

An improved version of https://stackoverflow.com/a/57093927/2580077 and suitable to python 3.10

A function to iterate over both stdout and stderr of the process in parallel.

Improvements:

  • Unified queue to maintain the order of entries in stdout and stderr.
  • Yield all available lines in stdout and stderr - this is useful when the calling process is slower.
  • Use blocking in the loop to prevent the process from utilizing 100% of the CPU.
import time from queue import Queue, Empty from concurrent.futures import ThreadPoolExecutor def enqueue_output(file, queue, level): for line in file: queue.put((level, line)) file.close() def read_popen_pipes(p, blocking_delay=0.5): with ThreadPoolExecutor(2) as pool: q = Queue() pool.submit(enqueue_output, p.stdout, q, 'stdout') pool.submit(enqueue_output, p.stderr, q, 'stderr') while True: if p.poll() is not None and q.empty(): break lines = [] while not q.empty(): lines.append(q.get_nowait()) if lines: yield lines # otherwise, loop will run as fast as possible and utilizes 100% of the CPU time.sleep(blocking_delay) 

Usage:

with subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=1, universal_newlines=True) as p: for lines in read_popen_pipes(p): # lines - all the log entries since the last loop run. print('ext cmd', lines) # process lines 

Comments

0

I came here with the same problem, and found that none of the provided answers really worked for me. The closest was adding the sys.std.flush() to the child process, which works but means modifying that process, which I didn't want to do.

Setting the bufsize=1 in the Popen() didn't seem to have any effect for my use case. I guess the problem is that the child process is buffering, regardless of how I call the Popen().

However, I found this question with similar problem (How can I flush the output of the print function?) and one of the answers is to set the environment variable PYTHONUNBUFFERED=1 when calling Popen. This works how I want it to, i.e. real-time line-by-line reading of the output of the child process.

Comments

0

On Linux (and presumably OSX), sometimes the parent process doesn't see the output immediately because the child process is buffering its output (see this article for a more detailed explanation).

If the child process is a Python program, you can disable this by setting the environment variable PYTHONUNBUFFERED to 1 as described in this answer.

If the child process is not a Python program, you can sometimes trick it into running in line-buffered mode by creating a pseudo-terminal like so:

import os import pty import subprocess # Open a pseudo-terminal master_fd, slave_fd = pty.openpty() # Open the child process on the slave end of the PTY with subprocess.Popen( ['python', 'fake_utility.py'], stdout=slave_fd, stdin=slave_fd, stderr=slave_fd) as proc: # Close our copy of the slave FD (without this we won't notice # when the child process closes theirs) os.close(slave_fd) # Convert the master FD into a file-like object with open(master_fd, 'r') as stdout: try: for line in stdout: # Do the actual filtering here print("test:", line.rstrip()) except OSError: # This happens when the child process closes its STDOUT, # usually when it exits pass 

If the child process needs to read from STDIN, you can get away without the stdin=slave_fd argument to subprocess.Popen(), as the child process should be checking the status of STDOUT (not STDIN) when it decides whether or not to use line-buffering.

Finally, some programs may actually directly open and write to their controlling terminal instead of writing to STDOUT. If you need to catch this case, you can use the setsid utility by replacing ['python', 'fake_utility.py'] with ['setsid', 'python', 'fake_utility.py'] in the call to subprocess.Popen().

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.