5

I am writing a data logging app, all programs are started like:

./program > out.bin 

The data collector periodically pools the stdout output files and reads the data.

The issue is that the IO streams are buffered and if some program outputs data in like 1 byte per second, it takes a lot of time (up too 4k seconds with default 4kB buffer size) before the data are actually flushed out.

My question is how to force the stdout/pipe/printf buffer to flush externally, i.e. call externally something like fflush(stdout).

I have read various sites like Turn off buffering in pipe , but I can not disable the buffers as it have huge IO performace impact (measured).

I am looking for high performace solution for production and these following conditions are always met:

  • the program (data producer) PID is always known
  • the output is always a file with known path
  • the data logging process has full root access

3 Answers 3

6
gdb -p PID -batch -ex 'p fflush(stdout)' 

As with any debugging and hacking, YMMV.

4

Do you have access to the source of the running programs?

Forcing a flush of an arbitrary executable is, while not theoretically impossible, very difficult. You would need to find the fflush function in the code and the stdout argument, then interrupt the execution of the program, arrange for the call to fflush, then continue execution. If the program is using a shared library, that at least makes the first part easier, finding fflush and stdout, but you still need to simulate the call. Also, for an unknown binary, you can't know whether it used stdio or whether it implements its own buffering mechanism. Knowing the path of the output file will not help you.

You can try to use gdb, then use the attach command to attach to the process. Maybe gdb can call the fflush function.

If you have the source to the program, implement a signal handler that flushes the buffer and just send the signal when you want the buffer flushed.

You can try a pipe, maybe the programs don't buffer if the output is a pipe. Change it to

./program | your_program > out.bin 

Your program can accept the input, buffer it and flush the buffer when it receives a signal. It would still add CPU overhead, but not disk overhead.

8
  • Unfortunately I can not modify the data producer programs in any way. Commented Jun 10, 2019 at 9:36
  • I added an idea about gdb. But I think stdout is a macro and not available as a symbol at runtime. Commented Jun 10, 2019 at 9:39
  • 2
    Does in use dynamic libraries? May be you can change the libraries, to do the same. Commented Jun 10, 2019 at 10:24
  • 1
    I added an idea about using a pipe. @ctrl-alt-delor's idea with changes in a dynamic library might also work. Commented Jun 10, 2019 at 10:38
  • @ctrl-alt-delor I presume most of them yes, the typical programs are java, php cli, ... I will definitely test. Commented Jun 10, 2019 at 10:56
0

The answer from user313992 is nearly the best way and is at least likely to work with other binaries as well. Explanation: start gdb without any output, attach to pid, execute flush, implicit: get out.

A bit improved:

gdb -batch -p $PID -ex 'call (int)fflush(stdout)' -ex 'call (int)fflush(stderr)'

which has the benefit ensuring the right return type and don't have any printing. Just for keeping it in mind I've also added the flush of stderr.

This won't work if other file handles are used internally or if those handles are macros that GDB cannot resolve.

The best way for an own application would be to:

  • provide an external function for doing the flush (that removes all the issues about macros or different file handles, and can already called by GDB as in the above example)
  • call the function within your program at a state where it is useful:
    • register a signal handler for example SIGUSR1 that calls the function -> you can send that signal to force flushing
    • poll a file for existence (if it is there - delete then flush)
    • poll for a message/semaphore/socket/...

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.