13

I have some long loop that I need to write some data to a file on every iteration. The problem is that writing to a file can be slow, so I would like to reduce the time this takes by doing the writing asynchronously.

Does anyone know a good way to do this? Should I be creating a thread that consumes whatever is put into it's buffer by writing it out ( in this case, a single producer, single consumer )?

I am interested mostly in solutions that don't involve anything but the standard library (C++11).

3
  • 2
    It's not part of the standard library, but you should checkout libuv if you don't end up liking the standard library solution Commented Jan 15, 2014 at 0:36
  • @TaylorFlores: Thanks! I'll look into that, but on first blush, it looks like much more than I need. Commented Jan 15, 2014 at 0:46
  • 1
    What functions are you using now for read and write? If you're not already using the stdio library, which does buffered I/O, try that. If you are, you could try calling setvbuf to increase the buffer size. Commented Jan 15, 2014 at 1:03

2 Answers 2

19

Before going into asynchronous writing, if you are using IOStreams you might want to try to avoid flushing the stream accidentally, e.g., by not using std::endl but rather using '\n' instead. Since writing to IOStreams is buffered this can improve performance quite a bit.

If that's not sufficient, the next question is how the data is written. If there is a lot of formatting going on, there is a chance that the actual formatting takes most of the time. You might be able to push the formatting off into a separate thread but that's quite different from merely passing off writing a couple of bytes to another thread: you'd need to pass on a suitable data structure holding the data to be formatted. What is suitable depends on what you are actually writing, though.

Finally, if writing the buffers to a file is really the bottleneck and you want to stick with the standard C++ library, it may be reasonable to have a writer thread which listens on a queue filled with buffers from a suitable stream buffer and writes the buffers to an std::ofstream: the producer interface would be an std::ostream which would send off probably fixed sized buffers either when the buffer is full or when the stream is flushed (for which I'd use std::flush explicitly) to a queue on which the other read listens. Below is a quick implementation of that idea using only standard library facilities:

#include <condition_variable> #include <fstream> #include <mutex> #include <queue> #include <streambuf> #include <string> #include <thread> #include <vector> struct async_buf : std::streambuf { std::ofstream out; std::mutex mutex; std::condition_variable condition; std::queue<std::vector<char>> queue; std::vector<char> buffer; bool done; std::thread thread; void worker() { bool local_done(false); std::vector<char> buf; while (!local_done) { { std::unique_lock<std::mutex> guard(this->mutex); this->condition.wait(guard, [this](){ return !this->queue.empty() || this->done; }); if (!this->queue.empty()) { buf.swap(queue.front()); queue.pop(); } local_done = this->queue.empty() && this->done; } if (!buf.empty()) { out.write(buf.data(), std::streamsize(buf.size())); buf.clear(); } } out.flush(); } public: async_buf(std::string const& name) : out(name) , buffer(128) , done(false) , thread(&async_buf::worker, this) { this->setp(this->buffer.data(), this->buffer.data() + this->buffer.size() - 1); } ~async_buf() { std::unique_lock<std::mutex>(this->mutex), (this->done = true); this->condition.notify_one(); this->thread.join(); } int overflow(int c) { if (c != std::char_traits<char>::eof()) { *this->pptr() = std::char_traits<char>::to_char_type(c); this->pbump(1); } return this->sync() != -1 ? std::char_traits<char>::not_eof(c): std::char_traits<char>::eof(); } int sync() { if (this->pbase() != this->pptr()) { this->buffer.resize(std::size_t(this->pptr() - this->pbase())); { std::unique_lock<std::mutex> guard(this->mutex); this->queue.push(std::move(this->buffer)); } this->condition.notify_one(); this->buffer = std::vector<char>(128); this->setp(this->buffer.data(), this->buffer.data() + this->buffer.size() - 1); } return 0; } }; int main() { async_buf sbuf("async.out"); std::ostream astream(&sbuf); std::ifstream in("async_stream.cpp"); for (std::string line; std::getline(in, line); ) { astream << line << '\n' << std::flush; } } 
Sign up to request clarification or add additional context in comments.

16 Comments

AndrewSpott: with the file stream's default setup it is buffered. You can disable buffering for file streams by calling stream.rdbuf()->setbuf(0, 0).
@zangw: When the buffer is full, it should flush automatically: overflow() is called when a character is written for which there is no more space in the buffer [based on what the stream knows: there is one more character space to stick the argument to overflow() in]. If you want to send data without the buffer being full, you'll need to flush (when the stream is destroyed it'll flush). The implementation above chops the data up into units of 128 bytes. The constant can be changed, of course (I haven't profiled the code to see which size makes most sense).
@zangw: well, sure. The design of the class above writes a buffer and hands it off to another thread when it is full, using a new buffer to write to. There could be a queue of available buffers (prefer to use these and create a new one if none is available) but that's not implemented. You can also grow the buffer in overflow() and only sent it when flushed explicitly (i.e., when sync() is called).
@zwang: next you seem to confuse this site with some source of free labor which it is not. If you want someone to review your code, you'd need to put it, e.g., on codereview, possibly pointing at it from a comment over here to draw attention of people interested in this question at your code review. If you have concrete questions how something works you can possibly ask over here.
@zangw: With respect to the concrete question asked above: you should create a question, not ask in a comment. .. and the short answer is: whether you can use one buffer has little to do with stream buffers but rather with how you synchronize the access to the buffer between the two threads. If you make sure that the threads don't touch the same bytes in the one buffer things would be OK. If you end up writing a byte in on thread which is accessed unsynchronized in another thread, you have undefined behavior.
|
3

Search the web for "double buffering."

In general, one thread will write to one or more buffers. Another thread reads from the buffers, "chasing" the writing thread.

This may not make your program more efficient. Efficiency with files is achieved by writing in huge blocks so that the drive doesn't get a chance to spin down. One write of many bytes is more efficient than many writes of a few bytes.

This could be achieved by having the writing thread only write when the buffer content has exceeded some threshold like 1k.

Also research the topic of "spooling" or "print spooling".

You'll need to use C++11 since previous versions don't have threading support in the standard library. I don't know why you limit yourself, since Boost has some good stuff in it.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.