Before going into asynchronous writing, if you are using IOStreams you might want to try to avoid flushing the stream accidentally, e.g., by not using std::endl but rather using '\n' instead. Since writing to IOStreams is buffered this can improve performance quite a bit.
If that's not sufficient, the next question is how the data is written. If there is a lot of formatting going on, there is a chance that the actual formatting takes most of the time. You might be able to push the formatting off into a separate thread but that's quite different from merely passing off writing a couple of bytes to another thread: you'd need to pass on a suitable data structure holding the data to be formatted. What is suitable depends on what you are actually writing, though.
Finally, if writing the buffers to a file is really the bottleneck and you want to stick with the standard C++ library, it may be reasonable to have a writer thread which listens on a queue filled with buffers from a suitable stream buffer and writes the buffers to an std::ofstream: the producer interface would be an std::ostream which would send off probably fixed sized buffers either when the buffer is full or when the stream is flushed (for which I'd use std::flush explicitly) to a queue on which the other read listens. Below is a quick implementation of that idea using only standard library facilities:
#include <condition_variable> #include <fstream> #include <mutex> #include <queue> #include <streambuf> #include <string> #include <thread> #include <vector> struct async_buf : std::streambuf { std::ofstream out; std::mutex mutex; std::condition_variable condition; std::queue<std::vector<char>> queue; std::vector<char> buffer; bool done; std::thread thread; void worker() { bool local_done(false); std::vector<char> buf; while (!local_done) { { std::unique_lock<std::mutex> guard(this->mutex); this->condition.wait(guard, [this](){ return !this->queue.empty() || this->done; }); if (!this->queue.empty()) { buf.swap(queue.front()); queue.pop(); } local_done = this->queue.empty() && this->done; } if (!buf.empty()) { out.write(buf.data(), std::streamsize(buf.size())); buf.clear(); } } out.flush(); } public: async_buf(std::string const& name) : out(name) , buffer(128) , done(false) , thread(&async_buf::worker, this) { this->setp(this->buffer.data(), this->buffer.data() + this->buffer.size() - 1); } ~async_buf() { std::unique_lock<std::mutex>(this->mutex), (this->done = true); this->condition.notify_one(); this->thread.join(); } int overflow(int c) { if (c != std::char_traits<char>::eof()) { *this->pptr() = std::char_traits<char>::to_char_type(c); this->pbump(1); } return this->sync() != -1 ? std::char_traits<char>::not_eof(c): std::char_traits<char>::eof(); } int sync() { if (this->pbase() != this->pptr()) { this->buffer.resize(std::size_t(this->pptr() - this->pbase())); { std::unique_lock<std::mutex> guard(this->mutex); this->queue.push(std::move(this->buffer)); } this->condition.notify_one(); this->buffer = std::vector<char>(128); this->setp(this->buffer.data(), this->buffer.data() + this->buffer.size() - 1); } return 0; } }; int main() { async_buf sbuf("async.out"); std::ostream astream(&sbuf); std::ifstream in("async_stream.cpp"); for (std::string line; std::getline(in, line); ) { astream << line << '\n' << std::flush; } }