I'd give a look to Moody Camel's implementation. It's
It is a fast general purpose lock-free queue for C++ entirely written in C++11 by design. Documentation seems to be rather good along with a few performance tests.
Among all other interesting things (they're worth a read anyway), it's all contained in a single header, and available under the simplified BSD license. Just drop it in your project and enjoy!