Is the following singleton implementation data-race free?
static std::atomic<Tp *> m_instance; ... static Tp & instance() { if (!m_instance.load(std::memory_order_relaxed)) { std::lock_guard<std::mutex> lock(m_mutex); if (!m_instance.load(std::memory_order_acquire)) { Tp * i = new Tp; m_instance.store(i, std::memory_order_release); } } return * m_instance.load(std::memory_order_relaxed); } Is the std::memory_model_acquire of the load operation superfluous? Is it possible to further relax both load and store operations by switching them to std::memory_order_relaxed? In that case, is the acquire/release semantic of std::mutex enough to guarantee its correctness, or a further std::atomic_thread_fence(std::memory_order_release) is also required to ensure that the writes to memory of the constructor happen before the relaxed store? Yet, is the use of fence equivalent to have the store with memory_order_release?
EDIT: Thanks to the answer of John, I came up with the following implementation that should be data-race free. Even though the inner load could be non-atomic at all, I decided to leave a relaxed load in that it does not affect the performance. In comparison to always have an outer load with the acquire memory order, the thread_local machinery improves the performance of accessing the instance of about an order of magnitude.
static Tp & instance() { static thread_local Tp *instance; if (!instance && !(instance = m_instance.load(std::memory_order_acquire))) { std::lock_guard<std::mutex> lock(m_mutex); if (!(instance = m_instance.load(std::memory_order_relaxed))) { instance = new Tp; m_instance.store(instance, std::memory_order_release); } } return *instance; }
[15] ISO/IEC 14882:1998