2

The problems with the original Double-Checked Locking pattern have been well-documented: C++ and the Perils of Double-Checked Locking. I have seen the topic come up in questions on SO fairly often.

I have come up with a version that, to me, seems to solve the race condition problem of the original pattern, but does it look ok to you?

In the code below, I assume that LOCK is a properly implemented mutex type that causes a memory barrier at it's lock/unlock, and I don't attempt to deal with the deallocation of the instance, as that's not part of the pattern.

I am aware of the other possible ways to do a singleton, but that's not what I am asking for. I am specifically asking about the pattern - is the race condition solved by having the allocation outside of the mutex lock?.

template <typename T, typename LOCK> class Singleton { private: static T * instance_; static LOCK mutex; // private - inaccessible Singleton(); Singleton(const Singleton &); Singleton & operator=(const Singleton &); public: static T * get_instance() { if (instance_ == NULL) { T * temp = new T; mutex.lock(); if (instance_ == NULL) instance = temp; else delete temp; mutex.unlock(); } return instance_; } }; 
4
  • What's wrong with just allocating the instance in a temporary, issuing a memory barrier, then stuffing it into the final shared variable (then unlocking)? Commented Aug 26, 2012 at 8:37
  • @KevinBallard, there is no portable way of doing a memory barrier, AFAIK Commented Aug 26, 2012 at 8:38
  • 3
    Weird question. There is a portable way of doing a memory barrier in C++11. Before then, there wasn't, but there was no portable way of creating a mutex either. Or a thread, for that matter Commented Aug 26, 2012 at 9:20
  • @jalf, ok I see your point. So this could be rewritten with a memory barrier in c++11, but that's completely equivalent to this version. The question would be the same. Commented Aug 26, 2012 at 9:23

2 Answers 2

3

No this is not safe. There is a race condition on instance_ (one thread could be reading from it (if (instance_ == NULL)) while another is writing to it (instance = temp;)), so this code has undefined behaviour.

You are asking if the single memory fence created by the lock is sufficient. From the perspective of C++11, it is not. From a non-C++11 perspective, I can't say for sure, but relying on non-atomic and non-mutex types for synchronization seems unlikely to work (in the pre-C++11 world, mutexes and atomic variables only work through compiler and processor specific hacks, and it seems foolish to rely on them to do anything more than their bare specification).

Sign up to request clarification or add additional context in comments.

12 Comments

I am not sure I follow. Are you saying that writing to a pointer is somehow not atomic? If the writing thread gets preempted after it has temp in a register, but before it has written it out, then the reading thread will see NULL and continue and hit the mutex.
@LuboAntonov: Unless you have explicitly made the pointer atomic, any concurrent access/modification will result in undefined behaviour.
do you think that, at the system level, the write from a register to memory is not atomic? It seems to me that the reading thread will either see a 0 (in which case it will continue and stop on the mutex), or the correct pointer value for instance_. There is no intermediate possibility, where the memory location has only half the address or something like that.
As far as a singleton implementation, your version is fine, but suffers the performance penalty of a cache synchronization every time you access instance_, essentially the same as volatile (right?). Plus, this only works in c++ 0x11. As I mentioned, I am specifically interested in the DCL pattern and possible issues with this implementation of it.
@MSalters, there is no problem with the cache - the mutex operations have an implied memory fence, or you can do an explicit memory fence (see the comments to the question). The problem is that the standard leaves as undefined the atomicity of the (low-level) memory store operation. So on x86 the atomicity is effectively guaranteed, since no compiler would break down a 32-bit store into 2 16-bit stores, lets say. But on an embedded architecture one could imagine a 32-bit address stored with two machine instructions of 16-bit each, resulting in a partial address, if the thread gets preempted.
|
1

The problem, as has been mentioned elsewhere, is that there is a data race on accesses to instance_: the first if statement reads the value, and the later assignment to instance_ writes to that variable. With no synchronization between these two operations, the behavior is undefined. But there's an easy solution if you have C++11. Change the declaration

static T * instance_; 

to

static std::atomic<T *> instance_; 

Now all the accesses of instance_ are atomic, which guarantees no tearing and provides synchronization.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.