As the title says: How do you properly test and benchmark different implementations of mutexes in c++? Essentially I wrote my own std::mutex like class for a project running on a 2 core, armv7 with the aim to minimize the overhead in the uncontested case. Now I'm considering using said mutex in more places and also different architectures, but before I do this I'd like to make sure that - it is actually correct - there aren't any pathological cases in which it performs much worse than a standard std::mutex. Obviously, I wrote a few basic unit tests and micro-benchmarks and everything seems to work, but in multi-threaded code "seems to work" doesn't give me great comfort. - So, are there any established static or dynamic analysis techniques? - What are common pitfalls when writing unit tests for mutex classes? - What are typical edge cases one should look out for (performance-wise)? I'm only using standard library types for the implementation, which includes non-sequential-consistent load & store operations on atomics. However, I'm mainly interested in implementation agnostic advice, since I'd like to use the same test harness for other implementations, too.