Multiprocessor systems have some kind of cache coherency protocols built into them e.g. MSI, MESI etc. The only case where cache coherency matters is when instructions executing in two different processors tries to write/read shared data. For the shared data to be practically valid, programmer anyway has to introduce memory barriers. If there is no memory barrier, the shared data is going to be "wrong" regardless of whether underlying processor implements cache coherence or not. Why then the need of cache coherence mechanisms at hardware level?
1 Answer
Without cache coherency, instead of merely barriers, you'd have to flush and invalidate caches when accessing shared data, which has a much higher overhead than cache coherency.
Historically, there have been a few shared memory multiprocessor architectures, but they have all died out in favor of CC due to being very difficult to program correctly and efficiently.
std::memory_order_relaxed, i.e. just atomicity, no ordering wrt. other operations. Perhaps you're misunderstanding exactly what barriers do: Does a memory barrier ensure that the cache coherence has been completed?. Also When to use volatile with multi threading? discuses coherence making hand-rolled C atomics work