General question from the first part is solved by using a third mutex, that is always captured before working with any of lower level components. This ensures proper locking order and safety without invasive changes in lower level components.
Surprisingly, the example problem from second part has nothing to do with concurrency!
The abstraction leak you experience is caused by exposing the implementation detail (shared_ptr) to clients of the component.
Instead of a shared_ptr, return a wrapper object that knows about Cache's mutex and handles weak dereferencing (capturing the mutex before disposal or dereference). Note, that in your context you are unlikely to need the full power of shared_ptr in client code, so expose the minimal interface client needs and keep implementation simple.