"Common mode voltage" is simply the average appearing on both signal pathways. It's easier for me to think of it in the context of two inputs of a differential amplifier, where the common mode voltage is unambiguously defined as \$(V_+ + V_-)/2 \$. Whether this number reflects what some consider noise or what others consider signal is irrelevant with respect to definition.
Now, as for why its problematic, sometimes it is, sometimes it isn't. Usually, my goal is to have all EM noise appear to a good amplifier as common mode, and use twisted pairs to achieve this. By "Good", I mean an amplifier with a high common mode rejection ratio. For such an amplifier, differential signals (\$V_+ - V_-)\$ get amplified and common mode voltages get attenuated (VERY attenuated if you're doing it right). If you don't use twisted pair, each signal path can see a very different pattern of EM noise, so the EM noise is no longer common mode, but differential.
One particular example highlighting the difference is pro audio, which passes signals around using twisted pair cable with XLR connectors, vs consumer audio, which uses single-end signal passing.
Even Common Mode noise is problematic if you don't have a high common mode rejection ratio. For example, if you build a "typical" one op-amp differential amplifier with poorly tolerancetoleranced (i.e., most) resistors, the common mode rejection ratio will be poor.
So, back to "why is it problematic"? -- its less problematic than differential noise, but not necessarily a magic technique for ridding signals of noise, especially if the hardware isn't built to optimally attenuate common mode signals.