Time, is the problem. When you try to localize the voices for every language you publish, you would need to spend so much time on voice-over.
FPS games are luckier in this perspective though, they have much less texts to voice, compared to RPG games.
Synchronizing animations are another problem. When you are voice-overing a character, it would be so much unrealistic to watch that character making noises without moving it's mouth, or etc. So, less voice = less effort on syncing animations with voice.
And actually, considering most of the gamers usually skip the dialogs to read the summary, spending most of your efforts on voice-acting is some kind of "playing for the audience".