The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstancesthen.
The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it in Fig.16 here if you wish.
To this end, I actually think the use of cosine similarity is a hold-over from a more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.