Skip to main content
sounds weird to use the same word twice in the same sentence
Source Link

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstancesthen.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it in Fig.16 here if you wish.

To this end, I actually think the use of cosine similarity is a hold-over from a more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstances.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it here if you wish.

To this end, I actually think the use of cosine similarity is a hold-over from a more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

The intuition built by the top response is spot-on for tf-idf vectors, and carries over to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster then.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it in Fig.16 here if you wish.

To this end, I actually think the use of cosine similarity is a hold-over from classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstances.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it here if you wish: https://arxiv.org/pdf/1803.02839.pdfhere if you wish.

To this end, I actually think the use of cosine similarity is a hold-over from a more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstances.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it here if you wish: https://arxiv.org/pdf/1803.02839.pdf

To this end, I actually think the use of cosine similarity is a hold-over from more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstances.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it here if you wish.

To this end, I actually think the use of cosine similarity is a hold-over from a more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

Source Link

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstances.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it here if you wish: https://arxiv.org/pdf/1803.02839.pdf

To this end, I actually think the use of cosine similarity is a hold-over from more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.