Revisions to When to use cosine simlarity over Euclidean similarity

sounds weird to use the same word twice in the same sentence

edited Apr 27, 2021 at 15:00

61
1
3

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstancesthen.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it in Fig.16 here if you wish.

To this end, I actually think the use of cosine similarity is a hold-over from a more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstances.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it here if you wish.

To this end, I actually think the use of cosine similarity is a hold-over from a more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

The intuition built by the top response is spot-on for tf-idf vectors, and carries over to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster then.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it in Fig.16 here if you wish.

To this end, I actually think the use of cosine similarity is a hold-over from classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

Corrected hyperlink

Source Link

edit approved Apr 27, 2021 at 13:01

Shayan Shafiq

932
4
13
24

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstances.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it here if you wish: https://arxiv.org/pdf/1803.02839.pdfhere if you wish.

To this end, I actually think the use of cosine similarity is a hold-over from a more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstances.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it here if you wish: https://arxiv.org/pdf/1803.02839.pdf

To this end, I actually think the use of cosine similarity is a hold-over from more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstances.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it here if you wish.

To this end, I actually think the use of cosine similarity is a hold-over from a more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

Source Link

answered Apr 27, 2021 at 2:08

Sean Cantrell

61
1
3

The intuition built by the top response is spot-on for tf-idf vectors, and carries over naturally to any vector that naturally wants to be normalized. However, in such circumstances, cosine similarity is bijective with Euclidean distance, so there's no real advantage to one over the other theoretically; in practice, cosine similarity is faster in such circumstances.

The second response is incorrect, and sparsity doesn't matter in this context, at least, not in practice. In fact, I wrote a paper on the algebraic (and some geometric) properties of word embeddings trained end-to-end using RNNs on NLP tasks (well, on a classification task) and found that cosine similarity is a far, far weaker estimate of a word's action on the hidden state (heuristically, the "meaning" of the sentence) than the Euclidean distance. You can review it here if you wish: https://arxiv.org/pdf/1803.02839.pdf

To this end, I actually think the use of cosine similarity is a hold-over from more classical NLP that made use of tf-idf vectors, and really needs to be abandoned.

Stack Exchange Network

Return to Answer