Skip to main content

In NLP, people tend to use cosine similarity to measure document/text distances. I want to hear what do people think of the following two scenarios, which to pick, cosine similarity or Euclidean?

Overview of the task setting.set: The task is to compute context similarities of multi-word expressions. For example, suppose we were given aan MWE of put up, context refers to the words on the left side of put up and as well as the words on the right-side side of it in one text. Mathematically speaking, similarity in this task is about calculating

sim(context_of_using_"put_up", context_of_using_"in_short") 

Note that context is the feature that built on top of word embeddings, let's assume each word has an embedding dimension of 200:

Two scenarios of representing context_of_an_expression.

  1. concatenate the left and right context words, producing an embedding vector of dimension 200*4=800 if picking two words on each side. In other words, a feature vector of [lc1, lc2, rc1, rc2] is build for context, where lc=left_context and rc=right_context.

  2. get the mean of the sum of left and right context words, producing a vector of 200 dimensions. In other words, a feature vector of [mean(lc1+lc2+rc1+rc2)] is built for context.

[Edited] For both scenarios, I think Euclidean distance is a better fit. Cosine similarity is known for handling scale/length effects because of normalization. But I don't think there's much to be normalized.

In NLP, people tend to use cosine similarity to measure document/text distances. I want to hear what do people think of the following two scenarios, which to pick, cosine similarity or Euclidean?

Overview of the task setting. The task is to compute context similarities of multi-word expressions. For example, suppose we were given a MWE of put up, context refers to the words on the left side of put up and as well as the words on the right-side of it in one text. Mathematically speaking, similarity in this task is about calculating

sim(context_of_using_"put_up", context_of_using_"in_short") 

Note that context is the feature that built on top of word embeddings, let's assume each word has an embedding dimension of 200:

Two scenarios of representing context_of_an_expression.

  1. concatenate the left and right context words, producing an embedding vector of dimension 200*4=800 if picking two words each side. In other words, a feature vector of [lc1, lc2, rc1, rc2] is build for context, where lc=left_context and rc=right_context.

  2. get the mean of the sum of left and right context words, producing a vector of 200 dimensions. In other words, a feature vector of [mean(lc1+lc2+rc1+rc2)] is built for context.

[Edited] For both scenarios, I think Euclidean distance is a better fit. Cosine similarity is known for handling scale/length effects because of normalization. But I don't think there's much to be normalized.

In NLP, people tend to use cosine similarity to measure document/text distances. I want to hear what do people think of the following two scenarios, which to pick, cosine similarity or Euclidean?

Overview of the task set: The task is to compute context similarities of multi-word expressions. For example, suppose we were given an MWE of put up, context refers to the words on the left side of put up and as well as the words on the right side of it in one text. Mathematically speaking, similarity in this task is about calculating

sim(context_of_using_"put_up", context_of_using_"in_short") 

Note that context is the feature that built on top of word embeddings, let's assume each word has an embedding dimension of 200:

Two scenarios of representing context_of_an_expression.

  1. concatenate the left and right context words, producing an embedding vector of dimension 200*4=800 if picking two words on each side. In other words, a feature vector of [lc1, lc2, rc1, rc2] is build for context, where lc=left_context and rc=right_context.

  2. get the mean of the sum of left and right context words, producing a vector of 200 dimensions. In other words, a feature vector of [mean(lc1+lc2+rc1+rc2)] is built for context.

[Edited] For both scenarios, I think Euclidean distance is a better fit. Cosine similarity is known for handling scale/length effects because of normalization. But I don't think there's much to be normalized.

added 164 characters in body
Source Link
Logan
  • 513
  • 1
  • 4
  • 8

In NLP, people tend to use cosine similarity to measure to document/text distancedistances. I just want to hear out what do youpeople think forof the following two casesscenarios, which to pick, cosine similarity or Euclidean?

Overview of the task setting. The task is to compute context (left and right words of an expression) similarities of multi-word expressions (i.e., put up, rain cats and dogs). Mathematically For example, to calculatesuppose we were given a MWE of sim(context_1_mwe,put context_2_mwe)up. The, context refers to the words on the left side of context_n_mweput up feature vectorand as well as the words on the right-side of it in one text. Mathematically speaking, similarity in this task is buit fromabout calculating

sim(context_of_using_"put_up", context_of_using_"in_short") 

Note that context is the feature that built on top of word embeddings, let's assume theeach word has an embedding dimension isof 200.:

Two ways to representscenarios of representing context_n_mwecontext_of_an_expression:.

  1. concatenate the left and right 2 context words and then we have a new, producing an embedding vector of dimension 200*4=800 dimensionsif picking two words each side. In other words, a feature vector of [lc1, lc2, rc1, rc2] is build for context, where lc=left_context and rc=right_context.

  2. takeget the mean of the sum of left and right 2 context words and then we get, producing a vector of 200 dimensions. In other words, a feature vector of [mean(lc1+lc2+rc1+rc2)] is built for context.

Personal speaking[Edited] For both scenarios, I think Euclidean distance is a better fit for both cases. Cosine similarity is specialized inknown for handling scale/length effects because of normalization. For case 1, context length is fixed -- 4 words,But I don't think there's no scale effects. In terms of case 2, the term frequency matters, a word appears once is different from a word appears twice, we cannot apply cosinemuch to be normalized.

In NLP, people tend to use cosine similarity to measure to document/text distance. I just want to hear out what do you think for the following two cases, cosine similarity or Euclidean?

The task is to compute context (left and right words of an expression) similarities of multi-word expressions (i.e., put up, rain cats and dogs). Mathematically, to calculate sim(context_1_mwe, context_2_mwe). The context_n_mwe feature vector is buit from word embeddings, assume the embedding dimension is 200.

Two ways to represent context_n_mwe:

  1. concatenate left and right 2 context words and then we have a new embedding vector of 200*4=800 dimensions. In other words, a feature vector of [lc1, lc2, rc1, rc2] where lc=left_context and rc=right_context.

  2. take mean of the sum of left and right 2 context words and then we get a vector of 200 dimensions. In other words, a feature vector of [mean(lc1+lc2+rc1+rc2)].

Personal speaking, I think Euclidean distance is a better fit for both cases. Cosine similarity is specialized in handling scale/length effects. For case 1, context length is fixed -- 4 words, there's no scale effects. In terms of case 2, the term frequency matters, a word appears once is different from a word appears twice, we cannot apply cosine.

In NLP, people tend to use cosine similarity to measure document/text distances. I want to hear what do people think of the following two scenarios, which to pick, cosine similarity or Euclidean?

Overview of the task setting. The task is to compute context similarities of multi-word expressions. For example, suppose we were given a MWE of put up, context refers to the words on the left side of put up and as well as the words on the right-side of it in one text. Mathematically speaking, similarity in this task is about calculating

sim(context_of_using_"put_up", context_of_using_"in_short") 

Note that context is the feature that built on top of word embeddings, let's assume each word has an embedding dimension of 200:

Two scenarios of representing context_of_an_expression.

  1. concatenate the left and right context words, producing an embedding vector of dimension 200*4=800 if picking two words each side. In other words, a feature vector of [lc1, lc2, rc1, rc2] is build for context, where lc=left_context and rc=right_context.

  2. get the mean of the sum of left and right context words, producing a vector of 200 dimensions. In other words, a feature vector of [mean(lc1+lc2+rc1+rc2)] is built for context.

[Edited] For both scenarios, I think Euclidean distance is a better fit. Cosine similarity is known for handling scale/length effects because of normalization. But I don't think there's much to be normalized.

Bumped by Community user
Bumped by Community user
Source Link
Logan
  • 513
  • 1
  • 4
  • 8

When to use cosine simlarity over Euclidean similarity

In NLP, people tend to use cosine similarity to measure to document/text distance. I just want to hear out what do you think for the following two cases, cosine similarity or Euclidean?

The task is to compute context (left and right words of an expression) similarities of multi-word expressions (i.e., put up, rain cats and dogs). Mathematically, to calculate sim(context_1_mwe, context_2_mwe). The context_n_mwe feature vector is buit from word embeddings, assume the embedding dimension is 200.

Two ways to represent context_n_mwe:

  1. concatenate left and right 2 context words and then we have a new embedding vector of 200*4=800 dimensions. In other words, a feature vector of [lc1, lc2, rc1, rc2] where lc=left_context and rc=right_context.

  2. take mean of the sum of left and right 2 context words and then we get a vector of 200 dimensions. In other words, a feature vector of [mean(lc1+lc2+rc1+rc2)].

Personal speaking, I think Euclidean distance is a better fit for both cases. Cosine similarity is specialized in handling scale/length effects. For case 1, context length is fixed -- 4 words, there's no scale effects. In terms of case 2, the term frequency matters, a word appears once is different from a word appears twice, we cannot apply cosine.