How to compute sentence embedding from word2vec model?

Question

I am new to NLP and I'm trying to perform embedding for a clustering problem. I have created the word2vec model using Python's gensim library, but I am wondering the following:

The word2vec model embeds the words to vectors of size vector_size. However, in further steps of the clustering approach, I realised I was clustering based on single words instead of the sentences I had in my dataset at the beginning.

Let's say my vocabulary is composed of the two words foo and bar, mapped as follows:

foo: [0.0045, -0.0593, 0.0045]
bar: [-0.943, 0.05311, 0.5839]

If I have a sentence bar foo, how can I embed it? I mean, how can I get the vector of the entire sentence as a whole?

Thanks in advance.

noe · Accepted Answer · 2022-02-14 08:28:22Z

0

The usual approach is to average the vectors of all words in the sentence.

answered Feb 14, 2022 at 8:28

noe

28.5k1 gold badge49 silver badges85 bronze badges

$\begingroup$ This was my first thought too. However, I have just realised that there is a Doc2Vec model that basically appends each word's vector. At the end, I would have to average all of the vectors appended by Doc2Vec I guess? $\endgroup$

albertoperdomo2
– albertoperdomo2

2022-02-14 08:30:33 +00:00
Commented Feb 14, 2022 at 8:30
$\begingroup$ Sorry, I have never used doc2vec, so I can't answer the question in your comment. You may post it as a separate question to get answers to it. $\endgroup$

noe
– noe

2022-02-14 09:47:43 +00:00
Commented Feb 14, 2022 at 9:47

Add a comment |

Stack Exchange Network

How to compute sentence embedding from word2vec model?

1 Answer 1

Hot Network Questions

How to compute sentence embedding from word2vec model?

1 Answer 1

Related

Hot Network Questions