Does the mean/median of a set sentence embedded vectors represent anything?

Question

Please bear with me as I am new to NLP. I am specifically using tensorflow's universal sentence encoder: https://tfhub.dev/google/universal-sentence-encoder-large/3

I am clustering text based on the cosine similarity of the embedding produced by the model and I want to see what cluster a new text would most likely lie in. I was going to compare the new text embedding to the mean/median of all the embeddings within a cluster to see which cluster it would most likely lie in. Would taking the mean/median of the cluster's vectors "represent" the general idea of the cluster or will the vector not represent what I am looking for?

Has QUIT--Anony-Mousse · Accepted Answer · 2019-07-17 20:32:37Z

Well, the mean is pretty average for all the words.

These tend to all be quite similar, cluster in the center of the data, and have nearest neighbors to pretty bland, generic words.

The average word vector is not a good representation of what a text is about.

Stack Exchange Network

Does the mean/median of a set sentence embedded vectors represent anything?

1 Answer 1

Hot Network Questions

Does the mean/median of a set sentence embedded vectors represent anything?

1 Answer 1

Related

Hot Network Questions