2
$\begingroup$

I have a dataset that has been trained on word2vec. Is it a good idea to cluster the output vectors?.

$\endgroup$
1
  • $\begingroup$ Don't cluster with the Euclidean distance if you're operating in very high dimensions (typical of word2vec). Use cosine similarity instead. The reason is a bit technical; cf. this thread. $\endgroup$ Commented Mar 11, 2016 at 2:35

1 Answer 1

2
$\begingroup$

It's totally fine to cluster word2vec output to know semantically similar words. KMeans is an option, you might also want to checkout some approximate neighbor scheme such as Locality Sensitive Hashing.

$\endgroup$
1
  • $\begingroup$ I was also looking at examples where people had taken an average output of the prediction. $\endgroup$ Commented Mar 12, 2016 at 19:03

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.