I am relatively new to using word2vec. I am interested in solving the topic-word intrusion introduced here by using the vector spaces of words generated by word2vec and SVC.
I have a corpus with a vocabulary of 8000 words, the vocabulary is perfectly contained in Google's word2vec trained model. I was wondering which model would provide a better representation of the words, the pre-trained model on 3M words or a model trained only on the 8000 words appearing in my corpus?