0
$\begingroup$

I am beginner in machine learning. My project is to make search engine based on AI which shows related articles when we search on website. For this i decided to train my own embedding.

I found two methods for this:

  • One is to train network to find next word( i.e inputs=[the quick,the quick brown,the quick brown fox] and outputs=[brown, fox,lazy]
  • Other method is to train with nearest words(i.e [brown,fox],[brown,quick],[brown,quick]).

Which method should i use and after training how should i convert the sentence to a single vector to apply cosine similarity means sentence- the quick brown fox will return 4 vectors how should i convert it to feed for cosine similarity(which takes only one vector) with another sentence.

$\endgroup$

1 Answer 1

0
$\begingroup$

I find your question bit convoluted, so I will answer with the following bullet points:

  • Train your own word embeddings: There are many implementations out there, gensim is one.
  • Find related articles: On that point, without being an expert, I would suggest to do some research on Topic Modelling. There are also a lot of libraries you can use.
  • Word embeddings to sentence embeddings: This process is not as straightforward, the semantics change just by adding words together. You can use Word Mover's Distance or numerous other which train in a supervised way sentence embeddings or unsupervised.
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.