Hello i'm looking to find a solution of my issue : I Want to find a list of similar words with french and english For example : name could be : first name, last name, nom, prénom, username.... Postal address could be : city, country, street, ville, pays, code postale ....
2 Answers
The other answer, and comments, describe how to get synonyms, but I think you want more than that?
I can suggest two broad approaches: WordNet and word embeddings.
Using nltk and wordnet, you want to explore the adjacent graph nodes. See http://www.nltk.org/howto/wordnet.html for an overview of the functions available. I'd suggest that once you've found your start word in Wordnet, follow all its relations, but also go up to the hypernym, and do the same there.
Finding the start word is not always easy: http://wordnetweb.princeton.edu/perl/webwn?s=Postal+address&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=
Instead it seems I have to use "address": http://wordnetweb.princeton.edu/perl/webwn?s=address&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h= and then decide which of those is the correct sense here. Then try clicking the hypernym, hyponym, sister term, etc. To be honest, none of those feels quite right.
Open Multilingual WordNet tries to link different languages. http://compling.hss.ntu.edu.sg/omw/ So you could take your English WordNet code, and move to the French WordNet with it, or vice versa.
The other approach is to use word embeddings. You find the, say, 300 dimensional, vector of your source word, and then hunt for the nearest words in that vector space. This will be returning words that are used in similar contexts, so they could be similar meaning, or similar syntactically.
Spacy has a good implementation, see https://spacy.io/usage/spacy-101#vectors-similarity and https://spacy.io/usage/vectors-similarity
Regarding English and French, normally you would work in the two languages independently. But if you search for "multilingual word embeddings" you will find some papers and projects where the vector stays the same for the same concept in different languages.
Note: the API is geared towards telling you how two words are similar, not finding similar words. To find similar words you need to take your vector and compare with every other word vector, which is O(N) in the size of the vocabulary. So you might want to do this offline, and build your own "synonyms-and-similar" dictionary for each word of interest.
nltklibraryPyDictionary import PyDictionarylibrary to get the English synonyms.