1

Using Python, I want to identify French text in a list of short strings (from 1 to about 50 words) which are otherwise in English.

An example of the input data (input strings here are separated by commas):

year of the snake, legendary 'dragon horse', thunder, damsel-fly, larvae of mosquito, treillage, libellule, mythical water creature, petites chevrettes, de papillon hideux, the horse-fly, 5th earthly branch, dragon, mythical creature, a shore plant whose leaves dry a bright orange, dragon horse, god of rain, year of the dragon, orthopteran, crocodile, dont le duvet des ailes s'en va en poussière, insecte, dragonfly, dracontomelon vitiense, dragon king, petit filet pour une espèce de papillon, sorte d'insecte 

Ideally I want to use a library that's already been built, as I'm aware that this is a difficult problem. However, the natural language library in Python I am most familiar with, nltk, does not seem to have the ability to do this, or if it does I haven't found it.

I'm aware that identifying a single word or two is likely to be very difficult, and I'd rather have false negatives (French misidentified as English) than false positives.

1
  • 1
    There are datasets and NN models here and here to do so! Commented Aug 23, 2021 at 7:39

1 Answer 1

2

There are various approaches to this problem. A rather more traditional and exact (but also prone to issues with new words) is to use a thesaurus for French and English and check if the phrase is found in one or the other (full match or more words matching).

Another one is to use a package for language detection.

Yet another one would be to use an ML language model to classify phrases (e.g. SpaCy lang_detect model).

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! I chose to use langid since it had the best performance on the data I was looking at, but langdetect (also suggested by Jordi Carr on the nltk-users mailing list) and cld3 were also viable options for this task.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.