1

I have millions of sentence fragments and I am trying to determine if each is in English, French, Japanese, or Germ. Is there a python program to do this?

s1 = 'This is where lies a person' s2 = 'ボウリング・フォー・コロンバイン(字幕版)' s3 = 'Ep. 2448 : épisode du 12 mars 2014 (Plus belle la vie, Saison 10, Vol. 6) language_of_string(s1) ==> EN language_of_string(s2) ==> JP language_of_string(s3) ==> FR 

2 Answers 2

2

try langid with source code https://github.com/saffsd/langid.py

>>> import langid >>> langid.classify("This is a test") ('en', 0.99999999099035441) 
Sign up to request clarification or add additional context in comments.

Comments

1

guess_language

s1 = 'This is where lies a person' s2 = 'ボウリング・フォー・コロンバイン(字幕版)' s3 = 'Ep. 2448 : épisode du 12 mars 2014 (Plus belle la vie, Saison 10, Vol. 6)' import guess_language print guess_language.guessLanguage(s1) print guess_language.guessLanguage(s2) print guess_language.guessLanguage(s3) en ja fr 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.