makes the whole process of collecting,cleaning and sorting datasets alot easier Supported but not enough Datasets git clone https://github.com/silenterus/deepspeech-cleaner cd deepspeech-cleaner pip install -r requirements.txt download/analyze/insert all available corpora for french python3 deepspeech-cleaner.py download --lang fr insert corpora - in case you download the files by yourself python3 deepspeech-cleaner.py insert /path/to/corpora/ clean/sort/create all necessary files for training - includes lm.binary/trie if kenlm is installed python3 deepspeech-cleaner.py create clean/sort/create all necessary files for training - no cleaning and no lm.binary/trie creation python3 deepspeech-cleaner.py create --noclean --notrie start deepspeech training bash languages/fr/training/standard/start_train.sh python3 deepspeech-cleaner.py crawl Test num2words and your replacement rules python3 deepspeech-cleaner.py test 1 2 3 is not for me python3 deepspeech-cleaner.py test /path/to/textfile.txt convert/trimm/trimmsilence all audio files in your Database python3 deepspeech-cleaner.py convert all arguments are saved for each language seperately python3 deepspeech-cleaner.py autosave only files with a number attached will be used <0 used before number translation =>0 used after number translation replace a word/symbol with '�' and the whole sentence get rejected spaces at the start/end are important for whole words change the string based sql querys in languages/fr/sql_query/.. files are named like the tables in your "audio.db" '!' at the end of a line functions as NOT python3 deepspeech-cleaner.py help <---< samplerate [16000-48000] >---> corpora [forscher-tuda-vox16-zamia-custom-tatoeba-librivox-cv] >---> words per sec [2.07] >---> letters per sec [13.35] >---> train files [237463] Test - WER: 0.098498, CER: 3.228931, loss: 23.721140
WER: 3.500000, CER: 37.000000, loss: 326.320953
src: “eine neue” res: “einem neuen leben und neuen pflichten entgegen” WER: 3.000000, CER: 6.000000, loss: 7.963222
src: “ausverkauft” res: “aus der fast” WER: 3.000000, CER: 5.000000, loss: 11.577581
src: “riesengebirge” res: “riesen der berge” WER: 3.000000, CER: 6.000000, loss: 11.873451
src: “beerdigung” res: “wer die un” WER: 3.000000, CER: 8.000000, loss: 17.944910
src: “besuchstermin” res: “es wuchs der” WER: 3.000000, CER: 6.000000, loss: 22.410923
src: “beerdigung” res: “wer die un” WER: 3.000000, CER: 4.000000, loss: 25.310646
src: “weitermachen” res: “bei der machen” WER: 3.000000, CER: 34.000000, loss: 237.857559
src: “misses dent” res: “es ist mein wunsch vergessen vernachlässigt” WER: 3.000000, CER: 74.000000, loss: 484.282074
src: “es endigte mit einem” res: “es endigte mit einem lauten schall welcher in jedem einsamen zimmer in echo zu wecken schienen” WER: 2.800000, CER: 69.000000, loss: 650.892578
src: “computer alarm in neun minuten” res: “per definition handelt es sich bei diesen geräten im engeren sinn um personal computer” <---> samplerate [16000-48000] <---> corpora [librivox-tatoeba] <---> words per sec [2.25] <---> letters per sec [14.53] <---> train files [18134] I Test of Epoch 12 - WER: 0.137465, loss: 29.99004187996005, mean edit distance: 0.058884 I WER: 0.142857, loss: 4.163468, mean edit distance: 0.065217 I - src: "jak w ogóle we wszystkich naszych obliczeniach" I - res: "a w ogóle we wszystkich naszych obliczeniach " I WER: 0.142857, loss: 4.163468, mean edit distance: 0.065217 I - src: "jak w ogóle we wszystkich naszych obliczeniach" I - res: "a w ogóle we wszystkich naszych obliczeniach " I WER: 0.181818, loss: 6.447145, mean edit distance: 0.025641 I - src: "pomimoto w stosunku wokulskiego do panny izabeli pierwsze lody były przełamane" I - res: "pomimo to w stosunku wokulskiego do panny izabeli pierwsze lody były przełamane " I WER: 0.400000, loss: 6.677766, mean edit distance: 0.107143 I - src: "otarła oczy i ciągnęła dalej" I - res: "otarołaoczy i ciągnęła dalej " I WER: 0.400000, loss: 6.677766, mean edit distance: 0.107143 I - src: "otarła oczy i ciągnęła dalej" I - res: "otarołaoczy i ciągnęła dalej " I WER: 0.500000, loss: 1.875308, mean edit distance: 0.105263 I - src: "niedziela sprowadzą" I - res: "niedziela prowadzą " I WER: 0.500000, loss: 1.875308, mean edit distance: 0.105263 I - src: "niedziela sprowadzą" I - res: "niedziela prowadzą " I WER: 1.000000, loss: 3.942765, mean edit distance: 0.105263 I - src: "tu będzie licytacya" I - res: "tubędzielicytacya" I WER: 1.000000, loss: 3.942765, mean edit distance: 0.105263 I - src: "tu będzie licytacya" I - res: "tubędzielicytacya" I WER: 1.000000, loss: 6.762781, mean edit distance: 0.176471 I - src: "jakto z kucharzem" I - res: "jak to skucharzem" <---> corpora [librivox-vox-tatoeba] <---> samplerate [16000-48000] <---> words per sec [2.26] <---> letters per sec [12.85] <---> train files [97486] I Test of Epoch 12 - WER: 0.139222, loss: 16.857607432188242, mean edit distance: 0.060826 I WER: 0.250000, loss: 0.047055, mean edit distance: 0.047619 I - src: "tengo que comprar uno" I - res: "tengo que comprar un " I WER: 0.500000, loss: 0.039710, mean edit distance: 0.083333 I WER: 0.500000, loss: 0.072996, mean edit distance: 0.111111 I WER: 0.500000, loss: 0.072996, mean edit distance: 0.111111 I WER: 0.500000, loss: 0.098463, mean edit distance: 0.071429 I - src: "cuándo termina" I - res: "cuando termina" I WER: 1.000000, loss: 0.027957, mean edit distance: 0.100000 I WER: 1.000000, loss: 0.089742, mean edit distance: 0.125000 I WER: 1.000000, loss: 0.092845, mean edit distance: 0.100000 I WER: 1.000000, loss: 0.092845, mean edit distance: 0.100000 I WER: 1.000000, loss: 0.099211, mean edit distance: 0.076923 <--->samplerate [16000-48000] <---> corpora [librivox-tatoeba-vox16-accent] <---> words per sec [2.37] <---> letters per sec [14.41] <---> train files [87938] I Test of Epoch 11 - WER: 0.227659, loss: 38.279466658148145, mean edit distance: 0.123504 I WER: 0.333333, loss: 0.538573, mean edit distance: 0.166667 I WER: 0.333333, loss: 0.656955, mean edit distance: 0.166667 I WER: 0.333333, loss: 0.885854, mean edit distance: 0.062500 I - src: "nous avons gagné" I - res: "nous avons gagne" I WER: 0.333333, loss: 0.885854, mean edit distance: 0.062500 I - src: "nous avons gagné" I - res: "nous avons gagne" I WER: 0.500000, loss: 0.314220, mean edit distance: 0.333333 I WER: 1.000000, loss: 0.245572, mean edit distance: 1.000000 I WER: 1.000000, loss: 0.448257, mean edit distance: 1.000000 I WER: 1.000000, loss: 0.448257, mean edit distance: 1.000000 I WER: 1.000000, loss: 0.628055, mean edit distance: 0.333333 I WER: 1.000000, loss: 0.628055, mean edit distance: 0.333333 <---> corpora [librivox-vox-tatoeba] <---> samplerate [16000-48000] <---> words per sec [2.17] <---> letters per sec [12.83] <---> train files [58304] I Test of Epoch 10 - WER: 0.184894, loss: 28.62499210021505, mean edit distance: 0.075463 I WER: 0.083333, loss: 1.599633, mean edit distance: 0.029851 I - src: "cosí riflettendo su le sue sciagure bruno celèsia si ridusse a casa" I - res: "così riflettendo su le sue sciagure bruno celèsia si ridusse a casa " I WER: 0.090909, loss: 1.664164, mean edit distance: 0.033333 I - src: "abbiamo forse fatto male no niente di male rispose il medico" I - res: "abbiamo forse fatto male no niente di male rispose il medio " I WER: 0.100000, loss: 1.168548, mean edit distance: 0.033898 I - src: "perchè vedete signora voi siete stata la pietra di paragone" I - res: "perché vedete signora voi siete stata la pietra di paragone " I WER: 0.100000, loss: 1.493682, mean edit distance: 0.016129 I - src: "state zitto avaraccio gridò carmaux che slegava il povero uomo" I - res: "state zitto avaraccio gridò carmaux che slegava il povero uuomo" I WER: 0.100000, loss: 1.706887, mean edit distance: 0.040816 I - src: "oh esclamò in quel momento toby che si era levato" I - res: "o esclamò in quel momento toby che si era levato " I WER: 0.142857, loss: 0.449785, mean edit distance: 0.046512 I - src: "giunsi al paese senza averne fissato alcuno" I - res: "giunse al paese senza averne fissato alcuno " I WER: 0.142857, loss: 1.841321, mean edit distance: 0.058824 I - src: "le ricerche durarono più d un mese" I - res: "le ricerche durarono più di un mese " I WER: 0.200000, loss: 0.612865, mean edit distance: 0.083333 I - src: "ah e quale filippo ferri" I - res: "a e quale filippo ferri " I WER: 0.200000, loss: 0.969935, mean edit distance: 0.086957 I - src: "entrai in un altra sala" I - res: "entra in un altra sala " I WER: 0.200000, loss: 0.969935, mean edit distance: 0.086957 I - src: "entrai in un altra sala" I - res: "entra in un altra sala " <---> corpora [librivox-vox-tatoeba] <---> samplerate [16000-48000] <---> words per sec [1.94] <---> letters per sec [11.66] <---> train files [22351] I Test of Epoch 10 - WER: 0.299552, loss: 41.175528268814084, mean edit distance: 0.117625 I WER: 0.250000, loss: 1.425027, mean edit distance: 0.117647 I - src: "але як се зробити" I - res: "але як це зробити " I WER: 0.250000, loss: 1.425027, mean edit distance: 0.117647 I - src: "але як се зробити" I - res: "але як це зробити " I WER: 0.285714, loss: 2.314395, mean edit distance: 0.066667 I - src: "тож до тебе я зверну свою мову" I - res: "то ж до тебе я зверну свою мову " I WER: 0.285714, loss: 2.314395, mean edit distance: 0.066667 I - src: "тож до тебе я зверну свою мову" I - res: "то ж до тебе я зверну свою мову " I WER: 0.333333, loss: 2.467164, mean edit distance: 0.250000 I WER: 0.333333, loss: 2.467164, mean edit distance: 0.250000 I WER: 0.500000, loss: 2.119555, mean edit distance: 0.142857 I WER: 0.500000, loss: 2.119555, mean edit distance: 0.142857 I WER: 1.000000, loss: 0.684362, mean edit distance: 0.333333 I WER: 1.000000, loss: 0.684362, mean edit distance: 0.333333 <---> samplerate [16000-48000] <---> words per sec [1.91] <---> letters per sec [11.79] <---> train files [20360] I Test of Epoch 12 - WER: 0.369255, loss: 49.01442650910262, mean edit distance: 0.155081 I WER: 0.500000, loss: 0.076582, mean edit distance: 0.200000 I WER: 0.500000, loss: 0.076582, mean edit distance: 0.200000 I WER: 0.500000, loss: 0.199971, mean edit distance: 0.166667 I WER: 0.500000, loss: 0.199971, mean edit distance: 0.166667 I WER: 0.500000, loss: 0.276903, mean edit distance: 0.200000 I WER: 0.500000, loss: 0.276903, mean edit distance: 0.200000 I WER: 0.500000, loss: 0.312152, mean edit distance: 0.142857 I WER: 0.500000, loss: 0.312152, mean edit distance: 0.142857 I WER: 0.500000, loss: 0.868555, mean edit distance: 0.285714 I WER: 0.500000, loss: 0.868555, mean edit distance: 0.285714 <---> corpora [swc-vox-tatoeba] <---> samplerate [16000-48000] <---> words per sec [2.22] <---> letters per sec [13.9] <---> train files [30598] I Test of Epoch 9 - WER: 0.396161, loss: 92.96824162893921, mean edit distance: 0.193605 I WER: 0.083333, loss: 3.263168, mean edit distance: 0.014706 I - src: "de buurtschap ligt ten zuiden van dasselaar en ten westen van norden" I - res: "de buurtschap ligt ten zuiden van dasselaar en ten westen van noorden" I WER: 0.125000, loss: 3.376268, mean edit distance: 0.026316 I - src: "het is een restant van de oude zeedijk" I - res: "het is een restant van de oude zeedik" I WER: 0.142857, loss: 2.820412, mean edit distance: 0.025000 I - src: "de herkomst van dit wapen is onduidelijk" I - res: "de herkomst van dit wapen is onduidenlijk" I WER: 0.142857, loss: 3.029150, mean edit distance: 0.028571 I - src: "het ligt iets ten noorden van gendt" I - res: "het ligt iets ten noorden van gent" I WER: 0.142857, loss: 3.029150, mean edit distance: 0.028571 I - src: "het ligt iets ten noorden van gendt" I - res: "het ligt iets ten noorden van gent" I WER: 0.142857, loss: 3.058265, mean edit distance: 0.025641 I - src: "bij het buurtje lag een wierde die in de negentiende eeuw geheel is afgegraven" I - res: "bij het buurtje lag een wierde die in de negentien e eeuw geheel is afgegraven " I WER: 0.222222, loss: 2.067109, mean edit distance: 0.023256 I - src: "het dorp ligt op de rechteroever van de lek" I - res: "het dorp ligt op de rechter oever van de lek" I WER: 0.285714, loss: 1.334180, mean edit distance: 0.025000 I - src: "het dorp ontstond in de negentiende eeuw" I - res: "het dorp ontstond in de negentien e eeuw" I WER: 0.333333, loss: 2.122649, mean edit distance: 0.017857 I - src: "in duizendzeshonderdeenenvijftig wordt een sluis gebouwd" I - res: "in duizendzeshonderdeenenvijftig wordt een sluisgebouwd" I WER: 0.333333, loss: 2.648912, mean edit distance: 0.026316 I - src: "hier wordt lesgegeven aan de onderbouw" I - res: "hier wordt les gegeven aan de onderbouw" <---> samplerate [16000-48000] <---> corpora [tatoeba-vox16] <---> words per sec [1.88] <---> letters per sec [9.98] I Test of Epoch 10 - WER: 0.507568, loss: 21.292116564373636, mean edit distance: 0.244271 I WER: 0.200000, loss: 1.065989, mean edit distance: 0.058824 I - src: "não foi tom não é" I - res: "não foi tom não " I WER: 0.250000, loss: 1.081908, mean edit distance: 0.200000 I - src: "tom não tem pai" I WER: 0.250000, loss: 1.081908, mean edit distance: 0.200000 I - src: "tom não tem pai" I WER: 0.333333, loss: 1.577532, mean edit distance: 0.083333 I WER: 0.333333, loss: 1.577532, mean edit distance: 0.083333 I WER: 0.500000, loss: 1.114254, mean edit distance: 0.083333 I WER: 0.500000, loss: 1.114254, mean edit distance: 0.083333 I WER: 0.500000, loss: 1.841137, mean edit distance: 0.333333 I WER: 0.500000, loss: 1.879081, mean edit distance: 0.100000 I WER: 0.500000, loss: 1.879081, mean edit distance: 0.100000