A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
- Updated
Oct 14, 2025 - Python
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
Neural Fuzzy Repair (NFR) is a data augmentation pipeline, which integrates fuzzy matches (i.e. similar translations) into neural machine translation.
Scripts for machine translation corpora filtering/ 机器翻译平行语料过滤的脚本
Personal NMT Playground
Python script to split the text generated by 'wikipedia parallel title extractor' into separate text files (separate file for each language)
Replication package for SO processing for bitext
Extend/Passing extra source tokens to seq2seq encoder (PyTorch)
Add a description, image, and links to the machine-translation-data-processing topic page so that developers can more easily learn about it.
To associate your repository with the machine-translation-data-processing topic, visit your repo's landing page and select "manage topics."