Statistical and Neural Machine Translation This website contains resources for research in statistical and neural machine translation, i.e. the translation of text from one human language to another by a computer that learned how to translate from vast amounts of translated text. Events - Conference on machine translation: 2022, 2021, 2020, 2019, 2018, 2017, 2016.
- Workshop on machine translation: 2015. 2014. 2013. 2012. 2011. 2010. 2009. 2008. 2007. 2006.
- Workshop on building and using parallel text 2015
- Machine Translation Marathon: 2022, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011b, 2011a, 2010, 2009, 2008, 2007.
- Machine Translation Marathon of the Americas: 2022, 2019, 2018, 2017, 2016, 2015.
Resources - Textbook: Neural Machine Translation (2020)
- Textbook: Statistical Machine Translation (2010)
- Moses statistical machine translation toolkit
- Machine Translation Research Survey Wiki
- Proceedings of the European Parliament Proceedings (Europarl)
- 1 Billion Word Language Model Benchmark
- News Commentary
- N-gram counts and language models from the CommonCrawl (2014)
- SIGIR 2020 Tutorial: Searching the Web for Cross-lingual Web Data
- Data for "On the Impact of Various Types of Noise on Neural Machine Translation" (2018)
- Early Release of Parallel Data of Paracrawl (2016)
- Benchmark data for "Paracrawl: Web-Scale Acquisition of Parallel Corpora" (2020)
- Code and data for "Simulated Multiple Reference Training (SMRT) Improves Low-Resource Machine Translation" (2020)
- Parallel Named Entity Corpus for "XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment" (2021)
- Data for "Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings" (2017)
- Daat for experiments on context-aware neural machine translation (2018)
- CC-100: Monolingual data used to train XLM-R extracted from CommonCrawl (2020)
- CC-Matrix
- Translation Service Containers for the European Language Grid
- Monolingual News Crawl used for WMT
- Monolingual News Discussions used for WMT 2020
- Data for "PMIndia - A Collection of Parallel Corpora of Languages of India" (2020)
- PRISM: Data for "Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing" (2020)
- Wikititles used for WMT
- University of Edinburgh's models from WMT 2020, 2019, 2017, 2016.
- Data resources for WMT 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2013.
- CC-Aligned: A Massive Collection of Cross-lingual Web-Document Pairs (2020)
- Resources for the paper "When Does Unsupervised Machine Translation Work?" (Marchisio et al., 2020)
- Wiki of the Machine Translation Research Group at Johns Hopkins University
External Historic Links: Introduction to Statistical MT Research - The Mathematics of Statistical Machine Translation by Brown, Della Petra, Della Pietra, and Mercer
- Statistical MT Handbook by Kevin Knight
- SMT Tutorial (2003) by Kevin Knight and Philipp Koehn
- ESSLLI Summer Course on SMT (2005), day1, 2, 3, 4, 5 by Chris Callison-Burch and Philipp Koehn.
- MT Archive by John Hutchins, electronic repository and bibliography of articles, books and papers on topics in machine translation and computer-based translation tools
External Historic Software External Parallel Corpora maintained by Philipp Koehn |