Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python
- Updated
Jan 30, 2023 - Python
Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python
LLM-inspired BiLSTM pipeline for real-time, multi-label toxicity inference across adversarial discourse modalities.
A clean, educational implementation of the Byte Pair Encoding algorithm used in modern language models like GPT.
Fast & Efficient BPE & Unigram tokenizer trainer library
A minimal Python implementation of Byte Pair Encoding (BPE) with step-by-step visualization of merge operations and vocabulary updates.
Add a description, image, and links to the subword-tokenization topic page so that developers can more easily learn about it.
To associate your repository with the subword-tokenization topic, visit your repo's landing page and select "manage topics."