- Beersheba, Israel
- www.yuvalpinter.com
- @melelbgu
Highlights
- Pro
Stars
Python package for dictionary-based inline tokenization preprocessing
A repo with slides and reading list for Subword Tokenization Meets Formal Language Theory @ DLT2025.
Official code release for "SuperBPE: Space Travel for Language Models"
This course covers the applied side of algorithmics in machine learning, with some deep learning and evolutionary algorithms thrown in as well.
A collection of Pythonic subword tokenisers and text preprocessing tools.
BPE modification that implements removing of the intermediate tokens during tokenizer training.
alexandermorgan / BatchBPE
Forked from karpathy/minbpeLightweight batched implementation of the Byte Pair Encoding (BPE) algorithm for LLM tokenization.
a pretty-committed wikipedia markup parser
Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"
The Israeli Association of Human Language Technologies
SIGMORPHON 2022 Shared Task on Morpheme Segmentation
Research code for pixel-based encoders of language (PIXEL)
פרויקט ליצירת מודל קלסיפיקציה המסווג ציטוטים מפרוטוקלי ישיבות הכנסת לשמונה נושאים.
A comprehensive list of Hebrew NLP resources.
A national initiative for the creation of infrastructure, research and development of advanced capabilities for the advancement of the field of NLP in Hebrew and Arabic.
Riemannian Adaptive Optimization Methods with pytorch optim
Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.
A framework for few-shot evaluation of language models.
Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.
alonmln / ILNewsDiff
Forked from xuv/NYTdiffCode for the ILNewsDiff Twitter account
Code and source for paper ``How to Fine-Tune BERT for Text Classification?``
[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723
😸 💬 A module to compute textual lexical richness (aka lexical diversity).
