Skip to content
View yuvalpinter's full-sized avatar

Highlights

  • Pro

Block or report yuvalpinter

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Python package for dictionary-based inline tokenization preprocessing

Python 3 Updated Jul 20, 2025

Datamodels for hugging face tokenizers

Python 105 4 Updated Mar 12, 2026

A repo with slides and reading list for Subword Tokenization Meets Formal Language Theory @ DLT2025.

4 Updated Aug 19, 2025

Hebrew grapheme to phoneme (G2P)

Python 92 11 Updated Mar 17, 2026

Official code release for "SuperBPE: Space Travel for Language Models"

Jupyter Notebook 89 12 Updated Jan 9, 2026

This course covers the applied side of algorithmics in machine learning, with some deep learning and evolutionary algorithms thrown in as well.

Python 53 6 Updated Mar 21, 2026

Code for BLT research paper

Python 2,031 190 Updated Nov 3, 2025

A collection of Pythonic subword tokenisers and text preprocessing tools.

Python 13 1 Updated Mar 27, 2026

BPE modification that implements removing of the intermediate tokens during tokenizer training.

Python 27 4 Updated Nov 25, 2024

Lightweight batched implementation of the Byte Pair Encoding (BPE) algorithm for LLM tokenization.

Python 8 Updated Oct 18, 2025

a pretty-committed wikipedia markup parser

JavaScript 851 132 Updated Dec 12, 2025

Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"

Python 13 1 Updated Nov 26, 2024

The Israeli Association of Human Language Technologies

5 Updated Jun 15, 2023

SIGMORPHON 2022 Shared Task on Morpheme Segmentation

Jupyter Notebook 33 13 Updated Mar 26, 2023

Research code for pixel-based encoders of language (PIXEL)

Python 346 39 Updated Jul 15, 2025

פרויקט ליצירת מודל קלסיפיקציה המסווג ציטוטים מפרוטוקלי ישיבות הכנסת לשמונה נושאים.

Jupyter Notebook 7 Updated Sep 30, 2022

A comprehensive list of Hebrew NLP resources.

Java 288 49 Updated May 11, 2025

A national initiative for the creation of infrastructure, research and development of advanced capabilities for the advancement of the field of NLP in Hebrew and Arabic.

40 2 Updated Nov 2, 2022

Riemannian Adaptive Optimization Methods with pytorch optim

Python 1,048 93 Updated Jan 27, 2026

Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.

Python 302 55 Updated Oct 22, 2025
Python 2 2 Updated May 10, 2021
Python 151 10 Updated Jan 17, 2025

A framework for few-shot evaluation of language models.

Python 11,869 3,127 Updated Mar 18, 2026

Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Python 13 Updated Jan 5, 2023

Code for the ILNewsDiff Twitter account

Python 10 3 Updated May 23, 2023

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``

Python 641 101 Updated Oct 19, 2021

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723

Python 729 131 Updated Aug 29, 2022

😸 💬 A module to compute textual lexical richness (aka lexical diversity).

Python 112 22 Updated Aug 27, 2023
Next