yuvalpinter

Follow

Yuval Pinter yuvalpinter

Follow

Senior Lecturer at the Department of Computer Science at Ben-Gurion University, focusing on NLP. PhD in CS from Georgia Tech (2021).

65 followers · 4 following

Achievements

Achievements

Highlights

Pro

Stars

Kiryukhasemenov / InFlags

Python package for dictionary-based inline tokenization preprocessing

Python 3 Updated Jul 20, 2025

stephantul / skeletoken

Datamodels for hugging face tokenizers

Python 105 4 Updated Mar 12, 2026

cimeister / tokenizer-analysis-suite

Python 44 10 Updated Feb 11, 2026

mcognetta / subword_tokenization_meets_formal_language_theory

A repo with slides and reading list for Subword Tokenization Meets Formal Language Theory @ DLT2025.

4 Updated Aug 19, 2025

thewh1teagle / phonikud

Hebrew grapheme to phoneme (G2P)

Python 92 11 Updated Mar 17, 2026

PythonNut / superbpe

Official code release for "SuperBPE: Space Travel for Language Models"

Jupyter Notebook 89 12 Updated Jan 9, 2026

moshesipper / Applied-Machine-Learning-Course

This course covers the applied side of algorithmics in machine learning, with some deep learning and evolutionary algorithms thrown in as well.

Python 53 6 Updated Mar 21, 2026

facebookresearch / blt

Code for BLT research paper

Python 2,031 190 Updated Nov 3, 2025

bauwenst / TkTkT

A collection of Pythonic subword tokenisers and text preprocessing tools.

Python 13 1 Updated Mar 27, 2026

pchizhov / picky_bpe

BPE modification that implements removing of the intermediate tokens during tokenizer training.

Python 27 4 Updated Nov 25, 2024

alexandermorgan / BatchBPE

Forked from karpathy/minbpe

Lightweight batched implementation of the Byte Pair Encoding (BPE) algorithm for LLM tokenization.

Python 8 Updated Oct 18, 2025

spencermountain / wtf_wikipedia

a pretty-committed wikipedia markup parser

JavaScript 851 132 Updated Dec 12, 2025

pentagonalize / Transformer-Cookbook

TeX 18 4 Updated Feb 4, 2025

MeLeLBGU / tokenizers_intrinsic_benchmark

Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"

Python 13 1 Updated Nov 26, 2024

IAHLT / iahlt.github.io

The Israeli Association of Human Language Technologies

5 Updated Jun 15, 2023

sigmorphon / 2022SegmentationST

SIGMORPHON 2022 Shared Task on Morpheme Segmentation

Jupyter Notebook 33 13 Updated Mar 26, 2023

xplip / pixel

Research code for pixel-based encoders of language (PIXEL)

Python 346 39 Updated Jul 15, 2025

NitzanBarzilay / KnessetTopicClassification

פרויקט ליצירת מודל קלסיפיקציה המסווג ציטוטים מפרוטוקלי ישיבות הכנסת לשמונה נושאים.

Jupyter Notebook 7 Updated Sep 30, 2022

NNLP-IL / Hebrew-Resources

A comprehensive list of Hebrew NLP resources.

Java 288 49 Updated May 11, 2025

NNLP-IL / NNLP-IL

A national initiative for the creation of infrastructure, research and development of advanced capabilities for the advancement of the field of NLP in Hebrew and Arabic.

40 2 Updated Nov 2, 2022

geoopt / geoopt

Riemannian Adaptive Optimization Methods with pytorch optim

Python 1,048 93 Updated Jan 27, 2026

dmort27 / panphon

Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.

Python 302 55 Updated Oct 22, 2025

OnlpLab / HebrewResources

Python 2 2 Updated May 10, 2021

nyu-mll / quality

Python 151 10 Updated Jan 17, 2025

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 11,869 3,127 Updated Mar 18, 2026

mbollmann / sonnet-finder

Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Python 13 Updated Jan 5, 2023

alonmln / ILNewsDiff

Forked from xuv/NYTdiff

Code for the ILNewsDiff Twitter account

Python 10 3 Updated May 23, 2023

xuyige / BERT4doc-Classification

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``

Python 641 101 Updated Oct 19, 2021

princeton-nlp / LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723

Python 729 131 Updated Aug 29, 2022

LSYS / LexicalRichness

😸 💬 A module to compute textual lexical richness (aka lexical diversity).

Python 112 22 Updated Aug 27, 2023