| Charlie Snell I'm a fourth year CS PhD student in Berkeley EECS advised by Dan Klein. Previously, I was a UC Berkeley undergrad, where I had the great opportunity to work with and learn from a number of fantastic AI researchers, such as Sergey Levine, Ruiqi Zhong, Dan Klein, Jacob Steinhardt, and Jason Eisner. I was also previously a Student Researcher at Google DeepMind. I am now working at Cursor on training frontier coding agents. Email  /  Google Scholar  /  Twitter  /  Github | | | | Sleep-time Compute: Beyond Inference Scaling at Test-time Kevin Lin*, Charlie Snell*, Yu Wang, Charles Packer, Sarah Wooders, Ion Stoica, Joseph E. Gonzalez ArXiv 2025 [paper] We introduce sleep-time compute, which allows models to βthinkβ offline about contexts before queries are presented: by anticipating what queries users might ask and pre-computing useful quantities, we can significantly reduce the compute requirements at test-time. | | | Predicting Emergent Capabilities by Finetuning Charlie Snell, Eric Wallace, Dan Klein, Sergey Levine COLM 2024 [paper] Can we predict emergent capabilities in GPT-N+1 π using GPT-N, which has random performance on the task? To do this, we use information about how pre-emergence model checkpoints behave under the influence of task-specific finetuning to obtain predictive power about the point of emergence in the few-shot setting. | | | Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar arXiv 2024 [paper] On difficult problems, humans tend to think longer to improve their decisions. Can we instill a similar capability into LLMs? And how well can it perform? We find that by optimally scaling test-time compute we can outperform much larger models in a FLOPs matched evaluation. | | | The False Promise of Imitating Proprietary LLMs Arnav Gudibande*, Eric Wallace*, Charlie Snell*, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song ICLR 2024 [paper] Recent systems β like Koala, Vicuna, and Alpaca β finetune a weaker language model to imitate the outputs of a stronger model, like ChatGPT or GPT-4. In this work, we critically analyze the shortcomings of this approach. | | | Learning by Distilling Context Charlie Snell, Dan Klein, Ruiqi Zhong arXiv 2022 [paper] [talk] Language models significantly benefit from context tokens, such as prompts or scratchpads. We propose to apply context distillation so that a language model can improve itself by internalizing these gains. | | | Offline RL for Natural Language Generation with Implicit Language Q Learning Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine ICLR 2023 [paper] [project page] [code] [talk] We propose an effective and easy-to-use offline RL motivated method for steering language models towards successfully completing language tasks, such as goal directed dialogue, controled generation, and word games. | | | Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL Ruiqi Zhong*, Charlie Snell*, Dan Klein, Jason Eisner EMNLP 2023 [paper] We introduce APEL, a new framework that enables non-programmers to indirectly annotate natural language utterances with executable meaning representations, such as SQL programs. | | | Context-Aware Language Modeling for Goal-Oriented Dialogue Systems Charlie Snell, Mengjiao Yang, Justin Fu, Yi Su, Sergey Levine NAACL 2022, Findings [paper] [project page] [code] We extend techniques from learning-based control, such as task relabeling, to derive a simple and effective method to finetune language models in a goal-aware way. | | | Approximating How Single Head Attention Learns Charlie Snell*, Ruiqi Zhong*, Dan Klein, Jacob Steinhardt arXiv 2021 [paper] [slides] [code] [blog] Why do models often attend to salient words, and how does this evolve throughout training? | | | The Omniglot Jr. challenge; Can a model achieve child-level character generation and classification? Eliza Kosoy, Masha Belyi, Charlie Snell, Josh Tenenbaum, Brenden Lake, Alison Gopnik NeurIPS Workshop on BabyMind 2020 [paper] We augment the original Omniglot dataset with a new dataset of children's handwritten characters. We then study the properties of a Bayesian Program Learning model trained on this new data. | | | Alien Dreams: An Emerging Art Scene June 2021 [blog] [coverage] [discussion] A tour through the wonderful AI art scene that emerged when CLIP was released in January 2021. | | | How is it so good ? (DALL-E Explained Pt. 2) April 2021 [blog] A technical and philosophical discussion of how DALL-E works, why it is so effective at generating images from a text prompt, and its theoretical limitations. | | | Understanding VQ-VAE (DALL-E Explained Pt. 1) February 2021 [blog] How do vector quantized variational autoencoders (VQ-VAEs) work? And what role do they play in modern generative models, such as DALL-E and Jukebox? | Side Projects / Open Source Implementations Selected projects. See my github for much more. (Press "y" to add a random circle, "n" to remove one, and "wasd" to pan.) | JaxSeq October 2022 [code] Built on top of HuggingFace's Transformers library, JaxSeq enables training very large language models in Jax with model and data parallelism across both multi-device and multi-node clusters. | | | Re-implementation of the paper "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets" November 2021 [code] Re-create the dramatic train/test curves from the original paper; experiment with the grokking phenomenon yourself. | Music Preference Visualization with Deep Embeddings June-July 2020 [tweet] Harness the power of deep music representations to generate playlists and visualize your music preferences in an interactive web app. | | | Train Deep Neural Networks on a 2013 Macbook Air GPU 2017/2018 [code] A deep learning framework implemented from scratch in C++/OpenCL. Implements GPU kernels that can run on a 2013 Macbook Air GPU (and other Apple computers). Implements LSTM training/inference for music lyric generation. | Yeah JeCUB App 2017/2018 [app store] A humorous sound-box app. | | | 2D Procedural Endless World 2015 [code] Scroll through an infinite 2D block-world consisting of rugged terrain, endless caves, fluffy clouds, and extreme biomes all synthesized by PRNGs and Perlin Noise. | |