Highlights
- Pro
Stars
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
[ICLR 2026] Learning to Reason without External Rewards
A benchmark for LLMs on complicated tasks in the terminal
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
Long context evaluation for large language models
Multiple datasets for ARC (Abstraction and Reasoning Corpus)
Elegant easy-to-use neural networks + scientific computing in JAX. https://docs.kidger.site/equinox/
Orbax provides common checkpointing and persistence utilities for JAX users
Flax is a neural network library for JAX that is designed for flexibility.
Hackable and optimized Transformers building blocks, supporting a composable construction.
The original code for the paper "How to train your MAML" along with a replication of the original "Model Agnostic Meta Learning" (MAML) paper in Pytorch.
higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.
Domain Specific Language for the Abstraction and Reasoning Corpus
Reverse Engineering the Abstraction and Reasoning Corpus
SWE-bench: Can Language Models Resolve Real-world Github Issues?
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
Embeddable Postgres with real-time, reactive bindings.
Devika is the first open-source implementation of an Agentic Software Engineer. Initially started as an open-source alternative to Devin.
A Desktop App for Easily Viewing and Editing Markdown Files
Noosphere is a protocol for thought; let's discover it together!
A powerful, flexible, Markdown-based authoring framework.
A fast implementation of a Farcaster Hub, in Rust.
Generative Agents: Interactive Simulacra of Human Behavior






