Get Curated Research Datasets
Access benchmark-quality RL, multimodal, vision, and STEM datasets to accelerate your post-training research. Choose from pre-defined packs or create custom datasets tailored to your experiments.






Dataset Catalog
Choose from our curated data collections optimized for post-training research and ready to request:
Multimodal Data
Domain-Specific Data
STEM Data
Coding Data
Robotics & Embodied AI Data
Custom Data
Why These Datasets
Real-World & Benchmark Relevance
Expert Curation & Quality
Reproducible Methods
Frequently Asked Questions
How long does it take to receive sample data?
Samples are delivered via email and typically within 48 hours of your request, so you can begin integration and evaluation without delay.
Can I request multiple datasets at once?
Yes, you can select any combination of pre-defined packs or custom datasets in a single request form, and we’ll bundle them in one delivery.
What formats and modalities are supported?
We provide samples in ML–ready formats (e.g., image folders, CSV/JSON for tabular and text, WAV for audio). All modalities listed in the catalog—vision, audio, STEM, coding, and more—are available.
How do you license sample data?
Sample datasets are provided under a research-only license. For full-pack access or commercial use, please ask about terms and pricing.
Can I get a custom data pack if I don’t see what I need?
Yes, select "custom" option in your request and provide additional details. Our research team will work with you to assemble the right dataset.
What happens after I receive samples?
You’ll receive curated sample files and metadata, followed by outreach from our research team to discuss full-pack access, volume, pricing, and any custom adjustments.
Ready for Frontier Model Data?
Request your data packs today and accelerate your research.
AGI Advance Newsletter
Weekly updates on frontier benchmarks, evals, fine-tuning, and agentic workflows read by top labs and AI practitioners.