llamafile lets you distribute and run LLMs with a single file.
llamafile is a Mozilla Builders project (see its announcement blog post), now revamped by Mozilla.ai.
Our goal is to make open LLMs much more accessible to both developers and end users. We're doing that by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most operating systems and CPU archiectures, with no installation.
llamafile also includes whisperfile, a single-file speech-to-text tool built on whisper.cpp and the same Cosmopolitan packaging. It supports transcription and translation of audio files across all the same platforms, with no installation required.
llamafile versions starting from 0.10.0 use a new build system, aimed at keeping our code more easily aligned with the latest versions of llama.cpp. This means they support more recent models and functionalities, but at the same time they might be missing some of the features you were accustomed to (check out this doc for a high-level description of what has been done). If you liked the "classic experience" more, you will always be able to access the previous versions from our releases page. Our pre-built llamafiles always show which version of the server they have been bundled with (0.9.* example, 0.10.* example), so you will always know which version of the software you are downloading.
We want to hear from you! Whether you are a new user or a long-time fan, please share what you find most valuable about llamafile and what would make it more useful for you. Read more via the blog and add your voice to the discussion here.
Download and run your first llamafile in minutes:
# Download an example model (Qwen3.5 0.8B) curl -LO https://huggingface.co/mozilla-ai/llamafile_0.10.0/resolve/main/Qwen3.5-0.8B-Q8_0.llamafile # Make it executable (macOS/Linux/BSD) chmod +x Qwen3.5-0.8B-Q8_0.llamafile # Run it ./Qwen3.5-0.8B-Q8_0.llamafileWe chose this model because that's the smallest one we have built a llamafile for, so most likely to work out-of-the-box for you. If you have powerful hardware and/or GPUs, feel free to choose larger and more expressive models which should provide more accurate responses.
Windows users: Rename the file to add .exe extension before running.
Check the full documentation in the docs/ folder or online at mozilla-ai.github.io/llamafile, or directly jump into one of the following subsections:
- Quickstart
- Example llamafiles
- Running a llamafile
- Creating llamafiles
- Source installation
- Technical details
- Supported Systems
- Troubleshooting
- Whisperfile
While the llamafile project is Apache 2.0-licensed, our changes to llama.cpp and whisper.cpp are licensed under MIT (just like the projects themselves) so as to remain compatible and upstreamable in the future, should that be desired.
The llamafile logo on this page was generated with the assistance of DALL·E 3.
![[line drawing of llama animal head in front of slightly open manilla folder filled with files]](http://github.com/mozilla-ai/llamafile/raw/main/docs/images/llamafile-640x640.png)