If you find this useful, please β the repo! Β Also check out Vesta AI Explorer! β my full-featured native macOS AI app.
Note
5 Mar, 2026. Apologies. There were a few glitches for the brew and pip packages deployed. It should be fixed by now. Please report any issues.
Attention M-series Mac AI enthusiasts! You don't need to be a Swift developer to explore. Vibe coding really allows anyone to participate in this project. A lot of the hype is real! It does work.
Fork this repo first, then clone your fork to submit PRs:
git clone https://github.com/<your-username>/maclocal-api.git cd maclocal-api claude /build-afmTo just experiment locally
git clone https://github.com/scouzi1966/maclocal-api.git cd maclocal-api claude /build-afm/build-afm is an AI skill that builds for the first time so that you can start coding
Start vibe coding! I will add support for skills with more coding agents in the future.
Extensive testing of Qwen3.5-35B-A3B with afm. Uses an experimental technique with Claude and Codex as judges for evaluation scoring. Click the link below to view test results.
Run open-source MLX models or Apple's on-device Foundation Model through an OpenAI-compatible API. Built entirely in Swift for maximum Metal GPU performance. No Python runtime, no cloud, no API keys.
| Stable (v0.9.7) | Nightly (afm-next) | |
|---|---|---|
| Homebrew | brew install scouzi1966/afm/afm | brew install scouzi1966/afm/afm-next |
| pip | pip install macafm | pip install --extra-index-url https://kruks.ai/afm/wheels/simple/ macafm-next |
| Release notes | v0.9.7 | v0.9.8-next |
Note
The stable release (v0.9.7) and the latest nightly are currently at the same level. Either one will give you the same experience.
Tip
Switching between stable and nightly:
brew unlink afm && brew install scouzi1966/afm/afm-next # switch to nightly brew unlink afm-next && brew link afm # switch back to stable ASSUMES you did a brew install scouzi1966/afm/afm previouslyImportant
The nightly build is the future stable release. It includes everything in v0.9.7 plus:
- No new features yet β nightly is currently in sync with the stable release
# Run any MLX model with WebUI afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit -w # Or any smaller model afm mlx -m mlx-community/gemma-3-4b-it-8bit -w # Chat from the terminal (auto-downloads from Hugging Face) afm mlx -m Qwen3-0.6B-4bit -s "Explain quantum computing" # Interactive model picker (lists your downloaded models) MACAFM_MLX_MODEL_CACHE=/path/to/models afm mlx -w # Apple's on-device Foundation Model with WebUI afm -wOpenCode is a terminal-based AI coding assistant. Connect it to afm for a fully local coding experience β no cloud, no API keys. No Internet required (other than initially download the model of course!)
1. Configure OpenCode (~/.config/opencode/opencode.json):
{ "$schema": "https://opencode.ai/config.json", "provider": { "ollama": { "npm": "@ai-sdk/openai-compatible", "name": "macafm (local)", "options": { "baseURL": "http://localhost:9999/v1" }, "models": { "mlx-community/Qwen3-Coder-Next-4bit": { "name": "mlx-community/Qwen3-Coder-Next-4bit" } } } } }2. Start afm with a coding model:
afm mlx -m mlx-community/Qwen3-Coder-Next-4bit -t 1.0 --top-p 0.95 --max-tokens 81923. Launch OpenCode and type /connect. Scroll down to the very bottom of the provider list β macafm (local) will likely be the last entry. Select it, and when prompted for an API key, enter any value (e.g. x) β tokenized access is not yet implemented in afm so the key is ignored. All inference runs locally on your Mac's GPU.
28 models tested and verified including Qwen3, Gemma 3/3n, GLM-4/5, DeepSeek V3, LFM2, SmolLM3, Llama 3.2, MiniMax M2.5, Nemotron, and more. See test reports.
- Vesta AI Explorer β full-featured native macOS AI chat app
- AFMTrainer β LoRA fine-tuning wrapper for Apple's toolkit (Mac M-series & Linux CUDA)
- Apple Foundation Model Adapters β Apple's adapter training toolkit
- π OpenAI API Compatible - Works with existing OpenAI client libraries and applications
- π§ MLX Local Models - Run any Hugging Face MLX model locally (Qwen, Gemma, Llama, DeepSeek, GLM, and 28+ tested models)
- π API Gateway - Auto-discovers and proxies Ollama, LM Studio, Jan, and other local backends into a single API
- β‘ LoRA adapter support - Supports fine-tuning with LoRA adapters using Apple's tuning Toolkit
- π± Apple Foundation Models - Uses Apple's on-device 3B parameter language model
- ποΈ Vision OCR - Extract text from images and PDFs using Apple Vision (
afm vision) - π₯οΈ Built-in WebUI - Chat interface with model selection (
afm -w) - π Privacy-First - All processing happens locally on your device
- β‘ Fast & Lightweight - No network calls, no API keys required
- π οΈ Easy Integration - Drop-in replacement for OpenAI API endpoints
- π Token Usage Tracking - Provides accurate token consumption metrics
- **macOS 26 (Tahoe) or later
- Apple Silicon Mac (M1/M2/M3/M4 series)
- Apple Intelligence enabled in System Settings
- **Xcode 26 (for building from source)
# Add the tap brew tap scouzi1966/afm # Install AFM brew install afm # Verify installation afm --version# Install from PyPI pip install macafm # Verify installation afm --version# Clone the repository with submodules git clone --recurse-submodules https://github.com/scouzi1966/maclocal-api.git cd maclocal-api # Build everything from scratch (patches + webui + release build) ./Scripts/build-from-scratch.sh # Or skip webui if you don't have Node.js ./Scripts/build-from-scratch.sh --skip-webui # Or use make (patches + release build, no webui) make # Run ./.build/release/afm --version# API server only (Apple Foundation Model on port 9999) afm # API server with WebUI chat interface afm -w # WebUI + API gateway (auto-discovers Ollama, LM Studio, Jan, etc.) afm -w -g # Custom port with verbose logging afm -p 8080 -v # Show help afm -hRun open-source models locally on Apple Silicon using MLX:
# Run a model with single prompt afm mlx -m mlx-community/Qwen2.5-0.5B-Instruct-4bit -s "Explain gravity" # Start MLX model with WebUI afm mlx -m mlx-community/gemma-3-4b-it-8bit -w # Interactive model picker (lists downloaded models) afm mlx -w # MLX model as API server afm mlx -m mlx-community/Llama-3.2-1B-Instruct-4bit -p 8080 # Pipe mode cat essay.txt | afm mlx -m mlx-community/Qwen3-0.6B-4bit -i "Summarize this" # MLX help afm mlx --helpModels are downloaded from Hugging Face on first use and cached locally. Any model from the mlx-community collection is supported.
POST /v1/chat/completions
Compatible with OpenAI's chat completions API.
curl -X POST http://localhost:9999/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "foundation", "messages": [ {"role": "user", "content": "Hello, how are you?"} ] }'GET /v1/models
Returns available Foundation Models.
curl http://localhost:9999/v1/modelsGET /health
Server health status endpoint.
curl http://localhost:9999/healthfrom openai import OpenAI # Point to your local MacLocalAPI server client = OpenAI( api_key="not-needed-for-local", base_url="http://localhost:9999/v1" ) response = client.chat.completions.create( model="foundation", messages=[ {"role": "user", "content": "Explain quantum computing in simple terms"} ] ) print(response.choices[0].message.content)import OpenAI from 'openai'; const openai = new OpenAI({ apiKey: 'not-needed-for-local', baseURL: 'http://localhost:9999/v1', }); const completion = await openai.chat.completions.create({ messages: [{ role: 'user', content: 'Write a haiku about programming' }], model: 'foundation', }); console.log(completion.choices[0].message.content);# Basic chat completion curl -X POST http://localhost:9999/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "foundation", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ] }' # With temperature control curl -X POST http://localhost:9999/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "foundation", "messages": [{"role": "user", "content": "Be creative!"}], "temperature": 0.8 }'# Single prompt mode afm -s "Explain quantum computing" # Piped input from other commands echo "What is the meaning of life?" | afm cat file.txt | afm git log --oneline | head -5 | afm # Custom instructions with pipe echo "Review this code" | afm -i "You are a senior software engineer"MacLocalAPI/ βββ Package.swift # Swift Package Manager config βββ Sources/MacLocalAPI/ β βββ main.swift # CLI entry point & ArgumentParser β βββ Server.swift # Vapor web server configuration β βββ Controllers/ β β βββ ChatCompletionsController.swift # OpenAI API endpoints β βββ Models/ β βββ FoundationModelService.swift # Apple Foundation Models wrapper β βββ OpenAIRequest.swift # Request data models β βββ OpenAIResponse.swift # Response data models βββ README.md OVERVIEW: macOS server that exposes Apple's Foundation Models through OpenAI-compatible API Use -w to enable the WebUI, -g to enable API gateway mode (auto-discovers and proxies to Ollama, LM Studio, Jan, and other local LLM backends). USAGE: afm <options> afm mlx [<options>] Run local MLX models from Hugging Face afm vision <image> OCR text extraction from images/PDFs OPTIONS: -s, --single-prompt <single-prompt> Run a single prompt without starting the server -i, --instructions <instructions> Custom instructions for the AI assistant (default: You are a helpful assistant) -v, --verbose Enable verbose logging --no-streaming Disable streaming responses (streaming is enabled by default) -a, --adapter <adapter> Path to a .fmadapter file for LoRA adapter fine-tuning -p, --port <port> Port to run the server on (default: 9999) -H, --hostname <hostname> Hostname to bind server to (default: 127.0.0.1) -t, --temperature <temperature> Temperature for response generation (0.0-1.0) -r, --randomness <randomness> Sampling mode: 'greedy', 'random', 'random:top-p=<0.0-1.0>', 'random:top-k=<int>', with optional ':seed=<int>' -P, --permissive-guardrails Permissive guardrails for unsafe or inappropriate responses -w, --webui Enable webui and open in default browser -g, --gateway Enable API gateway mode: discover and proxy to local LLM backends (Ollama, LM Studio, Jan, etc.) --prewarm <prewarm> Pre-warm the model on server startup for faster first response (y/n, default: y) --version Show the version. -h, --help Show help information. Note: afm also accepts piped input from other commands, equivalent to using -s with the piped content as the prompt. The server respects standard logging environment variables:
LOG_LEVEL- Set logging level (trace, debug, info, notice, warning, error, critical)
- Model Scope: Apple Foundation Model is a 3B parameter model (optimized for on-device performance)
- macOS 26+ Only: Requires the latest macOS with Foundation Models framework
- Apple Intelligence Required: Must be enabled in System Settings
- Token Estimation: Uses word-based approximation for token counting (Foundation model only; proxied backends report real counts)
- Ensure you're running **macOS 26 or later
- Enable Apple Intelligence in System Settings β Apple Intelligence & Siri
- Verify you're on an Apple Silicon Mac
- Restart the application after enabling Apple Intelligence
- Check if the port is already in use:
lsof -i :9999 - Try a different port:
afm -p 8080 - Enable verbose logging:
afm -v
- Ensure you have **Xcode 26 installed
- Update Swift toolchain:
xcode-select --install - Clean and rebuild:
swift package clean && swift build -c release
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
# Clone the repo with submodules git clone --recurse-submodules https://github.com/scouzi1966/maclocal-api.git cd maclocal-api # Full build from scratch (submodules + patches + webui + release) ./Scripts/build-from-scratch.sh # Or for debug builds during development ./Scripts/build-from-scratch.sh --debug --skip-webui # Run with verbose logging ./.build/debug/afm -w -g -vThis project is licensed under the MIT License - see the LICENSE file for details.
- Apple for the Foundation Models framework
- The Vapor Swift web framework team
- OpenAI for the API specification standard
- The Swift community for excellent tooling
If you encounter any issues or have questions:
- Check the Troubleshooting section
- Search existing GitHub Issues
- Create a new issue with detailed information about your problem
- Streaming response support
- MLX local model support (28+ models tested)
- Multiple model support (API gateway mode)
- Web UI for testing (llama.cpp WebUI integration)
- Vision OCR subcommand
- Function/tool calling (OpenAI-compatible, multiple formats)
- Performance optimizations
- BFCL integration for automated tool calling validation
- Docker containerization (when supported)
Made with β€οΈ for the Apple Silicon community
Bringing the power of local AI to your fingertips.
