ZeusClaw with On-device LLM
March 31, 2026
Run ZeusClaw with models that stay on your Mac—no LLM API cost and private by default (prompts, repo context, and traces do not go to a third-party model API unless you choose cloud).
This unlocks the same class of stack Ollama describes for demanding local work on Apple silicon—MLX, unified memory, and fast prefill/decode for coding agents. See their write-up: Ollama is now powered by MLX on Apple Silicon in preview.
Why local ZeusClaw:
- No token bills — inference runs on your hardware.
- Private — sensitive code and agent history stay on-device.
- Strong benchmarks — Qwen3.5-35B-A3B tracks GPT-5-mini on Qwen’s published tables (e.g. MMLU-Pro 85.3 vs 83.7).
Accelerate coding agents like Pi or Claude Code
OpenClaw now responds much faster
Two integration paths
TurboQuant / KV cache — compress and serve long agent threads on macOS. TurboQuant on Apple macOS — five integration paths.
Ollama + MLX — turnkey local serving with NVFP4-style options and cache upgrades aimed at agents; read the MLX preview post for charts and requirements (e.g. 32GB+ unified memory for the highlighted Qwen build).
Reference models
- Qwen3.5-35B-A3B — official MoE; strong coding and agent scores; aligned with Ollama’s MLX preview examples.
- Qwen3.5-40B-RoughHouse-Claude-4.6-Opus (example quantized build) — community merge; pick a quant for your RAM envelope.
Example: Ollama + Qwen (from Ollama’s MLX post)
Ollama documents commands such as:
OpenClaw:
ollama launch openclaw --model qwen3.5:35b-a3b-coding-nvfp4
Chat with the model:
ollama run qwen3.5:35b-a3b-coding-nvfp4
Model names and flags may change; always check ollama.com/blog/mlx and ollama.com/download for current builds.
ZeusClaw
For enterprise deployment, governance, or stack questions: Contact · ZeusClaw Enterprise.