ZeusClaw with On-device LLM

March 31, 2026

Run ZeusClaw with models that stay on your Mac—no LLM API cost and private by default (prompts, repo context, and traces do not go to a third-party model API unless you choose cloud).

This unlocks the same class of stack Ollama describes for demanding local work on Apple silicon—MLX, unified memory, and fast prefill/decode for coding agents. See their write-up: Ollama is now powered by MLX on Apple Silicon in preview.

Why local ZeusClaw:

No token bills — inference runs on your hardware.
Private — sensitive code and agent history stay on-device.
Strong benchmarks — Qwen3.5-35B-A3B tracks GPT-5-mini on Qwen’s published tables (e.g. MMLU-Pro 85.3 vs 83.7).

Accelerate coding agents like Pi or Claude Code

OpenClaw now responds much faster

Two integration paths

TurboQuant / KV cache — compress and serve long agent threads on macOS. TurboQuant on Apple macOS — five integration paths.

Ollama + MLX — turnkey local serving with NVFP4-style options and cache upgrades aimed at agents; read the MLX preview post for charts and requirements (e.g. 32GB+ unified memory for the highlighted Qwen build).

Reference models

Qwen3.5-35B-A3B — official MoE; strong coding and agent scores; aligned with Ollama’s MLX preview examples.
Qwen3.5-40B-RoughHouse-Claude-4.6-Opus (example quantized build) — community merge; pick a quant for your RAM envelope.

Example: Ollama + Qwen (from Ollama’s MLX post)

Ollama documents commands such as:

OpenClaw:

ollama launch openclaw --model qwen3.5:35b-a3b-coding-nvfp4

Chat with the model:

ollama run qwen3.5:35b-a3b-coding-nvfp4

Model names and flags may change; always check ollama.com/blog/mlx and ollama.com/download for current builds.

ZeusClaw

For enterprise deployment, governance, or stack questions: Contact · ZeusClaw Enterprise.