Zeus AI / Blog

ZeusClaw with On-device LLM

March 31, 2026

Run ZeusClaw with models that stay on your Mac—no LLM API cost and private by default (prompts, repo context, and traces do not go to a third-party model API unless you choose cloud).

This unlocks the same class of stack Ollama describes for demanding local work on Apple silicon—MLX, unified memory, and fast prefill/decode for coding agents. See their write-up: Ollama is now powered by MLX on Apple Silicon in preview.

Why local ZeusClaw:

  • No token bills — inference runs on your hardware.
  • Private — sensitive code and agent history stay on-device.
  • Strong benchmarksQwen3.5-35B-A3B tracks GPT-5-mini on Qwen’s published tables (e.g. MMLU-Pro 85.3 vs 83.7).

Accelerate coding agents like Pi or Claude Code



OpenClaw now responds much faster

Two integration paths

TurboQuant / KV cache — compress and serve long agent threads on macOS. TurboQuant on Apple macOS — five integration paths.

Ollama + MLX — turnkey local serving with NVFP4-style options and cache upgrades aimed at agents; read the MLX preview post for charts and requirements (e.g. 32GB+ unified memory for the highlighted Qwen build).

Reference models

Example: Ollama + Qwen (from Ollama’s MLX post)

Ollama documents commands such as:

OpenClaw:

ollama launch openclaw --model qwen3.5:35b-a3b-coding-nvfp4

Chat with the model:

ollama run qwen3.5:35b-a3b-coding-nvfp4

Model names and flags may change; always check ollama.com/blog/mlx and ollama.com/download for current builds.

ZeusClaw

For enterprise deployment, governance, or stack questions: Contact · ZeusClaw Enterprise.