← Code

OpenClaw Multi-Model Router — Local Gemma + Qwen + Claude

A three-model AI setup that routes requests intelligently between local models and Claude. Simple questions stay local (free, private, fast). Complex reasoning escalates to the Claude API.

Prerequisite. You need OpenClaw already running, with your agent (the “Operator”) configured. If you haven’t done that yet, follow the OpenClaw Setup Guide first — this post picks up where that one ends.

The Architecture

Your message

↓

the Operator

your local agent — decides where to route

↓

[Gemma]

Gemma 4 31B

General Q&A, chat, trivia

Local · Free

[Qwen]

Qwen 2.5 Coder 32B

Code tasks — writing, debugging, scripting

Local · Free

[Claude]

Claude Sonnet

Complex reasoning, personal context, tools

API

Every response is labeled [Gemma], [Qwen], or [Claude] so you always know which model answered.

Customize the labels. I personally use [EDI] for Claude (named after the AI in Mass Effect) and a custom name for my Operator. Pick whatever fits your setup — just keep the labeling consistent so you always know who’s talking.

Hardware tested on: Mac Mini M4, 24GB unified memory, ~60GB free disk for models.

Step 1 — Install Ollama and Python 3.11

brew install ollama python@3.11

Verify:

ollama --version
/opt/homebrew/bin/python3.11 --version

Step 2 — Start Ollama

brew services start ollama

Verify with ollama list.

Step 3 — Pull the Models

ollama pull gemma4:31b
ollama pull qwen2.5-coder:32b

Gemma 4 31B — Google DeepMind. General-purpose, 256K context, multimodal.
Qwen 2.5 Coder 32B — Alibaba. Coding specialist, 92 languages.

Each is ~19–20 GB. Disk needed: ~40 GB. RAM needed: 24 GB (one model loads at a time).

Step 4 — Install Python Packages

/opt/homebrew/bin/python3.11 -m pip install litellm anthropic uvicorn fastapi

Step 5 — Create the Router Project

mkdir -p ~/.openclaw/workspace/openclaw-router

Create ~/.openclaw/workspace/openclaw-router/start.sh:

start.sh

#!/bin/bash
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
[ -f "$SCRIPT_DIR/.env" ] && export $(grep -v '^#' "$SCRIPT_DIR/.env" | xargs)
[ -z "$ANTHROPIC_API_KEY" ] && echo "ERROR: ANTHROPIC_API_KEY not set" && exit 1
exec /opt/homebrew/bin/python3.11 -m uvicorn router:app \
    --host 0.0.0.0 --port 4242 --log-level info --app-dir "$SCRIPT_DIR"

chmod +x ~/.openclaw/workspace/openclaw-router/start.sh

Create ~/.openclaw/workspace/openclaw-router/.env:

ANTHROPIC_API_KEY=sk-ant-...

Then drop in the router.py source from the repo.

Never commit your .env. The repo’s .gitignore excludes it for a reason.

Step 6 — Auto-Start on Login (Optional)

Save this as ~/Library/LaunchAgents/com.openclaw.router.plist — replace YOUR_USERNAME with your Mac username:

com.openclaw.router.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.openclaw.router</string>
    <key>ProgramArguments</key>
    <array>
        <string>/Users/YOUR_USERNAME/.openclaw/workspace/openclaw-router/start.sh</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/Users/YOUR_USERNAME/.openclaw/workspace/openclaw-router/router.log</string>
    <key>StandardErrorPath</key>
    <string>/Users/YOUR_USERNAME/.openclaw/workspace/openclaw-router/router.log</string>
    <key>EnvironmentVariables</key>
    <dict>
        <key>PATH</key>
        <string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin</string>
    </dict>
</dict>
</plist>

Load it:

launchctl load ~/Library/LaunchAgents/com.openclaw.router.plist

The router will now start automatically every time you log in, and restart if it crashes.

Step 7 — Configure Your Operator to Route Automatically

Add this section to ~/.openclaw/workspace/AGENTS.md before the “Make It Yours” section:

AGENTS.md addition — Smart Routing

## Smart Routing — Three-Way: Gemma / Qwen / Claude

| Model              | Label    | Role                    |
|--------------------|----------|-------------------------|
| Gemma 4 31B        | [Gemma]  | On-premise generalist   |
| Qwen 2.5 Coder 32B | [Qwen]   | Coding agent            |
| Claude (you)       | [Claude] | Deep reasoning          |

You are the Operator — direct the models, don't label yourself.

Route to Gemma: trivia, definitions, casual chat, greetings, sign-offs,
anything under ~20 words with no technical content.

Route to Qwen: writing/debugging/explaining code, scripting, architecture,
anything involving a code block or programming language.

Keep with Claude: multi-step reasoning, analysis, tradeoffs, personal context
(user's family/work/schedule), tool use, long input (>500 words),
anything requiring Bash/file/web access.

How to call Gemma (Bash tool):
curl -s --max-time 90 http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"gemma4:31b","messages":[{"role":"user","content":"PROMPT"}],"stream":false}' \
  | /opt/homebrew/bin/python3.11 -c "import sys,json; print(json.load(sys.stdin)['message']['content'])"

How to call Qwen (Bash tool):
curl -s --max-time 90 http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5-coder:32b","messages":[{"role":"user","content":"PROMPT"}],"stream":false}' \
  | /opt/homebrew/bin/python3.11 -c "import sys,json; print(json.load(sys.stdin)['message']['content'])"

Always start every reply with [Gemma], [Qwen], or [Claude].
If a local model times out, fall back to Claude silently and label [Claude].

Step 8 — Test Everything

Smoke tests for both local models

# Test Gemma
curl -s --max-time 90 http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"gemma4:31b","messages":[{"role":"user","content":"What is the capital of France?"}],"stream":false}' \
  | /opt/homebrew/bin/python3.11 -c "import sys,json; print(json.load(sys.stdin)['message']['content'])"

# Test Qwen
curl -s --max-time 90 http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5-coder:32b","messages":[{"role":"user","content":"Write a Python function to reverse a string."}],"stream":false}' \
  | /opt/homebrew/bin/python3.11 -c "import sys,json; print(json.load(sys.stdin)['message']['content'])"

Both should respond within 30 seconds (longer on first call as the model loads into memory).

Routing at a Glance

Signal	Goes to
Greeting, joke, trivia, simple question	Gemma
Code, debugging, scripting, architecture	Qwen
Planning, analysis, personal info, tools	Claude
Local model timeout	Claude (fallback)

Troubleshooting

Problem	Fix
Ollama not responding	`brew services restart ollama`
Model not found	`ollama pull gemma4:31b` or `ollama pull qwen2.5-coder:32b`
Slow first response	Normal — model loads on first query (~30 sec)
Router won’t start	`cat ~/.openclaw/workspace/openclaw-router/router.log`
Out of memory	Restart Ollama: `brew services restart ollama`
API key not set	Check `.env` exists in `openclaw-router/` and contains the key

Why this exists

Running everything through the Claude API gets expensive fast, especially for casual chat. Running everything locally means slow, inconsistent answers on hard problems. This router gives you the best of both: cheap local speed for the 80% of requests that don’t need a frontier model, and Claude’s deep reasoning for the 20% that do.

The labels ([Gemma], [Qwen], [Claude]) make the trade-off transparent. You always know which model answered, so you can tell the Operator to “ask Claude this one” if the local answer was weak.

Source code & further reading

mcornelia/openclaw-multi-model-router — the GitHub repo (start.sh, plist, router.py, full README)
OpenClaw Setup Guide — the prerequisite (sets up your local Operator)
Gemma 4 on Ollama · Qwen 2.5 Coder on Ollama
LiteLLM docs · FastAPI docs

Questions, ideas, or want to share your own routing rules? Reach out.