← Code

OpenClaw Multi-Model Router — Local Gemma + Qwen + Claude

A three-model AI setup that routes requests intelligently between local models and Claude. Simple questions stay local (free, private, fast). Complex reasoning escalates to the Claude API.

Prerequisite. You need OpenClaw already running, with your agent (the “Operator”) configured. If you haven’t done that yet, follow the OpenClaw Setup Guide first — this post picks up where that one ends.

The Architecture

Your message
the Operator
your local agent — decides where to route
[Gemma]
Gemma 4 31B
General Q&A, chat, trivia
Local · Free
[Qwen]
Qwen 2.5 Coder 32B
Code tasks — writing, debugging, scripting
Local · Free
[Claude]
Claude Sonnet
Complex reasoning, personal context, tools
API

Every response is labeled [Gemma], [Qwen], or [Claude] so you always know which model answered.

Customize the labels. I personally use [EDI] for Claude (named after the AI in Mass Effect) and a custom name for my Operator. Pick whatever fits your setup — just keep the labeling consistent so you always know who’s talking.

Hardware tested on: Mac Mini M4, 24GB unified memory, ~60GB free disk for models.

Step 1 — Install Ollama and Python 3.11

brew install ollama python@3.11

Verify:

ollama --version
/opt/homebrew/bin/python3.11 --version

Step 2 — Start Ollama

brew services start ollama

Verify with ollama list.

Step 3 — Pull the Models

ollama pull gemma4:31b
ollama pull qwen2.5-coder:32b

Each is ~19–20 GB. Disk needed: ~40 GB. RAM needed: 24 GB (one model loads at a time).

Step 4 — Install Python Packages

/opt/homebrew/bin/python3.11 -m pip install litellm anthropic uvicorn fastapi

Step 5 — Create the Router Project

mkdir -p ~/.openclaw/workspace/openclaw-router

Create ~/.openclaw/workspace/openclaw-router/start.sh:

start.sh
#!/bin/bash
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
[ -f "$SCRIPT_DIR/.env" ] && export $(grep -v '^#' "$SCRIPT_DIR/.env" | xargs)
[ -z "$ANTHROPIC_API_KEY" ] && echo "ERROR: ANTHROPIC_API_KEY not set" && exit 1
exec /opt/homebrew/bin/python3.11 -m uvicorn router:app \
    --host 0.0.0.0 --port 4242 --log-level info --app-dir "$SCRIPT_DIR"
chmod +x ~/.openclaw/workspace/openclaw-router/start.sh

Create ~/.openclaw/workspace/openclaw-router/.env:

ANTHROPIC_API_KEY=sk-ant-...

Then drop in the router.py source from the repo.

Never commit your .env. The repo’s .gitignore excludes it for a reason.

Step 6 — Auto-Start on Login (Optional)

Save this as ~/Library/LaunchAgents/com.openclaw.router.plist — replace YOUR_USERNAME with your Mac username:

com.openclaw.router.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.openclaw.router</string>
    <key>ProgramArguments</key>
    <array>
        <string>/Users/YOUR_USERNAME/.openclaw/workspace/openclaw-router/start.sh</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/Users/YOUR_USERNAME/.openclaw/workspace/openclaw-router/router.log</string>
    <key>StandardErrorPath</key>
    <string>/Users/YOUR_USERNAME/.openclaw/workspace/openclaw-router/router.log</string>
    <key>EnvironmentVariables</key>
    <dict>
        <key>PATH</key>
        <string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin</string>
    </dict>
</dict>
</plist>

Load it:

launchctl load ~/Library/LaunchAgents/com.openclaw.router.plist

The router will now start automatically every time you log in, and restart if it crashes.

Step 7 — Configure Your Operator to Route Automatically

Add this section to ~/.openclaw/workspace/AGENTS.md before the “Make It Yours” section:

AGENTS.md addition — Smart Routing
## Smart Routing — Three-Way: Gemma / Qwen / Claude

| Model              | Label    | Role                    |
|--------------------|----------|-------------------------|
| Gemma 4 31B        | [Gemma]  | On-premise generalist   |
| Qwen 2.5 Coder 32B | [Qwen]   | Coding agent            |
| Claude (you)       | [Claude] | Deep reasoning          |

You are the Operator — direct the models, don't label yourself.

Route to Gemma: trivia, definitions, casual chat, greetings, sign-offs,
anything under ~20 words with no technical content.

Route to Qwen: writing/debugging/explaining code, scripting, architecture,
anything involving a code block or programming language.

Keep with Claude: multi-step reasoning, analysis, tradeoffs, personal context
(user's family/work/schedule), tool use, long input (>500 words),
anything requiring Bash/file/web access.

How to call Gemma (Bash tool):
curl -s --max-time 90 http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"gemma4:31b","messages":[{"role":"user","content":"PROMPT"}],"stream":false}' \
  | /opt/homebrew/bin/python3.11 -c "import sys,json; print(json.load(sys.stdin)['message']['content'])"

How to call Qwen (Bash tool):
curl -s --max-time 90 http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5-coder:32b","messages":[{"role":"user","content":"PROMPT"}],"stream":false}' \
  | /opt/homebrew/bin/python3.11 -c "import sys,json; print(json.load(sys.stdin)['message']['content'])"

Always start every reply with [Gemma], [Qwen], or [Claude].
If a local model times out, fall back to Claude silently and label [Claude].

Step 8 — Test Everything

Smoke tests for both local models
# Test Gemma
curl -s --max-time 90 http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"gemma4:31b","messages":[{"role":"user","content":"What is the capital of France?"}],"stream":false}' \
  | /opt/homebrew/bin/python3.11 -c "import sys,json; print(json.load(sys.stdin)['message']['content'])"

# Test Qwen
curl -s --max-time 90 http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5-coder:32b","messages":[{"role":"user","content":"Write a Python function to reverse a string."}],"stream":false}' \
  | /opt/homebrew/bin/python3.11 -c "import sys,json; print(json.load(sys.stdin)['message']['content'])"

Both should respond within 30 seconds (longer on first call as the model loads into memory).

Routing at a Glance

Signal Goes to
Greeting, joke, trivia, simple questionGemma
Code, debugging, scripting, architectureQwen
Planning, analysis, personal info, toolsClaude
Local model timeoutClaude (fallback)

Troubleshooting

Problem Fix
Ollama not respondingbrew services restart ollama
Model not foundollama pull gemma4:31b or ollama pull qwen2.5-coder:32b
Slow first responseNormal — model loads on first query (~30 sec)
Router won’t startcat ~/.openclaw/workspace/openclaw-router/router.log
Out of memoryRestart Ollama: brew services restart ollama
API key not setCheck .env exists in openclaw-router/ and contains the key

Why this exists

Running everything through the Claude API gets expensive fast, especially for casual chat. Running everything locally means slow, inconsistent answers on hard problems. This router gives you the best of both: cheap local speed for the 80% of requests that don’t need a frontier model, and Claude’s deep reasoning for the 20% that do.

The labels ([Gemma], [Qwen], [Claude]) make the trade-off transparent. You always know which model answered, so you can tell the Operator to “ask Claude this one” if the local answer was weak.

Source code & further reading


Questions, ideas, or want to share your own routing rules? Reach out.