How to Use OpenClaw with Ollama for Fully Local AI

OpenClaw + Ollama lets you run a fully local AI CRM with zero data leaving your machine. Step-by-step setup guide for privacy-first teams.

Mark Rachapoom

March 26, 2026·7 min read

OpenClaw works with Ollama out of the box, giving you a fully local AI agent where no data ever leaves your machine. If you're running DenchClaw and want to keep every contact, conversation, and query 100% on-device, pairing it with Ollama is the cleanest path. Here's exactly how to set it up.

Why Run Ollama with OpenClaw?#

The default DenchClaw setup lets you plug in any LLM provider — OpenAI, Anthropic, Gemini, or a locally hosted model. Ollama is the most popular local model runner on macOS, Linux, and Windows. It handles model downloads, GPU acceleration, and serving a local API endpoint that mimics the OpenAI format.

The combination means:

No API keys required — no monthly billing, no rate limits
Full data privacy — your CRM data never hits an external server
Offline capable — works on a plane, in a basement, behind a firewall
No usage caps — run as many queries as you want

For teams in legal, healthcare, finance, or any regulated industry, this isn't just a preference — it's often a requirement.

Prerequisites#

Before starting, you need:

Node.js 18+ installed
DenchClaw installed (npx denchclaw or the global install)
About 4-8 GB free disk space per model

Step 1: Install Ollama#

Download Ollama from ollama.com and install it for your platform.

On macOS:

brew install ollama

On Linux:

curl -fsSL https://ollama.com/install.sh | sh

Start the Ollama server:

ollama serve

By default, Ollama runs on http://localhost:11434. Keep this running in a terminal or set it up as a background service.

Step 2: Pull a Model#

Ollama supports dozens of models. For running with DenchClaw's agent tasks (CRM queries, research, writing), you want a model with at least 7B parameters and strong instruction-following.

Good starting points:

# Llama 3.2 — excellent general-purpose, 8B
ollama pull llama3.2
 
# Mistral — fast, efficient, great for structured tasks
ollama pull mistral
 
# Qwen2.5 — strong at coding and data tasks
ollama pull qwen2.5
 
# Phi-4 — smaller but surprisingly capable
ollama pull phi4

Test that the model works:

ollama run llama3.2 "Summarize this in one sentence: DenchClaw is a local-first AI CRM."

You should get a sensible response within a few seconds.

Step 3: Configure OpenClaw to Use Ollama#

OpenClaw reads its model configuration from ~/.openclaw/config.json (or the equivalent path on your system). You need to point it at your local Ollama endpoint.

Open your OpenClaw config:

openclaw config

Set the model provider to ollama and point it at your local server:

{
  "model": {
    "provider": "ollama",
    "baseUrl": "http://localhost:11434/v1",
    "model": "llama3.2",
    "apiKey": "ollama"
  }
}

The apiKey field is required by the OpenAI-compatible client but ignored by Ollama — set it to any non-empty string.

Save the config and restart OpenClaw:

openclaw restart

Step 4: Verify the Connection#

Run a quick test to confirm OpenClaw is talking to Ollama:

openclaw chat "What CRM objects are available in my workspace?"

If configured correctly, the response will come from your local Llama model, not any cloud service. You'll notice the response latency depends on your hardware — expect 1-5 seconds on a modern MacBook, faster on machines with dedicated GPUs.

Step 5: Set Ollama as the Default Model#

If you want every DenchClaw operation — skills, agents, queries — to use Ollama, set it as the workspace default:

openclaw config set defaultModel ollama/llama3.2

You can override this per-session or per-command if needed.

Choosing the Right Model for CRM Tasks#

Not all models perform equally well on structured CRM tasks. Here's how the popular ones stack up for DenchClaw use cases:

Model	Size	Best For	Weaknesses
llama3.2	8B	General tasks, writing	Can be verbose
mistral	7B	Fast queries, structured output	Less creative writing
qwen2.5	7B	Code, data analysis	Slightly less conversational
phi4	3.8B	Low-RAM machines	Struggles with long context
llama3.1:70b	70B	Best quality	Needs 40GB+ RAM

For most users on a MacBook Pro M-series, llama3.2 is the sweet spot. If you're on a machine with 64GB+ RAM, consider llama3.1:70b for noticeably better reasoning.

Running Skills with Ollama#

DenchClaw's Skills system works with any configured model. When you run a skill — say, the CRM analyst or browser agent — it will use your Ollama model automatically.

One important note: complex Skills that require multi-step reasoning or long context windows perform better with larger models. If you find a Skill producing poor results, try switching to a bigger model for that session:

openclaw chat --model ollama/llama3.1:70b "Analyze all my leads from last month"

Troubleshooting Common Issues#

"Connection refused" when OpenClaw tries to reach Ollama

Ollama isn't running. Start it with ollama serve and check it responds at http://localhost:11434/api/tags.

Responses are very slow

You may be CPU-only. Check if Ollama is using your GPU:

ollama ps

On Apple Silicon, Ollama uses the Neural Engine automatically. On Linux, ensure your CUDA drivers are installed.

Model returns garbled or empty responses

Try a smaller or different model. Some models have known issues with certain prompt formats. mistral and llama3.2 are the most reliable for tool-use and structured output.

Context window errors

Ollama models have a default context window of 2048-4096 tokens. For longer conversations, configure a larger context:

ollama run llama3.2 --context-length 8192

Or set it in your Modelfile:

FROM llama3.2
PARAMETER num_ctx 8192

Performance Tips#

Keep Ollama running persistently — Use ollama serve as a background service (launchd on macOS, systemd on Linux) so there's no cold-start delay.
Preload your model — Run ollama run llama3.2 "" to load the model into memory before your first query.
Match model size to your RAM — Rule of thumb: model parameters × 2 = minimum RAM in GB. An 8B model needs ~16GB.
Use flash attention — Some models support OLLAMA_FLASH_ATTENTION=1 for faster inference on compatible hardware.

The Privacy Guarantee#

When OpenClaw uses Ollama, the data flow is entirely local:

Your CRM data (DuckDB)
    ↓
OpenClaw agent (local process)
    ↓
Ollama API (localhost:11434)
    ↓
Model inference (your CPU/GPU)
    ↓
Response back to OpenClaw

No step in this chain leaves your machine. Your contacts, pipeline data, notes, and queries stay on your hardware.

This is the core promise of DenchClaw's local-first architecture — the AI layer doesn't require cloud infrastructure to function.

FAQ#

Can I use multiple Ollama models simultaneously in DenchClaw?

Yes. You can configure different models for different tasks. Set a default in your config and override per-command with --model ollama/modelname. Ollama can serve multiple models, though only one runs actively at a time.

Does Ollama work on Windows with DenchClaw?

Yes. Ollama supports Windows, and OpenClaw runs on Node.js which is cross-platform. The setup steps are the same — just use the Windows installer from ollama.com.

What's the minimum hardware to run Ollama with DenchClaw?

A 7B model like Mistral runs acceptably on 8GB RAM (though 16GB is better). For a smooth experience with complex Skills, 16GB RAM and an M-series chip or NVIDIA GPU is recommended.

Can I use a quantized model to save RAM?

Yes. Ollama serves quantized models (Q4, Q5, Q8) by default for most pulls. You can specify quantization level: ollama pull llama3.2:8b-instruct-q4_K_M. Quantized models trade a small amount of quality for significantly less RAM usage.

Will DenchClaw Skills work the same with a local model?

Most Skills work well with capable 7B+ models. Skills that require complex reasoning, multi-step planning, or long context (like the full CRM analyst) perform best with 13B+ models. Simpler skills like weather or calendar checks work fine with small models.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →