OpenClaw with Llama: Running Open Source LLMs Locally

OpenClaw + Meta's Llama models: run Llama 3.2 locally with your DenchClaw CRM for a fully open-source, private AI stack. Complete setup guide.

Mark Rachapoom

March 26, 2026·8 min read

OpenClaw runs with Meta's Llama models out of the box. If you want a fully open-source AI CRM stack — DenchClaw for the agent layer, DuckDB for storage, and Llama for the model — you can have it running in under 30 minutes. This guide covers every step from downloading Llama to running your first CRM query.

Why Llama with DenchClaw?#

Meta's Llama series has become the de facto foundation for open-source LLM deployment:

Truly open weights: Llama 3 is available under Meta's community license for research and commercial use
Massive ecosystem: More fine-tunes, LoRAs, and quantized variants than any other open model family
Strong performance: Llama 3.1 8B and 70B are competitive with much larger proprietary models
Hardware flexibility: The 8B model runs on consumer hardware; 70B requires a workstation
Active development: Meta releases frequent improvements and new capabilities

For DenchClaw users who care about keeping their stack fully open-source and auditable, Llama is the natural choice.

Llama Variants: Which One to Use#

Meta releases Llama in several configurations. Here's what matters for DenchClaw:

Model	Parameters	RAM Needed (Q4)	Best For
Llama 3.2 3B	3B	~3GB	Very low-resource machines
Llama 3.2 8B	8B	~6GB	Most laptops, daily use
Llama 3.1 8B	8B	~6GB	Slightly older but very stable
Llama 3.1 70B	70B	~40GB	Workstations, best quality
Llama 3.1 405B	405B	200GB+	Cloud inference only

Start with Llama 3.2 8B for most setups. It's the newest, best-performing small model and handles the vast majority of DenchClaw tasks well.

Step 1: Install Ollama#

Ollama is the simplest way to run Llama locally. It handles model downloads, GPU acceleration, and the local API server.

# macOS
brew install ollama
 
# Linux
curl -fsSL https://ollama.com/install.sh | sh
 
# Windows
# Download from https://ollama.com

Start Ollama:

ollama serve

Step 2: Pull a Llama Model#

# Recommended: Llama 3.2 8B (fast, capable)
ollama pull llama3.2
 
# Alternative: Llama 3.1 8B (stable, widely tested)
ollama pull llama3.1
 
# For more capable reasoning (needs 40GB+ RAM)
ollama pull llama3.1:70b
 
# Minimal — for low-RAM machines
ollama pull llama3.2:3b

Verify the model downloaded and works:

ollama run llama3.2 "Hello, are you working?"

Step 3: Configure OpenClaw to Use Llama#

Open your OpenClaw config:

openclaw config

Add this model configuration:

{
  "model": {
    "provider": "ollama",
    "baseUrl": "http://localhost:11434/v1",
    "model": "llama3.2",
    "apiKey": "ollama"
  }
}

Apply and restart:

openclaw restart

Step 4: Set Llama as Your Default#

openclaw config set defaultModel ollama/llama3.2

Now every DenchClaw operation — Skills, queries, agent tasks — uses your local Llama model.

Step 5: Run Your First CRM Query#

openclaw chat "How many contacts are in my CRM and what's the breakdown by status?"

Watch the response come from your local Llama model. On Apple Silicon M-series chips, expect 20-40 tokens/second for the 8B model — fast enough for interactive use.

Getting the Most from Llama's Instruction Models#

Llama comes in two flavors: base models and instruct models. Always use instruct models for DenchClaw. The base models are pre-trained text completers; the instruct variants are fine-tuned to follow directions.

When using Ollama, the default llama3.2 pull fetches the instruct version. To be explicit:

ollama pull llama3.2:8b-instruct-q4_K_M

The q4_K_M suffix specifies the quantization level — a 4-bit quantization with medium quality. This is the standard choice balancing size and capability.

Quantization Levels Explained#

Llama models are distributed in multiple quantization levels. Higher quantization = smaller file, slightly less capable:

Quantization	Size (8B)	Quality	Recommended For
Q8_0	~9GB	Best	If you have the RAM
Q6_K	~7GB	Excellent	16GB+ RAM machines
Q5_K_M	~6GB	Very good	12GB+ RAM
Q4_K_M	~5GB	Good	8GB+ RAM (standard)
Q3_K_M	~4GB	Acceptable	8GB constrained machines
Q2_K	~3GB	Poor	Not recommended

For most DenchClaw use, Q4_K_M is the right choice. Only drop lower if you're RAM-constrained. Going lower than Q3 noticeably hurts instruction-following quality.

Optimizing Llama for CRM Agent Tasks#

Extend the Context Window#

Llama 3.x models support up to 128K tokens context, but Ollama defaults to 2048. For DenchClaw's agent workflows, you want more:

Create a Modelfile:

cat > ~/llama-crm.modelfile << 'EOF'
FROM llama3.2
PARAMETER num_ctx 8192
PARAMETER temperature 0.2
PARAMETER repeat_penalty 1.1
EOF

Create a custom model:

ollama create llama-crm -f ~/llama-crm.modelfile

Use it in OpenClaw:

openclaw config set model ollama/llama-crm

Lower Temperature for Structured Tasks#

CRM tasks benefit from deterministic, structured output. Lower temperature reduces randomness:

PARAMETER temperature 0.1

For creative tasks (writing emails, summaries), increase to 0.6-0.8.

Enable GPU Acceleration#

Ollama uses GPU automatically on supported hardware. Verify it's active:

ollama ps

Look for the GPU utilization column. On Apple Silicon, this uses the Neural Engine. On NVIDIA cards, it uses CUDA. If you're CPU-only and have a compatible GPU, check driver installation.

Running Llama 70B for Complex Analysis#

When 8B isn't enough — for deep pipeline analysis, nuanced customer research, or complex reasoning — Llama 3.1 70B is a significant step up and still fully local.

Requirements: 40GB RAM minimum for Q4 quantization. A Mac Studio or MacBook Pro with 64GB+ runs it well.

ollama pull llama3.1:70b
openclaw chat --model ollama/llama3.1:70b "Analyze my entire sales pipeline and create a prioritized action plan"

The quality difference is substantial for complex tasks. Many users keep Llama 8B as their default and switch to 70B for strategic analysis sessions.

The Fully Open-Source Stack#

With Llama running locally, your entire DenchClaw setup can be open-source and auditable:

Layer	Tool	License
Agent framework	OpenClaw	MIT
CRM layer	DenchClaw	MIT
Database	DuckDB	MIT
Model runner	Ollama	MIT
Model	Llama 3.x	Meta Community License

Every component is inspectable. You can audit the code, verify data flows, and know exactly what's happening on your machine.

This is significantly different from cloud CRM tools where you're trusting vendor systems you can't inspect. With DenchClaw's architecture, trust is earned by design — there's nothing to trust blindly because everything is visible.

Skills with Local Llama#

DenchClaw's Skills system works with local Llama models. A few notes:

Simple Skills (weather, calendar, basic CRM queries): work fine with Llama 8B.

Complex Skills (multi-step research, bulk enrichment, data analysis): perform better with Llama 70B or a fine-tuned 8B variant.

Skills requiring reliable JSON output: configure a system prompt that explicitly requests JSON output format. Llama 3.x is generally good at this but benefits from explicit instructions.

Troubleshooting#

Llama gives repetitive or looping output

Add repeat_penalty 1.1 to your Modelfile. This reduces repetition loops that some Llama models fall into on long generations.

Slow token generation

Check if GPU is being used:

ollama ps

If showing 0% GPU, ensure Ollama has permissions to access your GPU. On macOS, this should work automatically. On Linux, verify CUDA drivers.

Model outputs wrong format for Skills

Try adding an explicit system prompt. Create a Modelfile with:

SYSTEM """You are a precise assistant. Always respond with valid JSON when asked for structured output. Never add explanations around JSON blocks."""

RAM usage too high

Switch to a lower quantization (Q4_K_M → Q3_K_M) or use the 3B model. Also check if other applications are consuming RAM.

FAQ#

Can I fine-tune Llama on my own CRM data?

Yes, though this is advanced. You can create a LoRA fine-tune using tools like Unsloth or Axolotl on your own data to specialize the model for your CRM domain. The resulting fine-tune can be served via Ollama.

Is Meta's community license compatible with commercial use?

Generally yes for companies under 700M monthly active users. Read the Llama 3 license for specifics. For most DenchClaw users running an internal CRM, commercial use is permitted.

How often does Meta release new Llama versions?

Meta has released major versions roughly annually. Incremental updates and fine-tunes from the community appear continuously. Follow Ollama's model library for the latest available versions.

Can I use Llama with the Mistral runner format?

GGUF files work across runners. If you have a Llama GGUF file, you can load it in LM Studio or Ollama regardless of origin. See the LM Studio guide for loading custom GGUF files.

What's the difference between Llama 3.1 and 3.2?

Llama 3.2 adds multimodal capabilities (image understanding) to smaller model sizes, plus improvements to the instruct fine-tuning. For text-only CRM tasks, the difference is modest; Llama 3.1 70B remains strong for high-complexity reasoning.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →