OpenClaw with Llama: Running Open Source LLMs Locally
OpenClaw + Meta's Llama models: run Llama 3.2 locally with your DenchClaw CRM for a fully open-source, private AI stack. Complete setup guide.
OpenClaw runs with Meta's Llama models out of the box. If you want a fully open-source AI CRM stack — DenchClaw for the agent layer, DuckDB for storage, and Llama for the model — you can have it running in under 30 minutes. This guide covers every step from downloading Llama to running your first CRM query.
Why Llama with DenchClaw?#
Meta's Llama series has become the de facto foundation for open-source LLM deployment:
- Truly open weights: Llama 3 is available under Meta's community license for research and commercial use
- Massive ecosystem: More fine-tunes, LoRAs, and quantized variants than any other open model family
- Strong performance: Llama 3.1 8B and 70B are competitive with much larger proprietary models
- Hardware flexibility: The 8B model runs on consumer hardware; 70B requires a workstation
- Active development: Meta releases frequent improvements and new capabilities
For DenchClaw users who care about keeping their stack fully open-source and auditable, Llama is the natural choice.
Llama Variants: Which One to Use#
Meta releases Llama in several configurations. Here's what matters for DenchClaw:
| Model | Parameters | RAM Needed (Q4) | Best For |
|---|---|---|---|
| Llama 3.2 3B | 3B | ~3GB | Very low-resource machines |
| Llama 3.2 8B | 8B | ~6GB | Most laptops, daily use |
| Llama 3.1 8B | 8B | ~6GB | Slightly older but very stable |
| Llama 3.1 70B | 70B | ~40GB | Workstations, best quality |
| Llama 3.1 405B | 405B | 200GB+ | Cloud inference only |
Start with Llama 3.2 8B for most setups. It's the newest, best-performing small model and handles the vast majority of DenchClaw tasks well.
Step 1: Install Ollama#
Ollama is the simplest way to run Llama locally. It handles model downloads, GPU acceleration, and the local API server.
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows
# Download from https://ollama.comStart Ollama:
ollama serveStep 2: Pull a Llama Model#
# Recommended: Llama 3.2 8B (fast, capable)
ollama pull llama3.2
# Alternative: Llama 3.1 8B (stable, widely tested)
ollama pull llama3.1
# For more capable reasoning (needs 40GB+ RAM)
ollama pull llama3.1:70b
# Minimal — for low-RAM machines
ollama pull llama3.2:3bVerify the model downloaded and works:
ollama run llama3.2 "Hello, are you working?"Step 3: Configure OpenClaw to Use Llama#
Open your OpenClaw config:
openclaw configAdd this model configuration:
{
"model": {
"provider": "ollama",
"baseUrl": "http://localhost:11434/v1",
"model": "llama3.2",
"apiKey": "ollama"
}
}Apply and restart:
openclaw restartStep 4: Set Llama as Your Default#
openclaw config set defaultModel ollama/llama3.2Now every DenchClaw operation — Skills, queries, agent tasks — uses your local Llama model.
Step 5: Run Your First CRM Query#
openclaw chat "How many contacts are in my CRM and what's the breakdown by status?"Watch the response come from your local Llama model. On Apple Silicon M-series chips, expect 20-40 tokens/second for the 8B model — fast enough for interactive use.
Getting the Most from Llama's Instruction Models#
Llama comes in two flavors: base models and instruct models. Always use instruct models for DenchClaw. The base models are pre-trained text completers; the instruct variants are fine-tuned to follow directions.
When using Ollama, the default llama3.2 pull fetches the instruct version. To be explicit:
ollama pull llama3.2:8b-instruct-q4_K_MThe q4_K_M suffix specifies the quantization level — a 4-bit quantization with medium quality. This is the standard choice balancing size and capability.
Quantization Levels Explained#
Llama models are distributed in multiple quantization levels. Higher quantization = smaller file, slightly less capable:
| Quantization | Size (8B) | Quality | Recommended For |
|---|---|---|---|
| Q8_0 | ~9GB | Best | If you have the RAM |
| Q6_K | ~7GB | Excellent | 16GB+ RAM machines |
| Q5_K_M | ~6GB | Very good | 12GB+ RAM |
| Q4_K_M | ~5GB | Good | 8GB+ RAM (standard) |
| Q3_K_M | ~4GB | Acceptable | 8GB constrained machines |
| Q2_K | ~3GB | Poor | Not recommended |
For most DenchClaw use, Q4_K_M is the right choice. Only drop lower if you're RAM-constrained. Going lower than Q3 noticeably hurts instruction-following quality.
Optimizing Llama for CRM Agent Tasks#
Extend the Context Window#
Llama 3.x models support up to 128K tokens context, but Ollama defaults to 2048. For DenchClaw's agent workflows, you want more:
Create a Modelfile:
cat > ~/llama-crm.modelfile << 'EOF'
FROM llama3.2
PARAMETER num_ctx 8192
PARAMETER temperature 0.2
PARAMETER repeat_penalty 1.1
EOFCreate a custom model:
ollama create llama-crm -f ~/llama-crm.modelfileUse it in OpenClaw:
openclaw config set model ollama/llama-crmLower Temperature for Structured Tasks#
CRM tasks benefit from deterministic, structured output. Lower temperature reduces randomness:
PARAMETER temperature 0.1
For creative tasks (writing emails, summaries), increase to 0.6-0.8.
Enable GPU Acceleration#
Ollama uses GPU automatically on supported hardware. Verify it's active:
ollama psLook for the GPU utilization column. On Apple Silicon, this uses the Neural Engine. On NVIDIA cards, it uses CUDA. If you're CPU-only and have a compatible GPU, check driver installation.
Running Llama 70B for Complex Analysis#
When 8B isn't enough — for deep pipeline analysis, nuanced customer research, or complex reasoning — Llama 3.1 70B is a significant step up and still fully local.
Requirements: 40GB RAM minimum for Q4 quantization. A Mac Studio or MacBook Pro with 64GB+ runs it well.
ollama pull llama3.1:70b
openclaw chat --model ollama/llama3.1:70b "Analyze my entire sales pipeline and create a prioritized action plan"The quality difference is substantial for complex tasks. Many users keep Llama 8B as their default and switch to 70B for strategic analysis sessions.
The Fully Open-Source Stack#
With Llama running locally, your entire DenchClaw setup can be open-source and auditable:
| Layer | Tool | License |
|---|---|---|
| Agent framework | OpenClaw | MIT |
| CRM layer | DenchClaw | MIT |
| Database | DuckDB | MIT |
| Model runner | Ollama | MIT |
| Model | Llama 3.x | Meta Community License |
Every component is inspectable. You can audit the code, verify data flows, and know exactly what's happening on your machine.
This is significantly different from cloud CRM tools where you're trusting vendor systems you can't inspect. With DenchClaw's architecture, trust is earned by design — there's nothing to trust blindly because everything is visible.
Skills with Local Llama#
DenchClaw's Skills system works with local Llama models. A few notes:
Simple Skills (weather, calendar, basic CRM queries): work fine with Llama 8B.
Complex Skills (multi-step research, bulk enrichment, data analysis): perform better with Llama 70B or a fine-tuned 8B variant.
Skills requiring reliable JSON output: configure a system prompt that explicitly requests JSON output format. Llama 3.x is generally good at this but benefits from explicit instructions.
Troubleshooting#
Llama gives repetitive or looping output
Add repeat_penalty 1.1 to your Modelfile. This reduces repetition loops that some Llama models fall into on long generations.
Slow token generation
Check if GPU is being used:
ollama psIf showing 0% GPU, ensure Ollama has permissions to access your GPU. On macOS, this should work automatically. On Linux, verify CUDA drivers.
Model outputs wrong format for Skills
Try adding an explicit system prompt. Create a Modelfile with:
SYSTEM """You are a precise assistant. Always respond with valid JSON when asked for structured output. Never add explanations around JSON blocks."""
RAM usage too high
Switch to a lower quantization (Q4_K_M → Q3_K_M) or use the 3B model. Also check if other applications are consuming RAM.
FAQ#
Can I fine-tune Llama on my own CRM data?
Yes, though this is advanced. You can create a LoRA fine-tune using tools like Unsloth or Axolotl on your own data to specialize the model for your CRM domain. The resulting fine-tune can be served via Ollama.
Is Meta's community license compatible with commercial use?
Generally yes for companies under 700M monthly active users. Read the Llama 3 license for specifics. For most DenchClaw users running an internal CRM, commercial use is permitted.
How often does Meta release new Llama versions?
Meta has released major versions roughly annually. Incremental updates and fine-tunes from the community appear continuously. Follow Ollama's model library for the latest available versions.
Can I use Llama with the Mistral runner format?
GGUF files work across runners. If you have a Llama GGUF file, you can load it in LM Studio or Ollama regardless of origin. See the LM Studio guide for loading custom GGUF files.
What's the difference between Llama 3.1 and 3.2?
Llama 3.2 adds multimodal capabilities (image understanding) to smaller model sizes, plus improvements to the instruct fine-tuning. For text-only CRM tasks, the difference is modest; Llama 3.1 70B remains strong for high-complexity reasoning.
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
