Back to The Times of Claw

DuckDB in Docker: Containerized Local Analytics

Run DuckDB in Docker for portable, reproducible analytics. Here's how to containerize DuckDB with a REST API, mount persistent volumes, and deploy to production.

Mark Rachapoom
Mark Rachapoom
·5 min read
DuckDB in Docker: Containerized Local Analytics

DuckDB in Docker: Containerized Local Analytics

DuckDB runs perfectly in Docker. There's no server process to configure, no cluster to manage — just a single binary and your database file. Docker gives you reproducibility, portability, and easy deployment.

Here's how to containerize DuckDB for real workloads.

Basic DuckDB CLI Container#

The simplest starting point: a container with the DuckDB CLI:

# Dockerfile.cli
FROM ubuntu:22.04
 
RUN apt-get update && apt-get install -y curl && \
    curl -fsSL https://github.com/duckdb/duckdb/releases/latest/download/duckdb_cli-linux-amd64.zip \
    -o duckdb.zip && \
    unzip duckdb.zip && \
    mv duckdb /usr/local/bin/ && \
    rm duckdb.zip && \
    apt-get clean
 
VOLUME /data
WORKDIR /data
 
ENTRYPOINT ["duckdb"]
# Build
docker build -t duckdb-cli -f Dockerfile.cli .
 
# Run with a mounted volume
docker run -v $(pwd)/data:/data duckdb-cli /data/analytics.duckdb \
  "SELECT COUNT(*) FROM read_csv_auto('events.csv');"
 
# Interactive mode
docker run -it -v $(pwd)/data:/data duckdb-cli /data/analytics.duckdb

DuckDB REST API in Docker#

For a queryable DuckDB service:

# Dockerfile
FROM python:3.11-slim
 
WORKDIR /app
 
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
 
# Copy API code
COPY api.py .
 
# Create data directory
RUN mkdir -p /data
 
VOLUME /data
 
ENV DUCKDB_PATH=/data/analytics.duckdb
ENV PORT=8000
 
EXPOSE 8000
 
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]
# requirements.txt
fastapi==0.110.0
uvicorn==0.29.0
duckdb==0.10.0
pandas==2.2.0
pydantic==2.6.0
# api.py
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import duckdb
import os
 
app = FastAPI(title="DuckDB API")
 
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)
 
DB_PATH = os.environ.get('DUCKDB_PATH', '/data/analytics.duckdb')
con = duckdb.connect(DB_PATH, read_only=True)
 
class QueryRequest(BaseModel):
    sql: str
 
@app.post("/query")
def query(req: QueryRequest):
    try:
        result = con.execute(req.sql).fetchdf()
        return {"columns": result.columns.tolist(), "rows": result.values.tolist()}
    except Exception as e:
        raise HTTPException(400, str(e))
 
@app.get("/health")
def health():
    con.execute("SELECT 1")
    return {"status": "ok", "db": DB_PATH}
# Build
docker build -t duckdb-api .
 
# Run with persistent volume
docker run -d \
  -p 8000:8000 \
  -v duckdb-data:/data \
  -e DUCKDB_PATH=/data/analytics.duckdb \
  --name duckdb-api \
  duckdb-api
 
# Test
curl http://localhost:8000/health
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT 42 AS answer"}'

Docker Compose: DuckDB + Dashboard#

Combine DuckDB with a dashboard:

# docker-compose.yml
version: '3.8'
 
services:
  duckdb-api:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - duckdb-data:/data
    environment:
      DUCKDB_PATH: /data/analytics.duckdb
      API_KEY: ${API_KEY}
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
 
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
    depends_on:
      - duckdb-api
    restart: unless-stopped
 
volumes:
  duckdb-data:
  grafana-data:
# Start everything
API_KEY=secret GRAFANA_PASSWORD=admin docker-compose up -d
 
# Check logs
docker-compose logs -f duckdb-api

ETL Container: Load Data into DuckDB#

For a data loading pipeline:

# Dockerfile.etl
FROM python:3.11-slim
 
WORKDIR /app
 
RUN pip install duckdb pandas httpx boto3
 
COPY etl.py .
 
CMD ["python", "etl.py"]
# etl.py
import duckdb
import pandas as pd
import os
import sys
 
DB_PATH = os.environ['DUCKDB_PATH']
SOURCE_URL = os.environ['SOURCE_URL']
 
print(f"Loading data from {SOURCE_URL} into {DB_PATH}")
 
# Write connection (not read-only)
con = duckdb.connect(DB_PATH)
 
# Load from S3 or HTTP
con.execute("""
    INSTALL httpfs;
    LOAD httpfs;
    
    CREATE OR REPLACE TABLE events AS
    SELECT * FROM read_parquet(?)
""", [SOURCE_URL])
 
count = con.execute("SELECT COUNT(*) FROM events").fetchone()[0]
print(f"Loaded {count:,} rows")
con.close()
# Run ETL as a cron job in Kubernetes
apiVersion: batch/v1
kind: CronJob
metadata:
  name: duckdb-etl
spec:
  schedule: "0 * * * *"  # Every hour
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: etl
            image: my-registry/duckdb-etl:latest
            env:
            - name: DUCKDB_PATH
              value: /data/analytics.duckdb
            - name: SOURCE_URL
              value: s3://my-bucket/events/latest.parquet
            volumeMounts:
            - name: duckdb-data
              mountPath: /data
          volumes:
          - name: duckdb-data
            persistentVolumeClaim:
              claimName: duckdb-data-pvc
          restartPolicy: OnFailure

Persistent Volume Considerations#

DuckDB stores state in a file. Docker volume strategy:

# Named volume (recommended for production)
volumes:
  duckdb-data:
    driver: local
 
# Bind mount (for development — easier to access from host)
volumes:
  - ./data:/data
 
# NFS mount (for multi-host access — read-only replicas only)
volumes:
  duckdb-data:
    driver: local
    driver_opts:
      type: nfs
      o: addr=nfs-server.internal,ro
      device: ":/exports/duckdb"

Important: DuckDB's single-writer constraint applies across containers too. Never mount the same DuckDB file in write mode from two containers simultaneously.

DuckDB + DenchClaw in Docker#

DenchClaw can be run in Docker with its DuckDB database stored on a persistent volume:

# Run DenchClaw with persistent workspace
docker run -d \
  -p 19001:19001 \
  -p 3100:3100 \
  -v denchclaw-workspace:/root/.openclaw-dench/workspace \
  --name denchclaw \
  denchclaw/denchclaw:latest

The DuckDB file lives at /root/.openclaw-dench/workspace/workspace.duckdb inside the container, persisted via the named volume.

Multi-Stage Build (Smaller Image)#

# Multi-stage: smaller final image
FROM python:3.11 AS builder
 
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
 
FROM python:3.11-slim AS runtime
 
# Copy installed packages
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin/uvicorn /usr/local/bin/uvicorn
 
WORKDIR /app
COPY api.py .
 
RUN mkdir -p /data
VOLUME /data
 
ENV DUCKDB_PATH=/data/analytics.duckdb
 
EXPOSE 8000
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]

This reduces the image size from ~1.2GB to ~200MB.

Frequently Asked Questions#

Can I run multiple DuckDB containers sharing the same database?#

For reads: yes, mount the volume read-only in each container. For writes: only one container can write at a time — use a separate writer container.

How do I back up a DuckDB volume in Docker?#

# Backup
docker run --rm -v duckdb-data:/data -v $(pwd)/backups:/backups \
  ubuntu tar czf /backups/duckdb-backup-$(date +%Y%m%d).tar.gz /data
 
# Restore
docker run --rm -v duckdb-data:/data -v $(pwd)/backups:/backups \
  ubuntu tar xzf /backups/duckdb-backup-20260326.tar.gz -C /

What resources does a DuckDB Docker container need?#

DuckDB uses CPU and RAM aggressively. Allocate at least 2 CPU cores and 4GB RAM for production analytics workloads. Set memory_limit in DuckDB to 80% of the container's memory limit.

Can I run DuckDB in Docker on Apple Silicon?#

Yes. Use --platform linux/arm64 or let Docker automatically select the ARM build.

How do I expose DuckDB metrics to Prometheus?#

Wrap the DuckDB API in a metrics endpoint that queries pragma_memory_usage() and duckdb_queries(), then expose it in Prometheus format.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →

Mark Rachapoom

Written by

Mark Rachapoom

Building the future of AI CRM software.

Continue reading

DENCH

© 2026 DenchHQ · San Francisco, CA