DuckDB in Docker: Containerized Local Analytics
Run DuckDB in Docker for portable, reproducible analytics. Here's how to containerize DuckDB with a REST API, mount persistent volumes, and deploy to production.
DuckDB in Docker: Containerized Local Analytics
DuckDB runs perfectly in Docker. There's no server process to configure, no cluster to manage — just a single binary and your database file. Docker gives you reproducibility, portability, and easy deployment.
Here's how to containerize DuckDB for real workloads.
Basic DuckDB CLI Container#
The simplest starting point: a container with the DuckDB CLI:
# Dockerfile.cli
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y curl && \
curl -fsSL https://github.com/duckdb/duckdb/releases/latest/download/duckdb_cli-linux-amd64.zip \
-o duckdb.zip && \
unzip duckdb.zip && \
mv duckdb /usr/local/bin/ && \
rm duckdb.zip && \
apt-get clean
VOLUME /data
WORKDIR /data
ENTRYPOINT ["duckdb"]# Build
docker build -t duckdb-cli -f Dockerfile.cli .
# Run with a mounted volume
docker run -v $(pwd)/data:/data duckdb-cli /data/analytics.duckdb \
"SELECT COUNT(*) FROM read_csv_auto('events.csv');"
# Interactive mode
docker run -it -v $(pwd)/data:/data duckdb-cli /data/analytics.duckdbDuckDB REST API in Docker#
For a queryable DuckDB service:
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy API code
COPY api.py .
# Create data directory
RUN mkdir -p /data
VOLUME /data
ENV DUCKDB_PATH=/data/analytics.duckdb
ENV PORT=8000
EXPOSE 8000
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]# requirements.txt
fastapi==0.110.0
uvicorn==0.29.0
duckdb==0.10.0
pandas==2.2.0
pydantic==2.6.0# api.py
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import duckdb
import os
app = FastAPI(title="DuckDB API")
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
DB_PATH = os.environ.get('DUCKDB_PATH', '/data/analytics.duckdb')
con = duckdb.connect(DB_PATH, read_only=True)
class QueryRequest(BaseModel):
sql: str
@app.post("/query")
def query(req: QueryRequest):
try:
result = con.execute(req.sql).fetchdf()
return {"columns": result.columns.tolist(), "rows": result.values.tolist()}
except Exception as e:
raise HTTPException(400, str(e))
@app.get("/health")
def health():
con.execute("SELECT 1")
return {"status": "ok", "db": DB_PATH}# Build
docker build -t duckdb-api .
# Run with persistent volume
docker run -d \
-p 8000:8000 \
-v duckdb-data:/data \
-e DUCKDB_PATH=/data/analytics.duckdb \
--name duckdb-api \
duckdb-api
# Test
curl http://localhost:8000/health
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT 42 AS answer"}'Docker Compose: DuckDB + Dashboard#
Combine DuckDB with a dashboard:
# docker-compose.yml
version: '3.8'
services:
duckdb-api:
build: .
ports:
- "8000:8000"
volumes:
- duckdb-data:/data
environment:
DUCKDB_PATH: /data/analytics.duckdb
API_KEY: ${API_KEY}
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
depends_on:
- duckdb-api
restart: unless-stopped
volumes:
duckdb-data:
grafana-data:# Start everything
API_KEY=secret GRAFANA_PASSWORD=admin docker-compose up -d
# Check logs
docker-compose logs -f duckdb-apiETL Container: Load Data into DuckDB#
For a data loading pipeline:
# Dockerfile.etl
FROM python:3.11-slim
WORKDIR /app
RUN pip install duckdb pandas httpx boto3
COPY etl.py .
CMD ["python", "etl.py"]# etl.py
import duckdb
import pandas as pd
import os
import sys
DB_PATH = os.environ['DUCKDB_PATH']
SOURCE_URL = os.environ['SOURCE_URL']
print(f"Loading data from {SOURCE_URL} into {DB_PATH}")
# Write connection (not read-only)
con = duckdb.connect(DB_PATH)
# Load from S3 or HTTP
con.execute("""
INSTALL httpfs;
LOAD httpfs;
CREATE OR REPLACE TABLE events AS
SELECT * FROM read_parquet(?)
""", [SOURCE_URL])
count = con.execute("SELECT COUNT(*) FROM events").fetchone()[0]
print(f"Loaded {count:,} rows")
con.close()# Run ETL as a cron job in Kubernetes
apiVersion: batch/v1
kind: CronJob
metadata:
name: duckdb-etl
spec:
schedule: "0 * * * *" # Every hour
jobTemplate:
spec:
template:
spec:
containers:
- name: etl
image: my-registry/duckdb-etl:latest
env:
- name: DUCKDB_PATH
value: /data/analytics.duckdb
- name: SOURCE_URL
value: s3://my-bucket/events/latest.parquet
volumeMounts:
- name: duckdb-data
mountPath: /data
volumes:
- name: duckdb-data
persistentVolumeClaim:
claimName: duckdb-data-pvc
restartPolicy: OnFailurePersistent Volume Considerations#
DuckDB stores state in a file. Docker volume strategy:
# Named volume (recommended for production)
volumes:
duckdb-data:
driver: local
# Bind mount (for development — easier to access from host)
volumes:
- ./data:/data
# NFS mount (for multi-host access — read-only replicas only)
volumes:
duckdb-data:
driver: local
driver_opts:
type: nfs
o: addr=nfs-server.internal,ro
device: ":/exports/duckdb"Important: DuckDB's single-writer constraint applies across containers too. Never mount the same DuckDB file in write mode from two containers simultaneously.
DuckDB + DenchClaw in Docker#
DenchClaw can be run in Docker with its DuckDB database stored on a persistent volume:
# Run DenchClaw with persistent workspace
docker run -d \
-p 19001:19001 \
-p 3100:3100 \
-v denchclaw-workspace:/root/.openclaw-dench/workspace \
--name denchclaw \
denchclaw/denchclaw:latestThe DuckDB file lives at /root/.openclaw-dench/workspace/workspace.duckdb inside the container, persisted via the named volume.
Multi-Stage Build (Smaller Image)#
# Multi-stage: smaller final image
FROM python:3.11 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
FROM python:3.11-slim AS runtime
# Copy installed packages
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin/uvicorn /usr/local/bin/uvicorn
WORKDIR /app
COPY api.py .
RUN mkdir -p /data
VOLUME /data
ENV DUCKDB_PATH=/data/analytics.duckdb
EXPOSE 8000
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]This reduces the image size from ~1.2GB to ~200MB.
Frequently Asked Questions#
Can I run multiple DuckDB containers sharing the same database?#
For reads: yes, mount the volume read-only in each container. For writes: only one container can write at a time — use a separate writer container.
How do I back up a DuckDB volume in Docker?#
# Backup
docker run --rm -v duckdb-data:/data -v $(pwd)/backups:/backups \
ubuntu tar czf /backups/duckdb-backup-$(date +%Y%m%d).tar.gz /data
# Restore
docker run --rm -v duckdb-data:/data -v $(pwd)/backups:/backups \
ubuntu tar xzf /backups/duckdb-backup-20260326.tar.gz -C /What resources does a DuckDB Docker container need?#
DuckDB uses CPU and RAM aggressively. Allocate at least 2 CPU cores and 4GB RAM for production analytics workloads. Set memory_limit in DuckDB to 80% of the container's memory limit.
Can I run DuckDB in Docker on Apple Silicon?#
Yes. Use --platform linux/arm64 or let Docker automatically select the ARM build.
How do I expose DuckDB metrics to Prometheus?#
Wrap the DuckDB API in a metrics endpoint that queries pragma_memory_usage() and duckdb_queries(), then expose it in Prometheus format.
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
