Back to The Times of Claw

The Ultimate Guide to DuckDB

DuckDB is the fastest embedded analytical database available. This guide covers what DuckDB is, how it works, key use cases, and how to get started.

Mark Rachapoom
Mark Rachapoom
·8 min read
The Ultimate Guide to DuckDB

DuckDB is an in-process analytical database management system. It's fast, zero-dependency, and designed for the workloads that matter most in data analysis and AI applications. If you've heard it described as "SQLite for analytics," that's a fair starting point — but DuckDB is much more capable than that comparison implies.

What Is DuckDB?#

DuckDB is an open-source, in-process SQL OLAP (Online Analytical Processing) database. Key characteristics:

  • In-process: Runs inside your application, no separate database server needed
  • Embedded: Stores data in a single .duckdb file
  • Columnar: Data is stored column-by-column, ideal for analytical queries
  • SQL-complete: Supports the full SQL standard including window functions, CTEs, PIVOT
  • Fast: Vectorized execution engine, typically 10-100x faster than SQLite for analytical queries
  • Open source: MIT-licensed, actively maintained by DuckDB Labs

DuckDB was created at CWI Amsterdam (the same lab that created MonetDB) and first released in 2019. It reached 1.0 in 2024.

DuckDB vs. SQLite#

The most common comparison: DuckDB and SQLite are both embedded, single-file databases with no server required. They're designed for different workloads.

AspectSQLiteDuckDB
Storage formatRow-orientedColumnar
Best workloadTransactional (OLTP)Analytical (OLAP)
Concurrent writesLimitedLimited
Analytical query speedAdequateExcellent (10-100x faster)
Window functionsLimitedFull support
PIVOT/UNPIVOTManualNative
JSON handlingBasicAdvanced
Parquet supportVia extensionNative
Memory useVery lowModerate
MaturityDecadesSince 2019

Choose SQLite when: Your workload is primarily transactional (inserts/updates/single-row reads). You need the most mature embedded database.

Choose DuckDB when: Your workload involves analytics, aggregations, complex joins, or reading wide datasets. You need full SQL including window functions and PIVOT.

DenchClaw uses DuckDB because its CRM workload is predominantly analytical — pipeline reports, contact search, activity aggregations, funnel analysis.

Core DuckDB Concepts#

Column Store#

DuckDB stores data in columns rather than rows. For analytical queries that read few columns across many rows, this is dramatically more efficient.

If you have a table with 20 columns and 100,000 rows, and you query 3 of those columns:

  • Row store: reads all 20 columns × 100,000 rows
  • Column store: reads only 3 columns × 100,000 rows

For DenchClaw's CRM with many fields per contact, columnar storage means queries that filter on 2-3 fields read 10-20% of the data a row-store would read.

Vectorized Execution#

DuckDB processes data in batches (vectors) of 1,024 rows at a time, using SIMD instructions for parallel processing. This makes even simple operations significantly faster than row-by-row processing.

SQL Completeness#

DuckDB supports the full SQL standard. Key features beyond what SQLite provides:

Window functions:

SELECT 
  rep_name,
  deal_value,
  SUM(deal_value) OVER (PARTITION BY rep_name) as rep_total,
  RANK() OVER (PARTITION BY rep_name ORDER BY deal_value DESC) as rank_in_rep
FROM deals;

Native PIVOT:

PIVOT deals
ON stage
USING SUM(value) AS total_value, COUNT(*) AS deal_count
GROUP BY quarter;

Recursive CTEs:

WITH RECURSIVE org_hierarchy AS (
  SELECT id, manager_id, name, 0 as level FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.manager_id, e.name, h.level + 1
  FROM employees e JOIN org_hierarchy h ON e.manager_id = h.id
)
SELECT * FROM org_hierarchy;

Direct File Reading#

DuckDB can read external files directly without importing:

-- Read a CSV without importing
SELECT * FROM read_csv('contacts.csv');
 
-- Read a Parquet file
SELECT COUNT(*) FROM 'data.parquet';
 
-- Read all Parquet files in a directory
SELECT * FROM 'data/*.parquet';

This makes DuckDB excellent for data analysis pipelines where you want to query files directly.

Getting Started with DuckDB#

Installation#

# macOS
brew install duckdb
 
# Python
pip install duckdb
 
# Node.js
npm install duckdb
 
# Download CLI
# https://duckdb.org/docs/installation/

Basic Usage (CLI)#

# Start DuckDB CLI
duckdb mydata.duckdb
 
# Create a table
CREATE TABLE contacts (
  id INTEGER PRIMARY KEY,
  name VARCHAR,
  email VARCHAR,
  company VARCHAR,
  created_at TIMESTAMP DEFAULT NOW()
);
 
# Insert data
INSERT INTO contacts VALUES (1, 'Sarah Chen', 'sarah@stripe.com', 'Stripe', DEFAULT);
 
# Query
SELECT * FROM contacts WHERE company = 'Stripe';

Python API#

import duckdb
 
# Connect to file (or :memory: for in-memory)
conn = duckdb.connect('mydata.duckdb')
 
# Create table from CSV
conn.execute("CREATE TABLE contacts AS SELECT * FROM read_csv('contacts.csv')")
 
# Analytical query
result = conn.execute("""
  SELECT company, COUNT(*) as contacts, AVG(deal_value) as avg_deal
  FROM contacts c JOIN deals d ON c.company = d.company
  GROUP BY company
  ORDER BY avg_deal DESC
""").fetchdf()  # Returns Pandas DataFrame
 
print(result)

Node.js API#

import Database from 'duckdb';
 
const db = new Database(':memory:');
const conn = db.connect();
 
conn.exec(`
  CREATE TABLE contacts AS SELECT * FROM read_csv('/path/to/contacts.csv');
`);
 
conn.all(`
  SELECT company, COUNT(*) as count FROM contacts GROUP BY company ORDER BY count DESC
`, (err, result) => {
  console.log(result);
});

DenchClaw uses DuckDB via Node.js for all CRM operations.

DuckDB for CRM Applications#

DenchClaw's use of DuckDB demonstrates its fitness for CRM workloads.

The EAV + PIVOT Pattern#

DenchClaw uses Entity-Attribute-Value storage with PIVOT views. This allows custom fields without schema migrations:

-- EAV storage
CREATE TABLE entry_fields (
  entry_id INTEGER,
  field_id INTEGER,
  field_value VARCHAR
);
 
-- PIVOT view: flat access
CREATE VIEW v_people AS
PIVOT entry_fields
ON field_name IN ('Full Name', 'Email', 'Company', 'Status', 'Phone')
USING FIRST(field_value)
GROUP BY entry_id;

This query that would be painful in SQLite is native and fast in DuckDB.

DuckDB's FTS extension enables fast full-text search:

INSTALL fts;
LOAD fts;
 
PRAGMA create_fts_index('v_people', 'entry_id', 'Full Name', 'Company');
 
SELECT * FROM v_people WHERE fts_main_v_people.match_bm25(entry_id, 'stripe engineer') IS NOT NULL;

Analytical Pipeline Queries#

-- Pipeline by stage with conversion
SELECT 
  stage,
  COUNT(*) as deal_count,
  SUM(CAST(value AS DECIMAL)) as total_value,
  AVG(EXTRACT(day FROM NOW() - created_at)) as avg_days_in_stage
FROM v_deals
WHERE status = 'Active'
GROUP BY stage
ORDER BY total_value DESC;

This returns in 2-3ms on a typical DenchClaw dataset. A similar query in a cloud CRM API takes 200-500ms.

DuckDB Extensions#

DuckDB has a rich extension ecosystem:

ExtensionPurpose
httpfsRead files from HTTP/S3/GCS
ftsFull-text search
parquetParquet file support (built-in)
jsonJSON processing (built-in)
spatialGeospatial operations
excelRead Excel files
sqliteRead SQLite databases
postgresConnect to PostgreSQL
INSTALL httpfs;
LOAD httpfs;
 
-- Query a Parquet file directly from S3
SELECT * FROM 's3://my-bucket/data.parquet' LIMIT 10;

DuckDB in Production#

When DuckDB Is Right for Production#

DuckDB is appropriate for production when:

  • Workload is primarily analytical (read-heavy)
  • Single-writer, multi-reader pattern
  • Local or single-machine deployment
  • Dataset fits comfortably on disk

DenchClaw is a production DuckDB deployment — a personal CRM with 10,000-100,000 records running on your laptop.

Limitations#

DuckDB is not appropriate for:

  • High-concurrency multi-writer workloads
  • Distributed systems requiring replication
  • Very large datasets (multi-terabyte) on memory-constrained hardware
  • Applications requiring strong ACID guarantees across processes

For these use cases, PostgreSQL or ClickHouse are more appropriate.

Backup Strategies#

# Simple file copy (works when DuckDB is not connected)
cp workspace.duckdb workspace.duckdb.backup
 
# Export to Parquet (while connected)
EXPORT DATABASE 'backup_dir/' (FORMAT PARQUET);
 
# Export specific tables
COPY (SELECT * FROM v_people) TO 'people_backup.parquet';

Frequently Asked Questions#

How fast is DuckDB really?#

On analytical queries (GROUP BY, aggregations, full-table scans), DuckDB is typically 10-100x faster than SQLite. On transactional workloads (single-row inserts/updates), SQLite is faster. DuckDB's speed for analytics is what makes DenchClaw's instant CRM queries possible.

Can DuckDB handle millions of records?#

Yes. DuckDB handles hundreds of millions of rows well on modern hardware. For a CRM use case (even enterprise scale), DuckDB is more than sufficient.

Is DuckDB production-stable?#

Yes, since the 1.0 release in 2024. It's used in production by major data platforms and analytics companies.

What's the difference between DuckDB and ClickHouse?#

Both are columnar analytical databases. ClickHouse is a distributed server-based system designed for petabyte-scale analytics. DuckDB is embedded and single-machine. For personal and small-team use cases, DuckDB is simpler and more appropriate. ClickHouse scales further but requires infrastructure.

Can I use DuckDB with Pandas?#

Yes. DuckDB integrates tightly with Pandas: conn.execute("SELECT * FROM df").fetchdf() converts a DuckDB result to a Pandas DataFrame, and conn.register("df", pandas_df) registers a DataFrame as a queryable table.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →

Mark Rachapoom

Written by

Mark Rachapoom

Building the future of AI CRM software.

Continue reading

DENCH

© 2026 DenchHQ · San Francisco, CA