Back to The Times of Claw

DuckDB Full-Text Search: Find Anything in Your Data

DuckDB full-text search lets you find text across millions of rows instantly. Here's how to set it up and use it in your local analytics stack.

Mark Rachapoom
Mark Rachapoom
·7 min read
DuckDB Full-Text Search: Find Anything in Your Data

DuckDB Full-Text Search: Find Anything in Your Data

DuckDB full-text search lets you run fast text queries across large datasets without spinning up Elasticsearch or a separate search service. If you've got a DuckDB database and you need to find records by text content, this guide shows you exactly how to do it.

This is one of those capabilities that sounds simple but opens up a surprising number of use cases — from searching CRM notes to querying log files to finding relevant entries across thousands of documents.

How DuckDB Full-Text Search Works#

DuckDB's FTS is implemented as an extension (fts). Once loaded, you can create a full-text search index on any table, then query it using the match_bm25 function, which ranks results by BM25 relevance — the same algorithm behind most modern search engines.

The extension creates inverted index structures internally. When you run a search, it scans the index rather than doing a sequential table scan, which is what makes it fast even on large datasets.

Step 1: Load the Extension#

INSTALL fts;
LOAD fts;

That's it. No server to configure, no connection string to set up.

Step 2: Create a Table with Text Data#

CREATE TABLE notes (
    id INTEGER PRIMARY KEY,
    title VARCHAR,
    body TEXT,
    created_at TIMESTAMP
);
 
INSERT INTO notes VALUES
    (1, 'Q1 Review', 'Revenue was up 40% in Q1. Pipeline looks strong going into Q2.', NOW()),
    (2, 'Customer Call', 'Sarah from Acme mentioned they might expand their contract.', NOW()),
    (3, 'Churn Analysis', 'Three customers churned this month. Main reason: pricing.', NOW()),
    (4, 'Product Roadmap', 'Next quarter we ship the mobile app and API v2.', NOW());

Step 3: Build the FTS Index#

PRAGMA create_fts_index('notes', 'id', 'title', 'body');

The first argument is the table name. The second is the primary key column. The remaining arguments are the columns to index. You can index as many text columns as you need.

SELECT id, title, fts_main_notes.match_bm25(id, 'pipeline revenue') AS score
FROM notes
WHERE score IS NOT NULL
ORDER BY score DESC;

This returns results ranked by relevance. Records mentioning both "pipeline" and "revenue" rank higher than records that only match one term.

Advanced Search Patterns#

Boolean Queries#

DuckDB FTS supports AND, OR, and NOT operators:

-- Must contain both terms
SELECT id, title, fts_main_notes.match_bm25(id, 'pipeline AND revenue') AS score
FROM notes WHERE score IS NOT NULL ORDER BY score DESC;
 
-- Either term
SELECT id, title, fts_main_notes.match_bm25(id, 'churn OR expansion') AS score
FROM notes WHERE score IS NOT NULL ORDER BY score DESC;
 
-- Exclude a term
SELECT id, title, fts_main_notes.match_bm25(id, 'customer NOT churn') AS score
FROM notes WHERE score IS NOT NULL ORDER BY score DESC;

Combining FTS with Regular Filters#

This is where DuckDB shines — you can mix full-text search with normal SQL predicates:

SELECT id, title, created_at,
    fts_main_notes.match_bm25(id, 'expansion contract') AS score
FROM notes
WHERE score IS NOT NULL
    AND created_at > '2026-01-01'
ORDER BY score DESC
LIMIT 10;

Full-text search with date filtering and result limiting — no separate search service needed.

If you only want to search specific fields, create a separate index or use field prefixes in the query:

-- Index only the body field
PRAGMA create_fts_index('notes', 'id', 'body', stemmer='porter');

The stemmer option enables stemming — so "pricing" also matches "price" and "priced". Supported stemmers include porter (English), german, french, spanish, and others.

Real-World Use Case: Searching CRM Notes#

DenchClaw uses DuckDB as its core database. Every CRM entry can have an attached document — meeting notes, call summaries, proposals. Here's how you'd implement search across all of them:

-- Assume you have an entry_documents table
CREATE TABLE entry_documents (
    entry_id VARCHAR,
    object_name VARCHAR,
    title VARCHAR,
    content TEXT,
    updated_at TIMESTAMP
);
 
-- Build the index
PRAGMA create_fts_index('entry_documents', 'entry_id', 'title', 'content', stemmer='porter');
 
-- Search across all documents
SELECT 
    entry_id,
    object_name,
    title,
    fts_main_entry_documents.match_bm25(entry_id, 'contract renewal Q2') AS relevance
FROM entry_documents
WHERE relevance IS NOT NULL
ORDER BY relevance DESC
LIMIT 20;

This lets you type a query in natural language and surface the most relevant CRM records — all running locally, all in SQL.

Performance Considerations#

Index Size#

FTS indexes add overhead. A rough rule: expect the index to be 30-50% the size of the raw text being indexed. For a 1GB text dataset, budget 300-500MB for the index.

Rebuild vs. Incremental Updates#

DuckDB FTS indexes are not automatically updated when you insert new rows. You need to rebuild:

PRAGMA drop_fts_index('notes');
PRAGMA create_fts_index('notes', 'id', 'title', 'body');

For append-only workloads (like logs), rebuild on a schedule. For interactive CRM use, rebuild after bulk imports.

Query Performance#

For a table with 1 million rows, a BM25 search typically returns in under 100ms on modern hardware. For 10 million rows, expect 200-500ms. If you need faster, reduce the indexed columns or pre-filter with a WHERE clause before the FTS scan.

Comparing DuckDB FTS to Elasticsearch#

FeatureDuckDB FTSElasticsearch
Setup2 SQL commandsDocker + config + cluster
Relevance algorithmBM25BM25 (customizable)
Boolean queriesYesYes
Faceted searchManual with GROUP BYNative
Fuzzy matchingLimitedAdvanced
ScaleSingle machineDistributed
CostFree$16/GB/month (managed)
SQL integrationNativeSeparate API

For most use cases — internal tools, business analytics, CRM search — DuckDB FTS is more than sufficient. Elasticsearch is worth adding when you need fuzzy matching, sophisticated relevance tuning, or distributed scale.

Practical Recipes#

Autocomplete Suggestions#

-- Get unique words from the title column for autocomplete
SELECT DISTINCT unnest(string_split(lower(title), ' ')) AS word
FROM notes
WHERE word LIKE 'rev%'
ORDER BY word
LIMIT 10;

Search with Snippets#

DuckDB doesn't have a built-in snippet function, but you can fake it:

SELECT 
    id,
    title,
    substr(body, 
        max(1, position('revenue' IN lower(body)) - 50), 
        200) AS snippet,
    fts_main_notes.match_bm25(id, 'revenue') AS score
FROM notes
WHERE score IS NOT NULL
ORDER BY score DESC;
-- Search across both notes and tasks
WITH combined AS (
    SELECT 'note' AS type, id, title AS text, body AS content FROM notes
    UNION ALL
    SELECT 'task' AS type, id, title AS text, description AS content FROM tasks
)
SELECT type, id, text
FROM combined
WHERE lower(content) LIKE '%pipeline%'
   OR lower(text) LIKE '%pipeline%'
ORDER BY type, id;

For proper multi-table BM25, you'd need separate indexes per table and combine results in application code.

When to Use DuckDB FTS vs. Other Approaches#

Use DuckDB FTS when:

  • Your dataset fits on one machine (under a few hundred GB)
  • You want SQL-native search without extra infrastructure
  • You need to combine search with aggregations and filters
  • You're building internal tools or analytics dashboards

Consider alternatives when:

  • You need fuzzy matching (typo tolerance) — look at pg_trgm or Meilisearch
  • You need real-time index updates without rebuilds — SQLite FTS5 has better incremental support
  • You need distributed search at petabyte scale — Elasticsearch or OpenSearch

For local analytics use cases — which is exactly what DuckDB for business is designed for — the built-in FTS extension handles the vast majority of search requirements.

Frequently Asked Questions#

Not natively as a quoted phrase operator. You can approximate it by using LIKE for exact phrase matching in combination with FTS for relevance ranking.

Can I use DuckDB FTS with multiple languages?#

Yes. The stemmer parameter supports English (porter), German, French, Spanish, Dutch, Italian, Portuguese, and several others. Set it when creating the index.

How do I update the FTS index when data changes?#

Drop and recreate: PRAGMA drop_fts_index('table'); PRAGMA create_fts_index(...);. There's no incremental update mechanism in the current version.

Is DuckDB FTS case-sensitive?#

No. Queries are automatically lowercased before matching.

What's the maximum text size DuckDB FTS can handle?#

There's no hard limit per document, but very large documents (>1MB each) will slow down index creation. For large text, consider chunking documents into smaller passages.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →

Mark Rachapoom

Written by

Mark Rachapoom

Building the future of AI CRM software.

Continue reading

DENCH

© 2026 DenchHQ · San Francisco, CA