How DenchClaw Built DenchClaw: AI-Assisted Development in Practice

DenchClaw was built using DenchClaw itself — an AI-assisted workflow with gstack roles, local DuckDB, and OpenClaw agents. Here's exactly how it worked.

Kumar Abhirup

March 26, 2026·14 min read

There's a version of this story where I say we used DenchClaw to build DenchClaw and leave it at that — a nice recursive flourish, self-referential enough to seem clever, vague enough not to require explanation. That's not the version I'm going to tell.

The version I'm going to tell is specific. It involves real workflows, real failures, real tools, and a few things I didn't expect when we started eating our own cooking as early as month two of development. It's a story about what AI-assisted development actually looks like when you commit to it fully, not as a feature but as the default operating mode.

Why Dogfooding DenchClaw Was Necessary, Not Optional#

The honest reason we started using DenchClaw to build DenchClaw wasn't altruism or some principled commitment to dogfooding. It was desperation.

In the early days, Mark and I were moving faster than our tooling could keep up with. We had a growing list of issues on GitHub, a product roadmap in Notion that was perpetually two weeks out of date, and a set of investor relationships we were managing in — I'm embarrassed to admit this — a shared Google Sheet. We were building a CRM and not using one. The irony was starting to sting.

So we switched. We moved our investor pipeline into DenchClaw. We started logging product decisions as entry documents. We moved our roadmap into a kanban view. Not because it was perfect — it wasn't — but because using it was the fastest way to find out what was broken.

This turned out to be one of the best engineering decisions we made. Finding a bug because a user reported it is one kind of feedback. Finding a bug because you personally hit it while trying to do real work is a different category entirely. The urgency is different. The clarity is different. When I couldn't add a relation field to a deal entry because of a null-pointer error in the PIVOT view generation, I fixed it within two hours because I couldn't do my actual work until it was fixed.

The gstack Workflow in Practice#

By month three, we had formalized something we were calling gstack — a structured development workflow built around specialist AI roles. I've written about it separately, so I won't rehash all 18 roles here. What I want to talk about is what it felt like to actually use it for a non-trivial project.

The first time I ran a full gstack cycle on a real feature — the entry documents feature, where every CRM record can have its own rich-text markdown page — I set aside a full afternoon for it. I expected to get through maybe the Think and Plan phases and stub out some code.

Instead, I got to a working prototype by 6pm.

Here's what happened, concretely. The YC Office Hours phase (the first gstack role, which asks the hard strategic questions before any code gets written) surfaced a question I hadn't thought about: if entry documents are stored as markdown files on the filesystem, and the agent has direct write access to those files, and the user also has direct write access through the UI, what happens when they edit simultaneously? This is a conflict resolution question. It sounds like a detail but it's actually architectural. Getting it wrong means data loss.

Because gstack forced me to answer that question before writing code, I designed the feature correctly from the start. Documents are stored as append-only markdown files, not mutable. The UI always reads the full file and writes a complete replacement. The agent does the same. Conflicts are resolved at the filesystem level by the OS, last write wins, and the history is recoverable via git. Simple, correct, and no special-case code.

If I'd started coding without that question, I would have built a mutable document system and discovered the conflict problem three weeks later in production.

The Engineering Manager phase then produced an architecture document: where the files live, how they're registered in DuckDB's documents table, what the API endpoints look like, how the frontend loads and saves them. This document became the contract between the backend work and the frontend work. When I asked a subagent to implement the backend and another to implement the frontend, both were working from the same spec. Integration was almost trivial.

Subagents Building the Product#

This deserves more than a passing mention. A meaningful fraction of DenchClaw's code was written not by Mark or me directly, but by subagents that we orchestrated through DenchClaw itself.

The pattern looks like this: I'm working in the DenchClaw web chat. I describe a feature or a bug fix. I say "spawn a coding agent for this." DenchClaw spawns a subagent — typically a Claude Code or Codex session — with the relevant context: the gstack plan doc, the relevant files, the specific task. The subagent does the implementation work, typically in 10–30 minutes. I review the output, give feedback if needed, and either accept or ask for revisions.

What I'm doing in this flow is mostly thinking and directing. The reading, writing, compiling, testing cycle is happening in the subagent. My job is to hold the product vision clearly, catch architectural mistakes in the output, and maintain coherence across multiple simultaneous subagent sessions.

This is genuinely different from using an AI coding assistant like Copilot inline. With an inline assistant, the AI is filling in lines as you type. The cognition is mostly yours; the AI is doing autocomplete on a larger scale. With an orchestrated subagent, the AI is doing extended, multi-file work while you're thinking about something else. The division of labor is more like managing an engineer than pairing with one.

The quality control challenge is real. Subagents don't hold product context the way a human engineer who's been on the project for six months does. They do what the task says, which means the task description has to be very good. Early on, we produced a lot of technically correct code that was architecturally wrong — correctly implementing a feature that we'd described poorly, creating technical debt we had to clean up later. Over time we got better at writing task briefs that included not just "what to do" but "how it should fit with the existing architecture."

DuckDB as the Development Database#

One thing that surprised me about eating our own dogfood was how often I found myself using raw DuckDB queries to understand what was happening in the product.

The DenchClaw database is accessible directly at ~/.openclaw-dench/workspace/workspace.duckdb. Any time something looks wrong in the UI — a count is off, a relation isn't resolving, a PIVOT view is returning unexpected data — I drop into a DuckDB shell and query directly. This is impossibly fast.

-- How many entries does the 'deal' object have?
SELECT COUNT(*) FROM entries e
JOIN objects o ON o.id = e.object_id
WHERE o.name = 'deal';
 
-- Show me all entry_fields for deal #42
SELECT f.name AS field_name, ef.value
FROM entry_fields ef
JOIN fields f ON f.id = ef.field_id
WHERE ef.entry_id = 42
ORDER BY f.sort_order;

These queries return in single-digit milliseconds. The ability to directly inspect your production database with no intermediary, no export step, no ORM obscuring what's actually there — this is one of the underrated virtues of local-first software. When you're debugging, there's no mystery. The data is right there.

We also use DuckDB directly for our own internal metrics. How many objects have we created? What's the average number of fields per object? What's the ratio of entries with entry documents to entries without? These questions are trivially answerable because the data is local and we own it completely.

-- Field type distribution across all objects
SELECT ft.type_name, COUNT(*) AS field_count
FROM fields f
JOIN field_types ft ON ft.id = f.field_type_id
GROUP BY ft.type_name
ORDER BY field_count DESC;

That query tells me, at a glance, which field types are most used in real workspaces. That's product signal. We acted on it — when we saw that relation fields were the most frequently created type after text, we prioritized making relation fields faster and more reliable.

Memory Files and Async Context#

DenchClaw's memory system — MEMORY.md for long-term context, memory/YYYY-MM-DD.md for daily logs — was originally designed for end users. But we use it ourselves, heavily, as the development team.

Every significant product decision gets logged in memory/YYYY-MM-DD.md. Why we chose this approach over that one. What we tried that didn't work. What we decided to defer and why. When you come back to a part of the codebase six weeks later, the daily notes tell you what was in your head when you wrote it.

This has been more useful than comments in the code, and more honest. Code comments describe what code does. Memory files describe why decisions were made, what constraints were operating, what alternatives were considered. That's the context that's hardest to reconstruct and most valuable when you need it.

The agent also reads these files as part of its startup context, which means when I ask DenchClaw a question about a past decision, it has the actual notes in context, not just the codebase. "Why did we use EAV instead of a fixed schema?" — the agent can answer this because the decision is documented in the memory files, not just implied by the code.

What Didn't Work#

I want to be honest about the failures, because the success stories can make this sound easier than it is.

The biggest failure was sprint planning. We tried to use DenchClaw's kanban view as our sprint board, with gstack sprints mapped to two-week cycles. In theory this should have worked perfectly. In practice, we found that the sprint planning process requires a kind of structured discussion — prioritization debates, estimation arguments, scope negotiation — that doesn't naturally happen in the format of CRM entries.

We ended up keeping sprint planning in a separate tool (we used a simple markdown file, honestly) and only syncing outcomes back to DenchClaw. This was a valuable lesson: DenchClaw is great for managing outcomes — tracking what happened, what's in progress, what's done — but the process of deciding what to work on benefits from a more freeform workspace.

The second failure was using the browser automation agent for dependency research. We thought it would be useful to have the agent automatically research libraries and tools, pull information from GitHub readmes and npm pages, and summarize options. In theory, yes. In practice, the agent spent a lot of time on pages that required JavaScript rendering and interaction that it couldn't handle reliably, and the summaries it produced were too shallow to actually inform decisions. We reverted to doing this manually.

Third: the gstack Document phase (technical documentation) consistently produced documentation that was accurate at the moment of writing and stale within two weeks. Documentation that describes code is always racing against code that changes. We got better results by treating documentation as a review checklist — the Technical Writer role checks that existing docs are accurate, rather than writing new ones from scratch.

The Compounding Effect#

Six months into using DenchClaw to build DenchClaw, something interesting happened. The velocity of the tooling improvements started compounding.

When Mark or I hit a friction point in DenchClaw — something that's annoying or slow or missing — we have the option to fix it immediately, because we're already in the codebase. This means that the features we personally use most tend to get polished fastest. The kanban view got multiple quality-of-life improvements because I use it all day. The entry document editor got keyboard shortcut support because Mark types faster than he clicks. The search got better because we both search constantly.

This is the virtuous cycle of dogfooding: the users with the most context about what would make the product better are also the people building it. Every friction point is immediately both a bug report and a pull request.

The other compounding effect is psychological. When I'm building a feature and I'm not sure whether it's worth the complexity, the fastest way to decide is to try to use it for real work. Not a demo, not a toy dataset. Real work. Features that seem clever in the abstract often turn out to be confusing in practice. Features that seem marginal in planning often turn out to be essential in use. Dogfooding collapses the feedback loop from months to days.

What This Means for AI Development Workflows#

I want to draw a broader conclusion from this experience, because I think it has implications beyond DenchClaw specifically.

We are in an early period of AI-assisted development where the tools are powerful but the workflows are immature. Most developers are using AI assistants in a fairly narrow way: inline code completion, occasional "write me a function that does X," maybe some test generation. These are genuine improvements, but they're improvements to the existing workflow, not transformations of it.

What we've learned at DenchClaw is that the bigger gains come from transforming the workflow itself. Not using AI to write code faster within a traditional development process, but using AI to do entire phases of the process that previously required extended human time: the architecture planning, the task decomposition, the code review, the documentation, the testing.

When you make this shift, the bottleneck moves. The bottleneck is no longer "how fast can we write code" — the AI can write code at a rate that exceeds what humans can review and understand. The bottleneck becomes "how clearly can we specify what we want." Technical product thinking — the ability to describe a desired system behavior precisely enough that an AI agent can implement it correctly — becomes the scarce resource.

This is a skills shift. The developers who will thrive in this environment are not necessarily the fastest typists or the most knowledgeable about language quirks. They're the ones who can think most clearly about what a system should do, describe it with enough precision for an AI to implement it, and recognize when the implementation is right or wrong.

DenchClaw is our ongoing experiment in developing these skills ourselves, using the product as both the tool and the subject. We're still learning. The workflow keeps improving. And every improvement we find while building DenchClaw becomes a capability that our users benefit from.

Frequently Asked Questions#

Did you really use DenchClaw to manage your own development before it was stable?#

Yes. The instability was actually useful — it forced us to find bugs that polished demos would never surface. We hit null pointer errors, migration failures, and sync edge cases that we fixed immediately because we personally couldn't do our work until they were resolved.

How do you handle the chicken-and-egg problem of using software to build that same software?#

Carefully. We kept a minimal fallback (a local SQLite file and a text editor) for the periods when DenchClaw's core was being refactored. In practice, the core database and entry system was stable enough within the first six weeks that we could rely on it. We were dogfooding less critical features while they were unstable.

What percentage of DenchClaw's code was written by AI subagents?#

Hard to say precisely, but my best estimate is around 60% of lines of code have at least one AI touch — either generated, substantially modified, or reviewed/fixed by a coding agent. The architectural decisions, the task briefs, and the reviews are human. The implementation loops are increasingly agent-driven.

Is the gstack workflow documented somewhere I can use it?#

Yes — see the full gstack explained article. The SKILL.md for gstack is also included in the DenchClaw workspace and can be triggered by asking your DenchClaw agent to run a gstack sprint on a feature.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →