gstack: The AI Development Workflow That Changes Everything

gstack is DenchClaw's structured AI development workflow — 7 phases, 18 specialist roles — that transforms how teams ship software with AI. Here's how it works.

Kumar Abhirup

March 26, 2026·9 min read

There is a version of AI-assisted development that looks like chaos: a developer chatting with an LLM, accepting patches wholesale, shipping things that sort of work until they catastrophically don't. And there is a version that looks like leverage: a structured workflow where AI specialists handle research, architecture, implementation, review, testing, and documentation — each doing their job with clear handoffs, locked artifacts, and human decision points at the right moments.

gstack is the second version. And it changes everything about how software gets shipped.

What gstack Is#

gstack is DenchClaw's structured AI development workflow, adapted from Garry Tan's open-source garrytan/gstack (MIT licensed). The core insight behind gstack is simple but profound: AI makes process cheap. The reason software teams skip design reviews, architectural planning, performance testing, and proper documentation isn't that they don't want those things. It's that each step costs time they don't have.

When AI can fill specialist roles in each of those phases — doing the work in seconds that previously took hours — the cost equation inverts. Now it's cheaper to do it properly than to skip steps and fix problems later.

gstack encodes "doing it properly" into a repeatable workflow.

The Seven Phases#

gstack moves code from idea to production through seven phases:

Think → Plan → Build → Review → Test → Ship → Reflect

Each phase has specific roles, specific outputs, and specific gates before the next phase begins. This isn't bureaucracy — it's architecture for reliable outcomes.

Think#

The Think phase does something most AI development workflows skip entirely: it forces you to ask whether you're solving the right problem.

The Think role is modeled on YC Office Hours — a skeptical, experienced founder asking hard questions about your assumptions before you write any code. It reframes the problem, challenges your stated requirements, and generates a design document that becomes the shared artifact for everything that follows.

This is where many AI systems fail. They're great at answering "how do I implement X?" but terrible at asking "should you build X at all?" The Think phase enforces that question.

Plan#

Planning in gstack splits into three specialist roles that work simultaneously on different dimensions:

Plan CEO operates from a founder mindset. It evaluates the design document against 10-star product thinking — Amazon's "what would the most delightful version of this look like?" — and explicitly works in two modes: Expansion (generating possibilities) and Reduction (eliminating everything that isn't essential). The output is a product vision document that keeps the build focused on what actually matters.

Plan Eng is the Engineering Manager role. It takes the product vision and turns it into a locked technical architecture: data flow diagrams, API contracts, edge cases, database schema, error handling, test strategy. "Locked" is key — the Build phase implements against this document, not against a moving target.

Plan Design rates every design dimension on a 0–10 scale, edits the plan until every dimension hits 10, and signs off before Build begins. No more "we'll polish the UX later." Later never comes. The design gets resolved in Plan.

Build#

The Build phase is where code gets written — but only against the locked plan. This constraint is more important than it sounds. Most AI development failures happen because the AI starts implementing and then encounters edge cases that require architectural decisions. Without a locked plan, those decisions get made ad hoc, often incorrectly.

With gstack, the Build role — an experienced engineer — has a complete spec. It implements faithfully. When it encounters something the plan didn't cover, it stops and flags it rather than inventing a solution.

Review#

The Review phase uses a Staff Engineer perspective — someone who has seen production systems fail in unexpected ways. The reviewer specifically looks for bugs that pass CI but blow up in production: race conditions, missing error handling, N+1 queries, security vulnerabilities, incorrect assumptions about data shape.

Critically, Review auto-fixes what it finds. No separate bug-filing process. Find it, fix it, move on.

Test#

Testing in gstack splits into two distinct roles:

Test QA is the QA Lead who tests the running application. Not unit tests — actual usage. Clicks through flows, tries edge cases, attempts to break things. When it finds bugs, it fixes them with atomic commits and re-verifies.

Test Bench is the Performance Engineer. Core Web Vitals, bundle sizes, page load times, database query performance. Performance problems caught here cost 30 minutes to fix. The same problems caught after launch cost days and a production incident.

Ship#

Ship is a Release Engineer running a specific checklist: sync main, run full test suite, audit coverage, verify there are no TODOs in production-bound code, push, open PR. No surprises. No heroics.

The Deploy sub-phase follows: merge PR, wait for CI, deploy to production, verify health metrics, check error rates. Each step is logged.

The Monitor sub-phase runs post-deploy: watch error rates, check key user flows, verify expected behavior in production. Close the loop before moving on.

Reflect#

The Reflect phase does something rare in software development: it learns from what happened. The Engineering Manager role runs a retrospective with per-person (per-role) breakdowns. What did each specialist do well? What failed? What should change in the next run?

The Document sub-phase handles the detail work: updating all documentation to match what shipped. README, API docs, changelog, deployment guides. All current. All accurate.

The 18 Specialist Roles#

Full breakdown of gstack's specialist roster:

Phase	Role	Responsibility
Think	YC Office Hours	Problem reframing, design doc
Plan	CEO	10-star product thinking, Expansion/Reduction
Plan	Eng Manager	Architecture, data flow, edge cases, test strategy
Plan	Senior Designer	Design ratings (0–10), design decisions
Build	Engineer	Implementation against locked plan
Review	Staff Engineer	Production-failure-mode bugs, auto-fix
Test	QA Lead	Running app testing, atomic fixes
Test	Performance Eng	Core Web Vitals, bundle sizes, load times
Ship	Release Engineer	Sync, test, push, open PR
Deploy	Release Engineer	Merge, CI, deploy, health check
Monitor	SRE	Post-deploy monitoring
Reflect	Eng Manager	Retrospective, per-role breakdowns
Document	Technical Writer	Docs updated to match what shipped

The 18 roles explained in detail is worth reading if you want to understand each specialist's specific responsibilities and outputs.

The Completeness Principle#

gstack operates under what we call the Completeness Principle: Always boil the lake, never the ocean.

The lake is your feature. The ocean is a complete rewrite. AI makes completeness cheap — so implement the full feature with all edge cases handled, all error states covered, all performance implications considered. That's the lake.

What AI can't do is attempt multi-quarter architectural overhauls. That's the ocean. Attempting the ocean produces incomplete implementations that break in unexpected ways. Boiling the lake — doing one thing completely — produces shippable software.

This principle shapes every phase of gstack. The Think phase scopes the lake carefully. Plan locks it. Build implements it completely. Review ensures it's actually complete. Ship doesn't let incomplete things through.

Safety Tools#

gstack ships with three safety tools that deserve their own explanation — see the full breakdown here — but briefly:

careful prevents dangerous commands (rm -rf, DROP TABLE, force-push) without an explicit confirmation
freeze restricts the AI to edits in a single directory, preventing scope creep
guard combines both

These tools matter because AI under pressure makes mistakes. When a deadline is close and a bug is blocking, the temptation is to run a broad command that fixes the symptom and creates three new problems. Safety tools prevent that.

Why gstack Is Different from "Just Using AI"#

Most teams "using AI for development" are doing one of two things:

Using Copilot-style autocomplete to write code faster
Chatting with an LLM about how to solve problems

Neither of these is a workflow. Neither produces consistently reliable software. Neither makes the quality of the output predictable.

gstack is a workflow. The inputs, outputs, and responsibilities at each phase are defined. A given task run through gstack produces a result with known properties: a design doc, a locked architecture, a reviewed implementation, QA-verified functionality, and updated documentation.

That predictability is the point. Not AI magic — repeatable engineering excellence.

How to Get Started#

gstack is built into DenchClaw as the gstack skill. To run a feature through the full workflow:

# Install DenchClaw
npx denchclaw
 
# Load the gstack skill and describe your feature
# DenchClaw will walk you through Think → Plan → Build → ...

The what is DenchClaw overview explains how gstack fits into the broader product.

For teams new to gstack, start with a small feature — something that takes a day to build manually. Run it through gstack. Observe where the workflow catches things you'd have missed. Then scale to larger features once you trust the process.

FAQ#

Q: Is gstack only for teams using DenchClaw? No. gstack is adapted from an open-source workflow (garrytan/gstack, MIT licensed) and can be used with any AI coding environment. DenchClaw's implementation includes the safety tools and skill infrastructure that make it easier to run.

Q: How much does each gstack run cost in API tokens? A typical feature spanning Think through Ship consumes roughly 200–400K tokens across all specialist roles. At current API pricing, that's $0.20–2.00 per feature depending on complexity and model choice. The cost of not catching production bugs is higher.

Q: Can I skip phases I think are unnecessary? You can, but gstack's value comes from the complete chain. Teams that skip Review or Test phases consistently ship more production incidents. The phases that feel like overhead are usually the ones preventing the most expensive problems.

Q: Does gstack work for bug fixes, not just new features? Yes. For small bugs, you might run only Think + Build + Review. For anything touching core logic or data, run the full chain.

Q: How does gstack interact with existing CI/CD pipelines? The Ship phase integrates with your existing CI. It opens a PR that your normal CI pipeline processes. gstack doesn't replace CI — it makes the code that goes into CI dramatically better before it gets there.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →