Back to The Times of Claw

gstack: The AI Development Workflow That Changes Everything

gstack is DenchClaw's structured AI development workflow — 7 phases, 18 specialist roles — that transforms how teams ship software with AI. Here's how it works.

Kumar Abhirup
Kumar Abhirup
·9 min read
gstack: The AI Development Workflow That Changes Everything

gstack: The AI Development Workflow That Changes Everything

There is a version of AI-assisted development that looks like chaos: a developer chatting with an LLM, accepting patches wholesale, shipping things that sort of work until they catastrophically don't. And there is a version that looks like leverage: a structured workflow where AI specialists handle research, architecture, implementation, review, testing, and documentation — each doing their job with clear handoffs, locked artifacts, and human decision points at the right moments.

gstack is the second version. And it changes everything about how software gets shipped.

What gstack Is#

gstack is DenchClaw's structured AI development workflow, adapted from Garry Tan's open-source garrytan/gstack (MIT licensed). The core insight behind gstack is simple but profound: AI makes process cheap. The reason software teams skip design reviews, architectural planning, performance testing, and proper documentation isn't that they don't want those things. It's that each step costs time they don't have.

When AI can fill specialist roles in each of those phases — doing the work in seconds that previously took hours — the cost equation inverts. Now it's cheaper to do it properly than to skip steps and fix problems later.

gstack encodes "doing it properly" into a repeatable workflow.

The Seven Phases#

gstack moves code from idea to production through seven phases:

Think → Plan → Build → Review → Test → Ship → Reflect

Each phase has specific roles, specific outputs, and specific gates before the next phase begins. This isn't bureaucracy — it's architecture for reliable outcomes.

Think#

The Think phase does something most AI development workflows skip entirely: it forces you to ask whether you're solving the right problem.

The Think role is modeled on YC Office Hours — a skeptical, experienced founder asking hard questions about your assumptions before you write any code. It reframes the problem, challenges your stated requirements, and generates a design document that becomes the shared artifact for everything that follows.

This is where many AI systems fail. They're great at answering "how do I implement X?" but terrible at asking "should you build X at all?" The Think phase enforces that question.

Plan#

Planning in gstack splits into three specialist roles that work simultaneously on different dimensions:

Plan CEO operates from a founder mindset. It evaluates the design document against 10-star product thinking — Amazon's "what would the most delightful version of this look like?" — and explicitly works in two modes: Expansion (generating possibilities) and Reduction (eliminating everything that isn't essential). The output is a product vision document that keeps the build focused on what actually matters.

Plan Eng is the Engineering Manager role. It takes the product vision and turns it into a locked technical architecture: data flow diagrams, API contracts, edge cases, database schema, error handling, test strategy. "Locked" is key — the Build phase implements against this document, not against a moving target.

Plan Design rates every design dimension on a 0–10 scale, edits the plan until every dimension hits 10, and signs off before Build begins. No more "we'll polish the UX later." Later never comes. The design gets resolved in Plan.

Build#

The Build phase is where code gets written — but only against the locked plan. This constraint is more important than it sounds. Most AI development failures happen because the AI starts implementing and then encounters edge cases that require architectural decisions. Without a locked plan, those decisions get made ad hoc, often incorrectly.

With gstack, the Build role — an experienced engineer — has a complete spec. It implements faithfully. When it encounters something the plan didn't cover, it stops and flags it rather than inventing a solution.

Review#

The Review phase uses a Staff Engineer perspective — someone who has seen production systems fail in unexpected ways. The reviewer specifically looks for bugs that pass CI but blow up in production: race conditions, missing error handling, N+1 queries, security vulnerabilities, incorrect assumptions about data shape.

Critically, Review auto-fixes what it finds. No separate bug-filing process. Find it, fix it, move on.

Test#

Testing in gstack splits into two distinct roles:

Test QA is the QA Lead who tests the running application. Not unit tests — actual usage. Clicks through flows, tries edge cases, attempts to break things. When it finds bugs, it fixes them with atomic commits and re-verifies.

Test Bench is the Performance Engineer. Core Web Vitals, bundle sizes, page load times, database query performance. Performance problems caught here cost 30 minutes to fix. The same problems caught after launch cost days and a production incident.

Ship#

Ship is a Release Engineer running a specific checklist: sync main, run full test suite, audit coverage, verify there are no TODOs in production-bound code, push, open PR. No surprises. No heroics.

The Deploy sub-phase follows: merge PR, wait for CI, deploy to production, verify health metrics, check error rates. Each step is logged.

The Monitor sub-phase runs post-deploy: watch error rates, check key user flows, verify expected behavior in production. Close the loop before moving on.

Reflect#

The Reflect phase does something rare in software development: it learns from what happened. The Engineering Manager role runs a retrospective with per-person (per-role) breakdowns. What did each specialist do well? What failed? What should change in the next run?

The Document sub-phase handles the detail work: updating all documentation to match what shipped. README, API docs, changelog, deployment guides. All current. All accurate.

The 18 Specialist Roles#

Full breakdown of gstack's specialist roster:

PhaseRoleResponsibility
ThinkYC Office HoursProblem reframing, design doc
PlanCEO10-star product thinking, Expansion/Reduction
PlanEng ManagerArchitecture, data flow, edge cases, test strategy
PlanSenior DesignerDesign ratings (0–10), design decisions
BuildEngineerImplementation against locked plan
ReviewStaff EngineerProduction-failure-mode bugs, auto-fix
TestQA LeadRunning app testing, atomic fixes
TestPerformance EngCore Web Vitals, bundle sizes, load times
ShipRelease EngineerSync, test, push, open PR
DeployRelease EngineerMerge, CI, deploy, health check
MonitorSREPost-deploy monitoring
ReflectEng ManagerRetrospective, per-role breakdowns
DocumentTechnical WriterDocs updated to match what shipped

The 18 roles explained in detail is worth reading if you want to understand each specialist's specific responsibilities and outputs.

The Completeness Principle#

gstack operates under what we call the Completeness Principle: Always boil the lake, never the ocean.

The lake is your feature. The ocean is a complete rewrite. AI makes completeness cheap — so implement the full feature with all edge cases handled, all error states covered, all performance implications considered. That's the lake.

What AI can't do is attempt multi-quarter architectural overhauls. That's the ocean. Attempting the ocean produces incomplete implementations that break in unexpected ways. Boiling the lake — doing one thing completely — produces shippable software.

This principle shapes every phase of gstack. The Think phase scopes the lake carefully. Plan locks it. Build implements it completely. Review ensures it's actually complete. Ship doesn't let incomplete things through.

Safety Tools#

gstack ships with three safety tools that deserve their own explanation — see the full breakdown here — but briefly:

  • careful prevents dangerous commands (rm -rf, DROP TABLE, force-push) without an explicit confirmation
  • freeze restricts the AI to edits in a single directory, preventing scope creep
  • guard combines both

These tools matter because AI under pressure makes mistakes. When a deadline is close and a bug is blocking, the temptation is to run a broad command that fixes the symptom and creates three new problems. Safety tools prevent that.

Why gstack Is Different from "Just Using AI"#

Most teams "using AI for development" are doing one of two things:

  1. Using Copilot-style autocomplete to write code faster
  2. Chatting with an LLM about how to solve problems

Neither of these is a workflow. Neither produces consistently reliable software. Neither makes the quality of the output predictable.

gstack is a workflow. The inputs, outputs, and responsibilities at each phase are defined. A given task run through gstack produces a result with known properties: a design doc, a locked architecture, a reviewed implementation, QA-verified functionality, and updated documentation.

That predictability is the point. Not AI magic — repeatable engineering excellence.

How to Get Started#

gstack is built into DenchClaw as the gstack skill. To run a feature through the full workflow:

# Install DenchClaw
npx denchclaw
 
# Load the gstack skill and describe your feature
# DenchClaw will walk you through Think → Plan → Build → ...

The what is DenchClaw overview explains how gstack fits into the broader product.

For teams new to gstack, start with a small feature — something that takes a day to build manually. Run it through gstack. Observe where the workflow catches things you'd have missed. Then scale to larger features once you trust the process.

FAQ#

Q: Is gstack only for teams using DenchClaw? No. gstack is adapted from an open-source workflow (garrytan/gstack, MIT licensed) and can be used with any AI coding environment. DenchClaw's implementation includes the safety tools and skill infrastructure that make it easier to run.

Q: How much does each gstack run cost in API tokens? A typical feature spanning Think through Ship consumes roughly 200–400K tokens across all specialist roles. At current API pricing, that's $0.20–2.00 per feature depending on complexity and model choice. The cost of not catching production bugs is higher.

Q: Can I skip phases I think are unnecessary? You can, but gstack's value comes from the complete chain. Teams that skip Review or Test phases consistently ship more production incidents. The phases that feel like overhead are usually the ones preventing the most expensive problems.

Q: Does gstack work for bug fixes, not just new features? Yes. For small bugs, you might run only Think + Build + Review. For anything touching core logic or data, run the full chain.

Q: How does gstack interact with existing CI/CD pipelines? The Ship phase integrates with your existing CI. It opens a PR that your normal CI pipeline processes. gstack doesn't replace CI — it makes the code that goes into CI dramatically better before it gets there.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →

Kumar Abhirup

Written by

Kumar Abhirup

Building the future of AI CRM software.

Continue reading

DENCH

© 2026 DenchHQ · San Francisco, CA