gstack 18 Roles Explained: Your Virtual Engineering Team

gstack's 18 specialist roles give every feature a full virtual engineering team — from product thinking to post-deploy monitoring. Here's what each role does.

Mark Rachapoom

March 26, 2026·10 min read

gstack gives every feature a full virtual engineering team: a product thinker, an engineering manager, a senior designer, an engineer, a staff reviewer, QA, a performance engineer, a release engineer, an SRE, and a technical writer — all working in sequence, with locked handoffs. These 18 roles cover the complete software development lifecycle, from problem framing to post-deploy documentation.

Here's what each role does, why it matters, and what it produces.

Why Roles Matter#

Before diving into the roster, it's worth understanding why gstack uses specialist roles rather than a single "AI assistant" that handles everything.

The problem with general-purpose AI development assistance is context collapse. When one entity is simultaneously the product thinker, the implementer, the reviewer, and the QA engineer, it has an incentive to find its own implementation correct. It lacks the adversarial perspective that makes each role valuable.

A staff engineer reviewing code doesn't assume the implementer made good decisions — she looks for the ways she's seen good decisions go wrong in production. A QA lead testing a feature doesn't assume the engineer wrote correct behavior — she tries to break it. A performance engineer doesn't assume the implementation is fast — she measures it.

Role separation creates the right adversarial tension at each phase. The result is better software.

The 18 Roles#

1. Think: YC Office Hours#

Phase: Think
Inspired by: Y Combinator office hours — the skeptical mentor asking hard questions

What it does: The YC Office Hours role reframes the problem before any code is written. It asks: What are you actually trying to accomplish? Is this feature request the right solution to the underlying problem, or is it a workaround? What would make this feature unnecessary?

It generates a design document that becomes the shared artifact for the entire build. The design doc includes:

Problem statement (reframed if necessary)
Success criteria
Out-of-scope explicitly listed
Assumptions worth challenging
Risks worth flagging

Why it matters: Most software bugs are requirements bugs. The feature was built correctly, but the feature itself was wrong. The Think role catches these before they become expensive.

2. Plan CEO#

Phase: Plan
Inspired by: Founder product thinking, Airbnb's 10-star experience framework

What it does: Plan CEO evaluates the design document through a product lens. It operates in two explicit modes:

Expansion mode: Generate every possible way to solve this problem well. No constraints.
Reduction mode: Eliminate everything except the essential, highest-leverage solution.

It applies "10-star thinking": what would the most delightful version of this feature look like? Then works backward to what's actually shippable in the current sprint.

The output is a product vision document — what the feature should feel like, what success looks like for the user, and what trade-offs were consciously made.

3. Plan Eng#

Phase: Plan
Inspired by: Engineering Manager with technical depth

What it does: Plan Eng takes the product vision and translates it into a locked technical architecture. Locked means it doesn't change during Build. If the Build phase discovers something the architecture missed, it stops and flags — not improvises.

The architecture document includes:

Data model and schema
API contracts (inputs, outputs, error states)
Data flow diagrams
Edge cases with explicit handling decisions
Test strategy (what needs unit tests vs. integration vs. E2E)
Performance considerations and acceptable bounds

This is what software teams skip under deadline pressure. gstack doesn't let you skip it.

4. Plan Design#

Phase: Plan
Inspired by: Senior Designer running a design critique

What it does: Plan Design evaluates every visual and interaction design dimension of the feature, rating each on a 0–10 scale. It then edits the plan until every dimension reaches 10.

Design dimensions it evaluates:

Information hierarchy
Interaction flow
Error state handling
Loading and empty states
Accessibility
Consistency with existing design system
Mobile responsiveness

Why 0–10 per dimension instead of a general review? Numeric ratings prevent the "it looks fine" trap. Anything below an 8 is a concrete problem that needs a concrete fix. Plan Design doesn't move on until every dimension is resolved.

5. Build: Engineer#

Phase: Build
Inspired by: Senior engineer implementing against a spec

What it does: The Build role implements the feature against the locked plan. It does not deviate from the architecture without flagging. It implements completely — all edge cases, all error states, all the things the plan said to handle.

It writes code, tests, and initial documentation simultaneously, not as separate steps.

What makes this different from normal AI coding: The locked plan is the constraint. Without it, AI coding tools make architectural decisions on the fly and those decisions are often inconsistent with each other. With a locked plan, every implementation decision has a reference to check against.

6. Review: Staff Engineer#

Phase: Review
Inspired by: Senior individual contributor with production experience

What it does: The Review role looks specifically for bugs that pass CI but blow up in production:

Race conditions and state management issues
Missing error handling in async flows
N+1 database queries that work in dev, kill prod
Security vulnerabilities: injection, auth bypass, privilege escalation
Incorrect assumptions about data shapes from external APIs
Missing input validation
Memory leaks

When it finds something, it auto-fixes it. No bug ticket, no comment-and-move-on. Find it, fix it, explain the change.

This role is modeled on the kind of review you'd get from a staff engineer who's been paged at 3am because of exactly these kinds of bugs.

7. Test QA: QA Lead#

Phase: Test
Inspired by: QA Lead who treats assumptions as hypotheses

What it does: Test QA tests the running application, not the code. It clicks through user flows, tries edge cases, attempts to trigger error states, tests boundary conditions.

When it finds bugs:

It logs them
It fixes them with atomic commits
It re-verifies the fix didn't break anything else

The distinction from Review: Review reads code and finds theoretical problems. QA finds actual problems in working software.

8. Test Bench: Performance Engineer#

Phase: Test
Inspired by: Performance engineer with a focus on user-perceived speed

What it does: Test Bench measures:

Core Web Vitals: LCP, FID/INP, CLS
Bundle sizes: JavaScript, CSS, images
Page load times under realistic network conditions
Database query performance under expected load
API response times at P50, P95, P99

Performance problems caught in Test Bench cost 30–60 minutes to fix. The same problems caught after launch cost days, a production incident, and usually a regression in user trust.

9. Ship: Release Engineer#

Phase: Ship
Inspired by: Release engineer running a deployment checklist

What it does: Ship runs a specific, invariant checklist before code leaves the dev environment:

Sync with main branch
Resolve any conflicts
Run the full test suite
Audit test coverage (flag anything below threshold)
Check for TODOs or debug code
Verify environment configs are correct
Push branch
Open PR with full description

Nothing optional. Nothing "we'll do it later." If any step fails, Ship halts and reports. The gstack Ship workflow guide covers this in depth.

10. Deploy: Release Engineer#

Phase: Ship
Inspired by: SRE managing production deployments

What it does:

Merge the PR after CI passes
Trigger the deployment pipeline
Wait for deployment to complete
Verify production health metrics (error rates, latency, uptime)
Run smoke tests against production
Confirm key user flows work in prod

This is the "don't celebrate until the smoke tests pass" role.

11. Monitor: SRE#

Phase: Ship
Inspired by: Site Reliability Engineer on post-deploy watch

What it does: Post-deploy monitoring for 15–30 minutes after deployment:

Watch error rate dashboards
Check database connection pool usage
Monitor memory and CPU
Verify caching behavior
Alert on any unexpected patterns

Most post-deploy incidents are visible within 20 minutes. Monitor catches them before users report them.

12. Reflect: Eng Manager#

Phase: Reflect
Inspired by: Engineering manager running a sprint retrospective

What it does: The Reflect role runs a retrospective on the completed build:

What worked well in each phase?
Where did the process break down?
What took longer than expected and why?
What would we do differently next time?

It generates per-role breakdowns — reviewing not just the code outcome but the process quality at each step. This is how gstack improves over time.

13. Document: Technical Writer#

Phase: Reflect
Inspired by: Technical writer who maintains living documentation

What it does: Updates all documentation to match what shipped:

README
API reference
Changelog
Architecture docs
Deployment runbook
Developer guides

Documentation that matches what shipped is a forcing function. If the plan and the implementation diverged, Document will catch it.

How the Roles Coordinate#

The gstack overview explains the full coordination model. In brief: each role produces a locked artifact that the next role consumes. Design doc → architecture → implementation → review findings → QA results → deployment log → retrospective.

The chain is what makes it work. Skip one link and the chain breaks.

The Completeness Principle ensures no role ships partial work forward: each phase is complete before the next begins.

FAQ#

Q: Do I need all 18 roles for every task? No. Small bug fixes might only use Think + Build + Review. Database migrations might skip Design. Use judgment about which roles add value for your specific task, but err toward the full chain for anything that could affect production data or core flows.

Q: Can one AI session handle all 18 roles? Technically yes, but not recommended. Role separation matters because it creates adversarial perspectives. A single context handling all roles tends to be self-consistent in ways that miss real problems.

Q: How long does a full 18-role gstack run take? For a typical medium-complexity feature: 2–4 hours of AI processing, with human review points at the end of Think, Plan, and Ship. Active human time: 45–90 minutes.

Q: Can I add custom roles? Yes. gstack's role definitions are files you can edit. Teams often add domain-specific roles: a Security Reviewer for auth-related changes, a Data Reviewer for schema migrations, a Localization Reviewer for internationalized features.

Q: What's the most commonly skipped role in practice? Test Bench (performance) and Document (documentation). Both are consistently described as "nice to have" under deadline pressure and consistently deliver the highest-ROI catches when actually run.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →