gstack 18 Roles Explained: Your Virtual Engineering Team
gstack's 18 specialist roles give every feature a full virtual engineering team — from product thinking to post-deploy monitoring. Here's what each role does.
gstack 18 Roles Explained: Your Virtual Engineering Team
gstack gives every feature a full virtual engineering team: a product thinker, an engineering manager, a senior designer, an engineer, a staff reviewer, QA, a performance engineer, a release engineer, an SRE, and a technical writer — all working in sequence, with locked handoffs. These 18 roles cover the complete software development lifecycle, from problem framing to post-deploy documentation.
Here's what each role does, why it matters, and what it produces.
Why Roles Matter#
Before diving into the roster, it's worth understanding why gstack uses specialist roles rather than a single "AI assistant" that handles everything.
The problem with general-purpose AI development assistance is context collapse. When one entity is simultaneously the product thinker, the implementer, the reviewer, and the QA engineer, it has an incentive to find its own implementation correct. It lacks the adversarial perspective that makes each role valuable.
A staff engineer reviewing code doesn't assume the implementer made good decisions — she looks for the ways she's seen good decisions go wrong in production. A QA lead testing a feature doesn't assume the engineer wrote correct behavior — she tries to break it. A performance engineer doesn't assume the implementation is fast — she measures it.
Role separation creates the right adversarial tension at each phase. The result is better software.
The 18 Roles#
1. Think: YC Office Hours#
Phase: Think
Inspired by: Y Combinator office hours — the skeptical mentor asking hard questions
What it does: The YC Office Hours role reframes the problem before any code is written. It asks: What are you actually trying to accomplish? Is this feature request the right solution to the underlying problem, or is it a workaround? What would make this feature unnecessary?
It generates a design document that becomes the shared artifact for the entire build. The design doc includes:
- Problem statement (reframed if necessary)
- Success criteria
- Out-of-scope explicitly listed
- Assumptions worth challenging
- Risks worth flagging
Why it matters: Most software bugs are requirements bugs. The feature was built correctly, but the feature itself was wrong. The Think role catches these before they become expensive.
2. Plan CEO#
Phase: Plan
Inspired by: Founder product thinking, Airbnb's 10-star experience framework
What it does: Plan CEO evaluates the design document through a product lens. It operates in two explicit modes:
- Expansion mode: Generate every possible way to solve this problem well. No constraints.
- Reduction mode: Eliminate everything except the essential, highest-leverage solution.
It applies "10-star thinking": what would the most delightful version of this feature look like? Then works backward to what's actually shippable in the current sprint.
The output is a product vision document — what the feature should feel like, what success looks like for the user, and what trade-offs were consciously made.
3. Plan Eng#
Phase: Plan
Inspired by: Engineering Manager with technical depth
What it does: Plan Eng takes the product vision and translates it into a locked technical architecture. Locked means it doesn't change during Build. If the Build phase discovers something the architecture missed, it stops and flags — not improvises.
The architecture document includes:
- Data model and schema
- API contracts (inputs, outputs, error states)
- Data flow diagrams
- Edge cases with explicit handling decisions
- Test strategy (what needs unit tests vs. integration vs. E2E)
- Performance considerations and acceptable bounds
This is what software teams skip under deadline pressure. gstack doesn't let you skip it.
4. Plan Design#
Phase: Plan
Inspired by: Senior Designer running a design critique
What it does: Plan Design evaluates every visual and interaction design dimension of the feature, rating each on a 0–10 scale. It then edits the plan until every dimension reaches 10.
Design dimensions it evaluates:
- Information hierarchy
- Interaction flow
- Error state handling
- Loading and empty states
- Accessibility
- Consistency with existing design system
- Mobile responsiveness
Why 0–10 per dimension instead of a general review? Numeric ratings prevent the "it looks fine" trap. Anything below an 8 is a concrete problem that needs a concrete fix. Plan Design doesn't move on until every dimension is resolved.
5. Build: Engineer#
Phase: Build
Inspired by: Senior engineer implementing against a spec
What it does: The Build role implements the feature against the locked plan. It does not deviate from the architecture without flagging. It implements completely — all edge cases, all error states, all the things the plan said to handle.
It writes code, tests, and initial documentation simultaneously, not as separate steps.
What makes this different from normal AI coding: The locked plan is the constraint. Without it, AI coding tools make architectural decisions on the fly and those decisions are often inconsistent with each other. With a locked plan, every implementation decision has a reference to check against.
6. Review: Staff Engineer#
Phase: Review
Inspired by: Senior individual contributor with production experience
What it does: The Review role looks specifically for bugs that pass CI but blow up in production:
- Race conditions and state management issues
- Missing error handling in async flows
- N+1 database queries that work in dev, kill prod
- Security vulnerabilities: injection, auth bypass, privilege escalation
- Incorrect assumptions about data shapes from external APIs
- Missing input validation
- Memory leaks
When it finds something, it auto-fixes it. No bug ticket, no comment-and-move-on. Find it, fix it, explain the change.
This role is modeled on the kind of review you'd get from a staff engineer who's been paged at 3am because of exactly these kinds of bugs.
7. Test QA: QA Lead#
Phase: Test
Inspired by: QA Lead who treats assumptions as hypotheses
What it does: Test QA tests the running application, not the code. It clicks through user flows, tries edge cases, attempts to trigger error states, tests boundary conditions.
When it finds bugs:
- It logs them
- It fixes them with atomic commits
- It re-verifies the fix didn't break anything else
The distinction from Review: Review reads code and finds theoretical problems. QA finds actual problems in working software.
8. Test Bench: Performance Engineer#
Phase: Test
Inspired by: Performance engineer with a focus on user-perceived speed
What it does: Test Bench measures:
- Core Web Vitals: LCP, FID/INP, CLS
- Bundle sizes: JavaScript, CSS, images
- Page load times under realistic network conditions
- Database query performance under expected load
- API response times at P50, P95, P99
Performance problems caught in Test Bench cost 30–60 minutes to fix. The same problems caught after launch cost days, a production incident, and usually a regression in user trust.
9. Ship: Release Engineer#
Phase: Ship
Inspired by: Release engineer running a deployment checklist
What it does: Ship runs a specific, invariant checklist before code leaves the dev environment:
- Sync with main branch
- Resolve any conflicts
- Run the full test suite
- Audit test coverage (flag anything below threshold)
- Check for TODOs or debug code
- Verify environment configs are correct
- Push branch
- Open PR with full description
Nothing optional. Nothing "we'll do it later." If any step fails, Ship halts and reports. The gstack Ship workflow guide covers this in depth.
10. Deploy: Release Engineer#
Phase: Ship
Inspired by: SRE managing production deployments
What it does:
- Merge the PR after CI passes
- Trigger the deployment pipeline
- Wait for deployment to complete
- Verify production health metrics (error rates, latency, uptime)
- Run smoke tests against production
- Confirm key user flows work in prod
This is the "don't celebrate until the smoke tests pass" role.
11. Monitor: SRE#
Phase: Ship
Inspired by: Site Reliability Engineer on post-deploy watch
What it does: Post-deploy monitoring for 15–30 minutes after deployment:
- Watch error rate dashboards
- Check database connection pool usage
- Monitor memory and CPU
- Verify caching behavior
- Alert on any unexpected patterns
Most post-deploy incidents are visible within 20 minutes. Monitor catches them before users report them.
12. Reflect: Eng Manager#
Phase: Reflect
Inspired by: Engineering manager running a sprint retrospective
What it does: The Reflect role runs a retrospective on the completed build:
- What worked well in each phase?
- Where did the process break down?
- What took longer than expected and why?
- What would we do differently next time?
It generates per-role breakdowns — reviewing not just the code outcome but the process quality at each step. This is how gstack improves over time.
13. Document: Technical Writer#
Phase: Reflect
Inspired by: Technical writer who maintains living documentation
What it does: Updates all documentation to match what shipped:
- README
- API reference
- Changelog
- Architecture docs
- Deployment runbook
- Developer guides
Documentation that matches what shipped is a forcing function. If the plan and the implementation diverged, Document will catch it.
How the Roles Coordinate#
The gstack overview explains the full coordination model. In brief: each role produces a locked artifact that the next role consumes. Design doc → architecture → implementation → review findings → QA results → deployment log → retrospective.
The chain is what makes it work. Skip one link and the chain breaks.
The Completeness Principle ensures no role ships partial work forward: each phase is complete before the next begins.
FAQ#
Q: Do I need all 18 roles for every task? No. Small bug fixes might only use Think + Build + Review. Database migrations might skip Design. Use judgment about which roles add value for your specific task, but err toward the full chain for anything that could affect production data or core flows.
Q: Can one AI session handle all 18 roles? Technically yes, but not recommended. Role separation matters because it creates adversarial perspectives. A single context handling all roles tends to be self-consistent in ways that miss real problems.
Q: How long does a full 18-role gstack run take? For a typical medium-complexity feature: 2–4 hours of AI processing, with human review points at the end of Think, Plan, and Ship. Active human time: 45–90 minutes.
Q: Can I add custom roles? Yes. gstack's role definitions are files you can edit. Teams often add domain-specific roles: a Security Reviewer for auth-related changes, a Data Reviewer for schema migrations, a Localization Reviewer for internationalized features.
Q: What's the most commonly skipped role in practice? Test Bench (performance) and Document (documentation). Both are consistently described as "nice to have" under deadline pressure and consistently deliver the highest-ROI catches when actually run.
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
