What we learned shipping durable agents
Why we moved off naive streaming chat and onto Vercel's Workflow SDK + Convex. The pain points, the fixes, and what we'd do differently.
The Dench Team
·10 min read
Lessons from shipping durable agents
In v1, every Dench chat turn was a single streaming request. Refresh the page mid-stream and you lost the run.
This worked for demos and broke in production. Customers ran 15-minute autonomous turns; their laptops sleep mid-turn; the chat reconnects and finds nothing.
We migrated to Vercel's Workflow SDK on top of Convex. Lessons:
- Durable means resumable, not just persistent. Saving every tool result isn't enough — clients need a way to reconnect to the live stream.
- Approvals are inherently durable. They live for hours or days. Build them as first-class workflow steps with awake/sleep semantics.
- Cost compounds invisibly. Without per-run budgets, agents that loop on a bad tool spend $40 before anyone notices. We added per-org caps and per-run prepaid credits.
- Observability over chat. When a run goes sideways, the chat UI tells you nothing useful. We built a runEvents timeline that mirrors what an APM looks like.
If we were starting from scratch we'd build the durable runtime first and the chat UI second.