Orchestration fought back
We just made a film about Kit running many agents as one flow. This is the note underneath it: how that actually works, and how long it fought us before it did. Orchestration is the hardest thing we have built, and the road to it was not straight.
How the approach changed
The first version was an idea with a name: kitd, a small local daemon that would watch the brain and wake the right agent when work landed, so a human did not have to carry every message between agents by hand. That idea shipped, in late April, as kit-loom. It listens to the brain over Postgres notifications, matches a waiting subscription, spawns the right agent, and records every dispatch. It worked. Agents started waking each other.
And then it did not scale, not because it was slow, but because it was illegible. A collaboration made of spawn events is a thread you cannot read. You could see that agents had talked; you could not see the shape of the work. When something went wrong, you were reading a log, not watching a process.
So in May we stopped adding to the daemon and built a layer on top of it: a flow engine, and a place to watch it, Studio. A piece of work became a definition instead of a pile of events: a graph of steps, each with a role, with decision points and gates between them, and a run you could watch advance. We later renamed flows to playbooks in the interface, because that is what they are. The part that did not change is the important one: kit-loom is still underneath, still the thing that actually spawns each agent. The playbook decides what happens next; loom makes it happen.
A real playbook, end to end
Here is the one we run most: the coding workflow. Say you want a change to a website, ours is a small, deliberately silly shop called Whiskerwool, and it is where we put Kit through real work.
It starts with a trigger. You ask for the change in Claude, or it arrives another way: a message in Telegram, or the nightly audit deciding the single most worth-fixing thing and firing the same playbook itself. However it starts, the run drafts its own brief and acceptance criteria from the request, so the rest of the pipeline has something concrete to work from.
Then it moves through four roles. A Designer works out what the change should be. An Architect ratifies the approach. A Builder makes it, on a branch, touching nothing live. A Tester checks it against the acceptance criteria. Each is a step, and each step names a role, not a person.
The agents are registered separately. Kit ships with a few: Forest, which is Kit running in Claude Code, can cover any role; Taz, Kit running as Codex; Deforest, a headless Claude Code that does nothing but test. You can register your own; an agent is a name, a surface it runs on, a model, and the roles it is allowed to take. When the playbook reaches a step, it resolves the role to an agent: Designer and Architect to Forest, Builder to Taz, Tester to Deforest. Change the assignment and the same playbook runs with a different cast.
The part that took longest to get right is the handover. When the Builder finishes, the Tester does not get a copy-paste of what happened. The Builder's output, its summary, the changed files, the branch, is written into the shared brain, and the next agent wakes with it already in context, alongside the run's brief and the constraints that travel the whole way (this one is branch-only: no merging, no deploying). Context is handed over through the memory, not stuffed into a prompt. That is the whole reason Kit is the substrate underneath the orchestration, and not a separate tool bolted beside it.
At the end is a gate that is not an agent. The build passed its tests on a branch, and now it waits for you. A message arrives, what changed, whether tests passed, the files touched, with three choices: approve, and it merges and ships; decline, and it stays on the branch; or refine, and it goes back to the Builder with your note to try again. The gate is a step you put in the playbook on purpose, at the one moment that is yours to decide.
The four pieces
That is the whole shape, and it is four things. A way to compose the flow. A registry, so the right agent on the right model takes each step. One memory, so context is handed on instead of starting cold. And visibility with a gate, so you watch it run and decide only where you are truly needed.
We listed those four in the film as if they were obvious. They were not. Each one fought back, and the calm of pressing play is made of the months it took to make them quiet.
That, is the work.